Top

Published in:

2021 | OriginalPaper | Chapter

Bregman Proximal Gradient Algorithms for Deep Matrix Factorization

Authors : Mahesh Chandra Mukkamala, Felix Westerkamp, Emanuel Laude, Daniel Cremers, Peter Ochs

Published in: Scale Space and Variational Methods in Computer Vision

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

A typical assumption for the convergence of first order optimization methods is the Lipschitz continuity of the gradient of the objective function. However, for many practical applications this assumption is violated. To overcome this issue extensions based on generalized proximity measures, known as Bregman distances, were introduced. This initiated the development of the Bregman Proximal Gradient (BPG) algorithms, which, however, rely on problem dependent Bregman distances. In this paper, we develop Bregman distances for deep matrix factorization problems, which yields a BPG algorithm with theoretical convergence guarantees, while allowing for a constant step size strategy. Moreover, we demonstrate that the algorithms based on the developed Bregman distance outperform their Euclidean counterparts as well as alternating minimization based approaches.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter First-Order Geometric Multilevel Optimization for Discrete Tomography

next chapter Hessian Initialization Strategies for -BFGS Solving Non-linear Inverse Problems

Available only for authorised users

Arora, S., Cohen, N., Hu, W., Luo, Y.: Implicit regularization in deep matrix factorization. In: Advances in Neural Information Processing Systems, pp. 7413–7424 (2019)

Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)MathSciNetCrossRef

Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)MathSciNetCrossRef

Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)MathSciNetCrossRef

Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)MathSciNetCrossRef

Berg, R.V.D., Kipf, T.N., Welling, M.: Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017)

Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)MathSciNetCrossRef

Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2013). https://doi.org/10.1007/s10107-013-0701-9MathSciNetCrossRefMATH

Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)MathSciNetCrossRef

10.

Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: Artificial Intelligence and Statistics, pp. 192–204 (2015)

11.

Davis, D., Drusvyatskiy, D., MacPhee, K.J.: Stochastic model-based minimization under high-order growth. arxiv preprint arXiv:1807.00255 (2018)

12.

Dragomir, R.A., d’Aspremont, A., Bolte, J.: Quartic first-order methods for low rank minimization. arxiv preprint arXiv:1901.10791 (2019)

13.

Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)MathSciNetMATH

14.

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)MATH

15.

Hanzely, F., Richtárik, P.: Fastest rates for stochastic mirror descent methods. arxiv preprint arXiv:1803.07374 (2018)

16.

Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 19 (2016)

17.

Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)

18.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arxiv preprint arXiv:1412.6980 (2014)

19.

Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)MathSciNetCrossRef

20.

Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRef

21.

Li, Q., Zhu, Z., Tang, G., Wakin, M.B.: Provable Bregman-divergence based methods for nonconvex and non-lipschitz problems. arXiv preprint arXiv:1904.09712 (2019)

22.

Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)MathSciNetCrossRef

23.

Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)MathSciNetCrossRef

24.

Monti, F., Bronstein, M.M., Bresson, X.: Geometric matrix completion with recurrent multi-graph neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3700–3710 (2017)

25.

Mukkamala, M.C., Ochs, P.: Beyond alternating updates for matrix factorization with inertial Bregman proximal gradient algorithms. In: Advances in Neural Information Processing Systems, pp. 4266–4276 (2019)

26.

Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-Concave backtracking for inertial Bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math. Data Sci. 2(3), 658–682 (2020)MathSciNetCrossRef

27.

Nesterov, Y.: Introductory lectures on convex optimization: a basic course (2004)

28.

Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)MathSciNetCrossRef

29.

Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)MathSciNetCrossRef

30.

Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, vol. 317. Springer, Berlin (1998)

31.

Wang, X., He, X., Wang, M., Feng, F., Chua, T.S.: Neural graph collaborative filtering. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165–174 (2019)

32.

Wu, Y., Poczos, B., Singh, A.: Towards understanding the generalization bias of two layer convolutional linear classifiers with gradient descent. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1070–1078. PMLR (2019)

33.

Yun, C., Sra, S., Jadbabaie, A.: Global optimality conditions for deep neural networks. In: International Conference on Learning Representations (2018)

Title: Bregman Proximal Gradient Algorithms for Deep Matrix Factorization
Authors: Mahesh Chandra Mukkamala
Felix Westerkamp
Emanuel Laude
Daniel Cremers
Peter Ochs
Publisher: Springer International Publishing
Book: Scale Space and Variational Methods in Computer Vision
Print ISBN: 978-3-030-75548-5

Electronic ISBN: 978-3-030-75549-2

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-75549-2_17

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner