Skip to main content

2021 | OriginalPaper | Buchkapitel

Bregman Proximal Gradient Algorithms for Deep Matrix Factorization

verfasst von : Mahesh Chandra Mukkamala, Felix Westerkamp, Emanuel Laude, Daniel Cremers, Peter Ochs

Erschienen in: Scale Space and Variational Methods in Computer Vision

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A typical assumption for the convergence of first order optimization methods is the Lipschitz continuity of the gradient of the objective function. However, for many practical applications this assumption is violated. To overcome this issue extensions based on generalized proximity measures, known as Bregman distances, were introduced. This initiated the development of the Bregman Proximal Gradient (BPG) algorithms, which, however, rely on problem dependent Bregman distances. In this paper, we develop Bregman distances for deep matrix factorization problems, which yields a BPG algorithm with theoretical convergence guarantees, while allowing for a constant step size strategy. Moreover, we demonstrate that the algorithms based on the developed Bregman distance outperform their Euclidean counterparts as well as alternating minimization based approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Arora, S., Cohen, N., Hu, W., Luo, Y.: Implicit regularization in deep matrix factorization. In: Advances in Neural Information Processing Systems, pp. 7413–7424 (2019) Arora, S., Cohen, N., Hu, W., Luo, Y.: Implicit regularization in deep matrix factorization. In: Advances in Neural Information Processing Systems, pp. 7413–7424 (2019)
2.
Zurück zum Zitat Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)MathSciNetCrossRef Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)MathSciNetCrossRef
3.
Zurück zum Zitat Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)MathSciNetCrossRef Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)MathSciNetCrossRef
4.
Zurück zum Zitat Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)MathSciNetCrossRef Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)MathSciNetCrossRef
5.
Zurück zum Zitat Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)MathSciNetCrossRef Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3), 167–175 (2003)MathSciNetCrossRef
7.
Zurück zum Zitat Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)MathSciNetCrossRef Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)MathSciNetCrossRef
9.
Zurück zum Zitat Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)MathSciNetCrossRef Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)MathSciNetCrossRef
10.
Zurück zum Zitat Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: Artificial Intelligence and Statistics, pp. 192–204 (2015) Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: Artificial Intelligence and Statistics, pp. 192–204 (2015)
11.
Zurück zum Zitat Davis, D., Drusvyatskiy, D., MacPhee, K.J.: Stochastic model-based minimization under high-order growth. arxiv preprint arXiv:1807.00255 (2018) Davis, D., Drusvyatskiy, D., MacPhee, K.J.: Stochastic model-based minimization under high-order growth. arxiv preprint arXiv:​1807.​00255 (2018)
12.
Zurück zum Zitat Dragomir, R.A., d’Aspremont, A., Bolte, J.: Quartic first-order methods for low rank minimization. arxiv preprint arXiv:1901.10791 (2019) Dragomir, R.A., d’Aspremont, A., Bolte, J.: Quartic first-order methods for low rank minimization. arxiv preprint arXiv:​1901.​10791 (2019)
13.
Zurück zum Zitat Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)MathSciNetMATH Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)MathSciNetMATH
14.
Zurück zum Zitat Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)MATH Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)MATH
15.
16.
Zurück zum Zitat Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 19 (2016) Harper, F.M., Konstan, J.A.: The movielens datasets: history and context. ACM Trans. Interact. Intell. Syst. (TIIS) 5(4), 19 (2016)
17.
Zurück zum Zitat Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016) Kawaguchi, K.: Deep learning without poor local minima. In: Advances in Neural Information Processing Systems, pp. 586–594 (2016)
19.
20.
Zurück zum Zitat Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRef Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRef
21.
Zurück zum Zitat Li, Q., Zhu, Z., Tang, G., Wakin, M.B.: Provable Bregman-divergence based methods for nonconvex and non-lipschitz problems. arXiv preprint arXiv:1904.09712 (2019) Li, Q., Zhu, Z., Tang, G., Wakin, M.B.: Provable Bregman-divergence based methods for nonconvex and non-lipschitz problems. arXiv preprint arXiv:​1904.​09712 (2019)
22.
Zurück zum Zitat Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)MathSciNetCrossRef Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)MathSciNetCrossRef
23.
Zurück zum Zitat Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)MathSciNetCrossRef Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)MathSciNetCrossRef
24.
Zurück zum Zitat Monti, F., Bronstein, M.M., Bresson, X.: Geometric matrix completion with recurrent multi-graph neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3700–3710 (2017) Monti, F., Bronstein, M.M., Bresson, X.: Geometric matrix completion with recurrent multi-graph neural networks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3700–3710 (2017)
25.
Zurück zum Zitat Mukkamala, M.C., Ochs, P.: Beyond alternating updates for matrix factorization with inertial Bregman proximal gradient algorithms. In: Advances in Neural Information Processing Systems, pp. 4266–4276 (2019) Mukkamala, M.C., Ochs, P.: Beyond alternating updates for matrix factorization with inertial Bregman proximal gradient algorithms. In: Advances in Neural Information Processing Systems, pp. 4266–4276 (2019)
26.
Zurück zum Zitat Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-Concave backtracking for inertial Bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math. Data Sci. 2(3), 658–682 (2020)MathSciNetCrossRef Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-Concave backtracking for inertial Bregman proximal gradient algorithms in nonconvex optimization. SIAM J. Math. Data Sci. 2(3), 658–682 (2020)MathSciNetCrossRef
27.
Zurück zum Zitat Nesterov, Y.: Introductory lectures on convex optimization: a basic course (2004) Nesterov, Y.: Introductory lectures on convex optimization: a basic course (2004)
28.
Zurück zum Zitat Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)MathSciNetCrossRef Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)MathSciNetCrossRef
29.
Zurück zum Zitat Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)MathSciNetCrossRef Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9(4), 1756–1787 (2016)MathSciNetCrossRef
30.
Zurück zum Zitat Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, vol. 317. Springer, Berlin (1998) Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, vol. 317. Springer, Berlin (1998)
31.
Zurück zum Zitat Wang, X., He, X., Wang, M., Feng, F., Chua, T.S.: Neural graph collaborative filtering. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165–174 (2019) Wang, X., He, X., Wang, M., Feng, F., Chua, T.S.: Neural graph collaborative filtering. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165–174 (2019)
32.
Zurück zum Zitat Wu, Y., Poczos, B., Singh, A.: Towards understanding the generalization bias of two layer convolutional linear classifiers with gradient descent. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1070–1078. PMLR (2019) Wu, Y., Poczos, B., Singh, A.: Towards understanding the generalization bias of two layer convolutional linear classifiers with gradient descent. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1070–1078. PMLR (2019)
33.
Zurück zum Zitat Yun, C., Sra, S., Jadbabaie, A.: Global optimality conditions for deep neural networks. In: International Conference on Learning Representations (2018) Yun, C., Sra, S., Jadbabaie, A.: Global optimality conditions for deep neural networks. In: International Conference on Learning Representations (2018)
Metadaten
Titel
Bregman Proximal Gradient Algorithms for Deep Matrix Factorization
verfasst von
Mahesh Chandra Mukkamala
Felix Westerkamp
Emanuel Laude
Daniel Cremers
Peter Ochs
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-75549-2_17