nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

7. Gradient Descent Converges to Minimizers: Optimal and Adaptive Step-Size Rules

verfasst von : Bin Shi, S. S. Iyengar

Erschienen in: Mathematical Theories of Machine Learning - Theory and Applications

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As mentioned in Chap. 3, gradient descent (GD) and its variants provide the core optimization methodology in machine learning problems. Given a C ¹ or C ² function $f: \mathbb {R}^{n} \rightarrow \mathbb {R}$ with unconstrained variable $x \in \mathbb {R}^{n}$, GD uses the following update rule:

$$\displaystyle x_{t+1} = x_{t} - h_t \nabla f\left (x_t\right ), $$

where h _t are step size, which may be either fixed or vary across iterations. When f is convex, $h_t < \frac {2}{L}$ is a necessary and sufficient condition to guarantee the (worst-case) convergence of GD, where L is the Lipschitz constant of the gradient of the function f. On the other hand, there is far less understanding of GD for non-convex problems. For general smooth non-convex problems, GD is only known to converge to a stationary point (i.e., a point with zero gradient).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Related Work on Geometry of Non-Convex Programs

Nächstes Kapitel A Conservation Law Method Based on Optimization

For the purpose of this paper, strict saddle points include local maximizers.

f ⁿ(x) means the application of f on x repetitively for n times.

[AAB+17]

N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, T. Ma, Finding approximate local minima faster than gradient descent, in STOC (2017), pp. 1195–1199. http://arxiv.org/abs/1611.01146

[BNS16]

S. Bhojanapalli, B. Neyshabur, N. Srebro, Global optimality of local search for low rank matrix recovery, in Advances in Neural Information Processing Systems (2016), pp. 3873–3881

[CD16]

Y. Carmon, J.C. Duchi, Gradient descent efficiently finds the cubic-regularized non-convex Newton step. arXiv preprint arXiv:1612.00547 (2016)

[CDHS16]

Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Accelerated methods for non-convex optimization. arXiv preprint arXiv:1611.00756 (2016)

[CRS14]

F.E. Curtis, D.P. Robinson, M. Samadi, A trust region algorithm with a worst-case iteration complexity of O(𝜖 ^−3∕2) for nonconvex optimization. Math. Program. 162(1–2), 1–32 (2014)MathSciNetMATH

[DJL+17]

S.S. Du, C. Jin, J.D. Lee, M.I. Jordan, B. Poczos, A. Singh, Gradient descent can take exponential time to escape saddle points, in Proceedings of Advances in Neural Information Processing Systems (NIPS) (2017), pp. 1067–1077

[GHJY15]

R. Ge, F. Huang, C. Jin, Y. Yuan, Escaping from saddle points—online stochastic gradient for tensor decomposition, in Proceedings of the 28th Conference on Learning Theory (2015), pp. 797–842

[GJZ17]

R. Ge, C. Jin, Y. Zheng, No spurious local minima in nonconvex low rank problems: a unified geometric analysis, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1233–1242

[GLM16]

R. Ge, J.D. Lee, T. Ma, Matrix completion has no spurious local minimum, in Advances in Neural Information Processing Systems (2016), pp. 2973–2981

[GM74]

P.E. Gill, W. Murray, Newton-type methods for unconstrained and linearly constrained optimization. Math. Program. 7(1), 311–350 (1974)MathSciNetMATHCrossRef

[Har71]

P. Hartman, The stable manifold of a point of a hyperbolic map of a banach space. J. Differ. Equ. 9(2), 360–379 (1971)MathSciNetMATHCrossRef

[Har82]

P. Hartman, Ordinary Differential Equations, Classics in Applied Mathematics, vol. 38 (Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 2002). Corrected reprint of the second (1982) edition 1982

[JGN+17]

C. Jin, R. Ge, P. Netrapalli, S.M. Kakade, M.I. Jordan, How to escape saddle points efficiently, in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1724–1732

[JNJ17]

C. Jin, P. Netrapalli, M.I. Jordan, Accelerated gradient descent escapes saddle points faster than gradient descent. arXiv preprint arXiv:1711.10456 (2017)

[LPP+17]

J.D. Lee, I. Panageas, G. Piliouras, M. Simchowitz, M.I. Jordan, B. Recht, First-order methods almost always avoid saddle points. arXiv preprint arXiv:1710.07406 (2017)

[LSJR16]

J.D. Lee, M. Simchowitz, M.I. Jordan, B. Recht, Gradient descent only converges to minimizers, in Conference on Learning Theory (2016), pp. 1246–1257

[LWL+16]

X. Li, Z. Wang, J. Lu, R. Arora, J. Haupt, H. Liu, T. Zhao, Symmetry, saddle points, and global geometry of nonconvex matrix factorization. arXiv preprint arXiv:1612.09296 (2016)

[LY17]

M. Liu, T. Yang, On noisy negative curvature descent: competing with gradient descent for faster non-convex optimization. arXiv preprint arXiv:1709.08571 (2017)

[MS79]

J.J. Moré, D.C. Sorensen, On the use of directions of negative curvature in a modified newton method. Math. Program. 16(1), 1–20 (1979)MathSciNetMATHCrossRef

[Nes13]

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer, Berlin, 2013)MATH

[NP06]

Y. Nesterov, B.T. Polyak, Cubic regularization of newton method and its global performance. Math. Program. 108(1), 177–205 (2006)MathSciNetMATHCrossRef

[OW17]

M. O’Neill, S.J. Wright, Behavior of accelerated gradient methods near critical points of nonconvex problems. arXiv preprint arXiv:1706.07993 (2017)

[Pem90]

R. Pemantle, Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18(2), 698–712 (1990)MathSciNetMATHCrossRef

[PKCS17]

D. Park, A. Kyrillidis, C. Carmanis, S. Sanghavi, Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach, in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (2017), pp. 65–74

[PP16]

I. Panageas, G. Piliouras, Gradient descent only converges to minimizers: non-isolated critical points and invariant regions. arXiv preprint arXiv:1605.00405 (2016)

[RW17]

C.W. Royer, S.J. Wright, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. arXiv preprint arXiv:1706.03131 (2017)

[RZS+17]

S.J. Reddi, M. Zaheer, S. Sra, B. Poczos, F. Bach, R. Salakhutdinov, A.J. Smola, A generic approach for escaping saddle points. arXiv preprint arXiv:1709.01434 (2017)

[Shu13]

M. Shub, Global Stability of Dynamical Systems (Springer, Berlin, 2013)

[SQW16]

J. Sun, Q. Qu, J. Wright, A geometric analysis of phase retrieval, in 2016 IEEE International Symposium on Information Theory (ISIT) (IEEE, Piscataway, 2016), pp. 2379–2383

[SQW17]

J. Sun, Q. Qu, J. Wright, Complete dictionary recovery over the sphere I: overview and the geometric picture. IEEE Trans. Inf. Theory 63(2), 853–884 (2017)MathSciNetMATHCrossRef

Titel: Gradient Descent Converges to Minimizers: Optimal and Adaptive Step-Size Rules
verfasst von: Bin Shi
S. S. Iyengar
Verlag: Springer International Publishing
Buch: Mathematical Theories of Machine Learning - Theory and Applications
Print ISBN: 978-3-030-17075-2

Electronic ISBN: 978-3-030-17076-9

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-17076-9_7

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Frank Urbansky/© Peter Eichler / Leipzig, CO2-Fußabdruck/© Jenny Sturm / stock.adobe.com, Interview Entropie Bild 1/© Bernhard Weßling, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.