Skip to main content
main-content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

2022 | OriginalPaper | Buchkapitel

Recent Theoretical Advances in Non-Convex Optimization

verfasst von: Marina Danilova, Pavel Dvurechensky, Alexander Gasnikov, Eduard Gorbunov, Sergey Guminov, Dmitry Kamzolov, Innokentiy Shibaev

Erschienen in: High-Dimensional Optimization and Probability

Verlag: Springer International Publishing

share
TEILEN

Abstract

Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex problems could not be solved efficiently in a reasonable time. Then we give a list of problems that can be solved efficiently to find the global minimizer by exploiting the structure of the problem as much as it is possible. Another way to deal with non-convexity is to relax the goal from finding the global minimum to finding a stationary point or a local minimum. For this setting, we first present known results for the convergence rates of deterministic first-order methods, which are then followed by a general theoretical analysis of optimal stochastic and randomized gradient schemes, and an overview of the stochastic first-order methods. After that, we discuss quite general classes of non-convex problems, such as minimization of α-weakly quasi-convex functions and functions that satisfy Polyak–Łojasiewicz condition, which still allow obtaining theoretical convergence guarantees of first-order methods. Then we consider higher-order and zeroth-order/derivative-free methods and their convergence rates for non-convex optimization problems.
Fußnoten
1
See also this webpage with the list of references being updated https://​sunju.​org/​research/​nonconvex/​.
 
2
By ϕ(a), where \(a = (a_1,\ldots ,a_n)^\top \in {\mathbb R}^n\) is multidimensional vector, we mean vector (ϕ(a1), …, ϕ(an)).
 
3
Here \(\mathbb {E}_{\xi _k}[\cdot ]\) is a mathematical expectation conditioned on everything despite ξk, i.e., expectation is taken w.r.t. the randomness coming only from ξk.
 
4
In the original paper [160], the authors considered more general situation when stochastic realizations f(x, ξ) have Hölder-continuous gradients.
 
5
This technique is applied in distributed optimization to reduce the overall communication cost (e.g., see [4, 27, 113]). However, methods for distributed optimization are out of scope of our survey.
 
6
For simplicity, we neglect all parameters except m and ε, see the details in Table 2.
 
7
To distinguish exponents from superindexes, we use braces (⋅) for exponents.
 
8
In fact, most of the results from [118] do not rely on the finite-sum structure of f.
 
Literatur
1.
Zurück zum Zitat N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, T. Ma, Finding approximate local minima faster than gradient descent, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199 (2017) N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, T. Ma, Finding approximate local minima faster than gradient descent, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199 (2017)
2.
Zurück zum Zitat K. Ahn, C. Yun, S. Sra, Sgd with shuffling: optimal rates without component convexity and large epoch requirements. Adv. Neural Inf. Process. Syst. 33 (2020) K. Ahn, C. Yun, S. Sra, Sgd with shuffling: optimal rates without component convexity and large epoch requirements. Adv. Neural Inf. Process. Syst. 33 (2020)
3.
Zurück zum Zitat A. Ajalloeian, S.U. Stich, Analysis of sgd with biased gradient estimators. Preprint (2020). arXiv:2008.00051 A. Ajalloeian, S.U. Stich, Analysis of sgd with biased gradient estimators. Preprint (2020). arXiv:2008.00051
4.
Zurück zum Zitat D. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnovic, Qsgd: Communication-efficient sgd via gradient quantization and encoding. Adv. Neural Inf. Process. Syst. 1709–1720 (2017) D. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnovic, Qsgd: Communication-efficient sgd via gradient quantization and encoding. Adv. Neural Inf. Process. Syst. 1709–1720 (2017)
5.
Zurück zum Zitat Z. Allen-Zhu, Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter, in International Conference on Machine Learning, pp. 89–97 (2017) Z. Allen-Zhu, Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter, in International Conference on Machine Learning, pp. 89–97 (2017)
6.
Zurück zum Zitat Z. Allen-Zhu, How to make the gradients small stochastically: Even faster convex and nonconvex sgd, in Advances in Neural Information Processing Systems, pp. 1157–1167 (2018) Z. Allen-Zhu, How to make the gradients small stochastically: Even faster convex and nonconvex sgd, in Advances in Neural Information Processing Systems, pp. 1157–1167 (2018)
7.
Zurück zum Zitat Z. Allen-Zhu, Katyusha x: Simple momentum method for stochastic sum-of-nonconvex optimization, in International Conference on Machine Learning, pp. 179–185 (2018) Z. Allen-Zhu, Katyusha x: Simple momentum method for stochastic sum-of-nonconvex optimization, in International Conference on Machine Learning, pp. 179–185 (2018)
8.
Zurück zum Zitat Z. Allen-Zhu, Natasha 2: Faster non-convex optimization than sgd, in Advances in Neural Information Processing Systems, pp. 2675–2686 (2018) Z. Allen-Zhu, Natasha 2: Faster non-convex optimization than sgd, in Advances in Neural Information Processing Systems, pp. 2675–2686 (2018)
9.
Zurück zum Zitat Z. Allen-Zhu, Y. Li, Neon2: Finding local minima via first-order oracles, in Advances in Neural Information Processing Systems, pp. 3716–3726 (2018) Z. Allen-Zhu, Y. Li, Neon2: Finding local minima via first-order oracles, in Advances in Neural Information Processing Systems, pp. 3716–3726 (2018)
10.
Zurück zum Zitat Z. Allen-Zhu, Y. Li, Can sgd learn recurrent neural networks with provable generalization? in Advances in Neural Information Processing Systems, pp. 10331–10341 (2019) Z. Allen-Zhu, Y. Li, Can sgd learn recurrent neural networks with provable generalization? in Advances in Neural Information Processing Systems, pp. 10331–10341 (2019)
11.
Zurück zum Zitat Z. Allen-Zhu, Y. Li, Y. Liang, Learning and generalization in overparameterized neural networks, going beyond two layers, in Advances in Neural Information Processing Systems, pp. 6158–6169 (2019) Z. Allen-Zhu, Y. Li, Y. Liang, Learning and generalization in overparameterized neural networks, going beyond two layers, in Advances in Neural Information Processing Systems, pp. 6158–6169 (2019)
12.
Zurück zum Zitat Z. Allen-Zhu, Y. Li, Z. Song, A convergence theory for deep learning via over-parameterization, in International Conference on Machine Learning, pp. 242–252 (PMLR, 2019) Z. Allen-Zhu, Y. Li, Z. Song, A convergence theory for deep learning via over-parameterization, in International Conference on Machine Learning, pp. 242–252 (PMLR, 2019)
13.
Zurück zum Zitat Z. Allen-Zhu, Y. Li, Z. Song, On the convergence rate of training recurrent neural networks, in Advances in Neural Information Processing Systems, pp. 6676–6688 (2019) Z. Allen-Zhu, Y. Li, Z. Song, On the convergence rate of training recurrent neural networks, in Advances in Neural Information Processing Systems, pp. 6676–6688 (2019)
14.
Zurück zum Zitat A. Anandkumar, R. Ge, Efficient approaches for escaping higher order saddle points in non-convex optimization, in Conference on Learning Theory, pp. 81–102 (PMLR, 2016) A. Anandkumar, R. Ge, Efficient approaches for escaping higher order saddle points in non-convex optimization, in Conference on Learning Theory, pp. 81–102 (PMLR, 2016)
15.
Zurück zum Zitat Y. Arjevani, Y. Carmon, J.C. Duchi, D.J. Foster, N. Srebro, B. Woodworth, Lower bounds for non-convex stochastic optimization. Preprint (2019). arXiv:1912.02365 Y. Arjevani, Y. Carmon, J.C. Duchi, D.J. Foster, N. Srebro, B. Woodworth, Lower bounds for non-convex stochastic optimization. Preprint (2019). arXiv:1912.02365
16.
Zurück zum Zitat Y. Arjevani, Y. Carmon, J.C. Duchi, D.J. Foster, A. Sekhari, K. Sridharan, Second-order information in non-convex stochastic optimization: Power and limitations, in Conference on Learning Theory, pp. 242–299 (2020) Y. Arjevani, Y. Carmon, J.C. Duchi, D.J. Foster, A. Sekhari, K. Sridharan, Second-order information in non-convex stochastic optimization: Power and limitations, in Conference on Learning Theory, pp. 242–299 (2020)
17.
Zurück zum Zitat S. Arora, N. Cohen, N. Golowich, W. Hu, A convergence analysis of gradient descent for deep linear neural networks. Preprint (2018). arXiv:1810.02281 S. Arora, N. Cohen, N. Golowich, W. Hu, A convergence analysis of gradient descent for deep linear neural networks. Preprint (2018). arXiv:1810.02281
18.
Zurück zum Zitat F. Bach, R. Jenatton, J. Mairal, G. Obozinski, et al., Optimization with sparsity-inducing penalties. Found. Trends® Mach. Learn. 4(1), 1–106 (2012) F. Bach, R. Jenatton, J. Mairal, G. Obozinski, et al., Optimization with sparsity-inducing penalties. Found. Trends® Mach. Learn. 4(1), 1–106 (2012)
19.
Zurück zum Zitat R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices. Constructive Approximation 28(3), 253–263 (2008) MathSciNetMATHCrossRef R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices. Constructive Approximation 28(3), 253–263 (2008) MathSciNetMATHCrossRef
20.
Zurück zum Zitat A. Bazarova, A. Beznosikov, A. Gasnikov, Linearly convergent gradient-free methods for minimization of symmetric parabolic approximation. Preprint (2020). arXiv:2009.04906 A. Bazarova, A. Beznosikov, A. Gasnikov, Linearly convergent gradient-free methods for minimization of symmetric parabolic approximation. Preprint (2020). arXiv:2009.04906
21.
Zurück zum Zitat M. Belkin, Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica 30, 203–248 (2021) MathSciNetCrossRef M. Belkin, Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica 30, 203–248 (2021) MathSciNetCrossRef
22.
Zurück zum Zitat A. Ben-Tal, A. Nemirovski, Lectures on Modern Convex Optimization (Society for Industrial and Applied Mathematics, 2001) A. Ben-Tal, A. Nemirovski, Lectures on Modern Convex Optimization (Society for Industrial and Applied Mathematics, 2001)
23.
Zurück zum Zitat A.S. Berahas, L. Cao, K. Choromanski, K. Scheinberg, Linear interpolation gives better gradients than gaussian smoothing in derivative-free optimization (2019) A.S. Berahas, L. Cao, K. Choromanski, K. Scheinberg, Linear interpolation gives better gradients than gaussian smoothing in derivative-free optimization (2019)
24.
Zurück zum Zitat A.S. Berahas, L. Cao, K. Scheinberg, Global convergence rate analysis of a generic line search algorithm with noise (2019) A.S. Berahas, L. Cao, K. Scheinberg, Global convergence rate analysis of a generic line search algorithm with noise (2019)
25.
Zurück zum Zitat A.S. Berahas, L. Cao, K. Choromanski, K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization (2020) A.S. Berahas, L. Cao, K. Choromanski, K. Scheinberg, A theoretical and empirical comparison of gradient approximations in derivative-free optimization (2020)
26.
Zurück zum Zitat E.H. Bergou, E. Gorbunov, P. Richtárik, Stochastic three points method for unconstrained smooth minimization (2019) E.H. Bergou, E. Gorbunov, P. Richtárik, Stochastic three points method for unconstrained smooth minimization (2019)
27.
Zurück zum Zitat A. Beznosikov, S. Horváth, P. Richtárik, M. Safaryan, On biased compression for distributed learning. Preprint (2020). arXiv:2002.12410 A. Beznosikov, S. Horváth, P. Richtárik, M. Safaryan, On biased compression for distributed learning. Preprint (2020). arXiv:2002.12410
28.
Zurück zum Zitat S. Bhojanapalli, A. Kyrillidis, S. Sanghavi, Dropping convexity for faster semi-definite optimization, in Conference on Learning Theory, pp. 530–582 (2016) S. Bhojanapalli, A. Kyrillidis, S. Sanghavi, Dropping convexity for faster semi-definite optimization, in Conference on Learning Theory, pp. 530–582 (2016)
29.
Zurück zum Zitat E.G. Birgin, J. Gardenghi, J.M. Martínez, S.A. Santos, P.L. Toint, Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Mathematical Programming 163(1–2), 359–368 (2017) MathSciNetMATHCrossRef E.G. Birgin, J. Gardenghi, J.M. Martínez, S.A. Santos, P.L. Toint, Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Mathematical Programming 163(1–2), 359–368 (2017) MathSciNetMATHCrossRef
30.
Zurück zum Zitat A. Blum, J. Hopcroft, R. Kannan, Foundations of Data Science (Cambridge University Press, 2016) A. Blum, J. Hopcroft, R. Kannan, Foundations of Data Science (Cambridge University Press, 2016)
31.
Zurück zum Zitat A. Blum, R.L. Rivest, Training a 3-node neural network is np-complete, in Advances in Neural Information Processing Systems, pp. 494–501 (1989) A. Blum, R.L. Rivest, Training a 3-node neural network is np-complete, in Advances in Neural Information Processing Systems, pp. 494–501 (1989)
32.
Zurück zum Zitat T. Blumensath, M.E. Davies, Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal. 27(3), 265–274 (2009) MathSciNetMATHCrossRef T. Blumensath, M.E. Davies, Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal. 27(3), 265–274 (2009) MathSciNetMATHCrossRef
33.
Zurück zum Zitat L. Bogolubsky, P. Dvurechensky, A. Gasnikov, G. Gusev, Y. Nesterov, A.M. Raigorodskii, A. Tikhonov, M. Zhukovskii, Learning supervised pagerank with gradient-based and gradient-free optimization methods, in Advances in Neural Information Processing Systems 29, ed. by D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett (Curran Associates, Inc., 2016), pp. 4914–4922. arXiv:1603.00717 L. Bogolubsky, P. Dvurechensky, A. Gasnikov, G. Gusev, Y. Nesterov, A.M. Raigorodskii, A. Tikhonov, M. Zhukovskii, Learning supervised pagerank with gradient-based and gradient-free optimization methods, in Advances in Neural Information Processing Systems 29, ed. by D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett (Curran Associates, Inc., 2016), pp. 4914–4922. arXiv:1603.00717
34.
Zurück zum Zitat L. Bottou, Curiously fast convergence of some stochastic gradient descent algorithms, in Proceedings of the Symposium on Learning and Data Science, Paris (2009) L. Bottou, Curiously fast convergence of some stochastic gradient descent algorithms, in Proceedings of the Symposium on Learning and Data Science, Paris (2009)
35.
Zurück zum Zitat L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of COMPSTAT’2010 (Springer, 2010), pp. 177–186 L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proceedings of COMPSTAT’2010 (Springer, 2010), pp. 177–186
36.
Zurück zum Zitat L. Bottou, Stochastic gradient descent tricks, in Neural Networks: Tricks of the Trade (Springer, 2012), pp. 421–436 L. Bottou, Stochastic gradient descent tricks, in Neural Networks: Tricks of the Trade (Springer, 2012), pp. 421–436
37.
Zurück zum Zitat L. Bottou, F.E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning. SIAM Review 60(2), 223–311 (2018) MathSciNetMATHCrossRef L. Bottou, F.E. Curtis, J. Nocedal, Optimization methods for large-scale machine learning. SIAM Review 60(2), 223–311 (2018) MathSciNetMATHCrossRef
38.
Zurück zum Zitat S. Boyd, L. Vandenberghe, Convex Optimization (NY Cambridge University Press, 2004) S. Boyd, L. Vandenberghe, Convex Optimization (NY Cambridge University Press, 2004)
39.
Zurück zum Zitat J. Bu, M. Mesbahi, A note on Nesterov’s accelerated method in nonconvex optimization: a weak estimate sequence approach. Preprint (2020). arXiv:2006.08548 J. Bu, M. Mesbahi, A note on Nesterov’s accelerated method in nonconvex optimization: a weak estimate sequence approach. Preprint (2020). arXiv:2006.08548
40.
Zurück zum Zitat S. Bubeck, Introduction to online optimization (2011) S. Bubeck, Introduction to online optimization (2011)
41.
Zurück zum Zitat S. Bubeck, Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (Nov 2015) MATHCrossRef S. Bubeck, Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn. 8(3–4), 231–357 (Nov 2015) MATHCrossRef
42.
Zurück zum Zitat E.J. Candès, B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717 (2009) E.J. Candès, B. Recht, Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717 (2009)
44.
Zurück zum Zitat E.J. Candès, T. Tao, The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010) MathSciNetMATHCrossRef E.J. Candès, T. Tao, The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theory 56(5), 2053–2080 (2010) MathSciNetMATHCrossRef
45.
Zurück zum Zitat E.J. Candès, M.B. Wakin, S.P. Boyd, Enhancing sparsity by reweighted ℓ 1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008) MathSciNetMATHCrossRef E.J. Candès, M.B. Wakin, S.P. Boyd, Enhancing sparsity by reweighted 1 minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008) MathSciNetMATHCrossRef
46.
Zurück zum Zitat E.J. Candès, X. Li, M. Soltanolkotabi, Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015) MathSciNetMATHCrossRef E.J. Candès, X. Li, M. Soltanolkotabi, Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015) MathSciNetMATHCrossRef
47.
Zurück zum Zitat Y. Carmon, J.C. Duchi, Gradient descent efficiently finds the cubic-regularized non-convex newton step. Preprint (2016). arXiv:1612.00547 Y. Carmon, J.C. Duchi, Gradient descent efficiently finds the cubic-regularized non-convex newton step. Preprint (2016). arXiv:1612.00547
48.
Zurück zum Zitat Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, “Convex until proven guilty”: Dimension-free acceleration of gradient descent on non-convex functions, in Proceedings of Machine Learning Research, vol. 70 (International Convention Centre, Sydney, Australia, 06–11 Aug 2017), pp. 654–663. PMLR Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, “Convex until proven guilty”: Dimension-free acceleration of gradient descent on non-convex functions, in Proceedings of Machine Learning Research, vol. 70 (International Convention Centre, Sydney, Australia, 06–11 Aug 2017), pp. 654–663. PMLR
49.
Zurück zum Zitat Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018) MathSciNetMATHCrossRef Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018) MathSciNetMATHCrossRef
50.
Zurück zum Zitat Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Lower bounds for finding stationary points II: first-order methods. Mathematical Programming (Sep 2019) Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Lower bounds for finding stationary points II: first-order methods. Mathematical Programming (Sep 2019)
51.
Zurück zum Zitat Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Lower bounds for finding stationary points i. Mathematical Programming 184(1), 71–120 (Nov 2020) MathSciNetMATHCrossRef Y. Carmon, J.C. Duchi, O. Hinder, A. Sidford, Lower bounds for finding stationary points i. Mathematical Programming 184(1), 71–120 (Nov 2020) MathSciNetMATHCrossRef
52.
Zurück zum Zitat C. Cartis, N.I. Gould, P.L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. part i: motivation, convergence and numerical results. Mathematical Programming 127(2), 245–295 (2011) C. Cartis, N.I. Gould, P.L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. part i: motivation, convergence and numerical results. Mathematical Programming 127(2), 245–295 (2011)
53.
Zurück zum Zitat C. Cartis, N.I.M. Gould, P.L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. part ii: worst-case function- and derivative-evaluation complexity. Mathematical Programming 130(2), 295–319 (2011) C. Cartis, N.I.M. Gould, P.L. Toint, Adaptive cubic regularisation methods for unconstrained optimization. part ii: worst-case function- and derivative-evaluation complexity. Mathematical Programming 130(2), 295–319 (2011)
54.
Zurück zum Zitat C. Cartis, N.I.M. Gould, P.L. Toint, Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models (2018). arXiv:1708.04044 C. Cartis, N.I.M. Gould, P.L. Toint, Improved second-order evaluation complexity for unconstrained nonlinear optimization using high-order regularized models (2018). arXiv:1708.04044
55.
Zurück zum Zitat C. Cartis, N.I. Gould, P.L. Toint, Universal regularization methods: Varying the power, the smoothness and the accuracy. SIAM J. Optim. 29(1), 595–615 (2019) MathSciNetMATHCrossRef C. Cartis, N.I. Gould, P.L. Toint, Universal regularization methods: Varying the power, the smoothness and the accuracy. SIAM J. Optim. 29(1), 595–615 (2019) MathSciNetMATHCrossRef
56.
Zurück zum Zitat V. Charisopoulos, A.R. Benson, A. Damle, Entrywise convergence of iterative methods for eigenproblems. Preprint (2020). arXiv:2002.08491 V. Charisopoulos, A.R. Benson, A. Damle, Entrywise convergence of iterative methods for eigenproblems. Preprint (2020). arXiv:2002.08491
57.
Zurück zum Zitat Y. Chen, Y. Chi, Harnessing structures in big data via guaranteed low-rank matrix estimation. Preprint (2018). arXiv:1802.08397 Y. Chen, Y. Chi, Harnessing structures in big data via guaranteed low-rank matrix estimation. Preprint (2018). arXiv:1802.08397
58.
Zurück zum Zitat Z. Chen, T. Yang, A variance reduction method for non-convex optimization with improved convergence under large condition number. Preprint (2018). arXiv:1809.06754 Z. Chen, T. Yang, A variance reduction method for non-convex optimization with improved convergence under large condition number. Preprint (2018). arXiv:1809.06754
59.
Zurück zum Zitat Z. Chen, Y. Zhou, Momentum with variance reduction for nonconvex composition optimization. Preprint (2020). arXiv:2005.07755 Z. Chen, Y. Zhou, Momentum with variance reduction for nonconvex composition optimization. Preprint (2020). arXiv:2005.07755
60.
Zurück zum Zitat X. Chen, S. Liu, R. Sun, M. Hong, On the convergence of a class of adam-type algorithms for non-convex optimization. Preprint (2018). arXiv:1808.02941 X. Chen, S. Liu, R. Sun, M. Hong, On the convergence of a class of adam-type algorithms for non-convex optimization. Preprint (2018). arXiv:1808.02941
61.
Zurück zum Zitat Y. Chen, Y. Chi, J. Fan, C. Ma, Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Mathematical Programming 176(1–2), 5–37 (2019) MathSciNetMATHCrossRef Y. Chen, Y. Chi, J. Fan, C. Ma, Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Mathematical Programming 176(1–2), 5–37 (2019) MathSciNetMATHCrossRef
62.
Zurück zum Zitat Y. Chi, Y.M. Lu, Y. Chen, Nonconvex optimization meets low-rank matrix factorization: An overview. Preprint (2018). arXiv:1809.09573 Y. Chi, Y.M. Lu, Y. Chen, Nonconvex optimization meets low-rank matrix factorization: An overview. Preprint (2018). arXiv:1809.09573
64.
Zurück zum Zitat P.L. Combettes, J.-C. Pesquet, Proximal splitting methods in signal processing, in Fixed-Point Algorithms for Inverse Problems in Science and Engineering (Springer, 2011), pp. 185–212 P.L. Combettes, J.-C. Pesquet, Proximal splitting methods in signal processing, in Fixed-Point Algorithms for Inverse Problems in Science and Engineering (Springer, 2011), pp. 185–212
65.
Zurück zum Zitat A. Conn, N. Gould, P. Toint, Trust Region Methods (Society for Industrial and Applied Mathematics, 2000) A. Conn, N. Gould, P. Toint, Trust Region Methods (Society for Industrial and Applied Mathematics, 2000)
66.
Zurück zum Zitat A. Conn, K. Scheinberg, L. Vicente, Introduction to Derivative-Free Optimization (Society for Industrial and Applied Mathematics, 2009) A. Conn, K. Scheinberg, L. Vicente, Introduction to Derivative-Free Optimization (Society for Industrial and Applied Mathematics, 2009)
67.
Zurück zum Zitat F.E. Curtis, K. Scheinberg, Optimization methods for supervised machine learning: From linear models to deep learning. Preprint (2017). arXiv:1706.10207 F.E. Curtis, K. Scheinberg, Optimization methods for supervised machine learning: From linear models to deep learning. Preprint (2017). arXiv:1706.10207
68.
Zurück zum Zitat A. Cutkosky, F. Orabona, Momentum-based variance reduction in non-convex sgd, in Advances in Neural Information Processing Systems, pp. 15236–15245 (2019) A. Cutkosky, F. Orabona, Momentum-based variance reduction in non-convex sgd, in Advances in Neural Information Processing Systems, pp. 15236–15245 (2019)
69.
Zurück zum Zitat C.D. Dang, G. Lan, Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015) MathSciNetMATHCrossRef C.D. Dang, G. Lan, Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015) MathSciNetMATHCrossRef
70.
Zurück zum Zitat D. Davis, D. Drusvyatskiy, Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019) MathSciNetMATHCrossRef D. Davis, D. Drusvyatskiy, Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019) MathSciNetMATHCrossRef
71.
Zurück zum Zitat A. Defazio, Understanding the role of momentum in non-convex optimization: Practical insights from a lyapunov analysis. Preprint (2020). arXiv:2010.00406 A. Defazio, Understanding the role of momentum in non-convex optimization: Practical insights from a lyapunov analysis. Preprint (2020). arXiv:2010.00406
72.
Zurück zum Zitat A. Defazio, L. Bottou, On the ineffectiveness of variance reduced optimization for deep learning, in Advances in Neural Information Processing Systems, pp. 1753–1763 (2019) A. Defazio, L. Bottou, On the ineffectiveness of variance reduced optimization for deep learning, in Advances in Neural Information Processing Systems, pp. 1753–1763 (2019)
73.
Zurück zum Zitat A. Defazio, F. Bach, S. Lacoste-Julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, in Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14 (MIT Press, Cambridge, MA, USA, 2014), pp. 1646–1654 A. Defazio, F. Bach, S. Lacoste-Julien, Saga: A fast incremental gradient method with support for non-strongly convex composite objectives, in Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14 (MIT Press, Cambridge, MA, USA, 2014), pp. 1646–1654
74.
Zurück zum Zitat A. Defazio, J. Domke, et al., Finito: A faster, permutable incremental gradient method for big data problems, in International Conference on Machine Learning, pp. 1125–1133 (2014) A. Defazio, J. Domke, et al., Finito: A faster, permutable incremental gradient method for big data problems, in International Conference on Machine Learning, pp. 1125–1133 (2014)
75.
Zurück zum Zitat A. Défossez, L. Bottou, F. Bach, N. Usunier, On the convergence of adam and adagrad. Preprint (2020). arXiv:2003.02395 A. Défossez, L. Bottou, F. Bach, N. Usunier, On the convergence of adam and adagrad. Preprint (2020). arXiv:2003.02395
76.
Zurück zum Zitat V. Demin, D. Nekhaev, I. Surazhevsky, K. Nikiruy, A. Emelyanov, S. Nikolaev, V. Rylkov, M. Kovalchuk, Necessary conditions for stdp-based pattern recognition learning in a memristive spiking neural network. Neural Networks 134, 64–75 (2021) CrossRef V. Demin, D. Nekhaev, I. Surazhevsky, K. Nikiruy, A. Emelyanov, S. Nikolaev, V. Rylkov, M. Kovalchuk, Necessary conditions for stdp-based pattern recognition learning in a memristive spiking neural network. Neural Networks 134, 64–75 (2021) CrossRef
77.
Zurück zum Zitat J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint (2018). arXiv:1810.04805 J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint (2018). arXiv:1810.04805
78.
Zurück zum Zitat J. Diakonikolas, M.I. Jordan, Generalized momentum-based methods: A Hamiltonian perspective. Preprint (2019). arXiv:1906.00436 J. Diakonikolas, M.I. Jordan, Generalized momentum-based methods: A Hamiltonian perspective. Preprint (2019). arXiv:1906.00436
79.
Zurück zum Zitat T. Ding, D. Li, R. Sun, Spurious local minima exist for almost all over-parameterized neural networks (2019) T. Ding, D. Li, R. Sun, Spurious local minima exist for almost all over-parameterized neural networks (2019)
80.
Zurück zum Zitat J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul.), 2121–2159 (2011) MathSciNetMATH J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul.), 2121–2159 (2011) MathSciNetMATH
81.
Zurück zum Zitat J. Duchi, M.I. Jordan, B. McMahan, Estimation, optimization, and parallelism when data is sparse, in Advances in Neural Information Processing Systems, pp. 2832–2840 (2013) J. Duchi, M.I. Jordan, B. McMahan, Estimation, optimization, and parallelism when data is sparse, in Advances in Neural Information Processing Systems, pp. 2832–2840 (2013)
82.
Zurück zum Zitat D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky, V. Spokoiny, On the line-search gradient methods for stochastic optimization. IFAC-PapersOnLine 53(2), 1715–1720 (2020). 21th IFAC World Congress. arXiv:1911.08380 D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky, V. Spokoiny, On the line-search gradient methods for stochastic optimization. IFAC-PapersOnLine 53(2), 1715–1720 (2020). 21th IFAC World Congress. arXiv:1911.08380
83.
Zurück zum Zitat P. Dvurechensky, Gradient method with inexact oracle for composite non-convex optimization (2017). arXiv:1703.09180 P. Dvurechensky, Gradient method with inexact oracle for composite non-convex optimization (2017). arXiv:1703.09180
84.
Zurück zum Zitat P. Dvurechensky, M. Staudigl, Hessian barrier algorithms for non-convex conic optimization (2021). arXiv:2111.00100 P. Dvurechensky, M. Staudigl, Hessian barrier algorithms for non-convex conic optimization (2021). arXiv:2111.00100
85.
Zurück zum Zitat P. Dvurechensky, M. Staudigl, C.A. Uribe, Generalized self-concordant Hessian-barrier algorithms (2019). arXiv:1911.01522. WIAS Preprint No. 2693 P. Dvurechensky, M. Staudigl, C.A. Uribe, Generalized self-concordant Hessian-barrier algorithms (2019). arXiv:1911.01522. WIAS Preprint No. 2693
86.
Zurück zum Zitat P.E. Dvurechensky, A.V. Gasnikov, E.A. Nurminski, F.S. Stonyakin, Advances in Low-Memory Subgradient Optimization (Springer International Publishing, Cham, 2020), pp. 19–59. arXiv:1902.01572 P.E. Dvurechensky, A.V. Gasnikov, E.A. Nurminski, F.S. Stonyakin, Advances in Low-Memory Subgradient Optimization (Springer International Publishing, Cham, 2020), pp. 19–59. arXiv:1902.01572
87.
Zurück zum Zitat P. Dvurechensky, E. Gorbunov, A. Gasnikov, An accelerated directional derivative method for smooth stochastic convex optimization. Eur. J. Oper. Res. 290(2), 601–621 (2021) MathSciNetMATHCrossRef P. Dvurechensky, E. Gorbunov, A. Gasnikov, An accelerated directional derivative method for smooth stochastic convex optimization. Eur. J. Oper. Res. 290(2), 601–621 (2021) MathSciNetMATHCrossRef
88.
Zurück zum Zitat P. Dvurechensky, S. Shtern, M. Staudigl, First-order methods for convex optimization. EURO J. Comput. Optim. 9, 100015 (2021). arXiv:2101.00935 P. Dvurechensky, S. Shtern, M. Staudigl, First-order methods for convex optimization. EURO J. Comput. Optim. 9, 100015 (2021). arXiv:2101.00935
89.
Zurück zum Zitat N. Emmenegger, R. Kyng, A.N. Zehmakan, On the oracle complexity of higher-order smooth non-convex finite-sum optimization. Preprint (2021). arXiv:2103.05138 N. Emmenegger, R. Kyng, A.N. Zehmakan, On the oracle complexity of higher-order smooth non-convex finite-sum optimization. Preprint (2021). arXiv:2103.05138
90.
Zurück zum Zitat Y.G. Evtushenko, Numerical methods for finding global extrema (case of a non-uniform mesh). USSR Comput. Math. Math. Phys. 11(6), 38–54 (1971) CrossRef Y.G. Evtushenko, Numerical methods for finding global extrema (case of a non-uniform mesh). USSR Comput. Math. Math. Phys. 11(6), 38–54 (1971) CrossRef
91.
Zurück zum Zitat C. Fang, C.J. Li, Z. Lin, T. Zhang, Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator, in Advances in Neural Information Processing Systems, pp. 689–699 (2018) C. Fang, C.J. Li, Z. Lin, T. Zhang, Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator, in Advances in Neural Information Processing Systems, pp. 689–699 (2018)
92.
Zurück zum Zitat C. Fang, Z. Lin, T. Zhang, Sharp analysis for nonconvex sgd escaping from saddle points, in Conference on Learning Theory, pp. 1192–1234 (2019) C. Fang, Z. Lin, T. Zhang, Sharp analysis for nonconvex sgd escaping from saddle points, in Conference on Learning Theory, pp. 1192–1234 (2019)
93.
Zurück zum Zitat I. Fatkhullin, B. Polyak, Optimizing static linear feedback: Gradient method. Preprint (2020). arXiv:2004.09875 I. Fatkhullin, B. Polyak, Optimizing static linear feedback: Gradient method. Preprint (2020). arXiv:2004.09875
94.
Zurück zum Zitat M. Fazel, R. Ge, S.M. Kakade, M. Mesbahi, Global convergence of policy gradient methods for the linear quadratic regulator (2019) M. Fazel, R. Ge, S.M. Kakade, M. Mesbahi, Global convergence of policy gradient methods for the linear quadratic regulator (2019)
95.
Zurück zum Zitat S. Feizi, H. Javadi, J. Zhang, D. Tse, Porcupine neural networks:(almost) all local optima are global. Preprint (2017). arXiv:1710.02196 S. Feizi, H. Javadi, J. Zhang, D. Tse, Porcupine neural networks:(almost) all local optima are global. Preprint (2017). arXiv:1710.02196
96.
Zurück zum Zitat A.D. Flaxman, A.T. Kalai, H.B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, in Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05 (Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2005), pp. 385–394 A.D. Flaxman, A.T. Kalai, H.B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, in Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05 (Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2005), pp. 385–394
97.
Zurück zum Zitat C.A. Floudas, P.M. Pardalos, Encyclopedia of Optimization (Springer Science & Business Media, 2008) C.A. Floudas, P.M. Pardalos, Encyclopedia of Optimization (Springer Science & Business Media, 2008)
98.
Zurück zum Zitat A. Gasnikov, Universal Gradient Descent (MCCME, Moscow, 2021) A. Gasnikov, Universal Gradient Descent (MCCME, Moscow, 2021)
99.
Zurück zum Zitat A. Gasnikov, P. Dvurechensky, M. Zhukovskii, S. Kim, S. Plaunov, D. Smirnov, F. Noskov, About the power law of the pagerank vector component distribution. part 2. The buckley–osthus model, verification of the power law for this model, and setup of real search engines. Numer. Anal. Appl. 11(1), 16–32 (2018) A. Gasnikov, P. Dvurechensky, M. Zhukovskii, S. Kim, S. Plaunov, D. Smirnov, F. Noskov, About the power law of the pagerank vector component distribution. part 2. The buckley–osthus model, verification of the power law for this model, and setup of real search engines. Numer. Anal. Appl. 11(1), 16–32 (2018)
100.
Zurück zum Zitat R. Ge, J. Zou, Intersecting faces: Non-negative matrix factorization with new guarantees, in International Conference on Machine Learning, pp. 2295–2303 (PMLR, 2015) R. Ge, J. Zou, Intersecting faces: Non-negative matrix factorization with new guarantees, in International Conference on Machine Learning, pp. 2295–2303 (PMLR, 2015)
101.
Zurück zum Zitat R. Ge, F. Huang, C. Jin, Y. Yuan, Escaping from saddle points–online stochastic gradient for tensor decomposition, in Conference on Learning Theory, pp. 797–842 (2015) R. Ge, F. Huang, C. Jin, Y. Yuan, Escaping from saddle points–online stochastic gradient for tensor decomposition, in Conference on Learning Theory, pp. 797–842 (2015)
102.
Zurück zum Zitat S. Ghadimi, G. Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013). arXiv:1309.5549 S. Ghadimi, G. Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013). arXiv:1309.5549
103.
Zurück zum Zitat S. Ghadimi, G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming 156(1), 59–99 (2016) MathSciNetMATHCrossRef S. Ghadimi, G. Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming 156(1), 59–99 (2016) MathSciNetMATHCrossRef
104.
Zurück zum Zitat S. Ghadimi, G. Lan, H. Zhang, Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1), 267–305 (2016). arXiv:1308.6594 S. Ghadimi, G. Lan, H. Zhang, Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming 155(1), 267–305 (2016). arXiv:1308.6594
105.
Zurück zum Zitat S. Ghadimi, G. Lan, H. Zhang, Generalized uniformly optimal methods for nonlinear programming. J. Scientif. Comput. 79(3), 1854–1881 (2019) MathSciNetMATHCrossRef S. Ghadimi, G. Lan, H. Zhang, Generalized uniformly optimal methods for nonlinear programming. J. Scientif. Comput. 79(3), 1854–1881 (2019) MathSciNetMATHCrossRef
106.
Zurück zum Zitat M.X. Goemans, D.P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM (JACM) 42(6), 1115–1145 (1995) M.X. Goemans, D.P. Williamson, Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM (JACM) 42(6), 1115–1145 (1995)
108.
Zurück zum Zitat I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep learning, vol. 1 (MIT Press Cambridge, 2016) I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep learning, vol. 1 (MIT Press Cambridge, 2016)
109.
Zurück zum Zitat E. Gorbunov, P. Dvurechensky, A. Gasnikov, An accelerated method for derivative-free smooth stochastic convex optimization. Preprint (2018). arXiv:1802.09022 (accepted to SIOPT) E. Gorbunov, P. Dvurechensky, A. Gasnikov, An accelerated method for derivative-free smooth stochastic convex optimization. Preprint (2018). arXiv:1802.09022 (accepted to SIOPT)
110.
Zurück zum Zitat E. Gorbunov, M. Danilova, A. Gasnikov, Stochastic optimization with heavy-tailed noise via accelerated gradient clipping, in Advances in Neural Information Processing Systems, vol. 33, ed. by H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin (Curran Associates, Inc., 2020), pp. 15042–15053 E. Gorbunov, M. Danilova, A. Gasnikov, Stochastic optimization with heavy-tailed noise via accelerated gradient clipping, in Advances in Neural Information Processing Systems, vol. 33, ed. by H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, H. Lin (Curran Associates, Inc., 2020), pp. 15042–15053
111.
Zurück zum Zitat E. Gorbunov, F. Hanzely, P. Richtárik, A unified theory of sgd: Variance reduction, sampling, quantization and coordinate descent, in International Conference on Artificial Intelligence and Statistics, pp. 680–690 (2020) E. Gorbunov, F. Hanzely, P. Richtárik, A unified theory of sgd: Variance reduction, sampling, quantization and coordinate descent, in International Conference on Artificial Intelligence and Statistics, pp. 680–690 (2020)
112.
Zurück zum Zitat E.A. Gorbunov, A. Bibi, O. Sener, E.H. Bergou, P. Richtárik, A stochastic derivative free optimization method with momentum, in ICLR (2020) E.A. Gorbunov, A. Bibi, O. Sener, E.H. Bergou, P. Richtárik, A stochastic derivative free optimization method with momentum, in ICLR (2020)
113.
Zurück zum Zitat E. Gorbunov, K.P. Burlachenko, Z. Li, P. Richtarik, Marina: Faster non-convex distributed learning with compression, in Proceedings of the 38th International Conference on Machine Learning, vol. 139 of Proceedings of Machine Learning Research, ed. by M. Meila, T. Zhang (PMLR, 18–24 Jul 2021), pp. 3788–3798 E. Gorbunov, K.P. Burlachenko, Z. Li, P. Richtarik, Marina: Faster non-convex distributed learning with compression, in Proceedings of the 38th International Conference on Machine Learning, vol. 139 of Proceedings of Machine Learning Research, ed. by M. Meila, T. Zhang (PMLR, 18–24 Jul 2021), pp. 3788–3798
114.
Zurück zum Zitat E. Gorbunov, M. Danilova, I. Shibaev, P. Dvurechensky, A. Gasnikov, Near-optimal high probability complexity bounds for non-smooth stochastic optimization with heavy-tailed noise (2021). arXiv:2106.05958 E. Gorbunov, M. Danilova, I. Shibaev, P. Dvurechensky, A. Gasnikov, Near-optimal high probability complexity bounds for non-smooth stochastic optimization with heavy-tailed noise (2021). arXiv:2106.05958
115.
Zurück zum Zitat A. Gorodetskiy, A. Shlychkova, A.I. Panov, Delta schema network in model-based reinforcement learning, in Artificial General Intelligence, ed. by B. Goertzel, A. I. Panov, A. Potapov, R. Yampolskiy (Springer International Publishing, Cham, 2020), pp. 172–182 CrossRef A. Gorodetskiy, A. Shlychkova, A.I. Panov, Delta schema network in model-based reinforcement learning, in Artificial General Intelligence, ed. by B. Goertzel, A. I. Panov, A. Potapov, R. Yampolskiy (Springer International Publishing, Cham, 2020), pp. 172–182 CrossRef
116.
Zurück zum Zitat A. Gotmare, N. S. Keskar, C. Xiong, R. Socher, A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation. Preprint (2018). arXiv:1810.13243 A. Gotmare, N. S. Keskar, C. Xiong, R. Socher, A closer look at deep learning heuristics: Learning rate restarts, warmup and distillation. Preprint (2018). arXiv:1810.13243
117.
Zurück zum Zitat R.M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, P. Richtárik, Sgd: General analysis and improved rates, in International Conference on Machine Learning, pp. 5200–5209 (2019) R.M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, P. Richtárik, Sgd: General analysis and improved rates, in International Conference on Machine Learning, pp. 5200–5209 (2019)
118.
Zurück zum Zitat R. Gower, O. Sebbouh, N. Loizou, Sgd for structured nonconvex functions: Learning rates, minibatching and interpolation, in International Conference on Artificial Intelligence and Statistics (PMLR, 2021), pp. 1315–1323 R. Gower, O. Sebbouh, N. Loizou, Sgd for structured nonconvex functions: Learning rates, minibatching and interpolation, in International Conference on Artificial Intelligence and Statistics (PMLR, 2021), pp. 1315–1323
119.
Zurück zum Zitat P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, K. He, Accurate, large minibatch sgd: Training imagenet in 1 hour. Preprint (2017). arXiv:1706.02677 P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, K. He, Accurate, large minibatch sgd: Training imagenet in 1 hour. Preprint (2017). arXiv:1706.02677
121.
Zurück zum Zitat S. Guminov, A. Gasnikov, Accelerated methods for alpha-weakly-quasi-convex problems. Preprint (2017). arXiv:1710.00797 S. Guminov, A. Gasnikov, Accelerated methods for alpha-weakly-quasi-convex problems. Preprint (2017). arXiv:1710.00797
122.
Zurück zum Zitat S.V. Guminov, Y.E. Nesterov, P.E. Dvurechensky, A.V. Gasnikov, Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems. Doklady Mathematics 99(2), 125–128 (2019) MathSciNetMATHCrossRef S.V. Guminov, Y.E. Nesterov, P.E. Dvurechensky, A.V. Gasnikov, Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems. Doklady Mathematics 99(2), 125–128 (2019) MathSciNetMATHCrossRef
123.
Zurück zum Zitat S. Guminov, P. Dvurechensky, N. Tupitsa, A. Gasnikov, On a combination of alternating minimization and Nesterov’s momentum, in Proceedings of the 38th International Conference on Machine Learning, vol. 145 of Proceedings of Machine Learning Research, Virtual (PMLR, 18–24 Jul 2021). arXiv:1906.03622. WIAS Preprint No. 2695 S. Guminov, P. Dvurechensky, N. Tupitsa, A. Gasnikov, On a combination of alternating minimization and Nesterov’s momentum, in Proceedings of the 38th International Conference on Machine Learning, vol. 145 of Proceedings of Machine Learning Research, Virtual (PMLR, 18–24 Jul 2021). arXiv:1906.03622. WIAS Preprint No. 2695
124.
Zurück zum Zitat B.D. Haeffele, R. Vidal, Global optimality in neural network training, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7331–7339 (2017) B.D. Haeffele, R. Vidal, Global optimality in neural network training, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7331–7339 (2017)
125.
Zurück zum Zitat G. Haeser, H. Liu, Y. Ye, Optimality condition and complexity analysis for linearly-constrained optimization without differentiability on the boundary. Mathematical Programming 178(1), 263–299 (2019) MathSciNetMATHCrossRef G. Haeser, H. Liu, Y. Ye, Optimality condition and complexity analysis for linearly-constrained optimization without differentiability on the boundary. Mathematical Programming 178(1), 263–299 (2019) MathSciNetMATHCrossRef
126.
Zurück zum Zitat J.Z. HaoChen, S. Sra, Random shuffling beats sgd after finite epochs. Preprint (2018). arXiv:1806.10077 J.Z. HaoChen, S. Sra, Random shuffling beats sgd after finite epochs. Preprint (2018). arXiv:1806.10077
127.
Zurück zum Zitat E. Hazan, K. Levy, S. Shalev-Shwartz, Beyond convexity: Stochastic quasi-convex optimization, in Advances in Neural Information Processing Systems, pp. 1594–1602 (2015) E. Hazan, K. Levy, S. Shalev-Shwartz, Beyond convexity: Stochastic quasi-convex optimization, in Advances in Neural Information Processing Systems, pp. 1594–1602 (2015)
128.
Zurück zum Zitat K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
129.
Zurück zum Zitat O. Hinder, A. Sidford, N. Sohoni, Near-optimal methods for minimizing star-convex functions and beyond, in Conference on Learning Theory (PMLR, 2020), pp. 1894–1938 O. Hinder, A. Sidford, N. Sohoni, Near-optimal methods for minimizing star-convex functions and beyond, in Conference on Learning Theory (PMLR, 2020), pp. 1894–1938
130.
Zurück zum Zitat T. Hofmann, A. Lucchi, S. Lacoste-Julien, B. McWilliams, Variance reduced stochastic gradient descent with neighbors, in Advances in Neural Information Processing Systems, pp. 2305–2313 (2015) T. Hofmann, A. Lucchi, S. Lacoste-Julien, B. McWilliams, Variance reduced stochastic gradient descent with neighbors, in Advances in Neural Information Processing Systems, pp. 2305–2313 (2015)
131.
Zurück zum Zitat S. Horváth, D. Kovalev, K. Mishchenko, S. Stich, P. Richtárik, Stochastic distributed learning with gradient quantization and variance reduction. Preprint (2019). arXiv:1904.05115 S. Horváth, D. Kovalev, K. Mishchenko, S. Stich, P. Richtárik, Stochastic distributed learning with gradient quantization and variance reduction. Preprint (2019). arXiv:1904.05115
132.
Zurück zum Zitat S.A. Ilyuhin, A.V. Sheshkus, V.L. Arlazarov, Recognition of images of Korean characters using embedded networks, in Twelfth International Conference on Machine Vision (ICMV 2019), vol. 11433, ed. by W. Osten, D.P. Nikolaev (International Society for Optics and Photonics, SPIE, 2020), pp. 273–279 S.A. Ilyuhin, A.V. Sheshkus, V.L. Arlazarov, Recognition of images of Korean characters using embedded networks, in Twelfth International Conference on Machine Vision (ICMV 2019), vol. 11433, ed. by W. Osten, D.P. Nikolaev (International Society for Optics and Photonics, SPIE, 2020), pp. 273–279
133.
Zurück zum Zitat P. Jain, P. Kar, Non-convex optimization for machine learning. Found. Trends Mach. Learn. 10(3–4), 142–336 (2017) MATHCrossRef P. Jain, P. Kar, Non-convex optimization for machine learning. Found. Trends Mach. Learn. 10(3–4), 142–336 (2017) MATHCrossRef
134.
Zurück zum Zitat P. Jain, P. Netrapalli, S. Sanghavi, Low-rank matrix completion using alternating minimization, in Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 665–674 (2013) P. Jain, P. Netrapalli, S. Sanghavi, Low-rank matrix completion using alternating minimization, in Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 665–674 (2013)
135.
Zurück zum Zitat Z. Ji, M.J. Telgarsky, Gradient descent aligns the layers of deep linear networks, in 7th International Conference on Learning Representations, ICLR 2019 (2019) Z. Ji, M.J. Telgarsky, Gradient descent aligns the layers of deep linear networks, in 7th International Conference on Learning Representations, ICLR 2019 (2019)
136.
Zurück zum Zitat K. Ji, Z. Wang, Y. Zhou, Y. Liang, Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization (2019) K. Ji, Z. Wang, Y. Zhou, Y. Liang, Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization (2019)
137.
Zurück zum Zitat C. Jin, R. Ge, P. Netrapalli, S.M. Kakade, M.I. Jordan, How to escape saddle points efficiently. Proceedings of Machine Learning Research, vol. 70 (International Convention Centre, Sydney, Australia, 06–11 Aug 2017), pp. 1724–1732. PMLR C. Jin, R. Ge, P. Netrapalli, S.M. Kakade, M.I. Jordan, How to escape saddle points efficiently. Proceedings of Machine Learning Research, vol. 70 (International Convention Centre, Sydney, Australia, 06–11 Aug 2017), pp. 1724–1732. PMLR
138.
Zurück zum Zitat C. Jin, P. Netrapalli, M.I. Jordan, Accelerated gradient descent escapes saddle points faster than gradient descent, in Conference On Learning Theory (PMLR, 2018), pp. 1042–1085 C. Jin, P. Netrapalli, M.I. Jordan, Accelerated gradient descent escapes saddle points faster than gradient descent, in Conference On Learning Theory (PMLR, 2018), pp. 1042–1085
139.
Zurück zum Zitat C. Jin, P. Netrapalli, R. Ge, S. M. Kakade, M. I. Jordan, On nonconvex optimization for machine learning: Gradients, stochasticity, and saddle points. J. ACM (JACM) 68(2), 1–29 (2021) C. Jin, P. Netrapalli, R. Ge, S. M. Kakade, M. I. Jordan, On nonconvex optimization for machine learning: Gradients, stochasticity, and saddle points. J. ACM (JACM) 68(2), 1–29 (2021)
140.
Zurück zum Zitat R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems, pp. 315–323 (2013) R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems, pp. 315–323 (2013)
141.
Zurück zum Zitat H. Karimi, J. Nutini, M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, 2016), pp. 795–811 H. Karimi, J. Nutini, M. Schmidt, Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, 2016), pp. 795–811
142.
Zurück zum Zitat A. Khaled, P. Richtárik, Better theory for sgd in the nonconvex world. Preprint (2020). arXiv:2002.03329 A. Khaled, P. Richtárik, Better theory for sgd in the nonconvex world. Preprint (2020). arXiv:2002.03329
143.
Zurück zum Zitat S. Khot, G. Kindler, E. Mossel, R. O’Donnell, Optimal inapproximability results for max-cut and other 2-variable csps? SIAM J. Comput. 37(1), 319–357 (2007) MathSciNetMATHCrossRef S. Khot, G. Kindler, E. Mossel, R. O’Donnell, Optimal inapproximability results for max-cut and other 2-variable csps? SIAM J. Comput. 37(1), 319–357 (2007) MathSciNetMATHCrossRef
144.
Zurück zum Zitat A. Khritankov, Hidden feedback loops in machine learning systems: A simulation model and preliminary results, in Software Quality: Future Perspectives on Software Engineering Quality, ed. by D. Winkler, S. Biffl, D. Mendez, M. Wimmer, J. Bergsmann (Springer International Publishing, Cham, 2021), pp. 54–65 CrossRef A. Khritankov, Hidden feedback loops in machine learning systems: A simulation model and preliminary results, in Software Quality: Future Perspectives on Software Engineering Quality, ed. by D. Winkler, S. Biffl, D. Mendez, M. Wimmer, J. Bergsmann (Springer International Publishing, Cham, 2021), pp. 54–65 CrossRef
145.
Zurück zum Zitat R. Kidambi, P. Netrapalli, P. Jain, S. Kakade, On the insufficiency of existing momentum schemes for stochastic optimization, in 2018 Information Theory and Applications Workshop (ITA) (IEEE, 2018), pp. 1–9 R. Kidambi, P. Netrapalli, P. Jain, S. Kakade, On the insufficiency of existing momentum schemes for stochastic optimization, in 2018 Information Theory and Applications Workshop (ITA) (IEEE, 2018), pp. 1–9
146.
Zurück zum Zitat L. Kiefer, M. Storath, A. Weinmann, Iterative potts minimization for the recovery of signals with discontinuities from indirect measurements: The multivariate case. Found. Comput. Math. 1–46 (2020) L. Kiefer, M. Storath, A. Weinmann, Iterative potts minimization for the recovery of signals with discontinuities from indirect measurements: The multivariate case. Found. Comput. Math. 1–46 (2020)
147.
Zurück zum Zitat D. P. Kingma, J. Ba, Adam: A method for stochastic optimization. Preprint (2014). arXiv:1412.6980 D. P. Kingma, J. Ba, Adam: A method for stochastic optimization. Preprint (2014). arXiv:1412.6980
148.
Zurück zum Zitat V. V. Kniaz, S. Y. Zheltov, F. Remondino, V. A. Knyaz, A. Gruen, Wire structure image-based 3d reconstruction aided by deep learning. volume XLIII-B2-2020, pp. 435–441, Göttingen, 2020. Copernicus. XXIV ISPRS Congress 2020 (virtual); Conference Location: Online; Conference Date: August 31–September 2, 2020; Due to the Corona virus (COVID-19) the conference was conducted virtually V. V. Kniaz, S. Y. Zheltov, F. Remondino, V. A. Knyaz, A. Gruen, Wire structure image-based 3d reconstruction aided by deep learning. volume XLIII-B2-2020, pp. 435–441, Göttingen, 2020. Copernicus. XXIV ISPRS Congress 2020 (virtual); Conference Location: Online; Conference Date: August 31–September 2, 2020; Due to the Corona virus (COVID-19) the conference was conducted virtually
149.
Zurück zum Zitat V. V. Kniaz, V. A. Knyaz, V. Mizginov, A. Papazyan, N. Fomin, L. Grodzitsky, Adversarial dataset augmentation using reinforcement learning and 3d modeling, in Advances in Neural Computation, Machine Learning, and Cognitive Research IV, ed. by B. Kryzhanovsky, W. Dunin-Barkowski, V. Redko, Y. Tiumentsev (Springer International Publishing, Cham, 2021), pp. 316–329 CrossRef V. V. Kniaz, V. A. Knyaz, V. Mizginov, A. Papazyan, N. Fomin, L. Grodzitsky, Adversarial dataset augmentation using reinforcement learning and 3d modeling, in Advances in Neural Computation, Machine Learning, and Cognitive Research IV, ed. by B. Kryzhanovsky, W. Dunin-Barkowski, V. Redko, Y. Tiumentsev (Springer International Publishing, Cham, 2021), pp. 316–329 CrossRef
150.
Zurück zum Zitat J. M. Kohler, A. Lucchi, Sub-sampled cubic regularization for non-convex optimization, in International Conference on Machine Learning, pp. 1895–1904 (2017) J. M. Kohler, A. Lucchi, Sub-sampled cubic regularization for non-convex optimization, in International Conference on Machine Learning, pp. 1895–1904 (2017)
151.
Zurück zum Zitat G. Kornowski, O. Shamir, Oracle complexity in nonsmooth nonconvex optimization. Preprint (2021). arXiv:2104.06763 G. Kornowski, O. Shamir, Oracle complexity in nonsmooth nonconvex optimization. Preprint (2021). arXiv:2104.06763
152.
Zurück zum Zitat D. Kovalev, S. Horváth, P. Richtárik, Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop, in Algorithmic Learning Theory, pp. 451–467 (2020) D. Kovalev, S. Horváth, P. Richtárik, Don’t jump through hoops and remove those loops: Svrg and katyusha are better without the outer loop, in Algorithmic Learning Theory, pp. 451–467 (2020)
153.
Zurück zum Zitat A. Krizhevsky, G. Hinton, et al., Learning multiple layers of features from tiny images (2009) A. Krizhevsky, G. Hinton, et al., Learning multiple layers of features from tiny images (2009)
154.
Zurück zum Zitat P. Kuderov, A. Panov, Planning with hierarchical temporal memory for deterministic markov decision problem, in Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART (INSTICC, SciTePress, 2021), pp. 1073–1081 P. Kuderov, A. Panov, Planning with hierarchical temporal memory for deterministic markov decision problem, in Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART (INSTICC, SciTePress, 2021), pp. 1073–1081
155.
Zurück zum Zitat T. Lacroix, N. Usunier, G. Obozinski, Canonical tensor decomposition for knowledge base completion, in International Conference on Machine Learning, pp. 2863–2872 (2018) T. Lacroix, N. Usunier, G. Obozinski, Canonical tensor decomposition for knowledge base completion, in International Conference on Machine Learning, pp. 2863–2872 (2018)
156.
Zurück zum Zitat G. Lan, First-Order and Stochastic Optimization Methods for Machine Learning (Springer, 2020) G. Lan, First-Order and Stochastic Optimization Methods for Machine Learning (Springer, 2020)
157.
Zurück zum Zitat G. Lan, Y. Yang, Accelerated stochastic algorithms for nonconvex finite-sum and multiblock optimization. SIAM J. Optim. 29(4), 2753–2784 (2019) MathSciNetMATHCrossRef G. Lan, Y. Yang, Accelerated stochastic algorithms for nonconvex finite-sum and multiblock optimization. SIAM J. Optim. 29(4), 2753–2784 (2019) MathSciNetMATHCrossRef
158.
159.
Zurück zum Zitat J. C. H. Lee, P. Valiant, Optimizing star-convex functions, in 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 603–614 (2016) J. C. H. Lee, P. Valiant, Optimizing star-convex functions, in 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 603–614 (2016)
160.
Zurück zum Zitat Y. Lei, T. Hu, G. Li, K. Tang, Stochastic gradient descent for nonconvex learning without bounded gradient assumptions. IEEE Trans. Neural Networks Learn. Syst. (2019) Y. Lei, T. Hu, G. Li, K. Tang, Stochastic gradient descent for nonconvex learning without bounded gradient assumptions. IEEE Trans. Neural Networks Learn. Syst. (2019)
161.
Zurück zum Zitat K. Y. Levy, The power of normalization: Faster evasion of saddle points. Preprint (2016). arXiv:1611.04831 K. Y. Levy, The power of normalization: Faster evasion of saddle points. Preprint (2016). arXiv:1611.04831
162.
Zurück zum Zitat Y. Li, K. Lee, Y. Bresler, Identifiability in blind deconvolution with subspace or sparsity constraints. IEEE Trans. Inf. Theory 62(7), 4266–4275 (2016) MathSciNetMATHCrossRef Y. Li, K. Lee, Y. Bresler, Identifiability in blind deconvolution with subspace or sparsity constraints. IEEE Trans. Inf. Theory 62(7), 4266–4275 (2016) MathSciNetMATHCrossRef
163.
Zurück zum Zitat D. Li, T. Ding, R. Sun, Over-parameterized deep neural networks have no strict local minima for any continuous activations. Preprint (2018). arXiv:1812.11039 D. Li, T. Ding, R. Sun, Over-parameterized deep neural networks have no strict local minima for any continuous activations. Preprint (2018). arXiv:1812.11039
164.
Zurück zum Zitat Z. Li, H. Bao, X. Zhang, P. Richtárik, Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. Preprint (2020). arXiv:2008.10898 Z. Li, H. Bao, X. Zhang, P. Richtárik, Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. Preprint (2020). arXiv:2008.10898
165.
Zurück zum Zitat Z. Li, P. Richtárik, A unified analysis of stochastic gradient methods for nonconvex federated optimization. Preprint (2020). arXiv:2006.07013 Z. Li, P. Richtárik, A unified analysis of stochastic gradient methods for nonconvex federated optimization. Preprint (2020). arXiv:2006.07013
166.
Zurück zum Zitat S. Liang, R. Sun, Y. Li, R. Srikant, Understanding the loss surface of neural networks for binary classification, in International Conference on Machine Learning, pp. 2835–2843 (2018) S. Liang, R. Sun, Y. Li, R. Srikant, Understanding the loss surface of neural networks for binary classification, in International Conference on Machine Learning, pp. 2835–2843 (2018)
167.
Zurück zum Zitat S. Liu, B. Kailkhura, P.-Y. Chen, P. Ting, S. Chang, L. Amini, Zeroth-order stochastic variance reduction for nonconvex optimization (2018) S. Liu, B. Kailkhura, P.-Y. Chen, P. Ting, S. Chang, L. Amini, Zeroth-order stochastic variance reduction for nonconvex optimization (2018)
168.
Zurück zum Zitat R. Livni, S. Shalev-Shwartz, O. Shamir, On the computational efficiency of training neural networks, in Advances in Neural Information Processing Systems, pp. 855–863 (2014) R. Livni, S. Shalev-Shwartz, O. Shamir, On the computational efficiency of training neural networks, in Advances in Neural Information Processing Systems, pp. 855–863 (2014)
169.
Zurück zum Zitat N. Loizou, S. Vaswani, I. H. Laradji, S. Lacoste-Julien, Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence, in International Conference on Artificial Intelligence and Statistics (PMLR, 2021), pp. 1306–1314 N. Loizou, S. Vaswani, I. H. Laradji, S. Lacoste-Julien, Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence, in International Conference on Artificial Intelligence and Statistics (PMLR, 2021), pp. 1306–1314
170.
Zurück zum Zitat S. Lojasiewicz, A topological property of real analytic subsets. Coll. du CNRS, Les équations aux dérivées partielles 117, 87–89 (1963) S. Lojasiewicz, A topological property of real analytic subsets. Coll. du CNRS, Les équations aux dérivées partielles 117, 87–89 (1963)
171.
Zurück zum Zitat I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts. Preprint (2016). arXiv:1608.03983 I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts. Preprint (2016). arXiv:1608.03983
172.
Zurück zum Zitat A. Lucchi, J. Kohler, A stochastic tensor method for non-convex optimization. Preprint (2019). arXiv:1911.10367 A. Lucchi, J. Kohler, A stochastic tensor method for non-convex optimization. Preprint (2019). arXiv:1911.10367
173.
Zurück zum Zitat C. Ma, K. Wang, Y. Chi, Y. Chen, Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion, in International Conference on Machine Learning (PMLR, 2018), pp. 3345–3354 C. Ma, K. Wang, Y. Chi, Y. Chen, Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval and matrix completion, in International Conference on Machine Learning (PMLR, 2018), pp. 3345–3354
174.
Zurück zum Zitat S. Ma, R. Bassily, M. Belkin, The power of interpolation: Understanding the effectiveness of sgd in modern over-parametrized learning, in International Conference on Machine Learning (PMLR, 2018), pp. 3325–3334 S. Ma, R. Bassily, M. Belkin, The power of interpolation: Understanding the effectiveness of sgd in modern over-parametrized learning, in International Conference on Machine Learning (PMLR, 2018), pp. 3325–3334
175.
Zurück zum Zitat J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015) MathSciNetMATHCrossRef J. Mairal, Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015) MathSciNetMATHCrossRef
176.
Zurück zum Zitat D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. L. Bartlett, M. J. Wainwright, Derivative-free methods for policy optimization: Guarantees for linear quadratic systems (2020) D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. L. Bartlett, M. J. Wainwright, Derivative-free methods for policy optimization: Guarantees for linear quadratic systems (2020)
177.
Zurück zum Zitat J. Martens, Deep learning via hessian-free optimization, in International Conference on Machine Learning, vo 27, pp. 735–742 (2010) J. Martens, Deep learning via hessian-free optimization, in International Conference on Machine Learning, vo 27, pp. 735–742 (2010)
178.
Zurück zum Zitat T. Mikolov, Statistical language models based on neural networks, Presentation at Google, Mountain View, 2nd April, 80 (2012) T. Mikolov, Statistical language models based on neural networks, Presentation at Google, Mountain View, 2nd April, 80 (2012)
179.
Zurück zum Zitat K. Mishchenko, E. Gorbunov, M. Takáč, P. Richtárik, Distributed learning with compressed gradient differences. Preprint (2019). arXiv:1901.09269 K. Mishchenko, E. Gorbunov, M. Takáč, P. Richtárik, Distributed learning with compressed gradient differences. Preprint (2019). arXiv:1901.09269
180.
Zurück zum Zitat K. Mishchenko, A. Khaled, P. Richtárik, Random reshuffling: Simple analysis with vast improvements. Preprint (2020). arXiv:2006.05988 K. Mishchenko, A. Khaled, P. Richtárik, Random reshuffling: Simple analysis with vast improvements. Preprint (2020). arXiv:2006.05988
181.
Zurück zum Zitat K. G. Murty, S. N. Kabadi, Some np-complete problems in quadratic and nonlinear programming. Mathematical Programming 39(2), 117–129 (1987) MathSciNetMATHCrossRef K. G. Murty, S. N. Kabadi, Some np-complete problems in quadratic and nonlinear programming. Mathematical Programming 39(2), 117–129 (1987) MathSciNetMATHCrossRef
183.
Zurück zum Zitat A. Nemirovski, Orth-method for smooth convex optimization. Izvestia AN SSSR Transl. Eng. Cybern. Soviet J. Comput. Syst. Sci. 2, 937–947 (1982) A. Nemirovski, Orth-method for smooth convex optimization. Izvestia AN SSSR Transl. Eng. Cybern. Soviet J. Comput. Syst. Sci. 2, 937–947 (1982)
184.
Zurück zum Zitat Y. Nesterov, A method of solving a convex programming problem with convergence rate o(1∕ k 2). Soviet Math. Doklady 27(2), 372–376 (1983) MATH Y. Nesterov, A method of solving a convex programming problem with convergence rate o(1∕ k 2). Soviet Math. Doklady 27(2), 372–376 (1983) MATH
185.
Zurück zum Zitat Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, Massachusetts, 2004) MATHCrossRef Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, Massachusetts, 2004) MATHCrossRef
186.
Zurück zum Zitat Y. Nesterov, How to make the gradients small. Optima 88, 10–11 (2012) Y. Nesterov, How to make the gradients small. Optima 88, 10–11 (2012)
187.
Zurück zum Zitat Y. Nesterov, Lectures on Convex Optimization, vol. 137 (Springer, 2018) Y. Nesterov, Lectures on Convex Optimization, vol. 137 (Springer, 2018)
188.
Zurück zum Zitat Y. Nesterov, B. Polyak, Cubic regularization of Newton method and its global performance. Mathematical Programming 108(1), 177–205 (2006) MathSciNetMATHCrossRef Y. Nesterov, B. Polyak, Cubic regularization of Newton method and its global performance. Mathematical Programming 108(1), 177–205 (2006) MathSciNetMATHCrossRef
189.
Zurück zum Zitat Y. Nesterov, V. Spokoiny, Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017). First appeared in 2011 as CORE discussion paper 2011/16 Y. Nesterov, V. Spokoiny, Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017). First appeared in 2011 as CORE discussion paper 2011/16
190.
Zurück zum Zitat Y. Nesterov, A. Gasnikov, S. Guminov, P. Dvurechensky, Primal-dual accelerated gradient methods with small-dimensional relaxation oracle. Optim. Methods Softw., 1–28 (2020). arXiv:1809.05895 Y. Nesterov, A. Gasnikov, S. Guminov, P. Dvurechensky, Primal-dual accelerated gradient methods with small-dimensional relaxation oracle. Optim. Methods Softw., 1–28 (2020). arXiv:1809.05895
191.
Zurück zum Zitat B. Neyshabur, S. Bhojanapalli, D. McAllester, N. Srebro, Exploring generalization in deep learning, in Advances in Neural Information Processing Systems, pp. 5947–5956 (2017) B. Neyshabur, S. Bhojanapalli, D. McAllester, N. Srebro, Exploring generalization in deep learning, in Advances in Neural Information Processing Systems, pp. 5947–5956 (2017)
192.
Zurück zum Zitat L. M. Nguyen, J. Liu, K. Scheinberg, M. Takáč, Sarah: A novel method for machine learning problems using stochastic recursive gradient, in International Conference on Machine Learning, pp. 2613–2621 (2017) L. M. Nguyen, J. Liu, K. Scheinberg, M. Takáč, Sarah: A novel method for machine learning problems using stochastic recursive gradient, in International Conference on Machine Learning, pp. 2613–2621 (2017)
193.
Zurück zum Zitat L. M. Nguyen, J. Liu, K. Scheinberg, M. Takáč, Stochastic recursive gradient algorithm for nonconvex optimization. Preprint (2017). arXiv:1705.07261 L. M. Nguyen, J. Liu, K. Scheinberg, M. Takáč, Stochastic recursive gradient algorithm for nonconvex optimization. Preprint (2017). arXiv:1705.07261
194.
Zurück zum Zitat Q. Nguyen, M. C. Mukkamala, M. Hein, On the loss landscape of a class of deep neural networks with no bad local valleys. Preprint (2018). arXiv:1809.10749 Q. Nguyen, M. C. Mukkamala, M. Hein, On the loss landscape of a class of deep neural networks with no bad local valleys. Preprint (2018). arXiv:1809.10749
195.
Zurück zum Zitat L. M. Nguyen, Q. Tran-Dinh, D. T. Phan, P. H. Nguyen, M. van Dijk, A unified convergence analysis for shuffling-type gradient methods. Preprint (2020). arXiv:2002.08246 L. M. Nguyen, Q. Tran-Dinh, D. T. Phan, P. H. Nguyen, M. van Dijk, A unified convergence analysis for shuffling-type gradient methods. Preprint (2020). arXiv:2002.08246
196.
Zurück zum Zitat J. Nocedal, S. Wright, Numerical Optimization (Springer Science & Business Media, 2006) J. Nocedal, S. Wright, Numerical Optimization (Springer Science & Business Media, 2006)
197.
Zurück zum Zitat K. Osawa, Y. Tsuji, Y. Ueno, A. Naruse, R. Yokota, S. Matsuoka, Second-order optimization method for large mini-batch: Training resnet-50 on imagenet in 35 epochs. Preprint (2018). arXiv:1811.12019, 1:2 K. Osawa, Y. Tsuji, Y. Ueno, A. Naruse, R. Yokota, S. Matsuoka, Second-order optimization method for large mini-batch: Training resnet-50 on imagenet in 35 epochs. Preprint (2018). arXiv:1811.12019, 1:2
198.
Zurück zum Zitat N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami, Practical black-box attacks against machine learning (2017) N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami, Practical black-box attacks against machine learning (2017)
199.
Zurück zum Zitat V. Papyan, Y. Romano, J. Sulam, M. Elad, Convolutional dictionary learning via local processing, in Proceedings of the IEEE International Conference on Computer Vision, pp. 5296–5304 (2017) V. Papyan, Y. Romano, J. Sulam, M. Elad, Convolutional dictionary learning via local processing, in Proceedings of the IEEE International Conference on Computer Vision, pp. 5296–5304 (2017)
200.
Zurück zum Zitat S. Park, S. H. Jung, P. M. Pardalos, Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization. J. Optim. Theory Appl. 184(3), 953–971 (2020) MathSciNetMATHCrossRef S. Park, S. H. Jung, P. M. Pardalos, Combining stochastic adaptive cubic regularization with negative curvature for nonconvex optimization. J. Optim. Theory Appl. 184(3), 953–971 (2020) MathSciNetMATHCrossRef
201.
Zurück zum Zitat R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in International Conference on Machine Learning, pp. 1310–1318 (2013) R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in International Conference on Machine Learning, pp. 1310–1318 (2013)
202.
Zurück zum Zitat B. Polyak, Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963) MATHCrossRef B. Polyak, Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963) MATHCrossRef
203.
Zurück zum Zitat B. T. Polyak, Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964) CrossRef B. T. Polyak, Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964) CrossRef
204.
Zurück zum Zitat B. Polyak, Introduction to Optimization (Optimization Software, New York, 1987) MATH B. Polyak, Introduction to Optimization (Optimization Software, New York, 1987) MATH
205.
Zurück zum Zitat Q. Qu, X. Li, Z. Zhu, A nonconvex approach for exact and efficient multichannel sparse blind deconvolution, in Advances in Neural Information Processing Systems, pp. 4015–4026 (2019) Q. Qu, X. Li, Z. Zhu, A nonconvex approach for exact and efficient multichannel sparse blind deconvolution, in Advances in Neural Information Processing Systems, pp. 4015–4026 (2019)
206.
Zurück zum Zitat S. Rajput, A. Gupta, D. Papailiopoulos, Closing the convergence gap of sgd without replacement. Preprint (2020). arXiv:2002.10400 S. Rajput, A. Gupta, D. Papailiopoulos, Closing the convergence gap of sgd without replacement. Preprint (2020). arXiv:2002.10400
207.
Zurück zum Zitat S. J. Reddi, A. Hefny, S. Sra, B. Poczos, A. Smola, Stochastic variance reduction for nonconvex optimization, in International conference on machine learning, pp. 314–323 (2016) S. J. Reddi, A. Hefny, S. Sra, B. Poczos, A. Smola, Stochastic variance reduction for nonconvex optimization, in International conference on machine learning, pp. 314–323 (2016)
208.
Zurück zum Zitat S. J. Reddi, S. Sra, B. Poczos, A. J. Smola, Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, in Advances in Neural Information Processing Systems, pp. 1145–1153 (2016) S. J. Reddi, S. Sra, B. Poczos, A. J. Smola, Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization, in Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)
209.
Zurück zum Zitat S. J. Reddi, S. Kale, S. Kumar, On the convergence of adam and beyond. Preprint (2019). arXiv:1904.09237 S. J. Reddi, S. Kale, S. Kumar, On the convergence of adam and beyond. Preprint (2019). arXiv:1904.09237
210.
Zurück zum Zitat A. Rezanov, D. Yudin, Deep neural networks for ortophoto-based vehicle localization, in Advances in Neural Computation, Machine Learning, and Cognitive Research IV, ed. by B. Kryzhanovsky, W. Dunin-Barkowski, V. Redko, Y. Tiumentsev (Springer International Publishing, Cham, 2021), pp. 167–174 CrossRef A. Rezanov, D. Yudin, Deep neural networks for ortophoto-based vehicle localization, in Advances in Neural Computation, Machine Learning, and Cognitive Research IV, ed. by B. Kryzhanovsky, W. Dunin-Barkowski, V. Redko, Y. Tiumentsev (Springer International Publishing, Cham, 2021), pp. 167–174 CrossRef
211.
Zurück zum Zitat A. Risteski, Y. Li, Algorithms and matching lower bounds for approximately-convex optimization, in NIPS (2016) A. Risteski, Y. Li, Algorithms and matching lower bounds for approximately-convex optimization, in NIPS (2016)
212.
Zurück zum Zitat A. Roy, K. Balasubramanian, S. Ghadimi, P. Mohapatra, Escaping saddle-point faster under interpolation-like conditions, in Advances in Neural Information Processing Systems, p. 33 (2020) A. Roy, K. Balasubramanian, S. Ghadimi, P. Mohapatra, Escaping saddle-point faster under interpolation-like conditions, in Advances in Neural Information Processing Systems, p. 33 (2020)
213.
Zurück zum Zitat C. W. Royer, S. J. Wright, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim. 28(2), 1448–1477 (2018) MathSciNetMATHCrossRef C. W. Royer, S. J. Wright, Complexity analysis of second-order line-search algorithms for smooth nonconvex optimization. SIAM J. Optim. 28(2), 1448–1477 (2018) MathSciNetMATHCrossRef
214.
Zurück zum Zitat I. Safran, O. Shamir, Spurious local minima are common in two-layer relu neural networks, in International Conference on Machine Learning (PMLR, 2018), pp. 4433–4441 I. Safran, O. Shamir, Spurious local minima are common in two-layer relu neural networks, in International Conference on Machine Learning (PMLR, 2018), pp. 4433–4441
215.
Zurück zum Zitat K. A. Sankararaman, S. De, Z. Xu, W. R. Huang, T. Goldstein, The impact of neural network overparameterization on gradient confusion and stochastic gradient descent. Preprint (2019). arXiv:1904.06963 K. A. Sankararaman, S. De, Z. Xu, W. R. Huang, T. Goldstein, The impact of neural network overparameterization on gradient confusion and stochastic gradient descent. Preprint (2019). arXiv:1904.06963
216.
Zurück zum Zitat M. Schmidt, N. L. Roux, Fast convergence of stochastic gradient descent under a strong growth condition. Preprint (2013). arXiv:1308.6370 M. Schmidt, N. L. Roux, Fast convergence of stochastic gradient descent under a strong growth condition. Preprint (2013). arXiv:1308.6370
217.
Zurück zum Zitat M. Schmidt, N. Le Roux, F. Bach, Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162(1–2), 83–112 (2017) MathSciNetMATHCrossRef M. Schmidt, N. Le Roux, F. Bach, Minimizing finite sums with the stochastic average gradient. Mathematical Programming 162(1–2), 83–112 (2017) MathSciNetMATHCrossRef
218.
Zurück zum Zitat M. Schumer, K. Steiglitz, Adaptive step size random search. IEEE Trans. Automatic Control 13(3), 270–276 (1968) CrossRef M. Schumer, K. Steiglitz, Adaptive step size random search. IEEE Trans. Automatic Control 13(3), 270–276 (1968) CrossRef
219.
Zurück zum Zitat O. Sebbouh, R. M. Gower, A. Defazio, On the convergence of the stochastic heavy ball method. Preprint (2020). arXiv:2006.07867 O. Sebbouh, R. M. Gower, A. Defazio, On the convergence of the stochastic heavy ball method. Preprint (2020). arXiv:2006.07867
220.
Zurück zum Zitat O. Sener, V. Koltun, Learning to guide random search, in International Conference on Learning Representations (2020) O. Sener, V. Koltun, Learning to guide random search, in International Conference on Learning Representations (2020)
221.
Zurück zum Zitat S. Shalev-Shwartz, Sdca without duality, regularization, and individual convexity, in International Conference on Machine Learning, pp. 747–754 (2016) S. Shalev-Shwartz, Sdca without duality, regularization, and individual convexity, in International Conference on Machine Learning, pp. 747–754 (2016)
222.
Zurück zum Zitat Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, M. Segev, Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015) CrossRef Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, M. Segev, Phase retrieval with application to optical imaging: a contemporary overview. IEEE Signal Process. Mag. 32(3), 87–109 (2015) CrossRef
223.
Zurück zum Zitat Z. Shen, P. Zhou, C. Fang, A. Ribeiro, A stochastic trust region method for non-convex minimization. Preprint (2019). arXiv:1903.01540 Z. Shen, P. Zhou, C. Fang, A. Ribeiro, A stochastic trust region method for non-convex minimization. Preprint (2019). arXiv:1903.01540
224.
Zurück zum Zitat L. Shi, Y. Chi, Manifold gradient descent solves multi-channel sparse blind deconvolution provably and efficiently, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 5730–5734 L. Shi, Y. Chi, Manifold gradient descent solves multi-channel sparse blind deconvolution provably and efficiently, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2020), pp. 5730–5734
225.
Zurück zum Zitat B. Shi, W. J. Su, M. I. Jordan, On learning rates and schrödinger operators. Preprint (2020). arXiv:2004.06977 B. Shi, W. J. Su, M. I. Jordan, On learning rates and schrödinger operators. Preprint (2020). arXiv:2004.06977
226.
Zurück zum Zitat N. Shi, D. Li, M. Hong, R. Sun, Rmsprop converges with proper hyper-parameter, in International Conference on Learning Representations (2021) N. Shi, D. Li, M. Hong, R. Sun, Rmsprop converges with proper hyper-parameter, in International Conference on Learning Representations (2021)
228.
Zurück zum Zitat Y. Shin, Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks. Preprint (2019). arXiv:1910.05874 Y. Shin, Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks. Preprint (2019). arXiv:1910.05874
229.
Zurück zum Zitat N. Z. Shor, Generalized gradient descent with application to block programming. Kibernetika 3(3), 53–55 (1967) N. Z. Shor, Generalized gradient descent with application to block programming. Kibernetika 3(3), 53–55 (1967)
230.
Zurück zum Zitat A. Skrynnik, A. Staroverov, E. Aitygulov, K. Aksenov, V. Davydov, A. I. Panov, Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations. Knowl. Based Syst. 218, 106844 (2021) CrossRef A. Skrynnik, A. Staroverov, E. Aitygulov, K. Aksenov, V. Davydov, A. I. Panov, Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations. Knowl. Based Syst. 218, 106844 (2021) CrossRef
231.
Zurück zum Zitat L. N. Smith, Cyclical learning rates for training neural networks, in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2017), pp. 464–472 L. N. Smith, Cyclical learning rates for training neural networks, in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2017), pp. 464–472
232.
Zurück zum Zitat M. V. Solodov, Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11(1), 23–35 (1998) MathSciNetMATHCrossRef M. V. Solodov, Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11(1), 23–35 (1998) MathSciNetMATHCrossRef
233.
Zurück zum Zitat M. Soltanolkotabi, A. Javanmard, J. D. Lee, Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. IEEE Trans. Inf. Theory 65(2), 742–769 (2018) MathSciNetMATHCrossRef M. Soltanolkotabi, A. Javanmard, J. D. Lee, Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. IEEE Trans. Inf. Theory 65(2), 742–769 (2018) MathSciNetMATHCrossRef
234.
Zurück zum Zitat V. Spokoiny et al., Parametric estimation. finite sample theory. Ann. Stat. 40(6), 2877–2909 (2012) V. Spokoiny et al., Parametric estimation. finite sample theory. Ann. Stat. 40(6), 2877–2909 (2012)
235.
Zurück zum Zitat F. S. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C. A. Uribe, D. Pasechnyuk, S. Artamonov, Gradient methods for problems with inexact model of the objective, in Mathematical Optimization Theory and Operations Research, ed. by M. Khachay, Y. Kochetov, P. Pardalos, (Springer International Publishing, Cham, 2019), pp. 97–114 arXiv:1902.09001 F. S. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C. A. Uribe, D. Pasechnyuk, S. Artamonov, Gradient methods for problems with inexact model of the objective, in Mathematical Optimization Theory and Operations Research, ed. by M. Khachay, Y. Kochetov, P. Pardalos, (Springer International Publishing, Cham, 2019), pp. 97–114 arXiv:1902.09001
236.
Zurück zum Zitat F. Stonyakin, A. Tyurin, A. Gasnikov, P. Dvurechensky, A. Agafonov, D. Dvinskikh, M. Alkousa, D. Pasechnyuk, S. Artamonov, V. Piskunova, Inexact model: A framework for optimization and variational inequalities. Optim. Methods Softw. (2021). (accepted), WIAS Preprint No. 2709, arXiv:2001.09013, arXiv:1902.00990. https://​doi.​org/​10.​1080/​10556788.​2021.​1924714 F. Stonyakin, A. Tyurin, A. Gasnikov, P. Dvurechensky, A. Agafonov, D. Dvinskikh, M. Alkousa, D. Pasechnyuk, S. Artamonov, V. Piskunova, Inexact model: A framework for optimization and variational inequalities. Optim. Methods Softw. (2021). (accepted), WIAS Preprint No. 2709, arXiv:2001.09013, arXiv:1902.00990. https://​doi.​org/​10.​1080/​10556788.​2021.​1924714
237.
Zurück zum Zitat R. Sun, Optimization for deep learning: theory and algorithms. Preprint (2019). arXiv:1912.08957 R. Sun, Optimization for deep learning: theory and algorithms. Preprint (2019). arXiv:1912.08957
238.
Zurück zum Zitat I. Surazhevsky, V. Demin, A. Ilyasov, A. Emelyanov, K. Nikiruy, V. Rylkov, S. Shchanikov, I. Bordanov, S. Gerasimova, D. Guseinov, N. Malekhonova, D. Pavlov, A. Belov, A. Mikhaylov, V. Kazantsev, D. Valenti, B. Spagnolo, M. Kovalchuk, Noise-assisted persistence and recovery of memory state in a memristive spiking neuromorphic network. Chaos Solitons Fractals 146, 110890 (2021) MATHCrossRef I. Surazhevsky, V. Demin, A. Ilyasov, A. Emelyanov, K. Nikiruy, V. Rylkov, S. Shchanikov, I. Bordanov, S. Gerasimova, D. Guseinov, N. Malekhonova, D. Pavlov, A. Belov, A. Mikhaylov, V. Kazantsev, D. Valenti, B. Spagnolo, M. Kovalchuk, Noise-assisted persistence and recovery of memory state in a memristive spiking neuromorphic network. Chaos Solitons Fractals 146, 110890 (2021) MATHCrossRef
239.
Zurück zum Zitat I. Sutskever, J. Martens, G. Dahl, G. Hinton, On the importance of initialization and momentum in deep learning, in International Conference on Machine Learning, pp. 1139–1147 (2013) I. Sutskever, J. Martens, G. Dahl, G. Hinton, On the importance of initialization and momentum in deep learning, in International Conference on Machine Learning, pp. 1139–1147 (2013)
240.
Zurück zum Zitat G. Swirszcz, W. M. Czarnecki, R. Pascanu, Local minima in training of deep networks (2016) G. Swirszcz, W. M. Czarnecki, R. Pascanu, Local minima in training of deep networks (2016)
241.
Zurück zum Zitat Y. S. Tan, R. Vershynin, Online stochastic gradient descent with arbitrary initialization solves non-smooth, non-convex phase retrieval. Preprint (2019). arXiv:1910.12837 Y. S. Tan, R. Vershynin, Online stochastic gradient descent with arbitrary initialization solves non-smooth, non-convex phase retrieval. Preprint (2019). arXiv:1910.12837
242.
Zurück zum Zitat W. Tao, Z. Pan, G. Wu, Q. Tao, Primal averaging: A new gradient evaluation step to attain the optimal individual convergence. IEEE Trans. Cybern. 50(2), 835–845 (2018) CrossRef W. Tao, Z. Pan, G. Wu, Q. Tao, Primal averaging: A new gradient evaluation step to attain the optimal individual convergence. IEEE Trans. Cybern. 50(2), 835–845 (2018) CrossRef
243.
Zurück zum Zitat A. Taylor, F. Bach, Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions, in Conference on Learning Theory, pp. 2934–2992 (2019) A. Taylor, F. Bach, Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions, in Conference on Learning Theory, pp. 2934–2992 (2019)
244.
Zurück zum Zitat T. Tieleman, G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Networks Mach. Learn. 4(2), 26–31 (2012) T. Tieleman, G. Hinton, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Networks Mach. Learn. 4(2), 26–31 (2012)
245.
Zurück zum Zitat N. Tripuraneni, M. Stern, C. Jin, J. Regier, M. I. Jordan, Stochastic cubic regularization for fast nonconvex optimization, in Advances in Neural Information Processing Systems, pp. 2899–2908 (2018) N. Tripuraneni, M. Stern, C. Jin, J. Regier, M. I. Jordan, Stochastic cubic regularization for fast nonconvex optimization, in Advances in Neural Information Processing Systems, pp. 2899–2908 (2018)
246.
Zurück zum Zitat P. Tseng, An incremental gradient (-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8(2), 506–531 (1998) MathSciNetMATHCrossRef P. Tseng, An incremental gradient (-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8(2), 506–531 (1998) MathSciNetMATHCrossRef
247.
Zurück zum Zitat I. Usmanova, Robust solutions to stochastic optimization problems. Master Thesis (MSIAM); Institut Polytechnique de Grenoble ENSIMAG, Laboratoire Jean Kuntzmann (2017) I. Usmanova, Robust solutions to stochastic optimization problems. Master Thesis (MSIAM); Institut Polytechnique de Grenoble ENSIMAG, Laboratoire Jean Kuntzmann (2017)
248.
Zurück zum Zitat A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
249.
Zurück zum Zitat S. Vaswani, F. Bach, M. Schmidt, Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron, in The 22nd International Conference on Artificial Intelligence and Statistics (PMLR, 2019), pp. 1195–1204 S. Vaswani, F. Bach, M. Schmidt, Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron, in The 22nd International Conference on Artificial Intelligence and Statistics (PMLR, 2019), pp. 1195–1204
250.
Zurück zum Zitat S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, S. Lacoste-Julien, Painless stochastic gradient: Interpolation, line-search, and convergence rates, in Advances in Neural Information Processing Systems, pp. 3732–3745 (2019) S. Vaswani, A. Mishkin, I. Laradji, M. Schmidt, G. Gidel, S. Lacoste-Julien, Painless stochastic gradient: Interpolation, line-search, and convergence rates, in Advances in Neural Information Processing Systems, pp. 3732–3745 (2019)
252.
Zurück zum Zitat R. Vidal, J. Bruna, R. Giryes, S. Soatto, Mathematics of deep learning. Preprint (2017). arXiv:1712.04741 R. Vidal, J. Bruna, R. Giryes, S. Soatto, Mathematics of deep learning. Preprint (2017). arXiv:1712.04741
253.
Zurück zum Zitat Z. Wang, K. Ji, Y. Zhou, Y. Liang, V. Tarokh, Spiderboost: A class of faster variance-reduced algorithms for nonconvex optimization. Preprint (2018). arXiv:1810.10690 Z. Wang, K. Ji, Y. Zhou, Y. Liang, V. Tarokh, Spiderboost: A class of faster variance-reduced algorithms for nonconvex optimization. Preprint (2018). arXiv:1810.10690
254.
Zurück zum Zitat Z. Wang, K. Ji, Y. Zhou, Y. Liang, V. Tarokh, Spiderboost and momentum: Faster variance reduction algorithms, in Advances in Neural Information Processing Systems, pp. 2403–2413 (2019) Z. Wang, K. Ji, Y. Zhou, Y. Liang, V. Tarokh, Spiderboost and momentum: Faster variance reduction algorithms, in Advances in Neural Information Processing Systems, pp. 2403–2413 (2019)
255.
Zurück zum Zitat Z. Wang, Y. Zhou, Y. Liang, G. Lan, Stochastic variance-reduced cubic regularization for nonconvex optimization, in The 22nd International Conference on Artificial Intelligence and Statistics (PMLR, 2019), pp. 2731–2740 Z. Wang, Y. Zhou, Y. Liang, G. Lan, Stochastic variance-reduced cubic regularization for nonconvex optimization, in The 22nd International Conference on Artificial Intelligence and Statistics (PMLR, 2019), pp. 2731–2740
256.
Zurück zum Zitat Z. Wang, Y. Zhou, Y. Liang, G. Lan, Cubic regularization with momentum for nonconvex optimization, in Uncertainty in Artificial Intelligence (PMLR, 2020), pp. 313–322 Z. Wang, Y. Zhou, Y. Liang, G. Lan, Cubic regularization with momentum for nonconvex optimization, in Uncertainty in Artificial Intelligence (PMLR, 2020), pp. 313–322
257.
Zurück zum Zitat R. Ward, X. Wu, L. Bottou, Adagrad stepsizes: Sharp convergence over nonconvex landscapes, in International Conference on Machine Learning (PMLR, 2019), pp. 6677–6686 R. Ward, X. Wu, L. Bottou, Adagrad stepsizes: Sharp convergence over nonconvex landscapes, in International Conference on Machine Learning (PMLR, 2019), pp. 6677–6686
258.
Zurück zum Zitat A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, B. Recht, The marginal value of adaptive gradient methods in machine learning, in Advances in Neural Information Processing Systems, pp. 4148–4158 (2017) A. C. Wilson, R. Roelofs, M. Stern, N. Srebro, B. Recht, The marginal value of adaptive gradient methods in machine learning, in Advances in Neural Information Processing Systems, pp. 4148–4158 (2017)
260.
Zurück zum Zitat F. Wu, P. Rebeschini, Hadamard wirtinger flow for sparse phase retrieval. Preprint (2020). arXiv:2006.01065 F. Wu, P. Rebeschini, Hadamard wirtinger flow for sparse phase retrieval. Preprint (2020). arXiv:2006.01065
261.
Zurück zum Zitat G. Xie, L. Luo, Z. Zhang, A general analysis framework of lower complexity bounds for finite-sum optimization. Preprint (2019). arXiv:1908.08394 G. Xie, L. Luo, Z. Zhang, A general analysis framework of lower complexity bounds for finite-sum optimization. Preprint (2019). arXiv:1908.08394
262.
Zurück zum Zitat Y. Xu, Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. Preprint (2020). arXiv:2006.00425 Y. Xu, Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. Preprint (2020). arXiv:2006.00425
263.
Zurück zum Zitat P. Xu, J. Chen, D. Zou, Q. Gu, Global convergence of langevin dynamics based algorithms for nonconvex optimization, in Advances in Neural Information Processing Systems, pp. 3122–3133 (2018) P. Xu, J. Chen, D. Zou, Q. Gu, Global convergence of langevin dynamics based algorithms for nonconvex optimization, in Advances in Neural Information Processing Systems, pp. 3122–3133 (2018)
264.
Zurück zum Zitat Y. Xu, R. Jin, T. Yang, First-order stochastic algorithms for escaping from saddle points in almost linear time, in Advances in Neural Information Processing Systems, pp. 5530–5540 (2018) Y. Xu, R. Jin, T. Yang, First-order stochastic algorithms for escaping from saddle points in almost linear time, in Advances in Neural Information Processing Systems, pp. 5530–5540 (2018)
265.
Zurück zum Zitat P. Xu, F. Roosta, M. W. Mahoney, Newton-type methods for non-convex optimization under inexact hessian information. Mathematical Programming 184(1), 35–70 (2020) MathSciNetMATHCrossRef P. Xu, F. Roosta, M. W. Mahoney, Newton-type methods for non-convex optimization under inexact hessian information. Mathematical Programming 184(1), 35–70 (2020) MathSciNetMATHCrossRef
266.
Zurück zum Zitat P. Xu, F. Roosta, M. W. Mahoney, Second-order optimization for non-convex machine learning: An empirical study, in Proceedings of the 2020 SIAM International Conference on Data Mining (SIAM, 2020), pp. 199–207 P. Xu, F. Roosta, M. W. Mahoney, Second-order optimization for non-convex machine learning: An empirical study, in Proceedings of the 2020 SIAM International Conference on Data Mining (SIAM, 2020), pp. 199–207
267.
Zurück zum Zitat Y. Yan, T. Yang, Z. Li, Q. Lin, Y. Yang, A unified analysis of stochastic momentum methods for deep learning, in Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 2955–2961 (2018) Y. Yan, T. Yang, Z. Li, Q. Lin, Y. Yang, A unified analysis of stochastic momentum methods for deep learning, in Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 2955–2961 (2018)
268.
Zurück zum Zitat Z. Yang, L. F. Yang, E. X. Fang, T. Zhao, Z. Wang, M. Neykov, Misspecified nonconvex statistical optimization for sparse phase retrieval. Mathematical Programming 176(1–2), 545–571 (2019) MathSciNetMATHCrossRef Z. Yang, L. F. Yang, E. X. Fang, T. Zhao, Z. Wang, M. Neykov, Misspecified nonconvex statistical optimization for sparse phase retrieval. Mathematical Programming 176(1–2), 545–571 (2019) MathSciNetMATHCrossRef
269.
Zurück zum Zitat C. Yun, S. Sra, A. Jadbabaie, Small nonlinearities in activation functions create bad local minima in neural networks. Preprint (2018). arXiv:1802.03487 C. Yun, S. Sra, A. Jadbabaie, Small nonlinearities in activation functions create bad local minima in neural networks. Preprint (2018). arXiv:1802.03487
270.
Zurück zum Zitat J. Yun, A. C. Lozano, E. Yang, A general family of stochastic proximal gradient methods for deep learning. Preprint (2020). arXiv:2007.07484 J. Yun, A. C. Lozano, E. Yang, A general family of stochastic proximal gradient methods for deep learning. Preprint (2020). arXiv:2007.07484
271.
Zurück zum Zitat M. Zaheer, S. Reddi, D. Sachan, S. Kale, S. Kumar, Adaptive methods for nonconvex optimization, in Advances in Neural Information Processing Systems, pp. 9793–9803 (2018) M. Zaheer, S. Reddi, D. Sachan, S. Kale, S. Kumar, Adaptive methods for nonconvex optimization, in Advances in Neural Information Processing Systems, pp. 9793–9803 (2018)
272.
Zurück zum Zitat R. Y. Zhang, Sharp global guarantees for nonconvex low-rank matrix recovery in the overparameterized regime. Preprint (2021). arXiv:2104.10790 R. Y. Zhang, Sharp global guarantees for nonconvex low-rank matrix recovery in the overparameterized regime. Preprint (2021). arXiv:2104.10790
273.
Zurück zum Zitat J. Zhang, L. Xiao, Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization. Preprint (2020). arXiv:2004.04357 J. Zhang, L. Xiao, Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization. Preprint (2020). arXiv:2004.04357
274.
Zurück zum Zitat J. Zhang, L. Xiao, S. Zhang, Adaptive stochastic variance reduction for subsampled newton method with cubic regularization. Preprint (2018). arXiv:1811.11637 J. Zhang, L. Xiao, S. Zhang, Adaptive stochastic variance reduction for subsampled newton method with cubic regularization. Preprint (2018). arXiv:1811.11637
275.
Zurück zum Zitat B. Zhang, J. Jin, C. Fang, L. Wang, Improved analysis of clipping algorithms for non-convex optimization, in Advances in Neural Information Processing Systems, p. 33 (2020) B. Zhang, J. Jin, C. Fang, L. Wang, Improved analysis of clipping algorithms for non-convex optimization, in Advances in Neural Information Processing Systems, p. 33 (2020)
276.
Zurück zum Zitat J. Zhang, T. He, S. Sra, A. Jadbabaie, Why gradient clipping accelerates training: A theoretical justification for adaptivity, in International Conference on Learning Representations (2020) J. Zhang, T. He, S. Sra, A. Jadbabaie, Why gradient clipping accelerates training: A theoretical justification for adaptivity, in International Conference on Learning Representations (2020)
277.
Zurück zum Zitat J. Zhang, S. P. Karimireddy, A. Veit, S. Kim, S. Reddi, S. Kumar, S. Sra, Why are adaptive methods good for attention models? in Advances in Neural Information Processing Systems, p. 33 (2020) J. Zhang, S. P. Karimireddy, A. Veit, S. Kim, S. Reddi, S. Kumar, S. Sra, Why are adaptive methods good for attention models? in Advances in Neural Information Processing Systems, p. 33 (2020)
278.
Zurück zum Zitat Y. Zhang, Q. Qu, J. Wright, From symmetry to geometry: Tractable nonconvex problems. Preprint (2020). arXiv:2007.06753 Y. Zhang, Q. Qu, J. Wright, From symmetry to geometry: Tractable nonconvex problems. Preprint (2020). arXiv:2007.06753
279.
Zurück zum Zitat Y. Zhang, Q. Qu, J. Wright, From symmetry to geometry: Tractable nonconvex problems. Preprint (2020). arXiv:2007.06753 Y. Zhang, Q. Qu, J. Wright, From symmetry to geometry: Tractable nonconvex problems. Preprint (2020). arXiv:2007.06753
280.
Zurück zum Zitat Y. Zhang, Y. Zhou, K. Ji, M. M. Zavlanos, Boosting one-point derivative-free online optimization via residual feedback (2020) Y. Zhang, Y. Zhou, K. Ji, M. M. Zavlanos, Boosting one-point derivative-free online optimization via residual feedback (2020)
281.
Zurück zum Zitat C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021) CrossRef C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021) CrossRef
282.
Zurück zum Zitat H. Zhang, Y. Bi, J. Lavaei, General low-rank matrix optimization: Geometric analysis and sharper bounds. Preprint (2021). arXiv:2104.10356 H. Zhang, Y. Bi, J. Lavaei, General low-rank matrix optimization: Geometric analysis and sharper bounds. Preprint (2021). arXiv:2104.10356
283.
Zurück zum Zitat A. Zhigljavsky, A. Zilinskas, Stochastic Global Optimization, vol. 9 (Springer Science & Business Media, 2007) A. Zhigljavsky, A. Zilinskas, Stochastic Global Optimization, vol. 9 (Springer Science & Business Media, 2007)
284.
Zurück zum Zitat D. Zhou, Q. Gu, Lower bounds for smooth nonconvex finite-sum optimization, in International Conference on Machine Learning, pp. 7574–7583 (2019) D. Zhou, Q. Gu, Lower bounds for smooth nonconvex finite-sum optimization, in International Conference on Machine Learning, pp. 7574–7583 (2019)
285.
Zurück zum Zitat D. Zhou, Y. Tang, Z. Yang, Y. Cao, Q. Gu, On the convergence of adaptive gradient methods for nonconvex optimization. Preprint (2018). arXiv:1808.05671 D. Zhou, Y. Tang, Z. Yang, Y. Cao, Q. Gu, On the convergence of adaptive gradient methods for nonconvex optimization. Preprint (2018). arXiv:1808.05671
286.
Zurück zum Zitat D. Zhou, P. Xu, Q. Gu, Stochastic nested variance reduced gradient descent for nonconvex optimization, in Advances in Neural Information Processing Systems (2018) D. Zhou, P. Xu, Q. Gu, Stochastic nested variance reduced gradient descent for nonconvex optimization, in Advances in Neural Information Processing Systems (2018)
287.
Zurück zum Zitat D. Zhou, P. Xu, Q. Gu, Stochastic variance-reduced cubic regularization methods. J. Mach. Learn. Res. 20(134), 1–47 (2019) MathSciNetMATH D. Zhou, P. Xu, Q. Gu, Stochastic variance-reduced cubic regularization methods. J. Mach. Learn. Res. 20(134), 1–47 (2019) MathSciNetMATH
288.
Zurück zum Zitat D. Zhou, P. Xu, Q. Gu, Stochastic variance-reduced cubic regularization methods. J. Mach. Learn. Res. 20(134), 1–47 (2019) MathSciNetMATH D. Zhou, P. Xu, Q. Gu, Stochastic variance-reduced cubic regularization methods. J. Mach. Learn. Res. 20(134), 1–47 (2019) MathSciNetMATH
289.
Zurück zum Zitat D. Zhou, Q. Gu, Stochastic recursive variance-reduced cubic regularization methods, in International Conference on Artificial Intelligence and Statistics (PMLR, 2020), pp. 3980–3990 D. Zhou, Q. Gu, Stochastic recursive variance-reduced cubic regularization methods, in International Conference on Artificial Intelligence and Statistics (PMLR, 2020), pp. 3980–3990
290.
Zurück zum Zitat X. Zhu, J. Han, B. Jiang, An adaptive high order method for finding third-order critical points of nonconvex optimization. Preprint (2020). arXiv:2008.04191 X. Zhu, J. Han, B. Jiang, An adaptive high order method for finding third-order critical points of nonconvex optimization. Preprint (2020). arXiv:2008.04191
Metadaten
Titel
Recent Theoretical Advances in Non-Convex Optimization
verfasst von
Marina Danilova
Pavel Dvurechensky
Alexander Gasnikov
Eduard Gorbunov
Sergey Guminov
Dmitry Kamzolov
Innokentiy Shibaev
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-031-00832-0_3

Stellenausschreibungen

Anzeige

Premium Partner