Skip to main content
Erschienen in: Cognitive Computation 2/2022

11.01.2022

LightAdam: Towards a Fast and Accurate Adaptive Momentum Online Algorithm

verfasst von: Yangfan Zhou, Kaizhu Huang, Cheng Cheng, Xuguang Wang, Xin Liu

Erschienen in: Cognitive Computation | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Adaptive optimization algorithms enjoy fast convergence and have been widely exploited in pattern recognition and cognitively-inspired machine learning. These algorithms may however be of high computational cost and low generalization ability due to their projection steps. Such limitations make them difficult to be applied in big data analytics, which may typically be seen in cognitively inspired learning, e.g. deep learning tasks. In this paper, we propose a fast and accurate adaptive momentum online algorithm, called LightAdam, to alleviate the drawbacks of projection steps for the adaptive algorithms. The proposed algorithm substantially reduces computational cost for each iteration step by replacing high-order projection operators with one-dimensional linear searches. Moreover, we introduce a novel second-order momentum and engage dynamic learning rate bounds in the proposed algorithm, thereby obtaining a higher generalization ability than other adaptive algorithms. We theoretically analyze that our proposed algorithm has a guaranteed convergence bound, and prove that our proposed algorithm has better generalization capability as compared to Adam. We conduct extensive experiments on three public datasets for image pattern classification, and validate the computational benefit and accuracy performance of the proposed algorithm in comparison with other state-of-the-art adaptive optimization algorithms

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat McMahan HB. Streeter MJ. Adaptive bound optimization for online convex optimization, in: The 23rd Conference on Learning Theory. 2010:244–256. McMahan HB. Streeter MJ. Adaptive bound optimization for online convex optimization, in: The 23rd Conference on Learning Theory. 2010:244–256.
2.
Zurück zum Zitat Sutskever I, Martens J, Dahl GE, Hinton GE. On the importance of initialization and momentum in deep learning, in: Proceedings of the 30th International Conference on Machine Learning. 2013:1139–1147. Sutskever I, Martens J, Dahl GE, Hinton GE. On the importance of initialization and momentum in deep learning, in: Proceedings of the 30th International Conference on Machine Learning. 2013:1139–1147.
3.
Zurück zum Zitat Long M, Cao Y, Cao Z, Wang J, Jordan M. Transferable representation learning with deep adaptation networks. IEEE Trans Pattern Anal Mach Intell. 2019;41:3071–85.CrossRef Long M, Cao Y, Cao Z, Wang J, Jordan M. Transferable representation learning with deep adaptation networks. IEEE Trans Pattern Anal Mach Intell. 2019;41:3071–85.CrossRef
4.
Zurück zum Zitat Yang X, Huang K, Zhang R, et al. A Novel Deep Density Model for Unsupervised Learning. Cogn Comput. 2019;11:778–88.CrossRef Yang X, Huang K, Zhang R, et al. A Novel Deep Density Model for Unsupervised Learning. Cogn Comput. 2019;11:778–88.CrossRef
5.
Zurück zum Zitat Nguyen B, Morell C, Baets BD. Scalable large-margin distance metric learning using stochastic gradient descent. IEEE Transactions on Cybernetics. 2020;50:1072–83.CrossRef Nguyen B, Morell C, Baets BD. Scalable large-margin distance metric learning using stochastic gradient descent. IEEE Transactions on Cybernetics. 2020;50:1072–83.CrossRef
6.
Zurück zum Zitat Balcan M, Khodak M, Talwalkar A. Provable guarantees for gradient-based meta-learning, in: Proceedings of the 36th International Conference on Machine Learning, 2019:424–433. Balcan M, Khodak M, Talwalkar A. Provable guarantees for gradient-based meta-learning, in: Proceedings of the 36th International Conference on Machine Learning, 2019:424–433.
7.
Zurück zum Zitat Nesterov Y. A method for unconstrained convex minimization problem with the rate of convergence o(1=k2). Doklady AN USSR. 1983;269:543–7. Nesterov Y. A method for unconstrained convex minimization problem with the rate of convergence o(1=k2). Doklady AN USSR. 1983;269:543–7.
8.
Zurück zum Zitat Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, in: COURSERA: Neural Networks for Machine Learning. 2012. Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, in: COURSERA: Neural Networks for Machine Learning. 2012.
9.
Zurück zum Zitat Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12:2121–59.MathSciNetMATH Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12:2121–59.MathSciNetMATH
10.
Zurück zum Zitat Ghadimi E, Feyzmahdavian HR, Johansson M. Global convergence of the heavy-ball method for convex optimization, in: Proceedings of The European Control Conference. 2015:310–315. Ghadimi E, Feyzmahdavian HR, Johansson M. Global convergence of the heavy-ball method for convex optimization, in: Proceedings of The European Control Conference. 2015:310–315.
11.
Zurück zum Zitat Yang X, Zheng X, Gao H. SGD-Based Adaptive NN Control Design for Uncertain Nonlinear Systems. IEEE Transactions on Neural Networks and Learning Systems. 2018;29(10):5071–83.MathSciNetCrossRef Yang X, Zheng X, Gao H. SGD-Based Adaptive NN Control Design for Uncertain Nonlinear Systems. IEEE Transactions on Neural Networks and Learning Systems. 2018;29(10):5071–83.MathSciNetCrossRef
12.
Zurück zum Zitat Peng Y, Hao Z, Yun X. Lock-free parallelization for variance-reduced stochastic gradient descent onstreaming data. IEEE Trans Parallel Distrib Syst. 2020;31:2220–31.CrossRef Peng Y, Hao Z, Yun X. Lock-free parallelization for variance-reduced stochastic gradient descent onstreaming data. IEEE Trans Parallel Distrib Syst. 2020;31:2220–31.CrossRef
13.
Zurück zum Zitat Perantonis SJ, Karras DA. An efficient constrained learning algorithm with momentum acceleration. Neural Netw. 1995;8:237–49.CrossRef Perantonis SJ, Karras DA. An efficient constrained learning algorithm with momentum acceleration. Neural Netw. 1995;8:237–49.CrossRef
14.
Zurück zum Zitat Kingma DP, Ba JL. Adam: A method for stochastic optimization, in: Proceedings of the 3rd International Conference on Learning Representations. 2015:1–15. Kingma DP, Ba JL. Adam: A method for stochastic optimization, in: Proceedings of the 3rd International Conference on Learning Representations. 2015:1–15.
15.
Zurück zum Zitat Gu G, Dogandžić A. Projected nesterov’s proximal-gradient algorithm for sparse signal recovery. IEEE Trans Signal Process. 2017;65:3510–25.MathSciNetCrossRef Gu G, Dogandžić A. Projected nesterov’s proximal-gradient algorithm for sparse signal recovery. IEEE Trans Signal Process. 2017;65:3510–25.MathSciNetCrossRef
16.
Zurück zum Zitat Chen J, Zhou D, Tang Y, et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks, in: Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence. 2020. Chen J, Zhou D, Tang Y, et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks, in: Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence. 2020.
17.
Zurück zum Zitat Reddi SJ, Kale K, Kumar S. On the convergence of adam and beyond, in: Proceedings of the Sixth International Conference on Learning Representations. 2018:1–23. Reddi SJ, Kale K, Kumar S. On the convergence of adam and beyond, in: Proceedings of the Sixth International Conference on Learning Representations. 2018:1–23.
19.
Zurück zum Zitat Luo L, Xiong Y, Liu Y, Sun X. Adaptive gradient methods with dynamic bound of learning rate, in: Proceedings of the Seventh International Conference on Learning Representations. 2019:1–19. Luo L, Xiong Y, Liu Y, Sun X. Adaptive gradient methods with dynamic bound of learning rate, in: Proceedings of the Seventh International Conference on Learning Representations. 2019:1–19.
20.
Zurück zum Zitat Zhou Z, Zhang Q, Lu G, Wang H, Zhang W, Yu Y. Adashift: Decorrelation and convergence of adaptive learning rate methods. 2019:1–26. Zhou Z, Zhang Q, Lu G, Wang H, Zhang W, Yu Y. Adashift: Decorrelation and convergence of adaptive learning rate methods. 2019:1–26.
21.
Zurück zum Zitat Hazan E, Kale S. Projection-free online learning, in: Proceedings of the 29th International Conference on Machine Learning, 2012:1–8. Hazan E, Kale S. Projection-free online learning, in: Proceedings of the 29th International Conference on Machine Learning, 2012:1–8.
22.
Zurück zum Zitat Balles L, Hennig P. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients, in: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:404-413. 2018. Balles L, Hennig P. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients, in: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:404-413. 2018.
23.
Zurück zum Zitat Chen L, Harshaw C, Hassani H, Karbasi A. Projection-free online optimization with stochastic gradient: From convexity to submodularity, in: Proceedings of the 35th International Conference on Machine Learning. 2018:813–822. Chen L, Harshaw C, Hassani H, Karbasi A. Projection-free online optimization with stochastic gradient: From convexity to submodularity, in: Proceedings of the 35th International Conference on Machine Learning. 2018:813–822.
24.
Zurück zum Zitat Hazan E, Minasyan E. Faster projection-free online learning, in: Proceedings of the 33rd Annual Conference on Learning Theory. 2020:1877–1893. Hazan E, Minasyan E. Faster projection-free online learning, in: Proceedings of the 33rd Annual Conference on Learning Theory. 2020:1877–1893.
25.
Zurück zum Zitat Zhang M, Zhou Y, Quan W, Zhu J, Zheng R, Wu Q. Online learning for iot optimization: A frank-wolfe adam based algorithm. IEEE Internet Things J. 2020;7:8228–37.CrossRef Zhang M, Zhou Y, Quan W, Zhu J, Zheng R, Wu Q. Online learning for iot optimization: A frank-wolfe adam based algorithm. IEEE Internet Things J. 2020;7:8228–37.CrossRef
26.
Zurück zum Zitat Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent, in: Proceedings of the Twentieth International Conference on Machine Learning. 2003:928–936. Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent, in: Proceedings of the Twentieth International Conference on Machine Learning. 2003:928–936.
27.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770–778. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770–778.
28.
Zurück zum Zitat Huang G, Liu Z, Maaten L, Weinberger KQ. Densely connected convolutional network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:1–9. Huang G, Liu Z, Maaten L, Weinberger KQ. Densely connected convolutional network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:1–9.
29.
Zurück zum Zitat Berrada L, Zisserman A, Kumar MP. Deep Frank-Wolfe For Neural Network Optimization, in: Proceedings of the International Conference on Learning Representations. 2019. Berrada L, Zisserman A, Kumar MP. Deep Frank-Wolfe For Neural Network Optimization, in: Proceedings of the International Conference on Learning Representations. 2019.
30.
Zurück zum Zitat Lu H, Jin L, Luo X, et al. RNN for Solving Perturbed Time-Varying Underdetermined Linear System With Double Bound Limits on Residual Errors and State Variables. IEEE Trans Industr Inf. 2019;15(11):5931–42.CrossRef Lu H, Jin L, Luo X, et al. RNN for Solving Perturbed Time-Varying Underdetermined Linear System With Double Bound Limits on Residual Errors and State Variables. IEEE Trans Industr Inf. 2019;15(11):5931–42.CrossRef
31.
Zurück zum Zitat Xin L, Zhou M, Shang M, Xia Y. A Novel Approach to Extracting Non-Negative Latent Factors From Non-Negative Big Sparse Matrices. IEEE Access. 2016;4:2649–55.CrossRef Xin L, Zhou M, Shang M, Xia Y. A Novel Approach to Extracting Non-Negative Latent Factors From Non-Negative Big Sparse Matrices. IEEE Access. 2016;4:2649–55.CrossRef
32.
Zurück zum Zitat Luo X, Zhou MC, Li S, et al. Algorithms of Unconstrained Non-negative Latent Factor Analysis for Recommender Systems. IEEE Transactions on Big Data. 2021;7(1):227–40.CrossRef Luo X, Zhou MC, Li S, et al. Algorithms of Unconstrained Non-negative Latent Factor Analysis for Recommender Systems. IEEE Transactions on Big Data. 2021;7(1):227–40.CrossRef
Metadaten
Titel
LightAdam: Towards a Fast and Accurate Adaptive Momentum Online Algorithm
verfasst von
Yangfan Zhou
Kaizhu Huang
Cheng Cheng
Xuguang Wang
Xin Liu
Publikationsdatum
11.01.2022
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 2/2022
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-021-09985-9

Weitere Artikel der Ausgabe 2/2022

Cognitive Computation 2/2022 Zur Ausgabe