nach oben

Cognitive Computation

Erschienen in:

11.01.2022

LightAdam: Towards a Fast and Accurate Adaptive Momentum Online Algorithm

verfasst von: Yangfan Zhou, Kaizhu Huang, Cheng Cheng, Xuguang Wang, Xin Liu

Erschienen in: Cognitive Computation | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Adaptive optimization algorithms enjoy fast convergence and have been widely exploited in pattern recognition and cognitively-inspired machine learning. These algorithms may however be of high computational cost and low generalization ability due to their projection steps. Such limitations make them difficult to be applied in big data analytics, which may typically be seen in cognitively inspired learning, e.g. deep learning tasks. In this paper, we propose a fast and accurate adaptive momentum online algorithm, called LightAdam, to alleviate the drawbacks of projection steps for the adaptive algorithms. The proposed algorithm substantially reduces computational cost for each iteration step by replacing high-order projection operators with one-dimensional linear searches. Moreover, we introduce a novel second-order momentum and engage dynamic learning rate bounds in the proposed algorithm, thereby obtaining a higher generalization ability than other adaptive algorithms. We theoretically analyze that our proposed algorithm has a guaranteed convergence bound, and prove that our proposed algorithm has better generalization capability as compared to Adam. We conduct extensive experiments on three public datasets for image pattern classification, and validate the computational benefit and accuracy performance of the proposed algorithm in comparison with other state-of-the-art adaptive optimization algorithms

Vorheriger Artikel Embeddings Evaluation Using a Novel Measure of Semantic Similarity

Nächster Artikel Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

McMahan HB. Streeter MJ. Adaptive bound optimization for online convex optimization, in: The 23rd Conference on Learning Theory. 2010:244–256.

Sutskever I, Martens J, Dahl GE, Hinton GE. On the importance of initialization and momentum in deep learning, in: Proceedings of the 30th International Conference on Machine Learning. 2013:1139–1147.

Long M, Cao Y, Cao Z, Wang J, Jordan M. Transferable representation learning with deep adaptation networks. IEEE Trans Pattern Anal Mach Intell. 2019;41:3071–85.CrossRef

Yang X, Huang K, Zhang R, et al. A Novel Deep Density Model for Unsupervised Learning. Cogn Comput. 2019;11:778–88.CrossRef

Nguyen B, Morell C, Baets BD. Scalable large-margin distance metric learning using stochastic gradient descent. IEEE Transactions on Cybernetics. 2020;50:1072–83.CrossRef

Balcan M, Khodak M, Talwalkar A. Provable guarantees for gradient-based meta-learning, in: Proceedings of the 36th International Conference on Machine Learning, 2019:424–433.

Nesterov Y. A method for unconstrained convex minimization problem with the rate of convergence o(1=k2). Doklady AN USSR. 1983;269:543–7.

Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, in: COURSERA: Neural Networks for Machine Learning. 2012.

Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12:2121–59.MathSciNetMATH

10.

Ghadimi E, Feyzmahdavian HR, Johansson M. Global convergence of the heavy-ball method for convex optimization, in: Proceedings of The European Control Conference. 2015:310–315.

11.

Yang X, Zheng X, Gao H. SGD-Based Adaptive NN Control Design for Uncertain Nonlinear Systems. IEEE Transactions on Neural Networks and Learning Systems. 2018;29(10):5071–83.MathSciNetCrossRef

12.

Peng Y, Hao Z, Yun X. Lock-free parallelization for variance-reduced stochastic gradient descent onstreaming data. IEEE Trans Parallel Distrib Syst. 2020;31:2220–31.CrossRef

13.

Perantonis SJ, Karras DA. An efficient constrained learning algorithm with momentum acceleration. Neural Netw. 1995;8:237–49.CrossRef

14.

Kingma DP, Ba JL. Adam: A method for stochastic optimization, in: Proceedings of the 3rd International Conference on Learning Representations. 2015:1–15.

15.

Gu G, Dogandžić A. Projected nesterov’s proximal-gradient algorithm for sparse signal recovery. IEEE Trans Signal Process. 2017;65:3510–25.MathSciNetCrossRef

16.

Chen J, Zhou D, Tang Y, et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks, in: Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence. 2020.

17.

Reddi SJ, Kale K, Kumar S. On the convergence of adam and beyond, in: Proceedings of the Sixth International Conference on Learning Representations. 2018:1–23.

18.

Li W, Zhang Z, Wang X, Luo P. Adax: Adaptive gradient descent with exponential long term memory. 2020. https://arxiv.org/abs/2004.09740

19.

Luo L, Xiong Y, Liu Y, Sun X. Adaptive gradient methods with dynamic bound of learning rate, in: Proceedings of the Seventh International Conference on Learning Representations. 2019:1–19.

20.

Zhou Z, Zhang Q, Lu G, Wang H, Zhang W, Yu Y. Adashift: Decorrelation and convergence of adaptive learning rate methods. 2019:1–26.

21.

Hazan E, Kale S. Projection-free online learning, in: Proceedings of the 29th International Conference on Machine Learning, 2012:1–8.

22.

Balles L, Hennig P. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients, in: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:404-413. 2018.

23.

Chen L, Harshaw C, Hassani H, Karbasi A. Projection-free online optimization with stochastic gradient: From convexity to submodularity, in: Proceedings of the 35th International Conference on Machine Learning. 2018:813–822.

24.

Hazan E, Minasyan E. Faster projection-free online learning, in: Proceedings of the 33rd Annual Conference on Learning Theory. 2020:1877–1893.

25.

Zhang M, Zhou Y, Quan W, Zhu J, Zheng R, Wu Q. Online learning for iot optimization: A frank-wolfe adam based algorithm. IEEE Internet Things J. 2020;7:8228–37.CrossRef

26.

Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent, in: Proceedings of the Twentieth International Conference on Machine Learning. 2003:928–936.

27.

He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770–778.

28.

Huang G, Liu Z, Maaten L, Weinberger KQ. Densely connected convolutional network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:1–9.

29.

Berrada L, Zisserman A, Kumar MP. Deep Frank-Wolfe For Neural Network Optimization, in: Proceedings of the International Conference on Learning Representations. 2019.

30.

Lu H, Jin L, Luo X, et al. RNN for Solving Perturbed Time-Varying Underdetermined Linear System With Double Bound Limits on Residual Errors and State Variables. IEEE Trans Industr Inf. 2019;15(11):5931–42.CrossRef

31.

Xin L, Zhou M, Shang M, Xia Y. A Novel Approach to Extracting Non-Negative Latent Factors From Non-Negative Big Sparse Matrices. IEEE Access. 2016;4:2649–55.CrossRef

32.

Luo X, Zhou MC, Li S, et al. Algorithms of Unconstrained Non-negative Latent Factor Analysis for Recommender Systems. IEEE Transactions on Big Data. 2021;7(1):227–40.CrossRef

Titel: LightAdam: Towards a Fast and Accurate Adaptive Momentum Online Algorithm
verfasst von: Yangfan Zhou
Kaizhu Huang
Cheng Cheng
Xuguang Wang
Xin Liu
Publikationsdatum: 11.01.2022
Verlag: Springer US
Erschienen in: Cognitive Computation / Ausgabe 2/2022
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-021-09985-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2022

Automatic Assessment of Motor Impairments in Autism Spectrum Disorders: A Systematic Review

Short-term Airline Passenger Flow Prediction Based on the Attention Mechanism and Gated Recurrent Unit Model

A Novel Multiple Feature-Based Engine Knock Detection System using Sparse Bayesian Extreme Learning Machine

Differential Privacy in Cognitive Radio Networks: A Comprehensive Survey

Road Segmentation from High-Fidelity Remote Sensing Images using a Context Information Capture Network

Vector Symbolic Architectures for Context-Free Grammars