Skip to main content
Top
Published in: The Journal of Supercomputing 15/2023

09-05-2023

AdaXod: a new adaptive and momental bound algorithm for training deep neural networks

Authors: Yuanxuan Liu, Dequan Li

Published in: The Journal of Supercomputing | Issue 15/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Adaptive algorithms are widely used in deep learning because of their fast convergence. Among them, Adam is the most widely used algorithm. However, studies have shown that Adam’s generalization ability is weak. AdaX is a variant of Adam, which introduces a novel second-order momentum, modifies the second-order moment of Adam, and has good generalization ability. However, these algorithms may fail to converge due to instability and extreme learning rates during training. In this paper, we propose a new adaptive and momental bound algorithm, called AdaXod, which characterizes of exponentially averaging the learning rate and is particularly useful for training deep neural networks. By setting an adaptively limited learning rate in the AdaX algorithm, the resultant AdaXod can effectively eliminate the problem of excessive learning rate in the later stage of neural networks training and thus results in stable training. We conduct extensive experiments on different datasets and verify the advantages of the AdaXod algorithm by comparing with other advanced adaptive optimization algorithms. AdaXod eliminates large learning rates during neural networks training and outperforms other optimizers, especially for some neural networks with complex structures, such as DenseNet.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Sharma Neha, Jain Vibhor, Mishra Anju (2018) An analysis of convolutional neural networks for image classification. Procedia Comput Sci 132:377–384CrossRef Sharma Neha, Jain Vibhor, Mishra Anju (2018) An analysis of convolutional neural networks for image classification. Procedia Comput Sci 132:377–384CrossRef
2.
go back to reference Christian Szegedy, Alexander Toshev, Dumitru Erhan (2013) Deep neural networks for object detection. Adv Neural Inform Proc Syst 26:942 Christian Szegedy, Alexander Toshev, Dumitru Erhan (2013) Deep neural networks for object detection. Adv Neural Inform Proc Syst 26:942
3.
go back to reference Purwins Hendrik, Li Bo, Virtanen Tuomas, Schlüter Jan, Chang Shuo-Yiin, Sainath Tara (2019) Deep learning for audio signal processing. IEEE J Select Topics Signal Proc 13(2):206–219CrossRef Purwins Hendrik, Li Bo, Virtanen Tuomas, Schlüter Jan, Chang Shuo-Yiin, Sainath Tara (2019) Deep learning for audio signal processing. IEEE J Select Topics Signal Proc 13(2):206–219CrossRef
4.
go back to reference Can Burçak Kadir, Kaan Baykan Ömer, Harun Uğuz (2021) A new deep convolutional neural network model for classifying breast cancer histopathological images and the hyperparameter optimisation of the proposed model. J Supercomput 77(1):973–989CrossRef Can Burçak Kadir, Kaan Baykan Ömer, Harun Uğuz (2021) A new deep convolutional neural network model for classifying breast cancer histopathological images and the hyperparameter optimisation of the proposed model. J Supercomput 77(1):973–989CrossRef
5.
go back to reference Priyadarshini Ishaani, Cotton Chase (2021) A novel lstm-cnn-grid search-based deep neural network for sentiment analysis. J Supercomput 77(12):13911–13932CrossRef Priyadarshini Ishaani, Cotton Chase (2021) A novel lstm-cnn-grid search-based deep neural network for sentiment analysis. J Supercomput 77(12):13911–13932CrossRef
6.
go back to reference Do Luu-Ngoc, Yang Hyung-Jeong, Nguyen Hai-Duong, Kim Soo-Hyung, Lee Guee-Sang, Na In-Seop (2021) Deep neural network-based fusion model for emotion recognition using visual data. J Supercomput 77(10):10773–10790CrossRef Do Luu-Ngoc, Yang Hyung-Jeong, Nguyen Hai-Duong, Kim Soo-Hyung, Lee Guee-Sang, Na In-Seop (2021) Deep neural network-based fusion model for emotion recognition using visual data. J Supercomput 77(10):10773–10790CrossRef
7.
go back to reference McMahan H Brendan, Streeter Matthew (2010) Adaptive bound optimization for online convex optimization. arXiv preprintarXiv:1002.4908 McMahan H Brendan, Streeter Matthew (2010) Adaptive bound optimization for online convex optimization. arXiv preprintarXiv:1002.4908
8.
go back to reference Sutskever Ilya, Martens James, Dahl George, Hinton Geoffrey (2013) On the importance of initialization and momentum in deep learning. In International conference on machine learning pages 1139–1147. PMLR Sutskever Ilya, Martens James, Dahl George, Hinton Geoffrey (2013) On the importance of initialization and momentum in deep learning. In International conference on machine learning pages 1139–1147. PMLR
9.
go back to reference Mingsheng Long, Yue Cao, Zhangjie Cao, Jianmin Wang, Jordan Michael I (2018) Transferable representation learning with deep adaptation networks. IEEE Trans Pattern Anal Machine Intell 41(12):3071–3085 Mingsheng Long, Yue Cao, Zhangjie Cao, Jianmin Wang, Jordan Michael I (2018) Transferable representation learning with deep adaptation networks. IEEE Trans Pattern Anal Machine Intell 41(12):3071–3085
10.
go back to reference Xi Yang, Kaizhu Huang, Rui Zhang, Goulermas John Y (2019) A novel deep density model for unsupervised learning. Cognitive Comput 11:778–788CrossRef Xi Yang, Kaizhu Huang, Rui Zhang, Goulermas John Y (2019) A novel deep density model for unsupervised learning. Cognitive Comput 11:778–788CrossRef
11.
go back to reference Yangting Gui, Dequan Li, Runyue Fang (2022) A fast adaptive algorithm for training deep neural networks. Appl Intell 730:1–10 Yangting Gui, Dequan Li, Runyue Fang (2022) A fast adaptive algorithm for training deep neural networks. Appl Intell 730:1–10
12.
go back to reference Robbins Herbert, Monro Sutton (1951) A stochastic approximation method. The Annals Mathemat Stat pages 400–407 Robbins Herbert, Monro Sutton (1951) A stochastic approximation method. The Annals Mathemat Stat pages 400–407
13.
go back to reference Balcan Maria-Florina, Khodak Mikhail, Talwalkar Ameet (2019) Provable guarantees for gradient-based meta-learning. In : International Conference on Machine Learning pages 424–433. PMLR Balcan Maria-Florina, Khodak Mikhail, Talwalkar Ameet (2019) Provable guarantees for gradient-based meta-learning. In : International Conference on Machine Learning pages 424–433. PMLR
14.
go back to reference Nesterov Yurii (1983) A method for unconstrained convex minimization problem with the rate of convergence o (1/k\(\hat{\,}\) 2). In Doklady an ussr 269:543–547 Nesterov Yurii (1983) A method for unconstrained convex minimization problem with the rate of convergence o (1/k\(\hat{\,}\) 2). In Doklady an ussr 269:543–547
15.
go back to reference Tieleman Tijmen, Hinton G (2017) Divide the gradient by a running average of its recent magnitude. coursera: neural networks for machine learning. Technical report Tieleman Tijmen, Hinton G (2017) Divide the gradient by a running average of its recent magnitude. coursera: neural networks for machine learning. Technical report
16.
go back to reference Duchi John, Hazan Elad, Singer Yoram (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Machine Learn Res 12(7) Duchi John, Hazan Elad, Singer Yoram (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Machine Learn Res 12(7)
17.
go back to reference Ghadimi Euhanna, Feyzmahdavian Hamid Reza, Johansson Mikael (2015) Global convergence of the heavy-ball method for convex optimization. In: 2015 European Control Conference (ECC), pages 310–315. IEEE Ghadimi Euhanna, Feyzmahdavian Hamid Reza, Johansson Mikael (2015) Global convergence of the heavy-ball method for convex optimization. In: 2015 European Control Conference (ECC), pages 310–315. IEEE
18.
go back to reference Perantonis Stavros J, Karras Dimitris A (1995) An efficient constrained learning algorithm with momentum acceleration. Neural Networks 8(2):237–249CrossRef Perantonis Stavros J, Karras Dimitris A (1995) An efficient constrained learning algorithm with momentum acceleration. Neural Networks 8(2):237–249CrossRef
19.
go back to reference Lydia Agnes, Francis Sagayaraj (2019) Adagrad-an optimizer for stochastic gradient descent. Int J Inf Comput Sci 6(5):566–568 Lydia Agnes, Francis Sagayaraj (2019) Adagrad-an optimizer for stochastic gradient descent. Int J Inf Comput Sci 6(5):566–568
20.
go back to reference Zou Fangyu, Shen Li, Jie Zequn, Zhang Weizhong, Liu Wei (2019) A sufficient condition for convergences of adam and rmsprop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11127–11135 Zou Fangyu, Shen Li, Jie Zequn, Zhang Weizhong, Liu Wei (2019) A sufficient condition for convergences of adam and rmsprop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11127–11135
21.
go back to reference Kingma Diederik P, Ba Jimmy (2014) Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980 Kingma Diederik P, Ba Jimmy (2014) Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980
22.
go back to reference Zhou Zhiming, Zhang Qingru, Lu Guansong, Wang Hongwei, Zhang Weinan, Yu Yong (2018) Adashift: Decorrelation and convergence of adaptive learning rate methods. arXiv preprintarXiv:1810.00143 Zhou Zhiming, Zhang Qingru, Lu Guansong, Wang Hongwei, Zhang Weinan, Yu Yong (2018) Adashift: Decorrelation and convergence of adaptive learning rate methods. arXiv preprintarXiv:1810.00143
23.
go back to reference Savarese Pedro (2019) On the convergence of adabound and its connection to sgd. arXiv preprintarXiv:1908.04457 Savarese Pedro (2019) On the convergence of adabound and its connection to sgd. arXiv preprintarXiv:1908.04457
24.
go back to reference Li Wenjie, Zhang Zhaoyang, Wang Xinjiang, Luo Ping (2020) Adax: Adaptive gradient descent with exponential long term memory. arXiv preprintarXiv:2004.09740 Li Wenjie, Zhang Zhaoyang, Wang Xinjiang, Luo Ping (2020) Adax: Adaptive gradient descent with exponential long term memory. arXiv preprintarXiv:2004.09740
25.
go back to reference Reddi Sashank J, Kale Satyen, Kumar Sanjiv (2019) On the convergence of adam and beyond. arXiv preprintarXiv:1904.09237 Reddi Sashank J, Kale Satyen, Kumar Sanjiv (2019) On the convergence of adam and beyond. arXiv preprintarXiv:1904.09237
26.
go back to reference Tran Phuong Thi, et al (2019) On the convergence proof of amsgrad and a new version. IEEE Access 7: 61706–61716 Tran Phuong Thi, et al (2019) On the convergence proof of amsgrad and a new version. IEEE Access 7: 61706–61716
27.
go back to reference Juntang Zhuang, Tommy Tang, Yifan Ding, Tatikonda Sekhar C, Nicha Dvornek, Xenophon Papademetris, James Duncan (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. Adv Neural Inform Proc Syst 33:18795–18806 Juntang Zhuang, Tommy Tang, Yifan Ding, Tatikonda Sekhar C, Nicha Dvornek, Xenophon Papademetris, James Duncan (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. Adv Neural Inform Proc Syst 33:18795–18806
28.
go back to reference Ding Jianbang, Ren Xuancheng, Luo Ruixuan, Sun Xu (2019) An adaptive and momental bound method for stochastic learning. arXiv preprintarXiv:1910.12249 Ding Jianbang, Ren Xuancheng, Luo Ruixuan, Sun Xu (2019) An adaptive and momental bound method for stochastic learning. arXiv preprintarXiv:1910.12249
29.
go back to reference Wang Fei, Jiang Mengqing, Qian Chen, Yang Shuo, Li Cheng, Zhang Honggang, Wang Xiaogang, Tang Xiaoou (2017) Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156–3164 Wang Fei, Jiang Mengqing, Qian Chen, Yang Shuo, Li Cheng, Zhang Honggang, Wang Xiaogang, Tang Xiaoou (2017) Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156–3164
30.
go back to reference Bansal Monika, Kumar Munish, Sachdeva Monika, Mittal Ajay (2021) Transfer learning for image classification using vgg19: Caltech-101 image data set. J Ambient Intell Humanized Comput pages 1–12 Bansal Monika, Kumar Munish, Sachdeva Monika, Mittal Ajay (2021) Transfer learning for image classification using vgg19: Caltech-101 image data set. J Ambient Intell Humanized Comput pages 1–12
31.
go back to reference Clanuwat Tarin, Bober-Irizar Mikel, Kitamoto Asanobu, Lamb Alex, Yamamoto Kazuaki, Ha David (2018) Deep learning for classical japanese literature. arXiv preprintarXiv:1812.01718 Clanuwat Tarin, Bober-Irizar Mikel, Kitamoto Asanobu, Lamb Alex, Yamamoto Kazuaki, Ha David (2018) Deep learning for classical japanese literature. arXiv preprintarXiv:1812.01718
32.
go back to reference Xiao Han, Rasul Kashif, Vollgraf Roland (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprintarXiv:1708.07747 Xiao Han, Rasul Kashif, Vollgraf Roland (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprintarXiv:1708.07747
33.
go back to reference He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition pages 770–778 He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition pages 770–778
34.
go back to reference Huang Gao, Liu Zhuang, Van Der Maaten Laurens, Weinberger Kilian Q (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition pages 4700–4708 Huang Gao, Liu Zhuang, Van Der Maaten Laurens, Weinberger Kilian Q (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition pages 4700–4708
35.
go back to reference Khan Riaz Ullah, Zhang Xiaosong, Kumar Rajesh, Aboagye Emelia Opoku (2018) Evaluating the performance of resnet model based on image recognition. In: Proceedings of the 2018 International Conference on Computing and Artificial Intelligence pages 86–90 Khan Riaz Ullah, Zhang Xiaosong, Kumar Rajesh, Aboagye Emelia Opoku (2018) Evaluating the performance of resnet model based on image recognition. In: Proceedings of the 2018 International Conference on Computing and Artificial Intelligence pages 86–90
36.
go back to reference Tong Wei, Chen Weitao, Han Wei, Li Xianju, Wang Lizhe (2020) Channel-attention-based densenet network for remote sensing image scene classification. IEEE J Select Topics Appl Earth Observ Remote Sens 13:4121–4132CrossRef Tong Wei, Chen Weitao, Han Wei, Li Xianju, Wang Lizhe (2020) Channel-attention-based densenet network for remote sensing image scene classification. IEEE J Select Topics Appl Earth Observ Remote Sens 13:4121–4132CrossRef
Metadata
Title
AdaXod: a new adaptive and momental bound algorithm for training deep neural networks
Authors
Yuanxuan Liu
Dequan Li
Publication date
09-05-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 15/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05338-5

Other articles of this Issue 15/2023

The Journal of Supercomputing 15/2023 Go to the issue

Premium Partner