Top

Neural Computing and Applications

Published in:

18-01-2020 | Original Article

A Randomized Block-Coordinate Adam online learning optimization algorithm

Authors: Yangfan Zhou, Mingchuan Zhang, Junlong Zhu, Ruijuan Zheng, Qingtao Wu

Published in: Neural Computing and Applications | Issue 16/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In recent years, stochastic gradient descent (SGD) becomes one of the most important optimization algorithms in many fields, such as deep learning and reinforcement learning. However, the computation of full gradient in SGD is prohibitive when dealing with high-dimensional vectors. For this reason, we propose a randomized block-coordinate Adam (RBC-Adam) online learning optimization algorithm. At each round, RBC-Adam randomly chooses a variable from a subset of parameters to compute the gradient and updates the parameters along the negative gradient direction. Moreover, this paper analyzes the convergence of RBC-Adam and obtains the regret bound, \(O(\sqrt{T})\), where T is a time horizon. The theoretical results are verified by simulated experiments on four public datasets. Moreover, the simulated experiment results show that the computational cost of RBC-Adam is lower than the variants of Adam.

previous article A multi-model methodology for forecasting sales and returns of liquefied petroleum gas cylinders

next article A study on the relationship between the rank of input data and the performance of random weight neural network

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

Zhang M, Yang M, Wu Q, Zheng R, Zhu J (2018) Smart perception and autonomic optimization: a novel bio-inspired hybrid routing protocol for MANETs. Fut Gen Comput Syst 81:505–513CrossRef

Ai Z, Zhou Y, Song F (2018) A smart collaborative routing protocol for reliable data diffusion in IoT scenarios. Sensors 18(6):1926CrossRef

Zhang H, Quan W, Chao H, Qiao C (2016) Smart identifier network: a collaborative architecture for the future internet. IEEE Netw 30(3):46–51CrossRef

Song F, Zhou Y, Chang L, Zhang H (2019) Modeling space-terrestrial integrated networks with smart collaborative theory. IEEE Netw 33(1):51–57CrossRef

Klein S, Staring M, Pluim JPW (2008) Evaluation of optimization methods for nonrigid medical image registration using mutual information and B-splines. IEEE Trans Image Process 16(12):2879–2890MathSciNetCrossRef

Quan W, Cheng N, Qin M, Zhang H, Chan HA, Shen X (2018) Adaptive transmission control for software defined vehicular networks. IEEE Wirel Commun Lett Commun Lett 8:653–656CrossRef

Mokhtari A, Ling Q, Ribeiro A (2017) Network Newton distributed optimization methods. IEEE Trans Signal Process 65(1):146–161MathSciNetCrossRef

Bijral AS, Sarwate AD, Srebro N (2017) Data-dependent convergence for consensus stochastic optimization. IEEE Trans Autom Control 62(9):4483–4498MathSciNetCrossRef

Li Y, Liang Y (2018) Learning overparameterized neural networks via stochastic gradient descent on structured data. In: NIPS, Montreal, Canada, Dec 2018, pp 8157–8166

10.

Qiao Y, Lew BV, Lelieveldt BPF, Staring M (2016) Fast automatic step size estimation for gradient descent optimization of image registration. IEEE Trans Med Imaging 35(2):391–403CrossRef

11.

Cheng WY, Juang CF (2014) A fuzzy model with online incremental SVM and margin-selective gradient descent learning for classification problems. IEEE Trans Fuzzy Syst 22(2):324–337CrossRef

12.

Arablouei R, Werner S, Dogancay K (2014) Analysis of the gradient-descent total least-squares adaptive filtering algorithm. IEEE Trans Signal Process 62(5):1256–1264MathSciNetCrossRef

13.

Shi S, Wang Q, Chu X, Li B (2018) A DAG model of synchronous stochastic gradient descent in distributed deep learning. In: ICPADS, Singapore, Dec 2018, pp 425–432

14.

Lee C, Cho K, Kang W (September 2018) Directional analysis of stochastic gradient descent via von Mises–Fisher distributions in deep learning. [Online]. Available: arXiv:1810.00150, Initial submission

15.

Cohen K, Nedić A, Srikant R (2017) On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Trans Autom Control 62(11):5974–5981MathSciNetCrossRef

16.

Zhou F, Cong GJ (2018) On the convergence properties of a \(k\)-step averaging stochastic gradient descent algorithm for nonconvex optimization. In: IJCAI, Stockholm, Sweden, pp 3219–3227, July 2018

17.

Shen ZB, Qian H, Mu TZ, Zhang C (2017) Accelerated doubly stochastic gradient algorithm for large-scale empirical risk minimization. In: IJCAI, Melbourne, Australia, Aug 2017, pp 2715–2721

18.

Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159MathSciNetMATH

19.

Zeiler MD (2012) ADADELTA: an adaptive learning rate method (online) Dec 2012, Available: arXiv:1212.5701 Initial submission.

20.

Tieleman T, Hinton G (2012) RmsProp: divide the gradient by a running average of its recent magnitude. In: COURSERA: neural networks for machine learning

21.

Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: ICLR, San Diego, America, May 2015

22.

Ruder S (2016) An overview of gradient descent optimization algorithms (online) , Sept 2016. Available: arXiv:1609.04747, Initial submission.

23.

Dozat T (2016) Incorporating Nesterov momentum into Adam. In: ICLR, San Juan, Puerto Rico, May 2016

24.

Shazeer N, Stern M (2018) Adafactor: adaptive learning rates with sublinear memory cost. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 4596–4604

25.

Reddi SJ, Kale S, Kumar S (2018) On the convergence of Adam and beyond. In: ICLR, Vancouver, Canada, May 2018

26.

Zhang JW, Cui LM, Gouza FB (2018) GADAM: genetic-evolutionary ADAM for deep neural network optimization (online) May 2018. Available: arXiv:1805.07500, Initial submission

27.

Zaheer M, Reddi S, Sachan D, Sachan S, Kumar S (2018) Adaptive methods for Nonconvex optimization. In: NIPS, Montreal, Canada, Curran Associates, Inc, Dec 2018

28.

Nesterov YE (1983) A method of solving a convex programming problem with convergence rate \(O(1/{k^{2}})\). Soviet Mathematics Doklady 27:372–376MATH

29.

Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, Boston, MACrossRef

30.

Khan ME, Nielsen D, Tangkaratt V, Lin W, Gal Y, Srivastava A (2018) Fast and scalable bayesian deep learning by weight-perturbation in Adam. In: ICML, Stockholm, Sweden, PMLR, July 2018

31.

Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494MathSciNetCrossRef

32.

Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22(2):341–364MathSciNetCrossRef

33.

Hu EL, Kwok JT (2015) Scalable nonparametric low-rank kernel learning using block coordinate descent. IEEE Trans Neural Netw Learn Syst 26(9):1927–1938MathSciNetCrossRef

34.

Zhao T, Yu M, Wang Y, Arora R, Liu H (2014) Accelerated mini-batch randomized block coordinate descent method. In: NIPS, Montreal, Canada, Curran Associates, Inc., Dec 2014, pp 3329–3337

35.

Simon LJ, Martin J, Schmidt M, Pletscher P (2013) Block-coordinate frank-wolfe optimization for structural SVMs. In: ICML, Atlanta, America, PMLR, June 2013, pp 53–61

36.

Singh C, Nedić A, Srikant R (2014) Random block-coordinate gradient projection algorithms. In: CDC, Los Angeles, America, IEEE, Dec 2014, pp 185–190

37.

Xie TY, Liu B, Xu YY, Ghavamzadeh M, Chow Y, Lyu D (2018) A block coordinate ascent algorithm for mean-variance optimization. In: NIPS, Montreal, Canada, Curran Associates, Inc., Dec 2018, pp 1073–1083

38.

Cohen A, Hasidim A, Koren T, Lazic N, Mansour Y, Talwar K (2018) Online linear quadratic control. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 1029–1038

39.

Wang Y, Yao Q, James TK, Lionel MN (2018) Online convolutional sparse coding with sample-dependent dictionary. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 5209–5218

40.

Zhang W, Zhao P, Zhu W, Hoi SCH, Zhang T (2017) Projection-free distributed online learning in networks. In: ICML, Sydney, Australia, PMLR, Aug 2017, pp 4054–4062

41.

Zhang M, Quan W, Cheng N, Wu Q, Zhu J, Zheng R, Li K (2019) Distributed conditional gradient online learning for IoT optimization. IEEE Intern Things J. https://doi.org/10.1109/JIOT.2019.2919562 CrossRef

42.

Nedić A, Lee S, Raginsky M (2015) Decentralized online optimization with global objectives and local communication. In: ACC, America, July 2015, pp 4497–4503

43.

Zhu J, Xu C, Guan J, Wu DO (2018) Differentially private distributed online algorithms over time-varying directed networks. IEEE Trans Signal Inf Process Netw 4(1):4–17MathSciNetCrossRef

44.

Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. In: ICML, Washington DC, America, AAAI Press, Aug 2003, pp 928–936

45.

Boyd S, Vandenberghe L (2013) Convex optimization. Cambridge University Press, CambridgeMATH

46.

Durrett R (2005) Probability: theory and examples, 3rd edn. Cengage Learning, SingaporeMATH

Title: A Randomized Block-Coordinate Adam online learning optimization algorithm
Authors: Yangfan Zhou
Mingchuan Zhang
Junlong Zhu
Ruijuan Zheng
Qingtao Wu
Publication date: 18-01-2020
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 16/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-020-04718-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 16/2020

Image denoising via structure-constrained low-rank approximation

A study on the relationship between the rank of input data and the performance of random weight neural network

Indoor human activity recognition using high-dimensional sensors and deep neural networks

PSO-based novel resource scheduling technique to improve QoS parameters in cloud computing

Lidstone-type problems on the whole real line and homoclinic solutions applied to infinite beams

LGSim: local task-invariant and global task-specific similarity for few-shot classification

Premium Partner