Skip to main content
Top
Published in: Neural Computing and Applications 16/2020

18-01-2020 | Original Article

A Randomized Block-Coordinate Adam online learning optimization algorithm

Authors: Yangfan Zhou, Mingchuan Zhang, Junlong Zhu, Ruijuan Zheng, Qingtao Wu

Published in: Neural Computing and Applications | Issue 16/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent years, stochastic gradient descent (SGD) becomes one of the most important optimization algorithms in many fields, such as deep learning and reinforcement learning. However, the computation of full gradient in SGD is prohibitive when dealing with high-dimensional vectors. For this reason, we propose a randomized block-coordinate Adam (RBC-Adam) online learning optimization algorithm. At each round, RBC-Adam randomly chooses a variable from a subset of parameters to compute the gradient and updates the parameters along the negative gradient direction. Moreover, this paper analyzes the convergence of RBC-Adam and obtains the regret bound, \(O(\sqrt{T})\), where T is a time horizon. The theoretical results are verified by simulated experiments on four public datasets. Moreover, the simulated experiment results show that the computational cost of RBC-Adam is lower than the variants of Adam.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zhang M, Yang M, Wu Q, Zheng R, Zhu J (2018) Smart perception and autonomic optimization: a novel bio-inspired hybrid routing protocol for MANETs. Fut Gen Comput Syst 81:505–513CrossRef Zhang M, Yang M, Wu Q, Zheng R, Zhu J (2018) Smart perception and autonomic optimization: a novel bio-inspired hybrid routing protocol for MANETs. Fut Gen Comput Syst 81:505–513CrossRef
2.
go back to reference Ai Z, Zhou Y, Song F (2018) A smart collaborative routing protocol for reliable data diffusion in IoT scenarios. Sensors 18(6):1926CrossRef Ai Z, Zhou Y, Song F (2018) A smart collaborative routing protocol for reliable data diffusion in IoT scenarios. Sensors 18(6):1926CrossRef
3.
go back to reference Zhang H, Quan W, Chao H, Qiao C (2016) Smart identifier network: a collaborative architecture for the future internet. IEEE Netw 30(3):46–51CrossRef Zhang H, Quan W, Chao H, Qiao C (2016) Smart identifier network: a collaborative architecture for the future internet. IEEE Netw 30(3):46–51CrossRef
4.
go back to reference Song F, Zhou Y, Chang L, Zhang H (2019) Modeling space-terrestrial integrated networks with smart collaborative theory. IEEE Netw 33(1):51–57CrossRef Song F, Zhou Y, Chang L, Zhang H (2019) Modeling space-terrestrial integrated networks with smart collaborative theory. IEEE Netw 33(1):51–57CrossRef
5.
go back to reference Klein S, Staring M, Pluim JPW (2008) Evaluation of optimization methods for nonrigid medical image registration using mutual information and B-splines. IEEE Trans Image Process 16(12):2879–2890MathSciNetCrossRef Klein S, Staring M, Pluim JPW (2008) Evaluation of optimization methods for nonrigid medical image registration using mutual information and B-splines. IEEE Trans Image Process 16(12):2879–2890MathSciNetCrossRef
6.
go back to reference Quan W, Cheng N, Qin M, Zhang H, Chan HA, Shen X (2018) Adaptive transmission control for software defined vehicular networks. IEEE Wirel Commun Lett Commun Lett 8:653–656CrossRef Quan W, Cheng N, Qin M, Zhang H, Chan HA, Shen X (2018) Adaptive transmission control for software defined vehicular networks. IEEE Wirel Commun Lett Commun Lett 8:653–656CrossRef
7.
go back to reference Mokhtari A, Ling Q, Ribeiro A (2017) Network Newton distributed optimization methods. IEEE Trans Signal Process 65(1):146–161MathSciNetCrossRef Mokhtari A, Ling Q, Ribeiro A (2017) Network Newton distributed optimization methods. IEEE Trans Signal Process 65(1):146–161MathSciNetCrossRef
8.
go back to reference Bijral AS, Sarwate AD, Srebro N (2017) Data-dependent convergence for consensus stochastic optimization. IEEE Trans Autom Control 62(9):4483–4498MathSciNetCrossRef Bijral AS, Sarwate AD, Srebro N (2017) Data-dependent convergence for consensus stochastic optimization. IEEE Trans Autom Control 62(9):4483–4498MathSciNetCrossRef
9.
go back to reference Li Y, Liang Y (2018) Learning overparameterized neural networks via stochastic gradient descent on structured data. In: NIPS, Montreal, Canada, Dec 2018, pp 8157–8166 Li Y, Liang Y (2018) Learning overparameterized neural networks via stochastic gradient descent on structured data. In: NIPS, Montreal, Canada, Dec 2018, pp 8157–8166
10.
go back to reference Qiao Y, Lew BV, Lelieveldt BPF, Staring M (2016) Fast automatic step size estimation for gradient descent optimization of image registration. IEEE Trans Med Imaging 35(2):391–403CrossRef Qiao Y, Lew BV, Lelieveldt BPF, Staring M (2016) Fast automatic step size estimation for gradient descent optimization of image registration. IEEE Trans Med Imaging 35(2):391–403CrossRef
11.
go back to reference Cheng WY, Juang CF (2014) A fuzzy model with online incremental SVM and margin-selective gradient descent learning for classification problems. IEEE Trans Fuzzy Syst 22(2):324–337CrossRef Cheng WY, Juang CF (2014) A fuzzy model with online incremental SVM and margin-selective gradient descent learning for classification problems. IEEE Trans Fuzzy Syst 22(2):324–337CrossRef
12.
go back to reference Arablouei R, Werner S, Dogancay K (2014) Analysis of the gradient-descent total least-squares adaptive filtering algorithm. IEEE Trans Signal Process 62(5):1256–1264MathSciNetCrossRef Arablouei R, Werner S, Dogancay K (2014) Analysis of the gradient-descent total least-squares adaptive filtering algorithm. IEEE Trans Signal Process 62(5):1256–1264MathSciNetCrossRef
13.
go back to reference Shi S, Wang Q, Chu X, Li B (2018) A DAG model of synchronous stochastic gradient descent in distributed deep learning. In: ICPADS, Singapore, Dec 2018, pp 425–432 Shi S, Wang Q, Chu X, Li B (2018) A DAG model of synchronous stochastic gradient descent in distributed deep learning. In: ICPADS, Singapore, Dec 2018, pp 425–432
14.
go back to reference Lee C, Cho K, Kang W (September 2018) Directional analysis of stochastic gradient descent via von Mises–Fisher distributions in deep learning. [Online]. Available: arXiv:1810.00150, Initial submission Lee C, Cho K, Kang W (September 2018) Directional analysis of stochastic gradient descent via von Mises–Fisher distributions in deep learning. [Online]. Available: arXiv:​1810.​00150, Initial submission
15.
go back to reference Cohen K, Nedić A, Srikant R (2017) On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Trans Autom Control 62(11):5974–5981MathSciNetCrossRef Cohen K, Nedić A, Srikant R (2017) On projected stochastic gradient descent algorithm with weighted averaging for least squares regression. IEEE Trans Autom Control 62(11):5974–5981MathSciNetCrossRef
16.
go back to reference Zhou F, Cong GJ (2018) On the convergence properties of a \(k\)-step averaging stochastic gradient descent algorithm for nonconvex optimization. In: IJCAI, Stockholm, Sweden, pp 3219–3227, July 2018 Zhou F, Cong GJ (2018) On the convergence properties of a \(k\)-step averaging stochastic gradient descent algorithm for nonconvex optimization. In: IJCAI, Stockholm, Sweden, pp 3219–3227, July 2018
17.
go back to reference Shen ZB, Qian H, Mu TZ, Zhang C (2017) Accelerated doubly stochastic gradient algorithm for large-scale empirical risk minimization. In: IJCAI, Melbourne, Australia, Aug 2017, pp 2715–2721 Shen ZB, Qian H, Mu TZ, Zhang C (2017) Accelerated doubly stochastic gradient algorithm for large-scale empirical risk minimization. In: IJCAI, Melbourne, Australia, Aug 2017, pp 2715–2721
18.
go back to reference Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159MathSciNetMATH Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159MathSciNetMATH
19.
20.
go back to reference Tieleman T, Hinton G (2012) RmsProp: divide the gradient by a running average of its recent magnitude. In: COURSERA: neural networks for machine learning Tieleman T, Hinton G (2012) RmsProp: divide the gradient by a running average of its recent magnitude. In: COURSERA: neural networks for machine learning
21.
go back to reference Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: ICLR, San Diego, America, May 2015 Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: ICLR, San Diego, America, May 2015
22.
go back to reference Ruder S (2016) An overview of gradient descent optimization algorithms (online) , Sept 2016. Available: arXiv:1609.04747, Initial submission. Ruder S (2016) An overview of gradient descent optimization algorithms (online) , Sept 2016. Available: arXiv:​1609.​04747, Initial submission.
23.
go back to reference Dozat T (2016) Incorporating Nesterov momentum into Adam. In: ICLR, San Juan, Puerto Rico, May 2016 Dozat T (2016) Incorporating Nesterov momentum into Adam. In: ICLR, San Juan, Puerto Rico, May 2016
24.
go back to reference Shazeer N, Stern M (2018) Adafactor: adaptive learning rates with sublinear memory cost. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 4596–4604 Shazeer N, Stern M (2018) Adafactor: adaptive learning rates with sublinear memory cost. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 4596–4604
25.
go back to reference Reddi SJ, Kale S, Kumar S (2018) On the convergence of Adam and beyond. In: ICLR, Vancouver, Canada, May 2018 Reddi SJ, Kale S, Kumar S (2018) On the convergence of Adam and beyond. In: ICLR, Vancouver, Canada, May 2018
26.
go back to reference Zhang JW, Cui LM, Gouza FB (2018) GADAM: genetic-evolutionary ADAM for deep neural network optimization (online) May 2018. Available: arXiv:1805.07500, Initial submission Zhang JW, Cui LM, Gouza FB (2018) GADAM: genetic-evolutionary ADAM for deep neural network optimization (online) May 2018. Available: arXiv:​1805.​07500, Initial submission
27.
go back to reference Zaheer M, Reddi S, Sachan D, Sachan S, Kumar S (2018) Adaptive methods for Nonconvex optimization. In: NIPS, Montreal, Canada, Curran Associates, Inc, Dec 2018 Zaheer M, Reddi S, Sachan D, Sachan S, Kumar S (2018) Adaptive methods for Nonconvex optimization. In: NIPS, Montreal, Canada, Curran Associates, Inc, Dec 2018
28.
go back to reference Nesterov YE (1983) A method of solving a convex programming problem with convergence rate \(O(1/{k^{2}})\). Soviet Mathematics Doklady 27:372–376MATH Nesterov YE (1983) A method of solving a convex programming problem with convergence rate \(O(1/{k^{2}})\). Soviet Mathematics Doklady 27:372–376MATH
29.
go back to reference Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, Boston, MACrossRef Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, Boston, MACrossRef
30.
go back to reference Khan ME, Nielsen D, Tangkaratt V, Lin W, Gal Y, Srivastava A (2018) Fast and scalable bayesian deep learning by weight-perturbation in Adam. In: ICML, Stockholm, Sweden, PMLR, July 2018 Khan ME, Nielsen D, Tangkaratt V, Lin W, Gal Y, Srivastava A (2018) Fast and scalable bayesian deep learning by weight-perturbation in Adam. In: ICML, Stockholm, Sweden, PMLR, July 2018
31.
go back to reference Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494MathSciNetCrossRef Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494MathSciNetCrossRef
32.
go back to reference Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22(2):341–364MathSciNetCrossRef Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J Optim 22(2):341–364MathSciNetCrossRef
33.
go back to reference Hu EL, Kwok JT (2015) Scalable nonparametric low-rank kernel learning using block coordinate descent. IEEE Trans Neural Netw Learn Syst 26(9):1927–1938MathSciNetCrossRef Hu EL, Kwok JT (2015) Scalable nonparametric low-rank kernel learning using block coordinate descent. IEEE Trans Neural Netw Learn Syst 26(9):1927–1938MathSciNetCrossRef
34.
go back to reference Zhao T, Yu M, Wang Y, Arora R, Liu H (2014) Accelerated mini-batch randomized block coordinate descent method. In: NIPS, Montreal, Canada, Curran Associates, Inc., Dec 2014, pp 3329–3337 Zhao T, Yu M, Wang Y, Arora R, Liu H (2014) Accelerated mini-batch randomized block coordinate descent method. In: NIPS, Montreal, Canada, Curran Associates, Inc., Dec 2014, pp 3329–3337
35.
go back to reference Simon LJ, Martin J, Schmidt M, Pletscher P (2013) Block-coordinate frank-wolfe optimization for structural SVMs. In: ICML, Atlanta, America, PMLR, June 2013, pp 53–61 Simon LJ, Martin J, Schmidt M, Pletscher P (2013) Block-coordinate frank-wolfe optimization for structural SVMs. In: ICML, Atlanta, America, PMLR, June 2013, pp 53–61
36.
go back to reference Singh C, Nedić A, Srikant R (2014) Random block-coordinate gradient projection algorithms. In: CDC, Los Angeles, America, IEEE, Dec 2014, pp 185–190 Singh C, Nedić A, Srikant R (2014) Random block-coordinate gradient projection algorithms. In: CDC, Los Angeles, America, IEEE, Dec 2014, pp 185–190
37.
go back to reference Xie TY, Liu B, Xu YY, Ghavamzadeh M, Chow Y, Lyu D (2018) A block coordinate ascent algorithm for mean-variance optimization. In: NIPS, Montreal, Canada, Curran Associates, Inc., Dec 2018, pp 1073–1083 Xie TY, Liu B, Xu YY, Ghavamzadeh M, Chow Y, Lyu D (2018) A block coordinate ascent algorithm for mean-variance optimization. In: NIPS, Montreal, Canada, Curran Associates, Inc., Dec 2018, pp 1073–1083
38.
go back to reference Cohen A, Hasidim A, Koren T, Lazic N, Mansour Y, Talwar K (2018) Online linear quadratic control. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 1029–1038 Cohen A, Hasidim A, Koren T, Lazic N, Mansour Y, Talwar K (2018) Online linear quadratic control. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 1029–1038
39.
go back to reference Wang Y, Yao Q, James TK, Lionel MN (2018) Online convolutional sparse coding with sample-dependent dictionary. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 5209–5218 Wang Y, Yao Q, James TK, Lionel MN (2018) Online convolutional sparse coding with sample-dependent dictionary. In: ICML, Stockholm, Sweden, PMLR, July 2018, pp 5209–5218
40.
go back to reference Zhang W, Zhao P, Zhu W, Hoi SCH, Zhang T (2017) Projection-free distributed online learning in networks. In: ICML, Sydney, Australia, PMLR, Aug 2017, pp 4054–4062 Zhang W, Zhao P, Zhu W, Hoi SCH, Zhang T (2017) Projection-free distributed online learning in networks. In: ICML, Sydney, Australia, PMLR, Aug 2017, pp 4054–4062
42.
go back to reference Nedić A, Lee S, Raginsky M (2015) Decentralized online optimization with global objectives and local communication. In: ACC, America, July 2015, pp 4497–4503 Nedić A, Lee S, Raginsky M (2015) Decentralized online optimization with global objectives and local communication. In: ACC, America, July 2015, pp 4497–4503
43.
go back to reference Zhu J, Xu C, Guan J, Wu DO (2018) Differentially private distributed online algorithms over time-varying directed networks. IEEE Trans Signal Inf Process Netw 4(1):4–17MathSciNetCrossRef Zhu J, Xu C, Guan J, Wu DO (2018) Differentially private distributed online algorithms over time-varying directed networks. IEEE Trans Signal Inf Process Netw 4(1):4–17MathSciNetCrossRef
44.
go back to reference Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. In: ICML, Washington DC, America, AAAI Press, Aug 2003, pp 928–936 Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. In: ICML, Washington DC, America, AAAI Press, Aug 2003, pp 928–936
45.
go back to reference Boyd S, Vandenberghe L (2013) Convex optimization. Cambridge University Press, CambridgeMATH Boyd S, Vandenberghe L (2013) Convex optimization. Cambridge University Press, CambridgeMATH
46.
go back to reference Durrett R (2005) Probability: theory and examples, 3rd edn. Cengage Learning, SingaporeMATH Durrett R (2005) Probability: theory and examples, 3rd edn. Cengage Learning, SingaporeMATH
Metadata
Title
A Randomized Block-Coordinate Adam online learning optimization algorithm
Authors
Yangfan Zhou
Mingchuan Zhang
Junlong Zhu
Ruijuan Zheng
Qingtao Wu
Publication date
18-01-2020
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 16/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-04718-9

Other articles of this Issue 16/2020

Neural Computing and Applications 16/2020 Go to the issue

Premium Partner