Skip to main content
Erschienen in: Knowledge and Information Systems 9/2021

12.07.2021 | Regular paper

Sharp characterization of optimal minibatch size for stochastic finite sum convex optimization

verfasst von: Atsushi Nitanda, Tomoya Murata, Taiji Suzuki

Erschienen in: Knowledge and Information Systems | Ausgabe 9/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The minibatching technique has been extensively adopted to facilitate stochastic first-order methods because of their computational efficiency in parallel computing for large-scale machine learning and data mining. Indeed, increasing the minibatch size decreases the iteration complexity (number of minibatch queries) to converge, resulting in the decrease of the running time by processing a minibatch in parallel. However, this gain is usually saturated for too large minibatch sizes and the total computational complexity (number of access to an example) is deteriorated. Hence, the determination of an appropriate minibatch size which controls the trade-off between the iteration and total computational complexities is important to maximize performance of the method with as few computational resources as possible. In this study, we define the optimal minibatch size as the minimum minibatch size with which there exists a stochastic first-order method that achieves the optimal iteration complexity and we call such a method the optimal minibatch method. Moreover, we show that Katyusha (in: Proceedings of annual ACM SIGACT symposium on theory of computing vol 49, pp 1200–1205, ACM, 2017), DASVRDA (Murata and Suzuki, in: Advances in neural information processing systems vol 30, pp 608–617, 2017), and the proposed method which is a combination of Acc-SVRG (Nitanda, in: Advances in neural information processing systems vol 27, pp 1574–1582, 2014) with APPA (Cotter et al. in: Advances in neural information processing systems vol 27, pp 3059–3067, 2014) are optimal minibatch methods. In experiments, we compare optimal minibatch methods with several competitors on \(L_1\)-and \(L_2\)-regularized logistic regression problems and observe that iteration complexities of optimal minibatch methods linearly decrease as minibatch sizes increase up to reasonable minibatch sizes and finally attain the best iteration complexities. This confirms the computational efficiency of optimal minibatch methods suggested by the theory.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
In this case, the all methods converged within one or two epochs and performed very similarly to each other.
 
2
When \(\lambda _2 = 0\), we used dummy L2-regularization with \(\lambda _2 = 10^{-8}\), that was roughly \(\varepsilon * \Vert x_*\Vert ^2\), where \(\varepsilon \) is the desired accuracy and was set to \(10^{-4}\) and \(x_*\) is the optimal solution.
 
Literatur
1.
Zurück zum Zitat Allen-Zhu Z (2017) Katyusha: the first direct acceleration of stochastic gradient methods. In: Proceedings of annual ACM SIGACT symposium on theory of computing 49, pp 1200–1205. ACM Allen-Zhu Z (2017) Katyusha: the first direct acceleration of stochastic gradient methods. In: Proceedings of annual ACM SIGACT symposium on theory of computing 49, pp 1200–1205. ACM
2.
Zurück zum Zitat Murata T, Suzuki T (2017) Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Adv Neural Inf Process Syst 30:608–617 Murata T, Suzuki T (2017) Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Adv Neural Inf Process Syst 30:608–617
3.
Zurück zum Zitat Nitanda A (2014) Stochastic proximal gradient descent with acceleration techniques. Adv Neural Inf Process Syst 27:1574–1582 Nitanda A (2014) Stochastic proximal gradient descent with acceleration techniques. Adv Neural Inf Process Syst 27:1574–1582
4.
Zurück zum Zitat Lin Q, Lu Z, Xiao L (2014) An accelerated proximal coordinate gradient method. Adv Neural Inf Process Syst 27:3059–3067 Lin Q, Lu Z, Xiao L (2014) An accelerated proximal coordinate gradient method. Adv Neural Inf Process Syst 27:3059–3067
5.
Zurück zum Zitat Cotter A, Shamir O, Srebro N, Sridharan K (2011) Better mini-batch algorithms via accelerated gradient methods. Adv Neural Inf Process Syst 24:1647–1655 Cotter A, Shamir O, Srebro N, Sridharan K (2011) Better mini-batch algorithms via accelerated gradient methods. Adv Neural Inf Process Syst 24:1647–1655
6.
Zurück zum Zitat Shalev-Shwartz S, Zhang T (2013) Accelerated mini-batch stochastic dual coordinate ascent. Adv Neural Inf Process Syst 26:378–385 Shalev-Shwartz S, Zhang T (2013) Accelerated mini-batch stochastic dual coordinate ascent. Adv Neural Inf Process Syst 26:378–385
7.
Zurück zum Zitat Kai Z, Liang H (2013) Minibatch and parallelization for online large margin structured learning. In: Proceedings of the 2013 Conference of the North American chapter of the association for computational linguistics: human language technologies, pp 370–379 Kai Z, Liang H (2013) Minibatch and parallelization for online large margin structured learning. In: Proceedings of the 2013 Conference of the North American chapter of the association for computational linguistics: human language technologies, pp 370–379
8.
Zurück zum Zitat Martin T, Singh BA, Peter R, Nati S (2013) Mini-batch primal and dual methods for svms. In: ICML (3), pp 1022–1030 Martin T, Singh BA, Peter R, Nati S (2013) Mini-batch primal and dual methods for svms. In: ICML (3), pp 1022–1030
9.
Zurück zum Zitat Mu L, Tong Z, Yuqiang C, Alexander JS (2014) Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 661–670. ACM Mu L, Tong Z, Yuqiang C, Alexander JS (2014) Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 661–670. ACM
10.
Zurück zum Zitat Peilin Z, Tong Z (2014) Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv preprintarXiv:1405.3080 Peilin Z, Tong Z (2014) Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv preprintarXiv:​1405.​3080
11.
Zurück zum Zitat Jakub Konečnỳ, Jie Liu, Peter Richtárik, Martin Takáč (2016) Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J Select Top Signal Process 10(2):242–255CrossRef Jakub Konečnỳ, Jie Liu, Peter Richtárik, Martin Takáč (2016) Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J Select Top Signal Process 10(2):242–255CrossRef
13.
Zurück zum Zitat Prateek J, Sham KM, Rahul K, Praneeth N, Aaron S (2016) Parallelizing stochastic approximation through mini-batching and tail-averaging. Stat 1050:12 Prateek J, Sham KM, Rahul K, Praneeth N, Aaron S (2016) Parallelizing stochastic approximation through mini-batching and tail-averaging. Stat 1050:12
14.
Zurück zum Zitat Nicolas LR, Mark S, Francis RB (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in neural information processing systems, pp 2663–2671 Nicolas LR, Mark S, Francis RB (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in neural information processing systems, pp 2663–2671
15.
Zurück zum Zitat Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Adv Neural Inf Process Syst 26:315–323 Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Adv Neural Inf Process Syst 26:315–323
16.
Zurück zum Zitat Defazio A, Bach F, Lacoste-Julien S (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Adv Neural Inf Process Syst 27:1646–1654 Defazio A, Bach F, Lacoste-Julien S (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Adv Neural Inf Process Syst 27:1646–1654
17.
Zurück zum Zitat Lin H, Mairal J, Harchaoui Z (2015) A universal catalyst for first-order optimization. Adv Neural Inf Process Syst 28:3384–3392 Lin H, Mairal J, Harchaoui Z (2015) A universal catalyst for first-order optimization. Adv Neural Inf Process Syst 28:3384–3392
18.
Zurück zum Zitat Frostig R, Ge R, Kakade S, Sidford A (2015) Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization. In: Proceedings of International Conference on Machine Learning vol 32, pp 2540–2548 Frostig R, Ge R, Kakade S, Sidford A (2015) Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization. In: Proceedings of International Conference on Machine Learning vol 32, pp 2540–2548
19.
Zurück zum Zitat Blake EW, Srebro N (2016) Tight complexity bounds for optimizing composite objectives. Adv Neural Inf Process Syst 29:3639–3647 Blake EW, Srebro N (2016) Tight complexity bounds for optimizing composite objectives. Adv Neural Inf Process Syst 29:3639–3647
20.
Zurück zum Zitat Arjevani Y, Shamir O (2016) Dimension-free iteration complexity of finite sum optimization problems. Adv Neural Inf Process Syst 29:3540–3548 Arjevani Y, Shamir O (2016) Dimension-free iteration complexity of finite sum optimization problems. Adv Neural Inf Process Syst 29:3540–3548
21.
Zurück zum Zitat Mark S, Le RN (2013) Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprintarXiv:1308.6370 Mark S, Le RN (2013) Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprintarXiv:​1308.​6370
22.
Zurück zum Zitat Lam MN, Liu J, Scheinberg K, Takáč M (2017) Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of international conference on machine learning vol 34, pp 2613–2621 Lam MN, Liu J, Scheinberg K, Takáč M (2017) Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of international conference on machine learning vol 34, pp 2613–2621
23.
Zurück zum Zitat Ofer D, Ran G-B, Ohad S, Lin X (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202MathSciNetMATH Ofer D, Ran G-B, Ohad S, Lin X (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202MathSciNetMATH
25.
Zurück zum Zitat Yurii N (2004) Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, New YorkMATH Yurii N (2004) Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, New YorkMATH
26.
Zurück zum Zitat Panda B, Joshua SH, Basu S, Roberto JB (2009) Planet: massively parallel learning of tree ensembles with mapreduce. Proc VLDB Endow 2(2):1426–1437CrossRef Panda B, Joshua SH, Basu S, Roberto JB (2009) Planet: massively parallel learning of tree ensembles with mapreduce. Proc VLDB Endow 2(2):1426–1437CrossRef
27.
Zurück zum Zitat Chi Z, Feifei L, Jeffrey J (2012) Efficient parallel knn joins for large data in mapreduce. In: Proceedings of the 15th international conference on extending database technology, pp 38–49 Chi Z, Feifei L, Jeffrey J (2012) Efficient parallel knn joins for large data in mapreduce. In: Proceedings of the 15th international conference on extending database technology, pp 38–49
28.
Zurück zum Zitat Tianyang S, Chengchun S, Feng L, Yu H, Lili M, Yitong F (2009) An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International conference on parallel and distributed computing, applications and technologies, pp 494–499 Tianyang S, Chengchun S, Feng L, Yu H, Lili M, Yitong F (2009) An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International conference on parallel and distributed computing, applications and technologies, pp 494–499
29.
Zurück zum Zitat Yaobin H, Haoyu T, Wuman L, Huajian M, Di M, Shengzhong F, Jianping F (2011) Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In: 2011 IEEE 17th international conference on parallel and distributed systems, pp 473–480 Yaobin H, Haoyu T, Wuman L, Huajian M, Di M, Shengzhong F, Jianping F (2011) Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In: 2011 IEEE 17th international conference on parallel and distributed systems, pp 473–480
30.
Zurück zum Zitat Weizhong Z, Huifang M, Qing H (2009) Parallel k-means clustering based on mapreduce. In: IEEE international conference on cloud computing, pp 674–679 Weizhong Z, Huifang M, Qing H (2009) Parallel k-means clustering based on mapreduce. In: IEEE international conference on cloud computing, pp 674–679
31.
Zurück zum Zitat Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y (2014) Elm: distributed extreme learning machine with mapreduce. World Wide Web 17(5):1189–1204CrossRef Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y (2014) Elm: distributed extreme learning machine with mapreduce. World Wide Web 17(5):1189–1204CrossRef
32.
Zurück zum Zitat Wang B, Huang S, Qiu J, Yu L, Wang G (2015) Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing 149:224–232CrossRef Wang B, Huang S, Qiu J, Yu L, Wang G (2015) Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing 149:224–232CrossRef
33.
Zurück zum Zitat Bi X, Zhao X, Wang G, Zhang P, Wang C (2015) Distributed extreme learning machine with kernels based on mapreduce. Neurocomputing 149:456–463CrossRef Bi X, Zhao X, Wang G, Zhang P, Wang C (2015) Distributed extreme learning machine with kernels based on mapreduce. Neurocomputing 149:456–463CrossRef
34.
Zurück zum Zitat Çatak FO (2017) Classification with boosting of extreme learning machine over arbitrarily partitioned data. Soft Comput 21(9):2269–2281CrossRef Çatak FO (2017) Classification with boosting of extreme learning machine over arbitrarily partitioned data. Soft Comput 21(9):2269–2281CrossRef
35.
Zurück zum Zitat Xi C, Qihang L, Javier P (2012) Optimal regularized dual averaging methods for stochastic optimization. In: Advances in neural information processing systems, pp 395–403 Xi C, Qihang L, Javier P (2012) Optimal regularized dual averaging methods for stochastic optimization. In: Advances in neural information processing systems, pp 395–403
36.
Zurück zum Zitat Hongzhou L, Julien M, Zaid H (2017) Catalyst acceleration for first-order convex optimization: from theory to practice. arXiv preprintarXiv:1712.05654 Hongzhou L, Julien M, Zaid H (2017) Catalyst acceleration for first-order convex optimization: from theory to practice. arXiv preprintarXiv:​1712.​05654
Metadaten
Titel
Sharp characterization of optimal minibatch size for stochastic finite sum convex optimization
verfasst von
Atsushi Nitanda
Tomoya Murata
Taiji Suzuki
Publikationsdatum
12.07.2021
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 9/2021
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-021-01593-1

Weitere Artikel der Ausgabe 9/2021

Knowledge and Information Systems 9/2021 Zur Ausgabe