nach oben

Knowledge and Information Systems

Erschienen in:

12.07.2021 | Regular paper

Sharp characterization of optimal minibatch size for stochastic finite sum convex optimization

verfasst von: Atsushi Nitanda, Tomoya Murata, Taiji Suzuki

Erschienen in: Knowledge and Information Systems | Ausgabe 9/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The minibatching technique has been extensively adopted to facilitate stochastic first-order methods because of their computational efficiency in parallel computing for large-scale machine learning and data mining. Indeed, increasing the minibatch size decreases the iteration complexity (number of minibatch queries) to converge, resulting in the decrease of the running time by processing a minibatch in parallel. However, this gain is usually saturated for too large minibatch sizes and the total computational complexity (number of access to an example) is deteriorated. Hence, the determination of an appropriate minibatch size which controls the trade-off between the iteration and total computational complexities is important to maximize performance of the method with as few computational resources as possible. In this study, we define the optimal minibatch size as the minimum minibatch size with which there exists a stochastic first-order method that achieves the optimal iteration complexity and we call such a method the optimal minibatch method. Moreover, we show that Katyusha (in: Proceedings of annual ACM SIGACT symposium on theory of computing vol 49, pp 1200–1205, ACM, 2017), DASVRDA (Murata and Suzuki, in: Advances in neural information processing systems vol 30, pp 608–617, 2017), and the proposed method which is a combination of Acc-SVRG (Nitanda, in: Advances in neural information processing systems vol 27, pp 1574–1582, 2014) with APPA (Cotter et al. in: Advances in neural information processing systems vol 27, pp 3059–3067, 2014) are optimal minibatch methods. In experiments, we compare optimal minibatch methods with several competitors on \(L_1\)-and \(L_2\)-regularized logistic regression problems and observe that iteration complexities of optimal minibatch methods linearly decrease as minibatch sizes increase up to reasonable minibatch sizes and finally attain the best iteration complexities. This confirms the computational efficiency of optimal minibatch methods suggested by the theory.

Vorheriger Artikel Deep reinforcement learning-based resource allocation and seamless handover in multi-access edge computing based on SDN

Nächster Artikel Imputing sentiment intensity for SaaS service quality aspects using T-nearest neighbors with correlation-weighted Euclidean distance

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

In this case, the all methods converged within one or two epochs and performed very similarly to each other.

When \(\lambda _2 = 0\), we used dummy L2-regularization with \(\lambda _2 = 10^{-8}\), that was roughly \(\varepsilon * \Vert x_*\Vert ^2\), where \(\varepsilon \) is the desired accuracy and was set to \(10^{-4}\) and \(x_*\) is the optimal solution.

Allen-Zhu Z (2017) Katyusha: the first direct acceleration of stochastic gradient methods. In: Proceedings of annual ACM SIGACT symposium on theory of computing 49, pp 1200–1205. ACM

Murata T, Suzuki T (2017) Doubly accelerated stochastic variance reduced dual averaging method for regularized empirical risk minimization. Adv Neural Inf Process Syst 30:608–617

Nitanda A (2014) Stochastic proximal gradient descent with acceleration techniques. Adv Neural Inf Process Syst 27:1574–1582

Lin Q, Lu Z, Xiao L (2014) An accelerated proximal coordinate gradient method. Adv Neural Inf Process Syst 27:3059–3067

Cotter A, Shamir O, Srebro N, Sridharan K (2011) Better mini-batch algorithms via accelerated gradient methods. Adv Neural Inf Process Syst 24:1647–1655

Shalev-Shwartz S, Zhang T (2013) Accelerated mini-batch stochastic dual coordinate ascent. Adv Neural Inf Process Syst 26:378–385

Kai Z, Liang H (2013) Minibatch and parallelization for online large margin structured learning. In: Proceedings of the 2013 Conference of the North American chapter of the association for computational linguistics: human language technologies, pp 370–379

Martin T, Singh BA, Peter R, Nati S (2013) Mini-batch primal and dual methods for svms. In: ICML (3), pp 1022–1030

Mu L, Tong Z, Yuqiang C, Alexander JS (2014) Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 661–670. ACM

10.

Peilin Z, Tong Z (2014) Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv preprintarXiv:1405.3080

11.

Jakub Konečnỳ, Jie Liu, Peter Richtárik, Martin Takáč (2016) Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J Select Top Signal Process 10(2):242–255CrossRef

12.

Martin T, Peter R, Nathan S (2015) Distributed mini-batch sdca. arXiv preprintarXiv:1507.08322

13.

Prateek J, Sham KM, Rahul K, Praneeth N, Aaron S (2016) Parallelizing stochastic approximation through mini-batching and tail-averaging. Stat 1050:12

14.

Nicolas LR, Mark S, Francis RB (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. In: Advances in neural information processing systems, pp 2663–2671

15.

Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Adv Neural Inf Process Syst 26:315–323

16.

Defazio A, Bach F, Lacoste-Julien S (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Adv Neural Inf Process Syst 27:1646–1654

17.

Lin H, Mairal J, Harchaoui Z (2015) A universal catalyst for first-order optimization. Adv Neural Inf Process Syst 28:3384–3392

18.

Frostig R, Ge R, Kakade S, Sidford A (2015) Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization. In: Proceedings of International Conference on Machine Learning vol 32, pp 2540–2548

19.

Blake EW, Srebro N (2016) Tight complexity bounds for optimizing composite objectives. Adv Neural Inf Process Syst 29:3639–3647

20.

Arjevani Y, Shamir O (2016) Dimension-free iteration complexity of finite sum optimization problems. Adv Neural Inf Process Syst 29:3540–3548

21.

Mark S, Le RN (2013) Fast convergence of stochastic gradient descent under a strong growth condition. arXiv preprintarXiv:1308.6370

22.

Lam MN, Liu J, Scheinberg K, Takáč M (2017) Sarah: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of international conference on machine learning vol 34, pp 2613–2621

23.

Ofer D, Ran G-B, Ohad S, Lin X (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:165–202MathSciNetMATH

24.

Nidham G, Robert MG, Joseph S (2019) Optimal mini-batch and step sizes for saga. arXiv preprintarXiv:1902.00071

25.

Yurii N (2004) Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, New YorkMATH

26.

Panda B, Joshua SH, Basu S, Roberto JB (2009) Planet: massively parallel learning of tree ensembles with mapreduce. Proc VLDB Endow 2(2):1426–1437CrossRef

27.

Chi Z, Feifei L, Jeffrey J (2012) Efficient parallel knn joins for large data in mapreduce. In: Proceedings of the 15th international conference on extending database technology, pp 38–49

28.

Tianyang S, Chengchun S, Feng L, Yu H, Lili M, Yitong F (2009) An efficient hierarchical clustering method for large datasets with map-reduce. In: 2009 International conference on parallel and distributed computing, applications and technologies, pp 494–499

29.

Yaobin H, Haoyu T, Wuman L, Huajian M, Di M, Shengzhong F, Jianping F (2011) Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In: 2011 IEEE 17th international conference on parallel and distributed systems, pp 473–480

30.

Weizhong Z, Huifang M, Qing H (2009) Parallel k-means clustering based on mapreduce. In: IEEE international conference on cloud computing, pp 674–679

31.

Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y (2014) Elm: distributed extreme learning machine with mapreduce. World Wide Web 17(5):1189–1204CrossRef

32.

Wang B, Huang S, Qiu J, Yu L, Wang G (2015) Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing 149:224–232CrossRef

33.

Bi X, Zhao X, Wang G, Zhang P, Wang C (2015) Distributed extreme learning machine with kernels based on mapreduce. Neurocomputing 149:456–463CrossRef

34.

Çatak FO (2017) Classification with boosting of extreme learning machine over arbitrarily partitioned data. Soft Comput 21(9):2269–2281CrossRef

35.

Xi C, Qihang L, Javier P (2012) Optimal regularized dual averaging methods for stochastic optimization. In: Advances in neural information processing systems, pp 395–403

36.

Hongzhou L, Julien M, Zaid H (2017) Catalyst acceleration for first-order convex optimization: from theory to practice. arXiv preprintarXiv:1712.05654

Titel: Sharp characterization of optimal minibatch size for stochastic finite sum convex optimization
verfasst von: Atsushi Nitanda
Tomoya Murata
Taiji Suzuki
Publikationsdatum: 12.07.2021
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 9/2021
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-021-01593-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 9/2021

Imputing sentiment intensity for SaaS service quality aspects using T-nearest neighbors with correlation-weighted Euclidean distance

OpenWGL: open-world graph learning for unseen class node classification

Toward data-driven solutions to interactive dynamic influence diagrams

On augmenting database schemas by latent visual attributes

Privacy protection of user profiles in online search via semantic randomization

On entropy-based term weighting schemes for text categorization