nach oben

Erschienen in:

2017 | OriginalPaper | Buchkapitel

Projected Semi-Stochastic Gradient Descent Method with Mini-Batch Scheme Under Weak Strong Convexity Assumption

verfasst von : Jie Liu, Martin Takáč

Erschienen in: Modeling and Optimization: Theory and Applications

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We propose a projected semi-stochastic gradient descent method with mini-batch for improving both the theoretical complexity and practical performance of the general stochastic gradient descent method (SGD). We are able to prove linear convergence under weak strong convexity assumption. This requires no strong convexity assumption for minimizing the sum of smooth convex functions subject to a compact polyhedral set, which remains popular across machine learning community. Our PS2GD preserves the low-cost per iteration and high optimization accuracy via stochastic gradient variance-reduced technique, and admits a simple parallel implementation with mini-batches. Moreover, PS2GD is also applicable to dual problem of SVM with hinge loss.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Minimization of the L p -Norm, p ≥ 1 of Dirichlet-Type Boundary Controls for the 1D Wave Equation

Nächstes Kapitel Exact Separation of k-Projection Polytope Constraints

Nur mit Berechtigung zugänglich

Even though the concept was first proposed by Liu and Wright in [15] as optimally strong convexity, to emphasize it as an extended version of strong convexity, we use the term weak strong convexity as in [6] throughout our paper.

It is possible to finish each iteration with only b evaluations for component gradients, namely \(\{\nabla f_{i}(y_{k,t})\}_{i\in A_{kt}}\), at the cost of having to store {∇f _i(x _k)}_{i ∈ [n]}, which is exactly the way that SAG [14] works. This speeds up the algorithm; nevertheless, it is impractical for big n.

We only need to prove the existence of β and do not need to evaluate its value in practice. Lemma 4 provides the existence of β.

rcv1 and news20 are available at http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

Available at http://users.cecs.anu.edu.au/~xzhang/data/.

In practice, it is impossible to ensure that evaluating different component gradients takes the same time; however, Fig. 2 implies the potential and advantage of applying mini-batch scheme with parallelism.

Note that this quantity is never computed during the algorithm. We can use it in the analysis nevertheless.

For simplicity, we omit the E[⋅ | y _k, t] notation in further analysis.

\(\bar{y}_{k,t+1}\) is constant, conditioned on y _k, t.

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)CrossRefMathSciNetMATH

Calamai, P.H., Moré, J.J.: Projected gradient methods for linearly constrained problems. Math. Program. 39, 93–116 (1987)CrossRefMathSciNetMATH

Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS (2014)

Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. arXiv:1312.5799 (2013)

Fercoq, O., Qu, Z., Richtárik, P., Takáč, M.: Fast distributed coordinate descent for non-strongly convex losses. In: IEEE Workshop on Machine Learning for Signal Processing (2014)CrossRef

Gong, P., Ye, J.: Linear convergence of variance-reduced projected stochastic gradient without strong convexity. arXiv:1406.1102 (2014)

Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49(4), 263–265 (1952)CrossRefMathSciNet

Jaggi, M., Smith, V., Takáč, M., Terhorst, J., Hofmann, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: NIPS, pp. 3068–3076 (2014)

Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013)

10.

Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In: ECML PKDD, pp. 795–811 (2016)

11.

Kloft, M., Brefeld, U., Laskov, P., Müller, K.-R., Zien, A., Sonnenburg, S.: Efficient and accurate lp-norm multiple kernel learning. In: NIPS, pp. 997–1005 (2009)

12.

Konečný, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Sign. Proces. 10, 242–255 (2016)CrossRef

13.

Konečný, J., Richtárik, P.: Semi-stochastic gradient descent methods. arXiv:1312.1666 (2013)

14.

Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2672–2680 (2012)

15.

Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 25(1), 351–376 (2015)CrossRefMathSciNetMATH

16.

Mareček, J., Richtárik, P., Takáč, M.: Distributed block coordinate descent for minimizing partially separable functions. In: Numerical Analysis and Optimization 2014, Springer Proceedings in Mathematics and Statistics, pp. 261–286 (2014)

17.

Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: convergence analysis and error bounds. SIAM J. Optim. 26(1), 197–226 (2016)CrossRefMathSciNetMATH

18.

Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)CrossRefMathSciNetMATH

19.

Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. arXiv:1504.06298 (2015)

20.

Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)CrossRefMathSciNetMATH

21.

Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)CrossRefMATH

22.

Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)CrossRefMathSciNetMATH

23.

Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)CrossRefMathSciNetMATH

24.

Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. arXiv:1703.00102 (2017)

25.

Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)CrossRefMathSciNetMATH

26.

Richtárik, P., Takáč, M.: Distributed coordinate descent method for learning with big data. J. Mach. Learn. Res. 17, 1–25 (2016)MathSciNetMATH

27.

Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. Ser. A 156, 1–52 (2016)CrossRefMathSciNetMATH

28.

Shalev-Shwartz, S., Zhang, T.: Accelerated mini-batch stochastic dual coordinate ascent. In: NIPS, pp. 378–385 (2013)

29.

Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)MathSciNetMATH

30.

Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. Ser. A, B Spec. Issue Optim. Mach. Learn. 127, 3–30 (2011)CrossRefMathSciNetMATH

31.

Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: ICML, pp. 71–79. Springer, New York (2013)

32.

Takáč, M., Bijral, A.S., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. In: ICML, pp. 537–552. Springer (2013)

33.

Wang, P.-W., Lin, C.-J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15, 1523–1548 (2014)MathSciNetMATH

34.

Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)CrossRefMathSciNetMATH

35.

Zhang, T.: Solving large scale linear prediction using stochastic gradient descent algorithms. In: ICML, pp. 919–926. Springer (2004)

36.

Zhang, H.: The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817–833 (2016)CrossRefMathSciNetMATH

37.

Zhang, L., Mahdavi, M., Jin, R.: Linear convergence with condition number independent access of full gradients. In: NIPS, pp. 980–988 (2013)

Titel: Projected Semi-Stochastic Gradient Descent Method with Mini-Batch Scheme Under Weak Strong Convexity Assumption
verfasst von: Jie Liu
Martin Takáč
Verlag: Springer International Publishing
Buch: Modeling and Optimization: Theory and Applications
Print ISBN: 978-3-319-66615-0

Electronic ISBN: 978-3-319-66616-7

Copyright-Jahr: 2017
DOI: https://doi.org/10.1007/978-3-319-66616-7_7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"