Skip to main content

2017 | OriginalPaper | Buchkapitel

Projected Semi-Stochastic Gradient Descent Method with Mini-Batch Scheme Under Weak Strong Convexity Assumption

verfasst von : Jie Liu, Martin Takáč

Erschienen in: Modeling and Optimization: Theory and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a projected semi-stochastic gradient descent method with mini-batch for improving both the theoretical complexity and practical performance of the general stochastic gradient descent method (SGD). We are able to prove linear convergence under weak strong convexity assumption. This requires no strong convexity assumption for minimizing the sum of smooth convex functions subject to a compact polyhedral set, which remains popular across machine learning community. Our PS2GD preserves the low-cost per iteration and high optimization accuracy via stochastic gradient variance-reduced technique, and admits a simple parallel implementation with mini-batches. Moreover, PS2GD is also applicable to dual problem of SVM with hinge loss.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Even though the concept was first proposed by Liu and Wright in [15] as optimally strong convexity, to emphasize it as an extended version of strong convexity, we use the term weak strong convexity as in [6] throughout our paper.
 
2
It is possible to finish each iteration with only b evaluations for component gradients, namely \(\{\nabla f_{i}(y_{k,t})\}_{i\in A_{kt}}\), at the cost of having to store {∇f i (x k )} i ∈ [n], which is exactly the way that SAG [14] works. This speeds up the algorithm; nevertheless, it is impractical for big n.
 
3
We only need to prove the existence of β and do not need to evaluate its value in practice. Lemma 4 provides the existence of β.
 
6
In practice, it is impossible to ensure that evaluating different component gradients takes the same time; however, Fig. 2 implies the potential and advantage of applying mini-batch scheme with parallelism.
 
7
Note that this quantity is never computed during the algorithm. We can use it in the analysis nevertheless.
 
8
For simplicity, we omit the E[⋅ | y k, t ] notation in further analysis.
 
9
\(\bar{y}_{k,t+1}\) is constant, conditioned on y k, t .
 
Literatur
1.
Zurück zum Zitat Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)CrossRefMathSciNetMATH Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)CrossRefMathSciNetMATH
2.
3.
Zurück zum Zitat Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS (2014) Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS (2014)
4.
Zurück zum Zitat Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. arXiv:1312.5799 (2013) Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. arXiv:1312.5799 (2013)
5.
Zurück zum Zitat Fercoq, O., Qu, Z., Richtárik, P., Takáč, M.: Fast distributed coordinate descent for non-strongly convex losses. In: IEEE Workshop on Machine Learning for Signal Processing (2014)CrossRef Fercoq, O., Qu, Z., Richtárik, P., Takáč, M.: Fast distributed coordinate descent for non-strongly convex losses. In: IEEE Workshop on Machine Learning for Signal Processing (2014)CrossRef
6.
Zurück zum Zitat Gong, P., Ye, J.: Linear convergence of variance-reduced projected stochastic gradient without strong convexity. arXiv:1406.1102 (2014) Gong, P., Ye, J.: Linear convergence of variance-reduced projected stochastic gradient without strong convexity. arXiv:1406.1102 (2014)
7.
Zurück zum Zitat Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49(4), 263–265 (1952)CrossRefMathSciNet Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49(4), 263–265 (1952)CrossRefMathSciNet
8.
Zurück zum Zitat Jaggi, M., Smith, V., Takáč, M., Terhorst, J., Hofmann, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: NIPS, pp. 3068–3076 (2014) Jaggi, M., Smith, V., Takáč, M., Terhorst, J., Hofmann, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: NIPS, pp. 3068–3076 (2014)
9.
Zurück zum Zitat Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013)
10.
Zurück zum Zitat Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In: ECML PKDD, pp. 795–811 (2016) Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In: ECML PKDD, pp. 795–811 (2016)
11.
Zurück zum Zitat Kloft, M., Brefeld, U., Laskov, P., Müller, K.-R., Zien, A., Sonnenburg, S.: Efficient and accurate lp-norm multiple kernel learning. In: NIPS, pp. 997–1005 (2009) Kloft, M., Brefeld, U., Laskov, P., Müller, K.-R., Zien, A., Sonnenburg, S.: Efficient and accurate lp-norm multiple kernel learning. In: NIPS, pp. 997–1005 (2009)
12.
Zurück zum Zitat Konečný, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Sign. Proces. 10, 242–255 (2016)CrossRef Konečný, J., Liu, J., Richtárik, P., Takáč, M.: Mini-batch semi-stochastic gradient descent in the proximal setting. IEEE J. Sel. Top. Sign. Proces. 10, 242–255 (2016)CrossRef
13.
Zurück zum Zitat Konečný, J., Richtárik, P.: Semi-stochastic gradient descent methods. arXiv:1312.1666 (2013) Konečný, J., Richtárik, P.: Semi-stochastic gradient descent methods. arXiv:1312.1666 (2013)
14.
Zurück zum Zitat Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2672–2680 (2012) Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. In: NIPS, pp. 2672–2680 (2012)
15.
Zurück zum Zitat Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 25(1), 351–376 (2015)CrossRefMathSciNetMATH Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 25(1), 351–376 (2015)CrossRefMathSciNetMATH
16.
Zurück zum Zitat Mareček, J., Richtárik, P., Takáč, M.: Distributed block coordinate descent for minimizing partially separable functions. In: Numerical Analysis and Optimization 2014, Springer Proceedings in Mathematics and Statistics, pp. 261–286 (2014) Mareček, J., Richtárik, P., Takáč, M.: Distributed block coordinate descent for minimizing partially separable functions. In: Numerical Analysis and Optimization 2014, Springer Proceedings in Mathematics and Statistics, pp. 261–286 (2014)
17.
Zurück zum Zitat Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: convergence analysis and error bounds. SIAM J. Optim. 26(1), 197–226 (2016)CrossRefMathSciNetMATH Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: convergence analysis and error bounds. SIAM J. Optim. 26(1), 197–226 (2016)CrossRefMathSciNetMATH
18.
Zurück zum Zitat Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)CrossRefMathSciNetMATH Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)CrossRefMathSciNetMATH
19.
Zurück zum Zitat Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. arXiv:1504.06298 (2015) Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. arXiv:1504.06298 (2015)
20.
Zurück zum Zitat Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)CrossRefMathSciNetMATH Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)CrossRefMathSciNetMATH
21.
Zurück zum Zitat Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)CrossRefMATH Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)CrossRefMATH
22.
Zurück zum Zitat Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)CrossRefMathSciNetMATH Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)CrossRefMathSciNetMATH
24.
Zurück zum Zitat Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. arXiv:1703.00102 (2017) Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: a novel method for machine learning problems using stochastic recursive gradient. arXiv:1703.00102 (2017)
25.
Zurück zum Zitat Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)CrossRefMathSciNetMATH Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)CrossRefMathSciNetMATH
26.
Zurück zum Zitat Richtárik, P., Takáč, M.: Distributed coordinate descent method for learning with big data. J. Mach. Learn. Res. 17, 1–25 (2016)MathSciNetMATH Richtárik, P., Takáč, M.: Distributed coordinate descent method for learning with big data. J. Mach. Learn. Res. 17, 1–25 (2016)MathSciNetMATH
27.
Zurück zum Zitat Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. Ser. A 156, 1–52 (2016)CrossRefMathSciNetMATH Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. Ser. A 156, 1–52 (2016)CrossRefMathSciNetMATH
28.
Zurück zum Zitat Shalev-Shwartz, S., Zhang, T.: Accelerated mini-batch stochastic dual coordinate ascent. In: NIPS, pp. 378–385 (2013) Shalev-Shwartz, S., Zhang, T.: Accelerated mini-batch stochastic dual coordinate ascent. In: NIPS, pp. 378–385 (2013)
29.
Zurück zum Zitat Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)MathSciNetMATH Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)MathSciNetMATH
30.
Zurück zum Zitat Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. Ser. A, B Spec. Issue Optim. Mach. Learn. 127, 3–30 (2011)CrossRefMathSciNetMATH Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. Ser. A, B Spec. Issue Optim. Mach. Learn. 127, 3–30 (2011)CrossRefMathSciNetMATH
31.
Zurück zum Zitat Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: ICML, pp. 71–79. Springer, New York (2013) Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: ICML, pp. 71–79. Springer, New York (2013)
32.
Zurück zum Zitat Takáč, M., Bijral, A.S., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. In: ICML, pp. 537–552. Springer (2013) Takáč, M., Bijral, A.S., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. In: ICML, pp. 537–552. Springer (2013)
33.
Zurück zum Zitat Wang, P.-W., Lin, C.-J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15, 1523–1548 (2014)MathSciNetMATH Wang, P.-W., Lin, C.-J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15, 1523–1548 (2014)MathSciNetMATH
34.
Zurück zum Zitat Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)CrossRefMathSciNetMATH Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4), 2057–2075 (2014)CrossRefMathSciNetMATH
35.
Zurück zum Zitat Zhang, T.: Solving large scale linear prediction using stochastic gradient descent algorithms. In: ICML, pp. 919–926. Springer (2004) Zhang, T.: Solving large scale linear prediction using stochastic gradient descent algorithms. In: ICML, pp. 919–926. Springer (2004)
36.
Zurück zum Zitat Zhang, H.: The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817–833 (2016)CrossRefMathSciNetMATH Zhang, H.: The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11(4), 817–833 (2016)CrossRefMathSciNetMATH
37.
Zurück zum Zitat Zhang, L., Mahdavi, M., Jin, R.: Linear convergence with condition number independent access of full gradients. In: NIPS, pp. 980–988 (2013) Zhang, L., Mahdavi, M., Jin, R.: Linear convergence with condition number independent access of full gradients. In: NIPS, pp. 980–988 (2013)
Metadaten
Titel
Projected Semi-Stochastic Gradient Descent Method with Mini-Batch Scheme Under Weak Strong Convexity Assumption
verfasst von
Jie Liu
Martin Takáč
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-66616-7_7