Skip to main content
Log in

Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

This paper considers a class of constrained stochastic composite optimization problems whose objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a certain non-differentiable (but convex) component. In order to solve these problems, we propose a randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of stochastic samples allowed. The RSPG algorithm also employs a general distance function to allow taking advantage of the geometry of the feasible region. Complexity of this algorithm is established in a unified setting, which shows nearly optimal complexity of the algorithm for convex stochastic programming. A post-optimization phase is also proposed to significantly reduce the variance of the solutions returned by the algorithm. In addition, based on the RSPG algorithm, a stochastic gradient free algorithm, which only uses the stochastic zeroth-order information, has been also discussed. Some preliminary numerical results are also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Andradóttir, S.: A review of simulation optimization techniques. In: Proceedings of the Winter Simulation Conference, pp. 151–158 (1998)

  2. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bauschke, H., Borwein, J., Combettes, P.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42, 596–636 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  4. Ben-Tal, A., Margalit, T., Nemirovski, A.S.: The ordered subsets mirror descent optimization method with applications to tomography. SIAM J. Optim. 12, 79–108 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bregman, L.: The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Phys. 7, 200–217 (1967)

    Article  Google Scholar 

  6. Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20(6), 2833–2852 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  7. Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)

    MATH  Google Scholar 

  8. Dang, C.D., Lan, G.: On the Convergence Properties of Non-Euclidean Extragradient Methods for Variational Inequalities with Generalized Monotone Operators, manuscript, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA, April 2012. Available on http://www.optimization-online.org/

  9. Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22, 674–701 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  10. Duchi, J.C., Shalev-shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of the Twenty Third Annual Conference on Computational Learning Theory (2010)

  11. Flaxman, A.D., Kalai, A.T., McMahan, H.B.: Online convex optimization in the bandit setting: gradient descent without a gradient. J. Am. Stat. Assoc., 385–394 (2005). Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms

  12. Fu, M.: Gradient estimation. In: Henderson, S.G., Nelson, B.L. (eds). Handbooks in Operations Research and Management Science: Simulation. Elsevier, Amsterdam, pp. 575–616 (2008)

  13. Fu, M.: Optimization for simulation: theory vs. practice. INFORMS J. Comput. 14, 192–215 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  14. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  15. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  16. Ghadimi, S., Lan, G.: Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Optimization. Technical Report. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA (2013)

  17. Glasserman, P.: Gradient Estimation Via Perturbation Analysis. Kluwer, Boston, MA (2003)

    Google Scholar 

  18. Juditsky, A., Nemirovski, A.S.: Large Deviations of Vector-Valued Martingales in \(2\)-Smooth Normed Spaces, manuscript, Georgia Institute of Technology, Atlanta, GA, E-print: www2.isye.gatech.edu/~nemirovs/LargeDevSubmitted.pdf (2008)

  19. Juditsky, A., Nemirovski, A.S.: First-order methods for nonsmooth convex large-scale optimization. I: general purpose methods. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning. MIT Press, Cambridge, MA (2011)

    Google Scholar 

  20. Kelton, R.P.S.W.D., Sturrock, D.T.: Simulation with Arena, 4th edn. McGraw-Hill, New York (2007)

    Google Scholar 

  21. Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  22. Lan, G., Nemirovski, A.S., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  23. LÉcuyer, P.: A unified view of the IPA, SF, and LR gradient estimation techniques. Manag. Sci. 36(11), 1364–1383 (1990)

  24. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, pp. 689–696 (2009)

  25. Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. Proc. NIPS 12, 512–518 (1999)

    Google Scholar 

  26. Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  27. Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley, Chichester, New York (1983)

  28. Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston, MA (2004)

    Book  Google Scholar 

  29. Nesterov, Y.E.: Random Gradient-Free Minimization of Convex Functions, Technical Report. Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (2010)

  30. Polyak, B.: New stochastic approximation type procedures. Automat. i Telemekh. 7, 98–107 (1990)

    MathSciNet  Google Scholar 

  31. Polyak, B., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  32. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  33. Rockafellar, R.T., Wets, R.J.-B.: Variational analysis, ser. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer, Berlin (1998)

  34. Rubinstein, R., Shapiro, A.: Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method. Wiley, New York (1993)

    MATH  Google Scholar 

  35. Schmidt, M., Roux, N.L., Bach, F.: Minimizing Finite Sums with the Stochastic Average Gradient, Technical Report (2013)

  36. Spall, J.: Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley, Hoboken, NJ (2003)

    Book  Google Scholar 

  37. Sra, S.: Scalable nonconvex inexact proximal splitting. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 25, pp. 530–538. Curran Associates, Inc. (2012)

  38. Teboulle, M.: Convergence of proximal-like algorithms. SIAM J. Optim. 7, 1069–1083 (1997)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanghui Lan.

Additional information

This research was partially supported by NSF grants CMMI-1000347, CMMI-1254446, DMS-1319050, DMS-1016204 and ONR grant N00014-13-1-0036.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghadimi, S., Lan, G. & Zhang, H. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155, 267–305 (2016). https://doi.org/10.1007/s10107-014-0846-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-014-0846-1

Keywords

Mathematics Subject Classification

Navigation