Skip to main content
Log in

Subgradient Methods for Sharp Weakly Convex Functions

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

Subgradient methods converge linearly on a convex function that grows sharply away from its solution set. In this work, we show that the same is true for sharp functions that are only weakly convex, provided that the subgradient methods are initialized within a fixed tube around the solution set. A variety of statistical and signal processing tasks come equipped with good initialization and provably lead to formulations that are both weakly convex and sharp. Therefore, in such settings, subgradient methods can serve as inexpensive local search procedures. We illustrate the proposed techniques on phase retrieval and covariance estimation problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Notes

  1. To the best of our knowledge, the class of weakly convex functions was introduced in 1972 in a local publication of the Institute of Cybernetics in Kiev and then used in [19]. These are the functions that are lower bounded by a linear function up to a little-o term that is uniform in the base point. The term \(\rho \)-weak convexity, as we use it here, corresponds to the setting when the error term is a quadratic. A number of other names are used in the literature instead of \(\rho \)-weak convexity, such as semi-convex, prox-regular, and lower-\(C^2\).

  2. Note this potential function is different from the direct generalization of the one used in phase retrieval, \(\min _{x}\sum _i \left| \Vert X^Ta_i\Vert ^2-b_i\right| \). The reason for the more exotic formulation is that the differences \(a_{2i}a_{2i}^T-a_{2i-1}a_{2i-1}^T\) in (5) allow the authors of [38] to prove sharpness of the objective function by appealing to an \(\ell _2/\ell _1\) restricted isometry property of the resulting linear map. See [38, Section III.B] for details. It is not clear whether the naive objective function is also sharp with high probability under reasonable statistical assumptions.

References

  1. Jane, P., Netrapalli, P.: Fast exact matrix completion with finite samples. In: Grnwald, P., Hazan, E., Kale, S. (eds.) Proceedings of The 28th Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 40, pp. 1007–1034. PMLR, Paris, France (2015). http://proceedings.mlr.press/v40/Jain15.html

  2. Tu, S., Boczar, R., Simchowitz, M., Soltanolkotabi, M., Recht, B.: Low-rank solutions of linear matrix equations via procrustes flow. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning—Volume 48, ICML’16, pp. 964–973. JMLR.org (2016). http://dl.acm.org/citation.cfm?id=3045390.3045493

  3. Candès, E., Li, X., Soltanolkotabi, M.: Phase retrieval via Wirtinger flow: theory and algorithms. IEEE Trans. Inf. Theory 61(4), 1985–2007 (2015). https://doi.org/10.1109/TIT.2015.2399924

    Article  MathSciNet  MATH  Google Scholar 

  4. Jain, P., Jin, C., Kakade, S., Netrapalli, P.: Global convergence of non-convex gradient descent for computing matrix squareroot. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 479–488 (2017). http://proceedings.mlr.press/v54/jain17a.html

  5. Chen, Y., Wainwright, M.: Fast low-rank estimation by projected gradient descent: general statistical and algorithmic guarantees (2015). arXiv:1509.03025

  6. Meka, R., Jain, P., Dhillon, I.: Guaranteed rank minimization via singular value projection (2009). arXiv:0909.5457

  7. Boumal, N.: Nonconvex phase synchronization. SIAM J. Optim. 26(4), 2355–2377 (2016)

    Article  MathSciNet  Google Scholar 

  8. Brutzkus, A., Globerson, A.: Globally optimal gradient descent for a ConvNet with Gaussian inputs (2017). arXiv:1702.07966

  9. Eremin, I.: The relaxation method of solving systems of inequalities with convex functions on the left-hand side. Dokl. Akad. Nauk SSSR 160, 994–996 (1965)

    MathSciNet  Google Scholar 

  10. Poljak, B.: Minimization of unsmooth functionals. USSR Comput. Math. Math. Phys. 9, 14–29 (1969)

    Article  Google Scholar 

  11. Shor, N.: The rate of convergence of the method of the generalized gradient descent with expansion of space. Kibernetika (Kiev) 2, 80–85 (1970)

    MathSciNet  MATH  Google Scholar 

  12. Goffin, J.: On convergence rates of subgradient optimization methods. Math. Program. 13(3), 329–347 (1977). https://doi.org/10.1007/BF01584346

    Article  MathSciNet  MATH  Google Scholar 

  13. Supittayapornpong, S., Neely, M.: Staggered time average algorithm for stochastic non-smooth optimization with \({{O}}(1/t)\) convergence (2016). arXiv:1607.02842

  14. Yang, T., Lin, Q.: RSG: beating subgradient method without smoothness and strong convexity (2016). arXiv:1512.03107

  15. Johnstone, P., Moulin, P.: Faster subgradient methods for functions with Hölderian growth (2017). arXiv:1704.00196

  16. Polyak, B.: Sharp minima. Institute of Control Sciences Lecture Notes, Moscow, USSR; Presented at the IIASA Workshop on Generalized Lagrangians and Their Applications, IIASA, Laxenburg, Austria, vol. 3, pp. 369–380 (1979)

  17. Burke, J., Ferris, M.: Weak sharp minima in mathematical programming. SIAM J. Control Optim. 31(5), 1340–1359 (1993). https://doi.org/10.1137/0331063

    Article  MathSciNet  MATH  Google Scholar 

  18. Poljak, B.: Subgradient methods: a survey of Soviet research. In: Nonsmooth Optimization (Proc. IIASA Workshop, Laxenburg, 1977), IIASA Proc. Ser., vol. 3, pp. 5–29. Pergamon, Oxford (1978)

  19. Nurminskii, E.A.: The quasigradient method for the solving of the nonlinear programming problems. Cybernetics 9(1), 145–150 (1973). https://doi.org/10.1007/BF01068677

    Article  Google Scholar 

  20. Nurminskii, E.A.: Minimization of nondifferentiable functions in the presence of noise. Cybernetics 10(4), 619–621 (1974). https://doi.org/10.1007/BF01071541

    Article  MathSciNet  Google Scholar 

  21. Davis, D., Drusvyatskiy, D.: Stochastic subgradient method converges at the rate \(O(k^{-1/4})\) on weakly convex functions (2018). arXiv:1802.02988

  22. Drusvyatskiy, D., Lewis, A.: Error bounds, quadratic growth, and linear convergence of proximal methods. To appear in Mathematics of Operations Research (2016). arXiv:1602.06661

  23. Duchi, J., Ruan, F.: Stochastic methods for composite optimization problems (2017). Preprint arXiv:1703.08570

  24. Duchi, J., Ruan, F.: Solving (most) of a set of quadratic equalities: composite optimization for robust phase retrieval (2017). arXiv:1705.02356

  25. Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps (2016). Preprint arXiv:1605.00125

  26. Davis, D., Grimmer, B.: Proximally guided stochastic method for nonsmooth, nonconvex problems (2017). Preprint arXiv:1707.03505

  27. Drusvyatskiy, D.: The proximal point method revisited. To appear in SIAG/OPT Views and News (2018). arXiv:1712.06038

  28. Rockafellar, R., Wets, R.B.: Variational Analysis. Grundlehren der mathematischen Wissenschaften, vol. 317. Springer, Berlin (1998)

    Google Scholar 

  29. Mordukhovich, B.: Variational Analysis and Generalized Differentiation I: Basic Theory. Grundlehren der mathematischen Wissenschaften, vol. 330. Springer, Berlin (2006)

    Google Scholar 

  30. Penot, J.P.: Calculus Without Derivatives, Graduate Texts in Mathematics, vol. 266. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-4538-8

    Book  Google Scholar 

  31. Ioffe, A.: Variational Analysis of Regular Mappings. Springer Monographs in Mathematics. Springer, Cham (2017). https://doi-org.offcampus.lib.washington.edu/10.1007/978-3-319-64277-2. Theory and Applications

    Book  Google Scholar 

  32. Poliquin, R., Rockafellar, R.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348, 1805–1838 (1996)

    Article  MathSciNet  Google Scholar 

  33. Eldar, Y., Mendelson, S.: Phase retrieval: stability and recovery guarantees. Appl. Comput. Harmon. Anal. 36(3), 473–494 (2014). https://doi.org/10.1016/j.acha.2013.08.003

    Article  MathSciNet  MATH  Google Scholar 

  34. Davis, D., Drusvyatskiy, D., Paquette, C.: The nonsmooth landscape of phase retrieval (2017). arXiv:1711.03247

  35. Chen, Y., Candès, E.: Solving random quadratic systems of equations is nearly as easy as solving linear systems. Commun. Pure Appl. Math. 70(5), 822–883 (2017). https://doi-org.offcampus.lib.washington.edu/10.1002/cpa.21638

    Article  MathSciNet  Google Scholar 

  36. Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. To appear in Foundations of Computational Mathematic (2017). arXiv:1602.06664

  37. Tan, Y., Vershynin, R.: Phase retrieval via randomized Kaczmarz: theoretical guarantees (2017). arXiv:1605.08285

  38. Chen, Y., Chi, Y., Goldsmith, A.J.: Exact and stable covariance estimation from quadratic sampling via convex programming. IEEE Trans. Inf. Theory 61(7), 4034–4059 (2015)

    Article  MathSciNet  Google Scholar 

  39. Clarke, F.: Optimization and Nonsmooth Analysis. Wiley, New York (1983)

    MATH  Google Scholar 

Download references

Acknowledgements

Research of Drusvyatskiy and MacPhee was partially supported by the AFOSR YIP Award FA9550-15-1-0237 and by the NSF DMS 1651851 and CCF 1740551 Awards. Research of Paquette was supported by NSF CCF 1740796.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dmitriy Drusvyatskiy.

Additional information

Communicated by Evgeni A. Nurminski.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Davis, D., Drusvyatskiy, D., MacPhee, K.J. et al. Subgradient Methods for Sharp Weakly Convex Functions. J Optim Theory Appl 179, 962–982 (2018). https://doi.org/10.1007/s10957-018-1372-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-018-1372-8

Keywords

Mathematics Subject Classification

Navigation