Skip to main content
Top
Published in: Neural Computing and Applications 20/2021

06-08-2021 | Review

Accelerated proximal stochastic variance reduction for DC optimization

Authors: Lulu He, Jimin Ye, Jianwei E

Published in: Neural Computing and Applications | Issue 20/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this article, we study an important class of stochastic difference-of-convex (SDC) programming whose objective is given in the form of the sum of a continuously differentiable convex function, a simple convex function and a continuous concave function. Recently, a proximal stochastic variance reduction difference-of-convex algorithm (Prox-SVRDCA) (Xu et al., 2019) is developed for this problem. And, Prox-SVRDCA reduces to the proximal stochastic variance reduction gradient (Prox-SVRG) (Xiao and Zhang, 2014) as the continuous concave function is disappeared, and hence Prox-SVRDCA is potentially slow in practice. Inspired by recently proposed acceleration techniques, an accelerated proximal stochastic variance reduction difference-of-convex algorithm (AProx-SVRDCA) is proposed. Different from Prox-SVRDCA, an extrapolation acceleration step that involves the latest two iteration points is incorporated in AProx-SVRDCA. The experimental results show that, for a fairly general choice of the extrapolation parameter, the acceleration will be achieved for AProx-SVRDCA. Then, a rigorous theoretical analysis is presented. We first show that any accumulation point of the generated iteration sequences is a stationary point of the objective function. Furthermore, different from the traditional convergence analysis in the existing nonconvex stochastic optimizations, a global convergence property with respect to the generated sequences is established under the assumption: the Kurdyka-Łojasiewicz property together with the continuity and differentiability of the concave part in objective function. To the best of our knowledge, this is the first time that the acceleration trick is incorporated into nonconvex nonsmooth SDC programming. Finally, extended experimental results show the superiority of our proposed algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Gasso G, Rakotomamonjy A, Canu S (2009) Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans Signal Process. 57(12):4686–4698MathSciNetCrossRef Gasso G, Rakotomamonjy A, Canu S (2009) Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans Signal Process. 57(12):4686–4698MathSciNetCrossRef
2.
go back to reference Zhang S, Xin J (2014) Minimization of transformed \(l_1\) penalty: Theory, difference of convex function algorithm, and robust application in compressed sensing. Math Program. 169(3):1–30 Zhang S, Xin J (2014) Minimization of transformed \(l_1\) penalty: Theory, difference of convex function algorithm, and robust application in compressed sensing. Math Program. 169(3):1–30
3.
go back to reference Le Thi HA, Le HM, Pham Dinh T (2015) Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach Learn. 101(1–3):163–186MathSciNetCrossRef Le Thi HA, Le HM, Pham Dinh T (2015) Feature selection in machine learning: an exact penalty approach using a difference of convex function algorithm. Mach Learn. 101(1–3):163–186MathSciNetCrossRef
4.
go back to reference PhamDinh T, LeThi HA (1997) Convex analysis approach to DC programming: theory, algorithms and applications, Acta Math Vietnam 22 (1): 289–355MathSciNetMATH PhamDinh T, LeThi HA (1997) Convex analysis approach to DC programming: theory, algorithms and applications, Acta Math Vietnam 22 (1): 289–355MathSciNetMATH
5.
go back to reference Wen B, Chen X, Pong T (2018) A proximal difference-of-convex algorithm with extrapolation. Comput Optimiz Appl 69(2):297–324MathSciNetCrossRef Wen B, Chen X, Pong T (2018) A proximal difference-of-convex algorithm with extrapolation. Comput Optimiz Appl 69(2):297–324MathSciNetCrossRef
6.
go back to reference PhamDinh T, LeThi HA (1998) Optimization algorithm for solving the trust-region subproblem, SIAM J Optimiz 8 (2): 476–505MathSciNetCrossRef PhamDinh T, LeThi HA (1998) Optimization algorithm for solving the trust-region subproblem, SIAM J Optimiz 8 (2): 476–505MathSciNetCrossRef
7.
go back to reference Ahn M, Pang J, Xin J (2017) Difference-of-convex learning: directional stationarity, optimality, and sparsity, SIAM J Optimiz 27 (3): 1637–1665MathSciNetCrossRef Ahn M, Pang J, Xin J (2017) Difference-of-convex learning: directional stationarity, optimality, and sparsity, SIAM J Optimiz 27 (3): 1637–1665MathSciNetCrossRef
8.
9.
go back to reference Pham DN (2016) DCA based algorithms for learning with sparsity in high dimensional setting and stochastical learning, Ph. D. thesis, University of Lorraine Pham DN (2016) DCA based algorithms for learning with sparsity in high dimensional setting and stochastical learning, Ph. D. thesis, University of Lorraine
10.
go back to reference Le Thi HA, Le HM, Phan DN, et al (2017) Stochastic DCA for sparse multiclass logistic regression. In: Advances in Intelligent Systems and Computing Le Thi HA, Le HM, Phan DN, et al (2017) Stochastic DCA for sparse multiclass logistic regression. In: Advances in Intelligent Systems and Computing
11.
go back to reference Le Thi HA, Le HM, Phan DN, et al (2017) Stochastic DCA for the large-sum of non-convex functions problem and its application to group variable selection in classification. In: International Conference on Machine Learning Le Thi HA, Le HM, Phan DN, et al (2017) Stochastic DCA for the large-sum of non-convex functions problem and its application to group variable selection in classification. In: International Conference on Machine Learning
12.
go back to reference Le Thi HA, Huynh VN, Pham Dinh T (2019) Stochastic difference-of-convex algorithms for solving nonconvex optimization problems. arXiv:1911.04334v1 Le Thi HA, Huynh VN, Pham Dinh T (2019) Stochastic difference-of-convex algorithms for solving nonconvex optimization problems. arXiv:​1911.​04334v1
13.
go back to reference Nemirovski A, Juditsky A, Lan G et al (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609 Nemirovski A, Juditsky A, Lan G et al (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609
14.
go back to reference Roux N, Schmidt M, Bach F (2013) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv Neural Inform Process Syst 4:2663–2671 Roux N, Schmidt M, Bach F (2013) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv Neural Inform Process Syst 4:2663–2671
15.
go back to reference Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advance in Neural Information Processing Systems, pp 315–323 Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Advance in Neural Information Processing Systems, pp 315–323
16.
go back to reference Defazio A, Bach F, Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. AAdv Neural Inform Process Syst 2:1646–1654 Defazio A, Bach F, Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. AAdv Neural Inform Process Syst 2:1646–1654
17.
go back to reference Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J Optimiz 24(4):2057–2075MathSciNetCrossRef Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J Optimiz 24(4):2057–2075MathSciNetCrossRef
18.
go back to reference Xu Y, Qi Q, Lin Q, et al (2019) Stochastic optimization for DC functions and non-smooth non-convex regularizers with non-asymptotic convergence. arXiv:1811.11829v2 Xu Y, Qi Q, Lin Q, et al (2019) Stochastic optimization for DC functions and non-smooth non-convex regularizers with non-asymptotic convergence. arXiv:​1811.​11829v2
19.
go back to reference Nguyen L, Liu J, Scheinberg K, et al (2017) Stochastic recursive gradient algorithm for nonconvex optimization. arXiv:1705.07261v1 Nguyen L, Liu J, Scheinberg K, et al (2017) Stochastic recursive gradient algorithm for nonconvex optimization. arXiv:​1705.​07261v1
20.
go back to reference Nguyen L, Scheinberg K, Taká M (2021) Inexact sarah algorithm for stochastic optimization. Optimiz Methods Softw 36(1):237–258MathSciNetCrossRef Nguyen L, Scheinberg K, Taká M (2021) Inexact sarah algorithm for stochastic optimization. Optimiz Methods Softw 36(1):237–258MathSciNetCrossRef
21.
go back to reference Lei L, Jordan M (2017) Less than a single pass: Stochastically controlled stochastic gradient. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, pp 148–156 Lei L, Jordan M (2017) Less than a single pass: Stochastically controlled stochastic gradient. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, pp 148–156
22.
go back to reference Lu Q, Liao X, Li H, Huang T (2020) A computation-efficient decentralized algorithm for composite constrained optimization, IEEE Trans Signal Inform Process over Netw 6: 774–789MathSciNetCrossRef Lu Q, Liao X, Li H, Huang T (2020) A computation-efficient decentralized algorithm for composite constrained optimization, IEEE Trans Signal Inform Process over Netw 6: 774–789MathSciNetCrossRef
23.
go back to reference Lin Y, Jin X, Chen J et al (2019) An analytic computation-driven algorithm for decentralized multicore systems. Future Gene Comput Syst. 96:101–110CrossRef Lin Y, Jin X, Chen J et al (2019) An analytic computation-driven algorithm for decentralized multicore systems. Future Gene Comput Syst. 96:101–110CrossRef
24.
go back to reference Sodhro AH, Pirbhulal S, de Albuquerque VHC (2019) Artificial intelligence-driven mechanism for edge computing-based industrial applications. IEEE Trans Ind Inform. 15(7):4235–4243CrossRef Sodhro AH, Pirbhulal S, de Albuquerque VHC (2019) Artificial intelligence-driven mechanism for edge computing-based industrial applications. IEEE Trans Ind Inform. 15(7):4235–4243CrossRef
25.
go back to reference Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM J Imaging Sci 2(1):183–202MathSciNetCrossRef Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM J Imaging Sci 2(1):183–202MathSciNetCrossRef
26.
go back to reference Allen-Zhu Z (2017) Katyusha: The first direct acceleration of stochastic gradient methods. In: Acm Sigact Symposium on Theory of Computing Allen-Zhu Z (2017) Katyusha: The first direct acceleration of stochastic gradient methods. In: Acm Sigact Symposium on Theory of Computing
27.
go back to reference Zhou K (2018) Direct acceleration of SAGA using sampled negative momentum. arXiv:1806.11048v4 Zhou K (2018) Direct acceleration of SAGA using sampled negative momentum. arXiv:​1806.​11048v4
28.
go back to reference Nitanda A (2014) Stochastic proximal gradient descent with acceleration techniques. In: Advances in Neural Information Processing Systems Nitanda A (2014) Stochastic proximal gradient descent with acceleration techniques. In: Advances in Neural Information Processing Systems
29.
go back to reference Allen-Zhu Z (2018) Katyusha X: Practical momentum method for stochastic sum-of-nonconvex optimization. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80 Allen-Zhu Z (2018) Katyusha X: Practical momentum method for stochastic sum-of-nonconvex optimization. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80
31.
go back to reference Lan G, Zhou Y (2018) Random gradient extrapolation for distributed and stochastic optimization. SIAM J Optimiz 28(4):2753–2782MathSciNetCrossRef Lan G, Zhou Y (2018) Random gradient extrapolation for distributed and stochastic optimization. SIAM J Optimiz 28(4):2753–2782MathSciNetCrossRef
32.
go back to reference Nesterov Y (2004) Introductory lectures on convex optimizaton: a basic course. Applied Optimization. vol. 87. Kluwer Academic Publishers. LondonCrossRef Nesterov Y (2004) Introductory lectures on convex optimizaton: a basic course. Applied Optimization. vol. 87. Kluwer Academic Publishers. LondonCrossRef
33.
go back to reference Attouch H, Bolte J, Redont P et al (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-lojasiewicz inequality. Mathematics of Operations Research. 35:438–457CrossRef Attouch H, Bolte J, Redont P et al (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-lojasiewicz inequality. Mathematics of Operations Research. 35:438–457CrossRef
34.
go back to reference Bolte J, Sabach S, Teboulle M (2014) Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program 146:459–494MathSciNetCrossRef Bolte J, Sabach S, Teboulle M (2014) Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math Program 146:459–494MathSciNetCrossRef
35.
go back to reference Parikh N, Boyd S (2013) Proximal algorithms. Found Trends Optimiz 1(3):123–231 Parikh N, Boyd S (2013) Proximal algorithms. Found Trends Optimiz 1(3):123–231
36.
go back to reference Shang F, Jiao L, Zhou K et al (2018) ASVRG: Accelerated Proximal SVRG. In: Proceedings of Machine Learning Research, vol 95, pp 815–830 Shang F, Jiao L, Zhou K et al (2018) ASVRG: Accelerated Proximal SVRG. In: Proceedings of Machine Learning Research, vol 95, pp 815–830
37.
go back to reference Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties.J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRef Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties.J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRef
38.
go back to reference Gong P, Zhang C, Lu Z, et al (2013) A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th International Conference on Machine Learning, pp 37–45 Gong P, Zhang C, Lu Z, et al (2013) A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th International Conference on Machine Learning, pp 37–45
39.
go back to reference Pankratov EL, Spagnolo B (2005) Optimization of impurity profile for p-n-junctionin heterostructures.Eur Phys J B 46:15–19CrossRef Pankratov EL, Spagnolo B (2005) Optimization of impurity profile for p-n-junctionin heterostructures.Eur Phys J B 46:15–19CrossRef
40.
go back to reference Giuffrida A, Valenti D, Ziino G et al (2009) A stochastic interspecific competition model to predict the behaviour of listeria monocytogenes in the fermentation process of a traditional sicilian salami. Eur Food Res Technol. 228:767–775CrossRef Giuffrida A, Valenti D, Ziino G et al (2009) A stochastic interspecific competition model to predict the behaviour of listeria monocytogenes in the fermentation process of a traditional sicilian salami. Eur Food Res Technol. 228:767–775CrossRef
41.
go back to reference Denaro G, Valenti D, La Cognata A et al (2013) Spatio-temporal behaviour of the deep chlorophyll maximum in mediterranean sea: development of a stochastic model for picophytoplankton dynamics. Ecol Complex 13:21–34CrossRef Denaro G, Valenti D, La Cognata A et al (2013) Spatio-temporal behaviour of the deep chlorophyll maximum in mediterranean sea: development of a stochastic model for picophytoplankton dynamics. Ecol Complex 13:21–34CrossRef
42.
go back to reference Denaro G, Valenti D, Spagnolo B et al (2013) Dynamics of two picophytoplankton groups in mediterranean sea: analysis of the deep chlorophyll maximum by a stochastic advection-reaction-diffusion model. Plos One 8:e66765CrossRef Denaro G, Valenti D, Spagnolo B et al (2013) Dynamics of two picophytoplankton groups in mediterranean sea: analysis of the deep chlorophyll maximum by a stochastic advection-reaction-diffusion model. Plos One 8:e66765CrossRef
43.
go back to reference Pizzolato N, Fasconaro A, Adorno DP et al (2010) Resonant activation in polymer translocation: new insights into the escape dynamics of molecules driven by an oscillating field. Phys Biol. 7(3):034001CrossRef Pizzolato N, Fasconaro A, Adorno DP et al (2010) Resonant activation in polymer translocation: new insights into the escape dynamics of molecules driven by an oscillating field. Phys Biol. 7(3):034001CrossRef
44.
go back to reference Falci G, La Cognata A, Berritta M et al (2013) Design of a lambda system for population transfer in superconducting nanocircuits. Phys Rev B. 87(13):214515CrossRef Falci G, La Cognata A, Berritta M et al (2013) Design of a lambda system for population transfer in superconducting nanocircuits. Phys Rev B. 87(13):214515CrossRef
45.
go back to reference Mikhaylov AN, Gryaznov EG, Belov AI et al (2016) Field- and irradiation-induced phenomena in memristive nanomaterials. Physica Status Solidi 13:870–881 Mikhaylov AN, Gryaznov EG, Belov AI et al (2016) Field- and irradiation-induced phenomena in memristive nanomaterials. Physica Status Solidi 13:870–881
46.
go back to reference Carollo A, Spagnolo B, Valenti D (2018) Uhlmann curvature in dissipative phase transitions. Sci Rep 8:9852CrossRef Carollo A, Spagnolo B, Valenti D (2018) Uhlmann curvature in dissipative phase transitions. Sci Rep 8:9852CrossRef
47.
go back to reference Spagnolo B, Valenti D (2008) Volatility effects on the escape time in financial market models.Int J Bifurcation and Chaos 18(9):2775–2786CrossRef Spagnolo B, Valenti D (2008) Volatility effects on the escape time in financial market models.Int J Bifurcation and Chaos 18(9):2775–2786CrossRef
Metadata
Title
Accelerated proximal stochastic variance reduction for DC optimization
Authors
Lulu He
Jimin Ye
Jianwei E
Publication date
06-08-2021
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 20/2021
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-021-06348-1

Other articles of this Issue 20/2021

Neural Computing and Applications 20/2021 Go to the issue

Premium Partner