Skip to main content
Log in

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

This paper proposes an inertial Bregman proximal gradient method for minimizing the sum of two possibly nonconvex functions. This method includes two different inertial steps and adopts the Bregman regularization in solving the subproblem. Under some general parameter constraints, we prove the subsequential convergence that each generated sequence converges to the stationary point of the considered problem. To overcome the parameter constraints, we further propose a nonmonotone line search strategy to make the parameter selections more flexible. The subsequential convergence of the proposed method with line search is established. When the line search is monotone, we prove the stronger global convergence and linear convergence rate under Kurdyka–Łojasiewicz framework. Moreover, numerical results on SCAD and MCP nonconvex penalty problems are reported to demonstrate the effectiveness and superiority of the proposed methods and line search strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ahookhosh, M., Themelis, A., Patrinos, P.: A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: superlinear convergence to nonisolated local minima (2019). arXiv:1905.11904

  2. Alecsa, C.D., László, S.C., Pinţa, T.: An extension of the second order dynamical system that models Nesterov’s convex gradient method. Appl. Math. Optim. (2020). https://doi.org/10.1007/s00245-020-09692-1

    Article  Google Scholar 

  3. Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9, 3–11 (2001)

    MathSciNet  MATH  Google Scholar 

  4. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 91–129 (2013)

    MathSciNet  MATH  Google Scholar 

  5. Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24, 232–256 (2014)

    MathSciNet  MATH  Google Scholar 

  6. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)

    MathSciNet  MATH  Google Scholar 

  7. Auslender, A., Teboulle, M.: Projected subgradient methods with non-Euclidean distances for non-differentiable convex minimization and variational inequalities. Math. Program. 120, 27–48 (2009)

    MathSciNet  MATH  Google Scholar 

  8. Bauschke, H.H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182, 1068–1087 (2019)

    MathSciNet  MATH  Google Scholar 

  9. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)

    MATH  Google Scholar 

  10. Bauschke, H.H., Dao, M.N., Lindstrom, S.B.: Regularizing with Bregman–Moreau envelopes. SIAM J. Optim. 28, 3208–3228 (2018)

    MathSciNet  MATH  Google Scholar 

  11. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    MathSciNet  MATH  Google Scholar 

  12. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  13. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)

    MathSciNet  MATH  Google Scholar 

  14. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First-order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28, 2131–2151 (2018)

    MathSciNet  MATH  Google Scholar 

  15. Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4, 3–25 (2016)

    MathSciNet  MATH  Google Scholar 

  16. Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010)

    MathSciNet  MATH  Google Scholar 

  17. Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28, 1751–1772 (2018)

    MathSciNet  MATH  Google Scholar 

  18. Chen, C., Chan, R.H., Ma, S., Yang, J.: Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imaging Sci. 8, 2239–2267 (2015)

    MathSciNet  MATH  Google Scholar 

  19. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    MathSciNet  MATH  Google Scholar 

  20. Gao, X., Cai, X., Han, D.: A Gauss-Seidel type inertial proximal alternating linearized minimization for a class of nonconvex optimization problems. J. Global Optim. 76, 863–887 (2020)

    MathSciNet  MATH  Google Scholar 

  21. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)

    MathSciNet  MATH  Google Scholar 

  22. Ghayem, F., Sadeghi, M., Babaie-Zadeh, M., Chatterjee, S., Skoglund, M., Jutten, C.: Sparse signal recovery using iterative proximal projection. IEEE Trans. Signal Process. 66, 879–894 (2018)

    MathSciNet  MATH  Google Scholar 

  23. Guo, K., Han, D.: A note on the Douglas–Rachford splitting method for optimization problems involving hypoconvex functions. J. Global Optim. 72, 431–441 (2018)

    MathSciNet  MATH  Google Scholar 

  24. Han, D.: A generalized proximal-point-based prediction-correction method for variational inequality problems. J. Comput. Appl. Math. 221, 183–193 (2008)

    MathSciNet  MATH  Google Scholar 

  25. Hien, L.T.K., Gillis, N., Patrinos, P.: Inertial block mirror descent method for non-convex non-smooth optimization (2019). arXiv:1903.01818

  26. Hsieh, Y.-P., Kao, Y.-C., Mahabadi, R.K., Yurtsever, A., Kyrillidis, A., Cevher, V.: A non-Euclidean gradient descent framework for non-convex matrix factorization. IEEE Trans. Signal Process. 66, 5917–5926 (2018)

    MathSciNet  MATH  Google Scholar 

  27. Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends Mach. Learn. 10, 142–336 (2017)

    MATH  Google Scholar 

  28. Johnstone, P.R., Moulin, P.: Local and global convergence of a general inertial proximal splitting scheme for minimizing composite functions. Comput. Optim. Appl. 67, 259–292 (2017)

    MathSciNet  MATH  Google Scholar 

  29. Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Proceedings of NeurIPS, pp. 379–387 (2015)

  30. Liang, J., Monteiro, R.D., Sim, C.-K.: A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems (2019). arXiv:1905.07010

  31. Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)

    MathSciNet  MATH  Google Scholar 

  32. Lu, C., Tang, J., Yan, S., Lin, Z.: Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm. IEEE Trans. Image Process. 25, 829–839 (2016)

    MathSciNet  MATH  Google Scholar 

  33. Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28, 333–354 (2018)

    MathSciNet  MATH  Google Scholar 

  34. Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155, 447–454 (2003)

    MathSciNet  MATH  Google Scholar 

  35. Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-concave backtracking for inertial Bregman proximal gradient algorithms in non-convex optimization (2019). arXiv:1904.03537

  36. Nesterov, Y.: A method for solving the convex programming problem with convergence rate O(\(1/k^2\)). Soviet Math. Dok. 27, 372–376 (1983)

    MATH  Google Scholar 

  37. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)

    MathSciNet  MATH  Google Scholar 

  38. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7, 1388–1419 (2014)

    MathSciNet  MATH  Google Scholar 

  39. Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181, 244–278 (2019)

    MathSciNet  MATH  Google Scholar 

  40. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 4, 460–489 (2005)

    MathSciNet  MATH  Google Scholar 

  41. Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9, 1756–1787 (2016)

    MathSciNet  MATH  Google Scholar 

  42. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)

    Google Scholar 

  43. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (2009)

    MATH  Google Scholar 

  44. Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170, 67–96 (2018)

    MathSciNet  MATH  Google Scholar 

  45. Themelis, A., Stella, L., Patrinos, P.: Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms. SIAM J. Optim. 28, 2274–2303 (2018)

    MathSciNet  MATH  Google Scholar 

  46. Wen, B., Chen, X., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)

    MathSciNet  MATH  Google Scholar 

  47. Wu, Z., Li, M.: General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optim. Appl. 73, 129–158 (2019)

    MathSciNet  MATH  Google Scholar 

  48. Wu, Z., Li, M., Wang, D.Z., Han, D.: A symmetric alternating direction method of multipliers for separable nonconvex minimization problems. Asia Pac. J. Oper. Res. 34(6), 1750030 (2017)

    MathSciNet  MATH  Google Scholar 

  49. Yang, L.: Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems (2018). arXiv:1711.06831

  50. Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1, 143–168 (2008)

    MathSciNet  MATH  Google Scholar 

  51. Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)

    MathSciNet  MATH  Google Scholar 

  52. Zhang, X., Barrio, R., Martínez, M.A., Jiang, H., Cheng, L.: Bregman proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. IEEE Access 7, 126515–126529 (2019)

    Google Scholar 

Download references

Acknowledgements

The authors are grateful to the anonymous referees for their valuable comments, which largely improve the quality of this paper. This work was supported by Startup Foundation for Introducing Talent of NUIST grant 2020r003, National Natural Science Foundation of China grant 11771078, Natural Science Foundation of Jiangsu Province grant BK20181258, and National Research Foundation of Singapore grant NRF-RSS2016-004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chongshou Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 A.1 Proof of Lemma 3

Proof

Since \(\nabla f\) is Lipschitz continuous with the modulus \(L_f\), it follows from Lemma 1 that

$$\begin{aligned} f(x^{k+1}) - f(z^{k})\le \langle \nabla f(z^{k}),x^{k+1}-z^{k}\rangle +\frac{L_f}{2}\Vert x^{k+1}-z^{k}\Vert ^2. \end{aligned}$$
(30)

On the other hand, recall that \(f=f_1-f_2\) with \(f_1\) and \(f_2\) being convex, and their gradients \(\nabla f_1\) and \(\nabla f_2\) being Lipschitz continuous with moduli \(L_f\) and \(l_f\), respectively, thus we have

$$\begin{aligned} {\left\{ \begin{array}{ll} f_1(z^{k})+ \langle \nabla f_1(z^{k}),x^k-z^{k}\rangle \le f_1(x^k), \\ f_2(x^k)-f_2(z^{k})-\langle \nabla f_2(z^{k}),x^k-z^{k}\rangle \le \frac{l_f}{2}\Vert x^k-z^{k}\Vert ^2. \end{array}\right. } \end{aligned}$$

Summing the above two inequalities, and associating with \(f=f_1-f_2\) and \(\nabla f=\nabla f_1-\nabla f_2\), we get

$$\begin{aligned} f(z^{k})-f(x^k) \le \langle \nabla f(z^{k}),z^{k}-x^k\rangle +\frac{l_f}{2}\Vert x^k-z^k\Vert ^2. \end{aligned}$$
(31)

Next, from the minimization subproblem (9c), and the fact that \(D_h(x^{k},x^k)=0\) by the definition in (8), we have

$$\begin{aligned}&g(x^{k+1})+\left\langle \nabla f(z^{k})-\frac{1}{\lambda _k}(y^k-x^k),x^{k+1}\right\rangle +\frac{1}{\lambda _k}D_h(x^{k+1},x^k)\\&\quad \le g(x^k)+\left\langle \nabla f(z^{k})-\frac{1}{\lambda _k}(y^k-x^k),x^k\right\rangle . \end{aligned}$$

Rearranging the terms, we obtain

$$\begin{aligned} g(x^{k+1})-g(x^k)\le \left\langle \nabla f(z^{k})-\frac{1}{\lambda _k}(y^k-x^k),x^k-x^{k+1}\right\rangle -\frac{1}{\lambda _k}D_h(x^{k+1},x^k). \end{aligned}$$
(32)

Combining the inequalities (30), (31) and (32) and recalling \(F:=f+g\), we obtain

$$\begin{aligned}&F(x^{k+1})-F(x^k)\nonumber \\&\quad =f(x^{k+1})-f(z^{k})+f(z^{k})-f(x^k)+g(x^{k+1})-g(x^k)\nonumber \\&\quad \le \frac{1}{\lambda _k}\langle y^{k}-x^{k},x^{k+1}-x^k\rangle +\frac{L_f}{2}\Vert x^{k+1}-z^{k}\Vert ^2+\frac{l_f}{2}\Vert x^k-z^{k}\Vert ^2-\frac{1}{\lambda _k}D_h(x^{k+1},x^k). \end{aligned}$$
(33)

Next, recall that \(\varDelta _k:=x^k-x^{k-1}\) in (13), then it follows from (9a) and (9b) that

$$\begin{aligned} y^{k}-x^{k}=\alpha _k \varDelta _k,\quad z^{k}-x^k=\beta _k\varDelta _k \end{aligned}$$

and

$$\begin{aligned} z^{k}-x^{k+1}=\beta _k \varDelta _k-\varDelta _{k+1}. \end{aligned}$$

Substituting the above equalities into (33), we have

$$\begin{aligned}&F(x^{k+1})-F(x^k) \nonumber \\&\quad \le \frac{1}{\lambda _k}\langle \alpha _k\varDelta _k, \varDelta _{k+1}\rangle +\frac{L_f}{2}\Vert \beta _k \varDelta _k-\varDelta _{k+1}\Vert ^2 +\frac{l_f}{2}\Vert \beta _k\varDelta _k\Vert ^2-\frac{1}{\lambda _k}D_h(x^{k+1},x^k)\nonumber \\&\quad \le \frac{1}{\lambda _k}\langle \alpha _k\varDelta _k, \varDelta _{k+1}\rangle +\frac{L_f}{2}\Vert \beta _k \varDelta _k-\varDelta _{k+1}\Vert ^2 +\frac{l_f}{2}\Vert \beta _k\varDelta _k\Vert ^2-\frac{\sigma _h}{2\lambda _k}\Vert \varDelta _{k+1}\Vert ^2\nonumber \\&\quad = \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}\right) \Vert \varDelta _{k+1}\Vert ^2+\frac{L_f+l_f}{2}\beta _k^2\Vert \varDelta _{k}\Vert ^2+\left( \frac{\alpha _k}{\lambda _k}-\beta _k L_f\right) \langle \varDelta _k,\varDelta _{k+1}\rangle \nonumber \\&\quad \le \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k}{2\lambda _k}-\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k+1}\Vert ^2+\left( \frac{L_f+l_f}{2}\beta _k^2+\frac{\alpha _k}{2\lambda _k} -\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k}\Vert ^2, \end{aligned}$$
(34)

where the second inequality follows from the \(\sigma _h\)-strong convexity of h, i.e.,

$$\begin{aligned} D_h(x^{k+1},x^k)\ge \frac{\sigma _h}{2}\Vert x^{k+1}-x^k\Vert ^2 \end{aligned}$$

and the last inequality follows from \(\beta _k L_f \le \frac{\alpha _k}{\lambda _k}\) (see Assumption 2) and

$$\begin{aligned} \langle \varDelta _k,\varDelta _{k+1}\rangle \le \frac{\Vert \varDelta _k\Vert ^2}{2} +\frac{\Vert \varDelta _{k+1}\Vert ^2}{2}. \end{aligned}$$

Then, by the definition of \(\delta _k\) in (14) and the nondecreasing behavior of \(\{\lambda _k\}\) in Assumption 2, we get

$$\begin{aligned}&(F(x^{k+1})+\delta _{k+1}\Vert \varDelta _{k+1}\Vert ^2) -(F(x^k)+\delta _k\Vert \varDelta _{k}\Vert ^2) \\&\quad \le \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k}{2\lambda _k} +\frac{\alpha _{k+1}}{2\lambda _{k+1}}-\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k+1}\Vert ^2+\frac{\beta _k^2(L_f+l_f)-\beta _k L_f}{2} \Vert \varDelta _k\Vert ^2\\&\quad \le \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k +\alpha _{k+1}}{2\lambda _k}-\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k+1}\Vert ^2+\frac{\beta _k^2(L_f+l_f)-\beta _k L_f}{2} \Vert \varDelta _k\Vert ^2. \end{aligned}$$

From the definition of \(H_{\delta _k}(x^{k},x^{k-1})\) in (14) and the condition \(\beta _k \le {L_f}/{(L_f+l_f)}\) in Assumption 2, we obtain

$$\begin{aligned}&(F(x^{k+1})+\delta _{k+1}\Vert \varDelta _{k+1}\Vert ^2) -(F(x^k)+\delta _k\Vert \varDelta _{k}\Vert ^2) \nonumber \\&\quad \le \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k +\alpha _{k+1}}{2\lambda _k}-\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k+1}\Vert ^2. \end{aligned}$$
(35)

Moreover, by the condition (10), it holds that

$$\begin{aligned} (1-\beta _k)\lambda _k L_f\le \sigma _h-\alpha _k-\alpha _{k+1}-\varepsilon . \end{aligned}$$

Then, we have

$$\begin{aligned} \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k+\alpha _{k+1}}{2\lambda _k} -\frac{\beta _kL_f}{2} =\frac{(1-\beta _k)\lambda _kL_f-(\sigma _h-\alpha _k-\alpha _{k+1})}{2\lambda _k} \le -\frac{\varepsilon }{2\lambda _k}. \end{aligned}$$

Together with (35), we get the assertion (12) immediately. Furthermore, it follows from \(-\frac{\varepsilon }{2\lambda _k}<0\) by Assumption 2 that \(\{H_{\delta _k}(x^{k},x^{k-1})\} \) is monotonically nonincreasing. This completes the proof. \(\square \)

1.2 A.2 Proof of Theorem 1

Proof

(i) For any initial point \(x^0\in {\mathbb {R}}^n\), we know that \(x^1\in \mathrm{dom}F\) and thus \(F(x^1)<\infty \). It follows from Lemma 3 that the sequence \(\{H_{\delta _k}(x^{k},x^{k-1})\} \) is monotonically nonincreasing, thus we have

$$\begin{aligned} F(x^k)&\le F(x^k)+\delta _k\Vert \varDelta _k\Vert ^2=H_{\delta _k}(x^k,x^{k-1})\\&\le \cdots \le H_{\delta _1}(x^1,x^{0}){=} F(x^1)+\delta _1\Vert \varDelta _1\Vert ^2<\infty . \end{aligned}$$

Thus the sequence \(\{x^k\}_{k\ge 1}\) is contained in the level set \(\{x\in {\mathbb {R}}^n\,|\,F(x)\le F(x^1)\}\), which is bounded regardless of the choice of \(x^0\) by the coercivity of F in Assumption 1. This is enough for showing that the whole sequence \(\{x^k\}_{k\ge 0}\) is bounded.

(ii) We first conclude that there exists an upper bound \({\bar{\lambda }} = {\sigma _h}/{L_f}\) such that \(\lambda _k\le {\bar{\lambda }}\) for any \(k \ge 0\) from the condition (10). Otherwise, there is a \(k_0 \ge 0\) such that for any \(k \ge k_0\)

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\alpha _k}{\beta _k L_f}> \frac{\sigma _h}{L_f}, \\ \frac{\sigma _h-\alpha _k-\alpha _{k+1}-\varepsilon }{(1-\beta _k)L_f} > \frac{\sigma _h}{L_f}, \end{array}\right. \Rightarrow \quad \frac{\alpha _k+\alpha _{k+1}+\varepsilon }{\sigma _h}< \beta _k < \frac{\alpha _k}{\sigma _h}. \end{aligned}$$

Obviously, there is no \(\beta _k\) satisfying the above inequalities. Therefore, we have \(\lambda _k\le {\bar{\lambda }}=\sigma _h/L_f\) and it holds that \(\frac{\varepsilon }{2\lambda _k}\ge \frac{\varepsilon }{2{\bar{\lambda }}}\). Summing up (12) with \(k=1,2,\ldots ,N\) and combining the inequality \(\frac{\varepsilon }{2\lambda _k}\ge \frac{\varepsilon }{2{\bar{\lambda }}}\) yield that

$$\begin{aligned} \sum _{k=1}^N \frac{\varepsilon }{2{\bar{\lambda }}}\Vert \varDelta _{k+1}\Vert ^2&\le \sum _{k=1}^N \left( H_{\delta _k}(x^{k},x^{k-1})-H_{\delta _{k+1}}(x^{k+1},x^k)\right) \\&=H_{\delta _1}(x^{1},x^{0})-H_{\delta _{N+1}}(x^{N+1},x^N)\\&=F(x^1)+\delta _1\Vert \varDelta _{1}\Vert ^2-H_{\delta _{N+1}}(x^{N+1},x^N)\\&\le F(x^1)+\delta _1\Vert \varDelta _{1}\Vert ^2-{\underline{F}}<\infty . \end{aligned}$$

Letting N tend to \(\infty \) and recalling that \(\varepsilon >0\) and \({\bar{\lambda }} >0\), the assertion (ii) follows directly.

(iii) From the definition of \(H_{\delta _k}(x^k,x^{k-1})\) in (14), we know that \(\{H_{\delta _k}(x^k,x^{k-1})\}\) is bounded from below since \(F\ge {\underline{F}}>-\infty \). On the other hand, it follows from Lemma 3 that \(\{H_{\delta _k}(x^k,x^{k-1})\}\) is monotonically nonincreasing, hence \(\{H_{\delta _k}(x^k,x^{k-1})\}\) is convergent. This together with the boundedness of \(\delta _k\),

$$\begin{aligned} H_{\delta _k}(x^k, x^{k-1}):= F(x^k)+\delta _k\Vert x^k-x^{k-1}\Vert ^2 \end{aligned}$$

and \(\lim _{k\rightarrow \infty }\Vert \varDelta _k\Vert =0\) in the assertion (ii), we obtain that \(\{F(x^k)\}\) is convergent.

(iv) Since \(\{x^k\}\) is bounded, there exists at least one cluster point. Let \({\tilde{x}}\) be a cluster point and \(\{x^{k_n}\}\) be the subsequence of \(\{x^k\}\) such that \(\lim _{n\rightarrow \infty }x^{k_n}={\tilde{x}}\). From the iterative step (9c), we know that for any integer k,

$$\begin{aligned} x^{k+1}\in \arg \min \left\{ g(x)+\left\langle x-x^k,\nabla f(z^k)-\frac{1}{\lambda _k}(y^k-x^k)\right\rangle +\frac{1}{\lambda _k}D_h(x,x^k)\right\} . \end{aligned}$$

Then, we have

$$\begin{aligned}&g(x^{k+1})+\left\langle x^{k+1}-x^k,\nabla f(z^{k})-\frac{1}{\lambda _k}(y^k-x^k)\right\rangle +\frac{1}{\lambda _k}D_h(x^{k+1},x^k)\\&\quad \le g({\tilde{x}})+\left\langle {\tilde{x}}-x^k,\nabla f(z^k)-\frac{1}{\lambda _k}(y^k-x^k)\right\rangle +\frac{1}{\lambda _k}D_h({\tilde{x}},x^k). \end{aligned}$$

Using the fact \(D_h(x^{k+1},x^k)\ge \frac{\sigma _h}{2}\Vert x^{k+1}-x^k\Vert ^2\) and choosing \(k:=k_n-1\,(n\ge 1)\), we obtain

$$\begin{aligned}&g(x^{k_n})+\left\langle x^{k_n}-x^{k_n-1},\nabla f(z^{k_n-1})-\frac{1}{\lambda _{k_n-1}}(y^{k_n-1}-x^{k_n-1})\right\rangle +\frac{\sigma _h}{2\lambda _{k_n-1}}\Vert x^{k_n}-x^{k_n-1}\Vert ^2\\&\quad \le g({\tilde{x}})+\left\langle {\tilde{x}}-x^{k_n-1},\nabla f(z^{k_n-1})-\frac{1}{\lambda _{k_n-1}}(y^{k_n-1}-x^{k_n-1})\right\rangle +\frac{1}{\lambda _{k_n-1}}D_h({\tilde{x}},x^{k_n-1}). \end{aligned}$$

By taking the superior limit above as \(n\rightarrow \infty \), and using the fact in (ii), the continuity of \(\nabla f\) and h, \(x^{k_n}\rightarrow {\tilde{x}}\) as \(n\rightarrow \infty \), and the fact that \(\{z^k\}\) is bounded and so is \(\nabla f(z^k)\), we have

$$\begin{aligned} \lim \sup _{n\rightarrow \infty } g(x^{k_n}) \le g({\tilde{x}}). \end{aligned}$$

On the other hand, by the lower semicontinuity of g, we get

$$\begin{aligned} g({\tilde{x}})\le \lim \inf _{n\rightarrow \infty }g(x^{k_n}). \end{aligned}$$

Hence, we have \( \lim _{n\rightarrow \infty } g(x^{k_n}) = g({\tilde{x}}).\) Together with the definition of F in (1) and the continuity of f, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }F(x^{k_n}) =F({\tilde{x}}). \end{aligned}$$
(36)

Define \(\xi ^{k_n}:= -\frac{1}{\lambda _{k_n-1}}\left[ (x^{k_n-1}-y^{k_n-1})-(\nabla h(x^{k_n})-\nabla h(x^{k_n-1}))\right] +\nabla f(x^{k_n})-\nabla f(z^{k_n-1}),\,n\ge 1\). Then, by the optimality condition (11), we have

$$\begin{aligned} \xi ^{k_n} \in \nabla f(x^{k_n})+\partial g(x^{k_n}). \end{aligned}$$
(37)

Recalling (9a) and (9b), \(\lim _{n\rightarrow \infty }\Vert \varDelta _{k_n}\Vert =0\) by statement (ii), the continuity of \(\nabla h\) and \(\nabla f\), and the boundedness of \(\{\alpha _{k}\}\), \(\{\beta _{k_n}\}\) and \(\{\lambda _{k_n}\}\) by Assumption 2, we get \(\lim _{n\rightarrow \infty }\xi ^{k_n}=0\). Together with \(\lim _{n\rightarrow \infty }x^{k_n}= {\tilde{x}}\), (36) and (37), we obtain

$$\begin{aligned} 0\in \nabla f({\tilde{x}})+\partial g({\tilde{x}}), \end{aligned}$$

which implies that \({\tilde{x}}\in \mathrm{crit}F\). Furthermore, it is easy to derive that F is finite and constant on the set of all cluster point \({\tilde{x}}\) from Assumption 1 and the above conclusions as that in [13, Lemma 5]. This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Z., Li, C., Li, M. et al. Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems. J Glob Optim 79, 617–644 (2021). https://doi.org/10.1007/s10898-020-00943-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-020-00943-7

Keywords

Navigation