Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Wu, Zhongming; Li, Chongshou; Li, Min; Lim, Andrew

doi:10.1007/s10898-020-00943-7

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Published: 19 August 2020

Volume 79, pages 617–644, (2021)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

Zhongming Wu¹,
Chongshou Li ORCID: orcid.org/0000-0002-7595-0997²,
Min Li³ &
…
Andrew Lim²

1516 Accesses
18 Citations
Explore all metrics

Abstract

This paper proposes an inertial Bregman proximal gradient method for minimizing the sum of two possibly nonconvex functions. This method includes two different inertial steps and adopts the Bregman regularization in solving the subproblem. Under some general parameter constraints, we prove the subsequential convergence that each generated sequence converges to the stationary point of the considered problem. To overcome the parameter constraints, we further propose a nonmonotone line search strategy to make the parameter selections more flexible. The subsequential convergence of the proposed method with line search is established. When the line search is monotone, we prove the stronger global convergence and linear convergence rate under Kurdyka–Łojasiewicz framework. Moreover, numerical results on SCAD and MCP nonconvex penalty problems are reported to demonstrate the effectiveness and superiority of the proposed methods and line search strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An alternating structure-adapted Bregman proximal gradient descent algorithm for constrained nonconvex nonsmooth optimization problems and its inertial variant

Article 24 June 2023

General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems

Article 18 February 2019

A nonmonotone accelerated proximal gradient method with variable stepsize strategy for nonsmooth and nonconvex minimization problems

Article 05 March 2024

References

Ahookhosh, M., Themelis, A., Patrinos, P.: A Bregman forward-backward linesearch algorithm for nonconvex composite optimization: superlinear convergence to nonisolated local minima (2019). arXiv:1905.11904
Alecsa, C.D., László, S.C., Pinţa, T.: An extension of the second order dynamical system that models Nesterov’s convex gradient method. Appl. Math. Optim. (2020). https://doi.org/10.1007/s00245-020-09692-1
Article Google Scholar
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set Valued Anal. 9, 3–11 (2001)
MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 91–129 (2013)
MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J., Redont, P.: A dynamical approach to an inertial forward–backward algorithm for convex minimization. SIAM J. Optim. 24, 232–256 (2014)
MathSciNet MATH Google Scholar
Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)
MathSciNet MATH Google Scholar
Auslender, A., Teboulle, M.: Projected subgradient methods with non-Euclidean distances for non-differentiable convex minimization and variational inequalities. Math. Program. 120, 27–48 (2009)
MathSciNet MATH Google Scholar
Bauschke, H.H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182, 1068–1087 (2019)
MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
MATH Google Scholar
Bauschke, H.H., Dao, M.N., Lindstrom, S.B.: Regularizing with Bregman–Moreau envelopes. SIAM J. Optim. 28, 3208–3228 (2018)
MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First-order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28, 2131–2151 (2018)
MathSciNet MATH Google Scholar
Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4, 3–25 (2016)
MathSciNet MATH Google Scholar
Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010)
MathSciNet MATH Google Scholar
Carmon, Y., Duchi, J.C., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28, 1751–1772 (2018)
MathSciNet MATH Google Scholar
Chen, C., Chan, R.H., Ma, S., Yang, J.: Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imaging Sci. 8, 2239–2267 (2015)
MathSciNet MATH Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
MathSciNet MATH Google Scholar
Gao, X., Cai, X., Han, D.: A Gauss-Seidel type inertial proximal alternating linearized minimization for a class of nonconvex optimization problems. J. Global Optim. 76, 863–887 (2020)
MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156, 59–99 (2016)
MathSciNet MATH Google Scholar
Ghayem, F., Sadeghi, M., Babaie-Zadeh, M., Chatterjee, S., Skoglund, M., Jutten, C.: Sparse signal recovery using iterative proximal projection. IEEE Trans. Signal Process. 66, 879–894 (2018)
MathSciNet MATH Google Scholar
Guo, K., Han, D.: A note on the Douglas–Rachford splitting method for optimization problems involving hypoconvex functions. J. Global Optim. 72, 431–441 (2018)
MathSciNet MATH Google Scholar
Han, D.: A generalized proximal-point-based prediction-correction method for variational inequality problems. J. Comput. Appl. Math. 221, 183–193 (2008)
MathSciNet MATH Google Scholar
Hien, L.T.K., Gillis, N., Patrinos, P.: Inertial block mirror descent method for non-convex non-smooth optimization (2019). arXiv:1903.01818
Hsieh, Y.-P., Kao, Y.-C., Mahabadi, R.K., Yurtsever, A., Kyrillidis, A., Cevher, V.: A non-Euclidean gradient descent framework for non-convex matrix factorization. IEEE Trans. Signal Process. 66, 5917–5926 (2018)
MathSciNet MATH Google Scholar
Jain, P., Kar, P.: Non-convex optimization for machine learning. Found. Trends Mach. Learn. 10, 142–336 (2017)
MATH Google Scholar
Johnstone, P.R., Moulin, P.: Local and global convergence of a general inertial proximal splitting scheme for minimizing composite functions. Comput. Optim. Appl. 67, 259–292 (2017)
MathSciNet MATH Google Scholar
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Proceedings of NeurIPS, pp. 379–387 (2015)
Liang, J., Monteiro, R.D., Sim, C.-K.: A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems (2019). arXiv:1905.07010
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
MathSciNet MATH Google Scholar
Lu, C., Tang, J., Yan, S., Lin, Z.: Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm. IEEE Trans. Image Process. 25, 829–839 (2016)
MathSciNet MATH Google Scholar
Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28, 333–354 (2018)
MathSciNet MATH Google Scholar
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 155, 447–454 (2003)
MathSciNet MATH Google Scholar
Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-concave backtracking for inertial Bregman proximal gradient algorithms in non-convex optimization (2019). arXiv:1904.03537
Nesterov, Y.: A method for solving the convex programming problem with convergence rate O($1/k^2$). Soviet Math. Dok. 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)
MathSciNet MATH Google Scholar
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7, 1388–1419 (2014)
MathSciNet MATH Google Scholar
Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181, 244–278 (2019)
MathSciNet MATH Google Scholar
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 4, 460–489 (2005)
MathSciNet MATH Google Scholar
Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM J. Imaging Sci. 9, 1756–1787 (2016)
MathSciNet MATH Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4, 1–17 (1964)
Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (2009)
MATH Google Scholar
Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170, 67–96 (2018)
MathSciNet MATH Google Scholar
Themelis, A., Stella, L., Patrinos, P.: Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms. SIAM J. Optim. 28, 2274–2303 (2018)
MathSciNet MATH Google Scholar
Wen, B., Chen, X., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)
MathSciNet MATH Google Scholar
Wu, Z., Li, M.: General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optim. Appl. 73, 129–158 (2019)
MathSciNet MATH Google Scholar
Wu, Z., Li, M., Wang, D.Z., Han, D.: A symmetric alternating direction method of multipliers for separable nonconvex minimization problems. Asia Pac. J. Oper. Res. 34(6), 1750030 (2017)
MathSciNet MATH Google Scholar
Yang, L.: Proximal gradient method with extrapolation and line search for a class of nonconvex and nonsmooth problems (2018). arXiv:1711.06831
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for $\ell _1$-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1, 143–168 (2008)
MathSciNet MATH Google Scholar
Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
MathSciNet MATH Google Scholar
Zhang, X., Barrio, R., Martínez, M.A., Jiang, H., Cheng, L.: Bregman proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. IEEE Access 7, 126515–126529 (2019)
Google Scholar

Download references

Acknowledgements

The authors are grateful to the anonymous referees for their valuable comments, which largely improve the quality of this paper. This work was supported by Startup Foundation for Introducing Talent of NUIST grant 2020r003, National Natural Science Foundation of China grant 11771078, Natural Science Foundation of Jiangsu Province grant BK20181258, and National Research Foundation of Singapore grant NRF-RSS2016-004.

Author information

Authors and Affiliations

School of Management Science and Engineering, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Zhongming Wu
Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore, 117576, Singapore
Chongshou Li & Andrew Lim
School of Management and Engineering, Nanjing University, Nanjing, 210093, China
Min Li

Authors

Zhongming Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chongshou Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Li
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Lim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chongshou Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 A.1 Proof of Lemma 3

Proof

Since $\nabla f$ is Lipschitz continuous with the modulus $L_f$, it follows from Lemma 1 that

$$\begin{aligned} f(x^{k+1}) - f(z^{k})\le \langle \nabla f(z^{k}),x^{k+1}-z^{k}\rangle +\frac{L_f}{2}\Vert x^{k+1}-z^{k}\Vert ^2. \end{aligned}$$

(30)

On the other hand, recall that $f=f_1-f_2$ with $f_1$ and $f_2$ being convex, and their gradients $\nabla f_1$ and $\nabla f_2$ being Lipschitz continuous with moduli $L_f$ and $l_f$, respectively, thus we have

$$\begin{aligned} {\left\{ \begin{array}{ll} f_1(z^{k})+ \langle \nabla f_1(z^{k}),x^k-z^{k}\rangle \le f_1(x^k), \\ f_2(x^k)-f_2(z^{k})-\langle \nabla f_2(z^{k}),x^k-z^{k}\rangle \le \frac{l_f}{2}\Vert x^k-z^{k}\Vert ^2. \end{array}\right. } \end{aligned}$$

Summing the above two inequalities, and associating with $f=f_1-f_2$ and $\nabla f=\nabla f_1-\nabla f_2$, we get

$$\begin{aligned} f(z^{k})-f(x^k) \le \langle \nabla f(z^{k}),z^{k}-x^k\rangle +\frac{l_f}{2}\Vert x^k-z^k\Vert ^2. \end{aligned}$$

(31)

Next, from the minimization subproblem (9c), and the fact that $D_h(x^{k},x^k)=0$ by the definition in (8), we have

$$\begin{aligned}&g(x^{k+1})+\left\langle \nabla f(z^{k})-\frac{1}{\lambda _k}(y^k-x^k),x^{k+1}\right\rangle +\frac{1}{\lambda _k}D_h(x^{k+1},x^k)\\&\quad \le g(x^k)+\left\langle \nabla f(z^{k})-\frac{1}{\lambda _k}(y^k-x^k),x^k\right\rangle . \end{aligned}$$

Rearranging the terms, we obtain

$$\begin{aligned} g(x^{k+1})-g(x^k)\le \left\langle \nabla f(z^{k})-\frac{1}{\lambda _k}(y^k-x^k),x^k-x^{k+1}\right\rangle -\frac{1}{\lambda _k}D_h(x^{k+1},x^k). \end{aligned}$$

(32)

Combining the inequalities (30), (31) and (32) and recalling $F:=f+g$, we obtain

$$\begin{aligned}&F(x^{k+1})-F(x^k)\nonumber \\&\quad =f(x^{k+1})-f(z^{k})+f(z^{k})-f(x^k)+g(x^{k+1})-g(x^k)\nonumber \\&\quad \le \frac{1}{\lambda _k}\langle y^{k}-x^{k},x^{k+1}-x^k\rangle +\frac{L_f}{2}\Vert x^{k+1}-z^{k}\Vert ^2+\frac{l_f}{2}\Vert x^k-z^{k}\Vert ^2-\frac{1}{\lambda _k}D_h(x^{k+1},x^k). \end{aligned}$$

(33)

Next, recall that $\varDelta _k:=x^k-x^{k-1}$ in (13), then it follows from (9a) and (9b) that

$$\begin{aligned} y^{k}-x^{k}=\alpha _k \varDelta _k,\quad z^{k}-x^k=\beta _k\varDelta _k \end{aligned}$$

and

$$\begin{aligned} z^{k}-x^{k+1}=\beta _k \varDelta _k-\varDelta _{k+1}. \end{aligned}$$

Substituting the above equalities into (33), we have

$$\begin{aligned}&F(x^{k+1})-F(x^k) \nonumber \\&\quad \le \frac{1}{\lambda _k}\langle \alpha _k\varDelta _k, \varDelta _{k+1}\rangle +\frac{L_f}{2}\Vert \beta _k \varDelta _k-\varDelta _{k+1}\Vert ^2 +\frac{l_f}{2}\Vert \beta _k\varDelta _k\Vert ^2-\frac{1}{\lambda _k}D_h(x^{k+1},x^k)\nonumber \\&\quad \le \frac{1}{\lambda _k}\langle \alpha _k\varDelta _k, \varDelta _{k+1}\rangle +\frac{L_f}{2}\Vert \beta _k \varDelta _k-\varDelta _{k+1}\Vert ^2 +\frac{l_f}{2}\Vert \beta _k\varDelta _k\Vert ^2-\frac{\sigma _h}{2\lambda _k}\Vert \varDelta _{k+1}\Vert ^2\nonumber \\&\quad = \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}\right) \Vert \varDelta _{k+1}\Vert ^2+\frac{L_f+l_f}{2}\beta _k^2\Vert \varDelta _{k}\Vert ^2+\left( \frac{\alpha _k}{\lambda _k}-\beta _k L_f\right) \langle \varDelta _k,\varDelta _{k+1}\rangle \nonumber \\&\quad \le \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k}{2\lambda _k}-\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k+1}\Vert ^2+\left( \frac{L_f+l_f}{2}\beta _k^2+\frac{\alpha _k}{2\lambda _k} -\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k}\Vert ^2, \end{aligned}$$

(34)

where the second inequality follows from the $\sigma _h$-strong convexity of h, i.e.,

$$\begin{aligned} D_h(x^{k+1},x^k)\ge \frac{\sigma _h}{2}\Vert x^{k+1}-x^k\Vert ^2 \end{aligned}$$

and the last inequality follows from $\beta _k L_f \le \frac{\alpha _k}{\lambda _k}$ (see Assumption 2) and

$$\begin{aligned} \langle \varDelta _k,\varDelta _{k+1}\rangle \le \frac{\Vert \varDelta _k\Vert ^2}{2} +\frac{\Vert \varDelta _{k+1}\Vert ^2}{2}. \end{aligned}$$

Then, by the definition of $\delta _k$ in (14) and the nondecreasing behavior of $\{\lambda _k\}$ in Assumption 2, we get

$$\begin{aligned}&(F(x^{k+1})+\delta _{k+1}\Vert \varDelta _{k+1}\Vert ^2) -(F(x^k)+\delta _k\Vert \varDelta _{k}\Vert ^2) \\&\quad \le \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k}{2\lambda _k} +\frac{\alpha _{k+1}}{2\lambda _{k+1}}-\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k+1}\Vert ^2+\frac{\beta _k^2(L_f+l_f)-\beta _k L_f}{2} \Vert \varDelta _k\Vert ^2\\&\quad \le \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k +\alpha _{k+1}}{2\lambda _k}-\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k+1}\Vert ^2+\frac{\beta _k^2(L_f+l_f)-\beta _k L_f}{2} \Vert \varDelta _k\Vert ^2. \end{aligned}$$

From the definition of $H_{\delta _k}(x^{k},x^{k-1})$ in (14) and the condition $\beta _k \le {L_f}/{(L_f+l_f)}$ in Assumption 2, we obtain

$$\begin{aligned}&(F(x^{k+1})+\delta _{k+1}\Vert \varDelta _{k+1}\Vert ^2) -(F(x^k)+\delta _k\Vert \varDelta _{k}\Vert ^2) \nonumber \\&\quad \le \left( \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k +\alpha _{k+1}}{2\lambda _k}-\frac{\beta _k L_f}{2}\right) \Vert \varDelta _{k+1}\Vert ^2. \end{aligned}$$

(35)

Moreover, by the condition (10), it holds that

$$\begin{aligned} (1-\beta _k)\lambda _k L_f\le \sigma _h-\alpha _k-\alpha _{k+1}-\varepsilon . \end{aligned}$$

Then, we have

$$\begin{aligned} \frac{L_f}{2}-\frac{\sigma _h}{2\lambda _k}+\frac{\alpha _k+\alpha _{k+1}}{2\lambda _k} -\frac{\beta _kL_f}{2} =\frac{(1-\beta _k)\lambda _kL_f-(\sigma _h-\alpha _k-\alpha _{k+1})}{2\lambda _k} \le -\frac{\varepsilon }{2\lambda _k}. \end{aligned}$$

Together with (35), we get the assertion (12) immediately. Furthermore, it follows from $-\frac{\varepsilon }{2\lambda _k}<0$ by Assumption 2 that $\{H_{\delta _k}(x^{k},x^{k-1})\} $ is monotonically nonincreasing. This completes the proof. $\square $

1.2 A.2 Proof of Theorem 1

Proof

(i) For any initial point $x^0\in {\mathbb {R}}^n$, we know that $x^1\in \mathrm{dom}F$ and thus $F(x^1)<\infty $. It follows from Lemma 3 that the sequence $\{H_{\delta _k}(x^{k},x^{k-1})\} $ is monotonically nonincreasing, thus we have

$$\begin{aligned} F(x^k)&\le F(x^k)+\delta _k\Vert \varDelta _k\Vert ^2=H_{\delta _k}(x^k,x^{k-1})\\&\le \cdots \le H_{\delta _1}(x^1,x^{0}){=} F(x^1)+\delta _1\Vert \varDelta _1\Vert ^2<\infty . \end{aligned}$$

Thus the sequence $\{x^k\}_{k\ge 1}$ is contained in the level set $\{x\in {\mathbb {R}}^n\,|\,F(x)\le F(x^1)\}$, which is bounded regardless of the choice of $x^0$ by the coercivity of F in Assumption 1. This is enough for showing that the whole sequence $\{x^k\}_{k\ge 0}$ is bounded.

(ii) We first conclude that there exists an upper bound ${\bar{\lambda }} = {\sigma _h}/{L_f}$ such that $\lambda _k\le {\bar{\lambda }}$ for any $k \ge 0$ from the condition (10). Otherwise, there is a $k_0 \ge 0$ such that for any $k \ge k_0$

$$\begin{aligned} \left\{ \begin{array}{l} \frac{\alpha _k}{\beta _k L_f}> \frac{\sigma _h}{L_f}, \\ \frac{\sigma _h-\alpha _k-\alpha _{k+1}-\varepsilon }{(1-\beta _k)L_f} > \frac{\sigma _h}{L_f}, \end{array}\right. \Rightarrow \quad \frac{\alpha _k+\alpha _{k+1}+\varepsilon }{\sigma _h}< \beta _k < \frac{\alpha _k}{\sigma _h}. \end{aligned}$$

Obviously, there is no $\beta _k$ satisfying the above inequalities. Therefore, we have $\lambda _k\le {\bar{\lambda }}=\sigma _h/L_f$ and it holds that $\frac{\varepsilon }{2\lambda _k}\ge \frac{\varepsilon }{2{\bar{\lambda }}}$. Summing up (12) with $k=1,2,\ldots ,N$ and combining the inequality $\frac{\varepsilon }{2\lambda _k}\ge \frac{\varepsilon }{2{\bar{\lambda }}}$ yield that

$$\begin{aligned} \sum _{k=1}^N \frac{\varepsilon }{2{\bar{\lambda }}}\Vert \varDelta _{k+1}\Vert ^2&\le \sum _{k=1}^N \left( H_{\delta _k}(x^{k},x^{k-1})-H_{\delta _{k+1}}(x^{k+1},x^k)\right) \\&=H_{\delta _1}(x^{1},x^{0})-H_{\delta _{N+1}}(x^{N+1},x^N)\\&=F(x^1)+\delta _1\Vert \varDelta _{1}\Vert ^2-H_{\delta _{N+1}}(x^{N+1},x^N)\\&\le F(x^1)+\delta _1\Vert \varDelta _{1}\Vert ^2-{\underline{F}}<\infty . \end{aligned}$$

Letting N tend to $\infty $ and recalling that $\varepsilon >0$ and ${\bar{\lambda }} >0$, the assertion (ii) follows directly.

(iii) From the definition of $H_{\delta _k}(x^k,x^{k-1})$ in (14), we know that $\{H_{\delta _k}(x^k,x^{k-1})\}$ is bounded from below since $F\ge {\underline{F}}>-\infty $. On the other hand, it follows from Lemma 3 that $\{H_{\delta _k}(x^k,x^{k-1})\}$ is monotonically nonincreasing, hence $\{H_{\delta _k}(x^k,x^{k-1})\}$ is convergent. This together with the boundedness of $\delta _k$,

$$\begin{aligned} H_{\delta _k}(x^k, x^{k-1}):= F(x^k)+\delta _k\Vert x^k-x^{k-1}\Vert ^2 \end{aligned}$$

and $\lim _{k\rightarrow \infty }\Vert \varDelta _k\Vert =0$ in the assertion (ii), we obtain that $\{F(x^k)\}$ is convergent.

(iv) Since $\{x^k\}$ is bounded, there exists at least one cluster point. Let ${\tilde{x}}$ be a cluster point and $\{x^{k_n}\}$ be the subsequence of $\{x^k\}$ such that $\lim _{n\rightarrow \infty }x^{k_n}={\tilde{x}}$. From the iterative step (9c), we know that for any integer k,

$$\begin{aligned} x^{k+1}\in \arg \min \left\{ g(x)+\left\langle x-x^k,\nabla f(z^k)-\frac{1}{\lambda _k}(y^k-x^k)\right\rangle +\frac{1}{\lambda _k}D_h(x,x^k)\right\} . \end{aligned}$$

Then, we have

$$\begin{aligned}&g(x^{k+1})+\left\langle x^{k+1}-x^k,\nabla f(z^{k})-\frac{1}{\lambda _k}(y^k-x^k)\right\rangle +\frac{1}{\lambda _k}D_h(x^{k+1},x^k)\\&\quad \le g({\tilde{x}})+\left\langle {\tilde{x}}-x^k,\nabla f(z^k)-\frac{1}{\lambda _k}(y^k-x^k)\right\rangle +\frac{1}{\lambda _k}D_h({\tilde{x}},x^k). \end{aligned}$$

Using the fact $D_h(x^{k+1},x^k)\ge \frac{\sigma _h}{2}\Vert x^{k+1}-x^k\Vert ^2$ and choosing $k:=k_n-1\,(n\ge 1)$, we obtain

$$\begin{aligned}&g(x^{k_n})+\left\langle x^{k_n}-x^{k_n-1},\nabla f(z^{k_n-1})-\frac{1}{\lambda _{k_n-1}}(y^{k_n-1}-x^{k_n-1})\right\rangle +\frac{\sigma _h}{2\lambda _{k_n-1}}\Vert x^{k_n}-x^{k_n-1}\Vert ^2\\&\quad \le g({\tilde{x}})+\left\langle {\tilde{x}}-x^{k_n-1},\nabla f(z^{k_n-1})-\frac{1}{\lambda _{k_n-1}}(y^{k_n-1}-x^{k_n-1})\right\rangle +\frac{1}{\lambda _{k_n-1}}D_h({\tilde{x}},x^{k_n-1}). \end{aligned}$$

By taking the superior limit above as $n\rightarrow \infty $, and using the fact in (ii), the continuity of $\nabla f$ and h, $x^{k_n}\rightarrow {\tilde{x}}$ as $n\rightarrow \infty $, and the fact that $\{z^k\}$ is bounded and so is $\nabla f(z^k)$, we have

$$\begin{aligned} \lim \sup _{n\rightarrow \infty } g(x^{k_n}) \le g({\tilde{x}}). \end{aligned}$$

On the other hand, by the lower semicontinuity of g, we get

$$\begin{aligned} g({\tilde{x}})\le \lim \inf _{n\rightarrow \infty }g(x^{k_n}). \end{aligned}$$

Hence, we have $ \lim _{n\rightarrow \infty } g(x^{k_n}) = g({\tilde{x}}).$ Together with the definition of F in (1) and the continuity of f, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }F(x^{k_n}) =F({\tilde{x}}). \end{aligned}$$

(36)

Define $\xi ^{k_n}:= -\frac{1}{\lambda _{k_n-1}}\left[ (x^{k_n-1}-y^{k_n-1})-(\nabla h(x^{k_n})-\nabla h(x^{k_n-1}))\right] +\nabla f(x^{k_n})-\nabla f(z^{k_n-1}),\,n\ge 1$. Then, by the optimality condition (11), we have

$$\begin{aligned} \xi ^{k_n} \in \nabla f(x^{k_n})+\partial g(x^{k_n}). \end{aligned}$$

(37)

Recalling (9a) and (9b), $\lim _{n\rightarrow \infty }\Vert \varDelta _{k_n}\Vert =0$ by statement (ii), the continuity of $\nabla h$ and $\nabla f$, and the boundedness of $\{\alpha _{k}\}$, $\{\beta _{k_n}\}$ and $\{\lambda _{k_n}\}$ by Assumption 2, we get $\lim _{n\rightarrow \infty }\xi ^{k_n}=0$. Together with $\lim _{n\rightarrow \infty }x^{k_n}= {\tilde{x}}$, (36) and (37), we obtain

$$\begin{aligned} 0\in \nabla f({\tilde{x}})+\partial g({\tilde{x}}), \end{aligned}$$

which implies that ${\tilde{x}}\in \mathrm{crit}F$. Furthermore, it is easy to derive that F is finite and constant on the set of all cluster point ${\tilde{x}}$ from Assumption 1 and the above conclusions as that in [13, Lemma 5]. This completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Z., Li, C., Li, M. et al. Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems. J Glob Optim 79, 617–644 (2021). https://doi.org/10.1007/s10898-020-00943-7

Download citation

Received: 21 June 2019
Accepted: 07 August 2020
Published: 19 August 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s10898-020-00943-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Abstract

Access this article

Similar content being viewed by others

An alternating structure-adapted Bregman proximal gradient descent algorithm for constrained nonconvex nonsmooth optimization problems and its inertial variant

General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems

A nonmonotone accelerated proximal gradient method with variable stepsize strategy for nonsmooth and nonconvex minimization problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

1.1 A.1 Proof of Lemma 3

Proof

1.2 A.2 Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Abstract

Access this article

Similar content being viewed by others

An alternating structure-adapted Bregman proximal gradient descent algorithm for constrained nonconvex nonsmooth optimization problems and its inertial variant

General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems

A nonmonotone accelerated proximal gradient method with variable stepsize strategy for nonsmooth and nonconvex minimization problems

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

1.1 A.1 Proof of Lemma 3

Proof

1.2 A.2 Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation