Abstract
We study the convergence of general descent methods applied to a lower semi-continuous and nonconvex function, which satisfies the Kurdyka–Łojasiewicz inequality in a Hilbert space. We prove that any precompact sequence converges to a critical point of the function, and obtain new convergence rates both for the values and the iterates. The analysis covers alternating versions of the forward–backward method with variable metric and relative errors. As an example, a nonsmooth and nonconvex version of the Levenberg–Marquardt algorithm is detailed.
Similar content being viewed by others
Notes
A simple sufficient \(-\) yet not necessary \(-\) condition for \(\inf _{k\in \mathbb {N}} a_k b_{k+1}^2 >0\) and \(\inf _{k\in \mathbb {N}} a_k b_{k+1}>0\) is that \(\inf _{k\in \mathbb {N}}b_k>0\).
References
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles, pp. 87–89. Éditions du centre National de la Recherche Scientifique, Paris (1963)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier 48, 769–783 (1998)
Łojasiewicz, S.: Sur la géométrie semi- et sous-analytique. Ann. Inst. Fourier 43, 1575–1595 (1993)
Kurdyka, K., Parusiński, A.: \(\mathbf{w}_f\)-stratification of subanalytic functions and the Łojasiewicz inequality. C. R. Acad. Paris 318, 129–133 (1994)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)
Simon, L.: Asymptotics for a class of nonlinear evolution equations, with applications to geometric problems. Ann. Math. 118, 525–571 (1983)
Huang, S.-Z., Takáč, P.: Convergence in gradient-like systems which are asymptotically autonomous and analytic. Nonlinear Anal. Ser. A Theory Methods 46, 675–698 (2001)
Chill, R., Jendoubi, M.A.: Convergence to steady states in asymptotically autonomous semilinear evolution equations. Nonlinear Anal. 53, 1017–1039 (2003)
Haraux, A., Jendoubi, M.A.: Convergence of solutions of second-order gradient-like systems with analytic nonlinearities. J. Differ. Equ. 144(2), 313–320 (1998)
Haraux, A.: A hyperbolic variant of Simon’s convergence theorem. Evolution equations and their applications in physical and life sciences (Bad Herrenalb, 1998). Lecture Notes in Pure and Applied Mathematics. Dekker, New York (2001)
Baudoin, L., Salomon, J.: Constructive solution of a bilinear optimal control problem for a Schrödinger equation. Syst. Control Lett. 57(6), 453–464 (2008)
Absil, P.-A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16, 531–547 (2005)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116, 5–16 (2009)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)
Merlet, B., Pierre, M.: Convergence to equilibrium for the backward Euler scheme and applications. Commun. Pure Appl. Anal 9, 685–702 (2010)
Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Lojasiewicz inequality. J. Optim. Theory Appl. 160, 553–572 (2014)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl . (to appear) (2013)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems. An approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Programm. 15, 1–36 (2013)
Chouzenoux, E., Pesquet, J.C., Repetti, A.: A block coordinate variable metric forward-backward algorithm (2014). http://www.optimization-online.org/DB_HTML/2013/12/4178.html
Li, D., Pang, L.P., Chen, S.: A proximal alternating linearization method for nonconvex optimization problems. Optim. Methods Softw. 29(4), 646–657 (2014)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
van den Dries, L.: Tame topology and o-minimal structures. Bull. AMS 37(3), 351–357 (2000)
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84, 497–540 (1996)
Haraux, A., Jendoubi, M.A.: The Łojasiewicz gradient inequality in the infinite dimensional Hilbert space framework. J. Funct. Anal. 260(9), 2826–2842 (2010)
Chill, R.: The Łojasiewicz–Simon gradient inequality in Hilbert spaces. In: Jendoubi, M.A. (ed.) Proceedings of the 5th European-Maghrebian Workshop on Semigroup Theory, Evolution Equations, and Applications, pp. 25–36 (2006)
Attouch, H., Buttazo, G., Michaille, G.: Variational Analysis in Sobolev and \(BV\) Spaces. MPS-SIAM Series on Optimization. Springer, NewYork (2008)
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Ferris, M.: Finite termination of the proximal point algorithm. Math. Program. 50, 359–366 (1991)
Peypouquet, J.: Asymptotic convergence to the optimal value of diagonal proximal iterations in convex minimization. J. Convex Anal. 16(1), 277–286 (2009)
Chill, R., Fiorenza, A.: Convergence and decay rate to equilibrium of bounded solutions of quasilinear parabolic equations. J. Differ. Equ. 228, 611–632 (2006)
Cauchy, A.-L.: Méthode générale pour la résolution des systèmes d’équations simultanées. C. R. Acad. Sci. Paris 25, 536–538 (1847)
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informat. Recherche Opérationnelle 4, Sér. R-3, 154–158 (1970)
Brézis, H., Lions, P.-L.: Produits infinis de résolvantes. Israel J. Math. 29(4), 329–345 (1978)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)
Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control Optim. 43(2), 477–501 (2004)
Alvarez, F., López, J., Ramírez, H.: Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines. Optim. Methods Softw. 25(4–6), 859–881 (2010)
Attouch, H., Svaiter, B.F.: A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J. Optim. 49(2), 574–598 (2011)
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2011)
Gafni, E.M., Bertsekas, D.P.: Two-metric projection methods for constrained optimization. SIAM J. Control Optim. 22(6), 936–964 (1984)
Bertsekas, D.P.: Projected Newton methods for optimization problems with simple constraints. SIAM J. Control Optim. 20(2), 221–246 (1982)
Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence for alternating and averaged non convex projections. Found. Comput. Math. 9(4), 485–513 (2009)
Donoho, D.L.: For most Large underdetermined systems of linear equations the minimal \(\ell ^1\)-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 6, 59, 797–829 (2006)
Donoho, D.L., Tanner, J.: Counting the faces of randomly-projected hypercubes and orthants, with applications. Discret. Comput. Geom. 43, 522–541 (2010)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
Chandrasekaran, V., Sanghavi, S., Parillo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572–596 (2011)
Ganesh, A., Lin, Z., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast algorithms for recovering a corrupted low-rank matrix. In: Proceedings of the 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 213–216 (2009)
Yuan, X., Yang, J.: Sparse and low-rank matrix decomposition via alternating direction method. Pac. J. Optim. 9(1), 167–180 (2013)
Recht, B., Fazel, B., Parillo, P.A.: Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2007)
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate O(\(1/k^2\)). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Beck, A., Teboulle, M.: Gradient-based algorithms with applications in signal recovery problems. In: Palomar, D., Eldar, Y. (eds.) Convex Optimization in Signal Processing and Communications, pp. 33–88. Cambridge University Press, Cambridge (2010)
Ochs, P., Chen, Y., Brox, T.: Pock, T.: iPiano: Inertial Proximal Algorithm for Non-convex Optimization (2014). http://lmb.informatik.uni-freiburg.de/Publications/2014/OB14/
Casas, E., Herzog, R., Wachsmuth, G.: Approximation of sparse controls in semilinear equations by piecewise linear functions. Numer. Math. 122(4), 645–669 (2012)
Acknowledgments
The authors thank H. Attouch for useful remarks. They would also like to thank the anonymous reviewer for his careful reading and constructive comments. The second and third authors are partly supported by Conicyt Anillo Project ACT-1106, ECOS-Conicyt Project C13E03 and Millenium Nucleus ICM/FIC P10-024F. The third author is also partly supported by FONDECYT Grant 1140829 and Basal Project CMM Universidad de Chile.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Hédy Attouch.
Appendix
Appendix
1.1 Proofs of Theorems 3.1 and 3.2
The argument is a straightforward adaptation of the ideas in the proof of [17, Lemma 2.6]. One first proves the following:
Lemma 7.1
Let \(\mathbf {H}_1\) and \(\mathbf {H}_2\) hold and fix \(k\in \mathbb {N}\). If \(x^k\) and \(x^{k+1}\) belong to \(\underline{\Gamma }_\eta (x^*,\delta )\), then
For next results, we introduce the following auxiliary property (automatically fulfilled under the hypotheses of Theorems 3.1 and 3.2), which includes a stability of the sequence \((x^k)_{k\in \mathbb {N}}\) with respect to the point \(x^*\), along with a sufficiently close initialization.
\(\underline{\mathbf {S}(x^*,\delta ,\rho )}\): There exist \(\delta >\rho >0\) such that
-
(i)
For each \(k\in \mathbb {N}\), if \(\frac{}{}\!x^0,\dots ,x^k\in \underline{\Gamma }_\eta (x^*,\rho )\), then \(x^{k+1}\in \underline{\Gamma }_\eta (x^*,\delta )\);
-
(ii)
The initial point \(x^0\) belongs to \(\Gamma _\eta (x^*,\rho )\) and
$$\begin{aligned} \Vert x^*-x^{0}\Vert +2\sqrt{\frac{f(x^{0})-f(x^*)}{a_0}}+ M\varphi (f(x^{0})-f(x^*))+\sum _{i=1}^{+\infty }\epsilon _i < \rho . \end{aligned}$$(25)
Then, we have the following estimation:
Lemma 7.2
Let \(\mathbf {H}_1\), \(\mathbf {H}_2\), \(\mathbf {H}_3\) and \(\mathbf {S}(x^*,\delta ,\rho )\) hold, and \(M:=\sup _{k\in \mathbb {N}^*}\frac{1}{a_kb_k}<+\infty \). Then, for all \(K\in \mathbb {N}^*\), we have \(x^K\in \underline{\Gamma }_\eta (x^*,\rho )\) and
The basic asymptotic properties are given by the following result:
Proposition 7.1
Let \(\mathbf {H}_1\), \(\mathbf {H}_2\), \(\mathbf {H}_3\) and \(\mathbf {S}(x^*,\delta ,\rho )\) hold. Then, \(x^k\in \underline{\Gamma }_\eta (x^*,\rho )\) for all \(k\), the sequence has finite length and converges to some \(\overline{x}\). Moreover, \(\liminf \limits _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0\), and \(f(\overline{x})\le \lim \limits _{k\rightarrow \infty }f(x^k)=f(x^*)\).
Proof
Capture, convergence and finite length follow from Lemma 7.2 and \(\mathbf {H}_3\). Next, since \((b_k)\notin \ell ^1\) and \(\sum _{k=1}^\infty b_{k+1}\Vert \partial f(x^k) \Vert _- \le \sum _{k=1}^\infty \Vert x^{k+1}-x^k\Vert +\sum _{k=1}^\infty \varepsilon _{k+1}<\infty \), we obtain \(\liminf _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0\). Finally, observe that that \(\lim _{k\rightarrow \infty }f(x^k)\) exists because \(f(x^k)\) is decreasing, and bounded from below by \(f(x^*)\); and the lower semi-continuity of \(f\) implies \(f(\overline{x})\le \lim _{k\rightarrow \infty }f(x^k)\). If \(\lim _{k\rightarrow \infty }f(x^k)=\beta >f(x^*)\), the KŁ inequality and the fact that \(\varphi '\) is decreasing imply \(\varphi '(\beta -f(x^*))\Vert \partial f(x^k) \Vert _- \ge \varphi '(f(x^k)-f(x^*))\Vert \partial f(x^k) \Vert _- \ge 1\) for all \(k\in \mathbb {N}\), which is impossible because \(\liminf _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0\), whence \(\beta = f(x^*)\). \(\square \)
We are now in position to complete the proofs of Theorems 3.1 and 3.2.
Proof of Theorem 3.1
Let \(x^{n_k}\rightarrow x^*\) with \(f(x^{n_k})\rightarrow f(x^*)\) as \(k\rightarrow \infty \). Since \(f(x^k)\) is nonincreasing and admits a limit point, we deduce that \(f(x^k)\downarrow f(x^*)\). In particular, we have \(f(x^*)\le f(x^k)\) for all \(k\in \mathbb {N}\). The function \(f\) satisfies the KŁ inequality on \(\Gamma _\eta (x^*,\delta )\) with desingularising function \(\varphi \). Let \(K_0\in \mathbb {N}\) be sufficiently large so that \(f(x^{K})-f(x^*)<\min \{\eta ,\underline{a}\delta ^2\}\), and pick \(\rho >0\) such that \(f(x^K)-f(x^*)<\underline{a}(\delta -\rho )^2\).
Hence, \(f(x^*)\le f(x^{k+1})<f(x^*)+\eta \) for all \(k\ge K\) and
which implies part (i) of \(\mathbf {S}(x^*,\delta ,\rho )\). Now take \(K\ge K_0\) such that
The sequence \((y^k)_{k\in \mathbb {N}}\), defined by \(y^k:=x^{K+k}\) for all \(k\in \mathbb {N}\), satisfies the hypotheses of Proposition 7.1. Finally, since the whole sequence \((y^k)_{k\in \mathbb {N}}\) is \(f\)-convergent toward \(x^*\) and \(\liminf \limits _{k\rightarrow \infty }\Vert \partial f(y^k) \Vert _-=0\), we conclude that \(x^*\) must be critical using Lemma 2.1. \(\square \)
Proof of Theorem 3.2
Since \(f\) has the KŁ property in \(x^*\), there is a strict local upper level set \(\Gamma _{\eta }(x^*,\delta )\), where the KŁ inequality holds with \(\varphi \) as a desingularising function. Take \(\rho <\frac{3}{4}\delta \) and then \( \gamma <\frac{1}{3}\rho \). If necessary, shrink \(\eta \) so that \(2\sqrt{\dfrac{\eta }{\underline{a}}} + M \varphi (\eta ) < \dfrac{2\rho }{3}\). This is possible since \(\varphi \) is continuous in \(0\) with \(\varphi (0)=0\). Let \(x^0\in \underline{\Gamma }_\eta (x^*,\gamma )\subset \underline{\Gamma }_\eta (x^*,\rho )\). It suffices to verify that \(\mathbf {S}(x^*,\delta ,\rho )\) is fulfilled and use Proposition 7.1. For (i), let us suppose that \(x^0,\dots ,x^k\) lie in \(\underline{\Gamma }_\eta (x^*,\rho )\) and prove that \(x^{k+1} \in \underline{\Gamma }_\eta (x^*,\delta )\). Since \(x^*\) is a global minimum, from \(\mathbf {H}_1\) and the fact that \(\left( f(x^k)\right) _{k\in \mathbb {N}}\) is decreasing, we have
It follows that \(\Vert x^{k+1}-x^*\Vert \le \Vert x^{k+1}-x^k\Vert +\Vert x^k-x^*\Vert <\sqrt{\frac{\eta }{\underline{a}}}+\rho <\frac{4}{3}\rho <\delta \), and so \(x^{k+1}\in \underline{\Gamma }_\eta (x^*,\delta )\). Finally,
which is precisely (ii). \(\square \)
1.2 Proof of Proposition 4.1
Proof
Since \(X_i^k=Y_i^k +S_i^k\), we can rewrite the algorithm as
We start by showing that \(\mathbf {H}_1\) is satisfied. Let \(i=1..p\) be fixed; using the definition of the proximal operator \(\text{ prox }_{g_i}^{A_{i,k}}\) in (26) and developing the squared norms gives
Using \(\mathbf {HE}_3\) in (27), the latter results in
We introduce \(\tilde{h}_{i,k} : y_i \in H_i \mapsto (y_1^{k+1},..,y_{i-1}^{k+1},y_i,y_{i+1}^k,..,y_p^k) \in \mathbb {R}\) for fixed \(k\in \mathbb {N}\) and \(i=1,\dots ,p\). It satisfies \(\tilde{h}_{i,k} (y_i^k) = h(Y_i^k)\), \(\tilde{h}_{i,k} (y_i^{k+1}) = h(Y_{i+1}^k)\) and \(\nabla \tilde{h}_{i,k} (y_i^k) = \nabla _i h (Y_i^{k})\). Applying the descent lemma to \(\tilde{h}_{i,k}\), we obtain
Then, combining (28) and (29), we get
where \(\rho A_{i,k} - Lid_{H_i}\) remains coercive, since \(\rho \alpha _k > L\). Using successively the Cauchy–Schwartz inequality, the Lipschitz property of \(\nabla _i h\) (see Remark 4.2) and \(\mathbf {HE}_1\), one gets
By inserting this estimation in (30), we deduce that
We deduce that \(\mathbf {H}_1\) is fulfilled with \(a_k=\frac{\rho \alpha _k - L(\sigma \frac{\sqrt{p}}{p} +1)}{2}\), by summing (31) over \(i\):
To prove \(\mathbf {H}_2\), fix \(i=1,\dots ,p\) and use Fermat’s first-order condition in (26) to get
Define \(w_i^{k+1} := \nabla _i h(Y^k) - \nabla _i h(Y_i^k + S_i^k) - A_{i,k}(y_i^{k+1} - y_i^k) + A_{i,k}(r_i^k + s_i^k)\), which lies in \(\partial g_i(y_i^{k+1}) + \nabla _i h(Y^{k+1})\), by (32). The triangle inequality gives
Using the error estimations from (HE), and the \(\sqrt{p} L\)-Lipschitz continuity of \(\nabla _i h\) in (33), we get
Define now \(W^{k+1} := (w_1^{k+1},\ldots ,w_p^{k+1}) \in \partial f (Y^{k+1})\) (recall the definition of \(w_i^{k+1}\)). Then, by summing (34) over \(i=1 \ldots p\), we have (using \(\sqrt{p} \le p \le p^2\))
Hence, \(\mathbf {H}_2\) is verified with \(b_{k+1}=\frac{1}{p^2 (1+\sigma )(\beta _k +L)}\) and \(\epsilon _{k+1}= \frac{\beta _k \mu _k}{p(1+\sigma )(\beta _k +L)}\).
Now we just need to check that the hypotheses \(\mathbf {H}_3\) are satisfied with our hypotheses on \(\alpha _k\), \(\beta _k\) and \(\mu _k\). Clearly, \(\mathbf {H}_3(i)\) holds since we have supposed that \(\alpha _k \ge \underline{\alpha } > (\sigma \frac{\sqrt{p}}{p} +1) \frac{L}{\rho }\). Then, \(\mathbf {H}_3(ii)\) asks that \(b_k \notin \ell ^1\), which is equivalent to \(\frac{1}{\beta _k +L} \notin \ell ^1\) in our context. This holds since we have supposed that \(\frac{1}{\beta _k} \notin \ell ^1\). Hypothesis \(\mathbf {H}_3(iii)\) is satisfied because \(\frac{\beta _k}{\alpha _{k+1}}\) is supposed to be bounded. Finally, \(\mathbf {H}_3(iv)\) asks the summability of \(\frac{\beta _k \mu _k}{\beta _k +L}\) which is bounded by \(\mu _k \in \ell ^1\). \(\square \)
Rights and permissions
About this article
Cite this article
Frankel, P., Garrigos, G. & Peypouquet, J. Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates. J Optim Theory Appl 165, 874–900 (2015). https://doi.org/10.1007/s10957-014-0642-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-014-0642-3
Keywords
- Nonconvex and nonsmooth optimization
- Kurdyka–Łojasiewicz inequality
- Descent methods
- Convergence rates
- Variable metric
- Gauss–Seidel method
- Newton-like method