Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates

Frankel, Pierre; Garrigos, Guillaume; Peypouquet, Juan

doi:10.1007/s10957-014-0642-3

Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates

Published: 16 September 2014

Volume 165, pages 874–900, (2015)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Pierre Frankel¹,
Guillaume Garrigos^1,2 &
Juan Peypouquet²

2031 Accesses
103 Citations
2 Altmetric
Explore all metrics

Abstract

We study the convergence of general descent methods applied to a lower semi-continuous and nonconvex function, which satisfies the Kurdyka–Łojasiewicz inequality in a Hilbert space. We prove that any precompact sequence converges to a critical point of the function, and obtain new convergence rates both for the values and the iterates. The analysis covers alternating versions of the forward–backward method with variable metric and relative errors. As an example, a nonsmooth and nonconvex version of the Levenberg–Marquardt algorithm is detailed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods

Article 10 August 2017

Convergence Rates of Forward–Douglas–Rachford Splitting Method

Article Open access 11 April 2019

Convergence Analysis of the Generalized Splitting Methods for a Class of Nonconvex Optimization Problems

Article 09 July 2019

Notes

A simple sufficient $-$ yet not necessary $-$ condition for $\inf _{k\in \mathbb {N}} a_k b_{k+1}^2 >0$ and $\inf _{k\in \mathbb {N}} a_k b_{k+1}>0$ is that $\inf _{k\in \mathbb {N}}b_k>0$.

References

Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles, pp. 87–89. Éditions du centre National de la Recherche Scientifique, Paris (1963)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier 48, 769–783 (1998)
Article MATH MathSciNet Google Scholar
Łojasiewicz, S.: Sur la géométrie semi- et sous-analytique. Ann. Inst. Fourier 43, 1575–1595 (1993)
Article MATH Google Scholar
Kurdyka, K., Parusiński, A.: $\mathbf{w}_f$-stratification of subanalytic functions and the Łojasiewicz inequality. C. R. Acad. Paris 318, 129–133 (1994)
MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)
Article MathSciNet Google Scholar
Simon, L.: Asymptotics for a class of nonlinear evolution equations, with applications to geometric problems. Ann. Math. 118, 525–571 (1983)
Article MATH Google Scholar
Huang, S.-Z., Takáč, P.: Convergence in gradient-like systems which are asymptotically autonomous and analytic. Nonlinear Anal. Ser. A Theory Methods 46, 675–698 (2001)
Article MATH Google Scholar
Chill, R., Jendoubi, M.A.: Convergence to steady states in asymptotically autonomous semilinear evolution equations. Nonlinear Anal. 53, 1017–1039 (2003)
Article MATH MathSciNet Google Scholar
Haraux, A., Jendoubi, M.A.: Convergence of solutions of second-order gradient-like systems with analytic nonlinearities. J. Differ. Equ. 144(2), 313–320 (1998)
Article MATH MathSciNet Google Scholar
Haraux, A.: A hyperbolic variant of Simon’s convergence theorem. Evolution equations and their applications in physical and life sciences (Bad Herrenalb, 1998). Lecture Notes in Pure and Applied Mathematics. Dekker, New York (2001)
Google Scholar
Baudoin, L., Salomon, J.: Constructive solution of a bilinear optimal control problem for a Schrödinger equation. Syst. Control Lett. 57(6), 453–464 (2008)
Article Google Scholar
Absil, P.-A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16, 531–547 (2005)
Article MATH MathSciNet Google Scholar
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116, 5–16 (2009)
Article MATH MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)
Article MATH MathSciNet Google Scholar
Merlet, B., Pierre, M.: Convergence to equilibrium for the backward Euler scheme and applications. Commun. Pure Appl. Anal 9, 685–702 (2010)
Article MATH MathSciNet Google Scholar
Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Lojasiewicz inequality. J. Optim. Theory Appl. 160, 553–572 (2014)
Article MATH MathSciNet Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Article MATH MathSciNet Google Scholar
Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl . (to appear) (2013)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems. An approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MATH MathSciNet Google Scholar
Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)
Article MATH MathSciNet Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Programm. 15, 1–36 (2013)
Google Scholar
Chouzenoux, E., Pesquet, J.C., Repetti, A.: A block coordinate variable metric forward-backward algorithm (2014). http://www.optimization-online.org/DB_HTML/2013/12/4178.html
Li, D., Pang, L.P., Chen, S.: A proximal alternating linearization method for nonconvex optimization problems. Optim. Methods Softw. 29(4), 646–657 (2014)
Article MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Article MATH MathSciNet Google Scholar
van den Dries, L.: Tame topology and o-minimal structures. Bull. AMS 37(3), 351–357 (2000)
Article Google Scholar
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84, 497–540 (1996)
Article MATH MathSciNet Google Scholar
Haraux, A., Jendoubi, M.A.: The Łojasiewicz gradient inequality in the infinite dimensional Hilbert space framework. J. Funct. Anal. 260(9), 2826–2842 (2010)
Article MathSciNet Google Scholar
Chill, R.: The Łojasiewicz–Simon gradient inequality in Hilbert spaces. In: Jendoubi, M.A. (ed.) Proceedings of the 5th European-Maghrebian Workshop on Semigroup Theory, Evolution Equations, and Applications, pp. 25–36 (2006)
Attouch, H., Buttazo, G., Michaille, G.: Variational Analysis in Sobolev and $BV$ Spaces. MPS-SIAM Series on Optimization. Springer, NewYork (2008)
Google Scholar
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MATH MathSciNet Google Scholar
Ferris, M.: Finite termination of the proximal point algorithm. Math. Program. 50, 359–366 (1991)
Article MATH MathSciNet Google Scholar
Peypouquet, J.: Asymptotic convergence to the optimal value of diagonal proximal iterations in convex minimization. J. Convex Anal. 16(1), 277–286 (2009)
MATH MathSciNet Google Scholar
Chill, R., Fiorenza, A.: Convergence and decay rate to equilibrium of bounded solutions of quasilinear parabolic equations. J. Differ. Equ. 228, 611–632 (2006)
Article MATH MathSciNet Google Scholar
Cauchy, A.-L.: Méthode générale pour la résolution des systèmes d’équations simultanées. C. R. Acad. Sci. Paris 25, 536–538 (1847)
Google Scholar
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informat. Recherche Opérationnelle 4, Sér. R-3, 154–158 (1970)
Brézis, H., Lions, P.-L.: Produits infinis de résolvantes. Israel J. Math. 29(4), 329–345 (1978)
Article MATH MathSciNet Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MATH MathSciNet Google Scholar
Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)
Article MATH MathSciNet Google Scholar
Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control Optim. 43(2), 477–501 (2004)
Article MATH MathSciNet Google Scholar
Alvarez, F., López, J., Ramírez, H.: Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines. Optim. Methods Softw. 25(4–6), 859–881 (2010)
Article MATH MathSciNet Google Scholar
Attouch, H., Svaiter, B.F.: A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J. Optim. 49(2), 574–598 (2011)
Article MATH MathSciNet Google Scholar
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2011)
Google Scholar
Gafni, E.M., Bertsekas, D.P.: Two-metric projection methods for constrained optimization. SIAM J. Control Optim. 22(6), 936–964 (1984)
Article MATH MathSciNet Google Scholar
Bertsekas, D.P.: Projected Newton methods for optimization problems with simple constraints. SIAM J. Control Optim. 20(2), 221–246 (1982)
Article MATH MathSciNet Google Scholar
Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence for alternating and averaged non convex projections. Found. Comput. Math. 9(4), 485–513 (2009)
Article MATH MathSciNet Google Scholar
Donoho, D.L.: For most Large underdetermined systems of linear equations the minimal $\ell ^1$-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 6, 59, 797–829 (2006)
Article MathSciNet Google Scholar
Donoho, D.L., Tanner, J.: Counting the faces of randomly-projected hypercubes and orthants, with applications. Discret. Comput. Geom. 43, 522–541 (2010)
Article MATH MathSciNet Google Scholar
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
Article MATH Google Scholar
Chandrasekaran, V., Sanghavi, S., Parillo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572–596 (2011)
Article MATH MathSciNet Google Scholar
Ganesh, A., Lin, Z., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast algorithms for recovering a corrupted low-rank matrix. In: Proceedings of the 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 213–216 (2009)
Yuan, X., Yang, J.: Sparse and low-rank matrix decomposition via alternating direction method. Pac. J. Optim. 9(1), 167–180 (2013)
MATH MathSciNet Google Scholar
Recht, B., Fazel, B., Parillo, P.A.: Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2007)
Article Google Scholar
Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate O($1/k^2$). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MATH MathSciNet Google Scholar
Beck, A., Teboulle, M.: Gradient-based algorithms with applications in signal recovery problems. In: Palomar, D., Eldar, Y. (eds.) Convex Optimization in Signal Processing and Communications, pp. 33–88. Cambridge University Press, Cambridge (2010)
Google Scholar
Ochs, P., Chen, Y., Brox, T.: Pock, T.: iPiano: Inertial Proximal Algorithm for Non-convex Optimization (2014). http://lmb.informatik.uni-freiburg.de/Publications/2014/OB14/
Casas, E., Herzog, R., Wachsmuth, G.: Approximation of sparse controls in semilinear equations by piecewise linear functions. Numer. Math. 122(4), 645–669 (2012)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors thank H. Attouch for useful remarks. They would also like to thank the anonymous reviewer for his careful reading and constructive comments. The second and third authors are partly supported by Conicyt Anillo Project ACT-1106, ECOS-Conicyt Project C13E03 and Millenium Nucleus ICM/FIC P10-024F. The third author is also partly supported by FONDECYT Grant 1140829 and Basal Project CMM Universidad de Chile.

Author information

Authors and Affiliations

Institut de Mathématiques et Modélisation de Montpellier, UMR 5149 CNRS, Université Montpellier 2, Place Eugène Bataillon, 34095, Montpellier cedex 5, France
Pierre Frankel & Guillaume Garrigos
Departamento de Matemática & AM2V, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaiso, Chile
Guillaume Garrigos & Juan Peypouquet

Authors

Pierre Frankel
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Garrigos
View author publications
You can also search for this author in PubMed Google Scholar
Juan Peypouquet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Garrigos.

Additional information

Communicated by Hédy Attouch.

Appendix

1.1 Proofs of Theorems 3.1 and 3.2

The argument is a straightforward adaptation of the ideas in the proof of [17, Lemma 2.6]. One first proves the following:

Lemma 7.1

Let $\mathbf {H}_1$ and $\mathbf {H}_2$ hold and fix $k\in \mathbb {N}$. If $x^k$ and $x^{k+1}$ belong to $\underline{\Gamma }_\eta (x^*,\delta )$, then

$$\begin{aligned} 2\Vert x^{k+1}-x^k\Vert&\le \Vert x^k-x^{k-1}\Vert \\&+\frac{1}{a_kb_k}\big [\varphi (f(x^k)-f(x^*)) -\varphi (f(x^{k+1})-f(x^*))\big ]+\epsilon _{k}.\nonumber \end{aligned}$$

(24)

For next results, we introduce the following auxiliary property (automatically fulfilled under the hypotheses of Theorems 3.1 and 3.2), which includes a stability of the sequence $(x^k)_{k\in \mathbb {N}}$ with respect to the point $x^*$, along with a sufficiently close initialization.

$\underline{\mathbf {S}(x^*,\delta ,\rho )}$: There exist $\delta >\rho >0$ such that

(i)
For each $k\in \mathbb {N}$, if $\frac{}{}\!x^0,\dots ,x^k\in \underline{\Gamma }_\eta (x^*,\rho )$, then $x^{k+1}\in \underline{\Gamma }_\eta (x^*,\delta )$;
(ii)
The initial point $x^0$ belongs to $\Gamma _\eta (x^*,\rho )$ and
$$\begin{aligned} \Vert x^*-x^{0}\Vert +2\sqrt{\frac{f(x^{0})-f(x^*)}{a_0}}+ M\varphi (f(x^{0})-f(x^*))+\sum _{i=1}^{+\infty }\epsilon _i < \rho . \end{aligned}$$
(25)

Then, we have the following estimation:

Lemma 7.2

Let $\mathbf {H}_1$, $\mathbf {H}_2$, $\mathbf {H}_3$ and $\mathbf {S}(x^*,\delta ,\rho )$ hold, and $M:=\sup _{k\in \mathbb {N}^*}\frac{1}{a_kb_k}<+\infty $. Then, for all $K\in \mathbb {N}^*$, we have $x^K\in \underline{\Gamma }_\eta (x^*,\rho )$ and

$$\begin{aligned}&\sum _{k=1}^{K}\Vert x^{k+1}-x^k\Vert +\Vert x^{K+1}-x^K\Vert \\&\quad \le \Vert x^1-x^0\Vert +M\big [\varphi (f(x^1)-f( x^*))-\varphi (f(x^{K+1})-f(x^*))\big ]+\sum _{k=1}^{K}\epsilon _k. \end{aligned}$$

The basic asymptotic properties are given by the following result:

Proposition 7.1

Let $\mathbf {H}_1$, $\mathbf {H}_2$, $\mathbf {H}_3$ and $\mathbf {S}(x^*,\delta ,\rho )$ hold. Then, $x^k\in \underline{\Gamma }_\eta (x^*,\rho )$ for all $k$, the sequence has finite length and converges to some $\overline{x}$. Moreover, $\liminf \limits _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0$, and $f(\overline{x})\le \lim \limits _{k\rightarrow \infty }f(x^k)=f(x^*)$.

Proof

Capture, convergence and finite length follow from Lemma 7.2 and $\mathbf {H}_3$. Next, since $(b_k)\notin \ell ^1$ and $\sum _{k=1}^\infty b_{k+1}\Vert \partial f(x^k) \Vert _- \le \sum _{k=1}^\infty \Vert x^{k+1}-x^k\Vert +\sum _{k=1}^\infty \varepsilon _{k+1}<\infty $, we obtain $\liminf _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0$. Finally, observe that that $\lim _{k\rightarrow \infty }f(x^k)$ exists because $f(x^k)$ is decreasing, and bounded from below by $f(x^*)$; and the lower semi-continuity of $f$ implies $f(\overline{x})\le \lim _{k\rightarrow \infty }f(x^k)$. If $\lim _{k\rightarrow \infty }f(x^k)=\beta >f(x^*)$, the KŁ inequality and the fact that $\varphi '$ is decreasing imply $\varphi '(\beta -f(x^*))\Vert \partial f(x^k) \Vert _- \ge \varphi '(f(x^k)-f(x^*))\Vert \partial f(x^k) \Vert _- \ge 1$ for all $k\in \mathbb {N}$, which is impossible because $\liminf _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0$, whence $\beta = f(x^*)$. $\square $

We are now in position to complete the proofs of Theorems 3.1 and 3.2.

Proof of Theorem 3.1

Let $x^{n_k}\rightarrow x^*$ with $f(x^{n_k})\rightarrow f(x^*)$ as $k\rightarrow \infty $. Since $f(x^k)$ is nonincreasing and admits a limit point, we deduce that $f(x^k)\downarrow f(x^*)$. In particular, we have $f(x^*)\le f(x^k)$ for all $k\in \mathbb {N}$. The function $f$ satisfies the KŁ inequality on $\Gamma _\eta (x^*,\delta )$ with desingularising function $\varphi $. Let $K_0\in \mathbb {N}$ be sufficiently large so that $f(x^{K})-f(x^*)<\min \{\eta ,\underline{a}\delta ^2\}$, and pick $\rho >0$ such that $f(x^K)-f(x^*)<\underline{a}(\delta -\rho )^2$.

Hence, $f(x^*)\le f(x^{k+1})<f(x^*)+\eta $ for all $k\ge K$ and

$$\begin{aligned} \Vert x^{k+1}-x^k\Vert \le \sqrt{\frac{f(x^k)-f(x^{k+1})}{a_k}}\le \sqrt{\frac{f(x^K)-f(x^*)}{\underline{a}}}< \delta -\rho , \end{aligned}$$

which implies part (i) of $\mathbf {S}(x^*,\delta ,\rho )$. Now take $K\ge K_0$ such that

$$\begin{aligned} \Vert x^*-x^{K}\Vert +2\sqrt{\frac{f(x^{K})-f(x^*)}{a_{n_K}}}+M\varphi (f(x^{K})-f(x^*))+\sum _{k=K+1}^{+\infty }\epsilon _{k}<\rho . \end{aligned}$$

The sequence $(y^k)_{k\in \mathbb {N}}$, defined by $y^k:=x^{K+k}$ for all $k\in \mathbb {N}$, satisfies the hypotheses of Proposition 7.1. Finally, since the whole sequence $(y^k)_{k\in \mathbb {N}}$ is $f$-convergent toward $x^*$ and $\liminf \limits _{k\rightarrow \infty }\Vert \partial f(y^k) \Vert _-=0$, we conclude that $x^*$ must be critical using Lemma 2.1. $\square $

Proof of Theorem 3.2

Since $f$ has the KŁ property in $x^*$, there is a strict local upper level set $\Gamma _{\eta }(x^*,\delta )$, where the KŁ inequality holds with $\varphi $ as a desingularising function. Take $\rho <\frac{3}{4}\delta $ and then $ \gamma <\frac{1}{3}\rho $. If necessary, shrink $\eta $ so that $2\sqrt{\dfrac{\eta }{\underline{a}}} + M \varphi (\eta ) < \dfrac{2\rho }{3}$. This is possible since $\varphi $ is continuous in $0$ with $\varphi (0)=0$. Let $x^0\in \underline{\Gamma }_\eta (x^*,\gamma )\subset \underline{\Gamma }_\eta (x^*,\rho )$. It suffices to verify that $\mathbf {S}(x^*,\delta ,\rho )$ is fulfilled and use Proposition 7.1. For (i), let us suppose that $x^0,\dots ,x^k$ lie in $\underline{\Gamma }_\eta (x^*,\rho )$ and prove that $x^{k+1} \in \underline{\Gamma }_\eta (x^*,\delta )$. Since $x^*$ is a global minimum, from $\mathbf {H}_1$ and the fact that $\left( f(x^k)\right) _{k\in \mathbb {N}}$ is decreasing, we have

$$\begin{aligned} f(x^*)\!+\!\underline{a} \Vert x^{k+1}\!-\!x^k \Vert ^2\!\le \! f(x^{k+1})\!+\!\underline{a} \Vert x^{k+1}-x^k \Vert ^2\le f(x^k)\!\le \! f(x^0)< f(x^*) + \eta . \end{aligned}$$

It follows that $\Vert x^{k+1}-x^*\Vert \le \Vert x^{k+1}-x^k\Vert +\Vert x^k-x^*\Vert <\sqrt{\frac{\eta }{\underline{a}}}+\rho <\frac{4}{3}\rho <\delta $, and so $x^{k+1}\in \underline{\Gamma }_\eta (x^*,\delta )$. Finally,

$$\begin{aligned} \Vert x^0-x^*\Vert \!+\!2\sqrt{\frac{f(x^0)\!-\!f(x^*)}{a_0}}\!+\!M\varphi (f(x^0)-f(x^*))<\frac{1}{3}\rho +2\sqrt{\dfrac{\eta }{\underline{a}}} + M \varphi (\eta ) \!<\!\rho , \end{aligned}$$

which is precisely (ii). $\square $

1.2 Proof of Proposition 4.1

Proof

Since $X_i^k=Y_i^k +S_i^k$, we can rewrite the algorithm as

$$\begin{aligned} y_i^{k+1} \in \text{ prox }^{A_{i,k}}_{g_i} (y_i^k - A_{i,k}^{-1} \nabla _i h(Y_i^k + S_i^k) + r_i^k + s_i^k). \end{aligned}$$

(26)

We start by showing that $\mathbf {H}_1$ is satisfied. Let $i=1..p$ be fixed; using the definition of the proximal operator $\text{ prox }_{g_i}^{A_{i,k}}$ in (26) and developing the squared norms gives

$$\begin{aligned}&g_i (y_{i}^k) - g_i (y_{i}^{k+1}) \ge \frac{1}{2} \Vert y_i^{k+1} - y_i^k\Vert _{A_{i,k}}^2 + \langle y_i^{k+1} - y_i^k, \nabla _i h(Y_i^k + S_i^k)\rangle \nonumber \\&\quad \quad - \langle y_i^{k+1} - y_i^{k}, r_i^k +s_i^k\rangle _{A_{i,k}}. \end{aligned}$$

(27)

Using $\mathbf {HE}_3$ in (27), the latter results in

$$\begin{aligned} g_i (y_{i}^k) - g_i (y_{i}^{k+1}) \ge \frac{1}{2} \Vert y_i^{k+1} - y_i^k\Vert _{\rho A_{i,k}}^2 + \langle y_i^{k+1} - y_i^k, \nabla _i h(Y_i^k + S_i^k)\rangle . \end{aligned}$$

(28)

We introduce $\tilde{h}_{i,k} : y_i \in H_i \mapsto (y_1^{k+1},..,y_{i-1}^{k+1},y_i,y_{i+1}^k,..,y_p^k) \in \mathbb {R}$ for fixed $k\in \mathbb {N}$ and $i=1,\dots ,p$. It satisfies $\tilde{h}_{i,k} (y_i^k) = h(Y_i^k)$, $\tilde{h}_{i,k} (y_i^{k+1}) = h(Y_{i+1}^k)$ and $\nabla \tilde{h}_{i,k} (y_i^k) = \nabla _i h (Y_i^{k})$. Applying the descent lemma to $\tilde{h}_{i,k}$, we obtain

$$\begin{aligned} h(Y_{i+1}^k) - h(Y_i^k) - \langle y_i^{k+1} - y_i^k , \nabla _i h(Y_i^k) \rangle \le \frac{L}{2} \Vert y_i^{k+1} - y_i^k \Vert ^2. \end{aligned}$$

(29)

Then, combining (28) and (29), we get

$$\begin{aligned}&\frac{1}{2} \Vert y_i^{k+1} - y_i^k\Vert _{\rho A_{i,k} - Lid_{H_i}}^2 + \langle y_i^{k+1} - y_i^k, \nabla _i h(Y_i^k + S_i^k) - \nabla _i h(Y_i^k) \rangle \nonumber \\&\quad \le g_i (y_{i}^k) - g_i (y_{i}^{k+1}) + h(Y_i^k) - h(Y_{i+1}^k), \end{aligned}$$

(30)

where $\rho A_{i,k} - Lid_{H_i}$ remains coercive, since $\rho \alpha _k > L$. Using successively the Cauchy–Schwartz inequality, the Lipschitz property of $\nabla _i h$ (see Remark 4.2) and $\mathbf {HE}_1$, one gets

$$\begin{aligned} \langle y_i^{k+1} - y_i^k, \nabla _i h(Y_i^k + S_i^k) - \nabla _i h(Y_i^k) \rangle \ge - \frac{\sigma L}{2\sqrt{p}} \Vert y_i^{k+1} - y_i^k \Vert ^2. \end{aligned}$$

By inserting this estimation in (30), we deduce that

$$\begin{aligned} g_i (y_{i}^k) - g_i (y_{i}^{k+1}) + h(Y_i^k) - h(Y_{i+1}^k) \ge \frac{1}{2} \Vert y_i^{k+1} - y_i^k\Vert _{\rho A_{i,k} - L(\sigma \frac{\sqrt{p}}{p} +1)id_{H_i}}^2. \end{aligned}$$

(31)

We deduce that $\mathbf {H}_1$ is fulfilled with $a_k=\frac{\rho \alpha _k - L(\sigma \frac{\sqrt{p}}{p} +1)}{2}$, by summing (31) over $i$:

$$\begin{aligned} f(Y^k) - f(Y^{k+1})&= \sum \limits _{i=1}^{p} g_i(y_i^k) -g_i(y_i^{k+1}) + h(Y_i^k) - h(Y_{i+1}^k) \\&\ge \frac{1}{2} \sum \limits _{i=1}^{p} \Vert y_i^{k+1} - y_i^k\Vert _{\rho A_{i,k} - L({\sigma }\frac{\sqrt{p}}{p} +1)id_{H_i}}^2. \end{aligned}$$

To prove $\mathbf {H}_2$, fix $i=1,\dots ,p$ and use Fermat’s first-order condition in (26) to get

$$\begin{aligned} 0 \in \partial g_i(y_i^{k+1}) + \left\{ A_{i,k}(y_i^{k+1} - y_i^k) - A_{i,k}(r_i^k + s_i^k) + \nabla _i h(Y_i^k + S_i^k) \right\} . \end{aligned}$$

(32)

Define $w_i^{k+1} := \nabla _i h(Y^k) - \nabla _i h(Y_i^k + S_i^k) - A_{i,k}(y_i^{k+1} - y_i^k) + A_{i,k}(r_i^k + s_i^k)$, which lies in $\partial g_i(y_i^{k+1}) + \nabla _i h(Y^{k+1})$, by (32). The triangle inequality gives

$$\begin{aligned} \Vert w_i^{k+1} \Vert \le \ \beta _k \left( \Vert y_i^{k+1} - y_i^k \Vert + \Vert r_i^k \Vert + \Vert s_i^k \Vert \right) + \Vert \nabla _i h(Y_i^k + S_i^k) - \nabla _i h(Y^{k+1}) \Vert .\nonumber \\ \end{aligned}$$

(33)

Using the error estimations from (HE), and the $\sqrt{p} L$-Lipschitz continuity of $\nabla _i h$ in (33), we get

$$\begin{aligned} \Vert w_i^{k+1} \Vert \le \ (\beta _k (1 + \sigma ) + \sqrt{p}L\sigma ) \Vert y_i^{k+1} - y_i^k \Vert + \sqrt{p}L \Vert Y^{k+1} - Y^k \Vert + \beta _k \mu _k.\nonumber \\ \end{aligned}$$

(34)

Define now $W^{k+1} := (w_1^{k+1},\ldots ,w_p^{k+1}) \in \partial f (Y^{k+1})$ (recall the definition of $w_i^{k+1}$). Then, by summing (34) over $i=1 \ldots p$, we have (using $\sqrt{p} \le p \le p^2$)

$$\begin{aligned} \Vert W^{k+1} \Vert \ \le \ \sum \limits _{i=1}^{p} \Vert w_i^{k+1} \Vert \ \le \ p\beta _k \mu _k + p^2 (\beta _k+L) (1 + \sigma )\Vert Y^{k+1} - Y^k \Vert . \end{aligned}$$

Hence, $\mathbf {H}_2$ is verified with $b_{k+1}=\frac{1}{p^2 (1+\sigma )(\beta _k +L)}$ and $\epsilon _{k+1}= \frac{\beta _k \mu _k}{p(1+\sigma )(\beta _k +L)}$.

Now we just need to check that the hypotheses $\mathbf {H}_3$ are satisfied with our hypotheses on $\alpha _k$, $\beta _k$ and $\mu _k$. Clearly, $\mathbf {H}_3(i)$ holds since we have supposed that $\alpha _k \ge \underline{\alpha } > (\sigma \frac{\sqrt{p}}{p} +1) \frac{L}{\rho }$. Then, $\mathbf {H}_3(ii)$ asks that $b_k \notin \ell ^1$, which is equivalent to $\frac{1}{\beta _k +L} \notin \ell ^1$ in our context. This holds since we have supposed that $\frac{1}{\beta _k} \notin \ell ^1$. Hypothesis $\mathbf {H}_3(iii)$ is satisfied because $\frac{\beta _k}{\alpha _{k+1}}$ is supposed to be bounded. Finally, $\mathbf {H}_3(iv)$ asks the summability of $\frac{\beta _k \mu _k}{\beta _k +L}$ which is bounded by $\mu _k \in \ell ^1$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frankel, P., Garrigos, G. & Peypouquet, J. Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates. J Optim Theory Appl 165, 874–900 (2015). https://doi.org/10.1007/s10957-014-0642-3

Download citation

Received: 12 November 2013
Accepted: 12 August 2014
Published: 16 September 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10957-014-0642-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates

Abstract

Access this article

Similar content being viewed by others

Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods

Convergence Rates of Forward–Douglas–Rachford Splitting Method

Convergence Analysis of the Generalized Splitting Methods for a Class of Nonconvex Optimization Problems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proofs of Theorems 3.1 and 3.2

Lemma 7.1

Lemma 7.2

Proposition 7.1

Proof

Proof of Theorem 3.1

Proof of Theorem 3.2

1.2 Proof of Proposition 4.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates

Abstract

Access this article

Similar content being viewed by others

Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods

Convergence Rates of Forward–Douglas–Rachford Splitting Method

Convergence Analysis of the Generalized Splitting Methods for a Class of Nonconvex Optimization Problems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proofs of Theorems 3.1 and 3.2

Lemma 7.1

Lemma 7.2

Proposition 7.1

Proof

Proof of Theorem 3.1

Proof of Theorem 3.2

1.2 Proof of Proposition 4.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation