On the linear convergence of the alternating direction method of multipliers

Hong, Mingyi; Luo, Zhi-Quan

doi:10.1007/s10107-016-1034-2

On the linear convergence of the alternating direction method of multipliers

Full Length Paper
Published: 06 July 2016

Volume 162, pages 165–199, (2017)
Cite this article

Mathematical Programming Submit manuscript

Mingyi Hong³ &
Zhi-Quan Luo^1,2

7893 Accesses
267 Citations
Explore all metrics

Abstract

We analyze the convergence rate of the alternating direction method of multipliers (ADMM) for minimizing the sum of two or more nonsmooth convex separable functions subject to linear constraints. Previous analysis of the ADMM typically assumes that the objective function is the sum of only two convex functions defined on two separable blocks of variables even though the algorithm works well in numerical experiments for three or more blocks. Moreover, there has been no rate of convergence analysis for the ADMM without strong convexity in the objective function. In this paper we establish the global R-linear convergence of the ADMM for minimizing the sum of any number of convex separable functions, assuming that a certain error bound condition holds true and the dual stepsize is sufficiently small. Such an error bound condition is satisfied for example when the feasible set is a compact polyhedron and the objective function consists of a smooth strictly convex function composed with a linear mapping, and a nonsmooth $\ell _1$ regularizer. This result implies the linear convergence of the ADMM for contemporary applications such as LASSO without assuming strong convexity of the objective function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Yurii Nesterov & Vladimir Spokoiny

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Sebastian Pokutta

Tseng’s extragradient method with double projection for solving pseudomonotone variational inequality problems in Hilbert spaces

Article 10 April 2024

Zhongbing Xie, Gang Cai, … Qiao-Li Dong

Notes

A sequence $\{x^r\}$ is said to converge Q-linearly to some ${\bar{x}}$ if $\Vert x^{r+1}-{\bar{x}}\Vert /\Vert x^{r}-{\bar{x}}\Vert \le \mu $ for all r, where $\mu \in (0,1)$ is some constant. A sequence $\{x^r\}$ is said to converge to ${\bar{x}}$ R-linearly if $\Vert x^r-{\bar{x}}\Vert \le c\mu ^r$ for all r and for some $c>0$.
To see that such R-linear convergence is in fact global, note that ${\bar{r}}>0$ is finite, and $\Delta _p^{r}+\Delta _d^{r}$ is Q-linearly convergent for $r\ge {\bar{r}}$. Then one can always find an appropriate constant c such that $\Delta _p^{r}+\Delta _d^{r}\le c(1+\lambda )^{-r}$ for all $r=1, 2,\ldots $.

References

Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bertsekas, D.P., Gafni, E.: Projection methods for variational inequalities with application to the traffic assignment problem. Math. Prog. Study 17, 139–159 (1982)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Hosein, P.A., Tseng, P.: Relaxation methods for network flow problems with convex arc costs. SIAM J. Control Optim. 25, 1219–1243 (1987)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs (1989)
MATH Google Scholar
Boley, D.: Local Linear Convergence of the Alternating Direction Method of Multipliers on Quadratic or Linear Programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011). (Michael Jordan, Editor in Chief)
Article MATH Google Scholar
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)
Article MathSciNet MATH Google Scholar
Chen, C., He, B. , Yuan, X., Ye, Y.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Prog. 155(1), 57–79 (2013)
Cottle, R.W., Duvall, S.G., Zikan, K.: A Lagrangian relaxation algorithm for the constrained matrix problem. Nav. Res. Logist. Q. 33, 55–76 (1986)
Article MATH Google Scholar
De Pierro, A.R., Iusem, A.N.: On the convergence properties of Hildreth’s quadratic programming algorithm. Math. Prog. 47, 37–51 (1990)
Article MathSciNet MATH Google Scholar
Deng, W., Yin, W.: On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers. J. Sci. Comput. 66(3), 889–916 (2012)
Douglas, J., Rachford, H.H.: On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Article MathSciNet MATH Google Scholar
Eckstein, J.: Splitting Methods for Monotone Operators with Applications to Parallel Optimization. Ph.D. Thesis, Operations Research Center, MIT (1989)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet MATH Google Scholar
Eckstein, J., Svaiter, B.F.: General projective splitting methods for sums of maximal monotone operators. SIAM J. Control Optim. 48, 787–811 (2010)
Article MathSciNet MATH Google Scholar
Gabay, D.: Application of the method of multipliers to varuational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Application to the Numerical Solution of Boundary-Value Problem, pp. 299–331. North-Holland, Amsterdam (1983)
Chapter Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer, New York (1984)
Book MATH Google Scholar
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator Splitting Methods in Nonlinear Mechanics. SIAM Studies in Applied Mathematics, Philadelphia (1989)
Book MATH Google Scholar
Goldfarb, D., Ma, S.: Fast multiple splitting algorithms for convex optimization. SIAM J. Optim. 22(2), 533–556 (2012)
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Prog. A. 141(1,2), 349–382 (2013)
Goldstein, A.A.: Convex programming in hilbert space. Bull. Am. Math. Soc. 70, 709–710 (1964)
Article MathSciNet MATH Google Scholar
Goldstein, T., O’Donoghue, B., Setzer, S.: Fast alternating direction optimization methods. SIAM J. Imaging Sci, 7(3), 1588–1623 (2014)
He, B.S., Tao, M., Yuan, X.M.: Alternating direction method with gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)
Article MathSciNet MATH Google Scholar
He, B.S., Yuan, X.M.: On the $O(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MathSciNet MATH Google Scholar
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Nat. Bur. Stand. 49, 263–265 (1952)
Article MathSciNet Google Scholar
Iusem, A.N.: On dual convergence and the rate of primal convergence of bregman’s convex programming method. SIAM J. Control Optim. 1, 401–423 (1991)
Article MathSciNet MATH Google Scholar
Kontogiorgis, S., Meyer, R.R.: A variable-penalty alternating directions method for convex optimization. Math. Program. 83, 29–53 (1998)
MathSciNet MATH Google Scholar
Levitin, E.S., Poljak, B.T.: Constrained minimization methods. Z. Vycisl. Mat. i Mat. Fiz. 6, 787–823 (1965). English translation in USSR Comput. Math. Phys. 6, 1–50 (1965)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Article MathSciNet MATH Google Scholar
Lin, Y.Y., Pang, J.-S.: Iterative methods for large convex quadratic programs: a survey. SIAM J. Control Optim. 18, 383–411 (1987)
Article MathSciNet MATH Google Scholar
Luo, Z.-Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72, 7–35 (1992)
Article MathSciNet MATH Google Scholar
Luo, Z.-Q., Tseng, P.: On the Linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30, 408–425 (1992)
Article MathSciNet MATH Google Scholar
Luo, Z.-Q., Tseng, P.: On the convergence rate of dual ascent methods for strictly convex minimization. Math. Oper. Res. 18, 846–867 (1993)
Article MathSciNet MATH Google Scholar
Ma, S.: Alternating proximal gradient method for convex minimization. J. Sci. Comput. (2015). doi:10.1007/s10915-015-0150-0
Mangasarian, O.L., Shiau, T.-H.: Lipschitz continuity of solutions of linear inequalities, programs and complementarity problems. SIAM J. Control Optim. 25, 583–595 (1987)
Article MathSciNet MATH Google Scholar
Monteiro, R., Svaiter, B.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23, 475–507 (2013)
Article MathSciNet MATH Google Scholar
Ohuchi, A., Kaji, I.: Lagrangian dual coordinatewise maximization algorithm for network transportation problems with quadratic costs. Networks 14, 515–530 (1984)
Article MATH Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
MATH Google Scholar
Pang, J.-S.: On the Convergence of Dual Ascent Methods for Large Scale Linearly Constrained Optimization Problems. The University of Texas, School of Management, Dallas (1984)
Google Scholar
Pang, J.-S.: A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res. 12, 474–484 (1987)
Article MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Tao, M., Yuan, X.M.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)
Article MathSciNet MATH Google Scholar
Tseng, P.: Dual ascent methods for problems with strictly convex costs and linear constraints: a unified approach. SIAM J. Control Optim. 28, 214–242 (1990)
Article MathSciNet MATH Google Scholar
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Prog. 125(2), 263–295 (2010)
Tseng, P., Bertsekas, D.P.: Relaxation methods for problems with strictly convex separable costs and linear constraints. Math. Prog. 38, 303–321 (1987)
Article MathSciNet MATH Google Scholar
Tseng, P., Bertsekas, D.P.: Relaxation methods for problems with strictly convex costs and linear constraints. Math. Oper. Res. 16, 462–481 (1991)
Article MathSciNet MATH Google Scholar
Ventura, J.A., Hearn, D.W.: Computational Development of a Lagrangian Dual Approach for Quadratic Networks. 21(4), 469–485 (1991)
Wang, X.F., Yuan, X.M.: The linearized alternating direction method of multipliers for dantzig selector. SIAM J. Sci. Comput. 34, 2792–2811 (2012)
Article MathSciNet MATH Google Scholar
Yang, J.F., Zhang, Y.: Alternating direction algorithms for $l_1$-problems in compressive sensing. SIAM J. Sci. Comput. 33, 250–278 (2011)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Statistical Methodology) 68, 49–67 (2006)
Article MathSciNet MATH Google Scholar
Zhang, H., Jiang, J.J., Luo, Z.-Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. Chin. 1(2), 163–186 (2013)
Zenios, S.A., Mulvey, J.M.: Relaxation techniques for strictly convex network problems. Ann. Oper. Res. 5, 517–538 (1986)
Article MathSciNet Google Scholar
Zhou, Z., Li, X., Wright, J., Candes, E.J., Ma, Y.: Stable principal component pursuit. In: Proceedings of 2010 IEEE International Symposium on Information Theory (2010)

Download references

Acknowledgments

The authors are grateful to Xiangfeng Wang and Dr. Min Tao of Nanjing University for their constructive comments.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
Zhi-Quan Luo
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China
Zhi-Quan Luo
Department of Industrial and Manufacturing Systems Engineering, Iowa State University, Ames, IA, 50011, USA
Mingyi Hong

Authors

Mingyi Hong
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Quan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Quan Luo.

Additional information

Dedicated to the fond memories of a close friend and collaborator, Paul Y. Tseng.

This research is also supported in part by Supported by NSFC, Grant No. 61571384, and by the Leading Talents of Guang Dong Province program, Grant No. 00201510.

Appendix

1.1 Proof of dual error bound (2.8)

The augmented Lagrangian dual function can be expressed as

$$\begin{aligned} d(y)=\min _{x\in X} \langle y,q-Ex\rangle +\frac{\rho }{2}\Vert q-Ex\Vert ^2+g(Ax)+h(x). \end{aligned}$$

(6.1)

For convenience, define $p(Ex):=\frac{\rho }{2}\Vert q-Ex\Vert ^2$, and let $\ell (x):=p(Ex)+g(Ax)+h(x)$. For simplicity, in this proof we further restrict ourselves to the case where the nonsmooth part has polyhedral level sets, i.e., $\{x: h(x)\le \xi \}$ is polyhedral for each $\xi $. More general cases can be shown along similar lines, but the arguments become more involved.

Let us define

$$\begin{aligned} x(y)\in \arg \min _{x\in X} \ell (x)+\langle y, q-Ex\rangle . \end{aligned}$$

Let $(x^*, y^*)$ denote a primal and dual optimal solution pair. Let $X^*$ and $Y^*$ denote the primal and dual optimal solution set. The following properties will be useful in our subsequent analysis.

(a)
There exist positive scalars $\sigma _g$, $L_g$ such that $\forall ~x(y),x(y')\in X$
1. a-1)
  $\langle A^T \nabla g(A x(y'))-A^T \nabla g(A x(y)), x(y')-x(y)\rangle \ge \sigma _g\Vert A x(y')-A x(y)\Vert ^2$.
2. a-2)
  $g(A x(y'))-g(Ax(y))-\langle A^T \nabla g(A x(y)), x(y')-x(y)\rangle \ge \frac{\sigma _g}{2}\Vert A x(y')-Ax(y)\Vert ^2.$
3. a-3)
  $\Vert A^T\nabla g(Ax(y'))-A^T\nabla g(A x(y))\Vert \le L_g\Vert Ax(y')-Ax(y)\Vert .$
(b)
All a-1)–a-3) are true for $p(\cdot )$ as well, with some constants $\sigma _p$ and $L_p$.
(c)
$\nabla d(y)=q-Ex(y)$, and $\Vert \nabla d(y')-\nabla d(y)\Vert \le \frac{1}{\rho }\Vert y'-y\Vert .$

Part (a) is true due to the assumed Lipchitz continuity and strong convexity of the function $g(\cdot )$. Part (b) is from the Lipchitz continuity and strong convexity of the quadratic penalization $p(\cdot )$. Part (c) has been shown in Lemmas 2.1 and 2.2.

To proceed, let us rewrite the primal problem equivalently as

$$\begin{aligned} d(y)=\min _{(x,s): x\in X, h(x)\le s} \langle y, q-Ex\rangle + p(Ex)+g(Ax)+s. \end{aligned}$$

(6.2)

Let us write the polyhedral set $\{(x,s): x\in X, h(x)\le s\}$ compactly as $C_x x+ C_s s \ge c$ for some matrices $C_x\in \mathbb {R}^{j\times n}$, $C_s\in \mathbb {R}^{j\times 1}$ and $c\in \mathbb {R}^{j\times 1}$, where $j\ge 0$ is some integer. For any fixed y, let (x(y), s(y)) denote one optimal solution for (6.2), note we must have $h(x(y))=s(y)$. Due to equivalence, if $y^*\in Y^*$, we must also have $x(y^*)\in X^*$.

Define a set-valued function $\mathcal {M}$ that assigns the vector $(d, e)\in \mathbb {R}^{n}\times \mathbb {R}^{m}$ to the set of vectors $(x,s,y,\lambda )\in \mathbb {R}^n\times \mathbb {R}\times \mathbb {R}^m\times \mathbb {R}^j$ that satisfy the following system of equations

$$\begin{aligned}&E^Ty + C_x^T\lambda =d,\\&C^T_s\lambda =1,\\&q-Ex=e,\\&\lambda \ge 0, \ (C_x x+C_s s)\ge c, \ \langle C_x x+C_s s-c,\lambda \rangle =0. \end{aligned}$$

It is easy to verify by using the optimality condition for problem (6.2) that

$$\begin{aligned} (x,s,y,\lambda )\in \mathcal {M}(E^T\nabla p(Ex)+A^T\nabla g(Ax), e)~\hbox {for some} \ \lambda \nonumber \\ \quad \quad \quad \hbox {if and only if}\ x=x(y), \ e=\nabla d(y). \end{aligned}$$

(6.3)

We can take $e=0$, and use the fact that $x(y^*)\in X^*$, we see that $(x,s,y,\lambda )\in \mathcal {M}(E^T\nabla p(Ex)+A^T\nabla g(Ax), 0)$ if and only if $x\in X^*$ and $y\in Y^*$.

The following result states a well-known local upper Lipschitzian continuity property for the polyhedral multifunction $\mathcal {M}$; see [26, 33, 34].

Proposition 6.1

There exists a positive scalar $\theta $ that depends on $A, E, C_x, C_s$ only, such that for each $({\bar{d}}, {\bar{e}})$ there is a positive scalar $\delta '$ satisfying

$$\begin{aligned} \mathcal {M}(d, e)\subseteq \mathcal {M}({\bar{d}}, {\bar{e}})+\theta \Vert (d, e)-({\bar{d}}, {\bar{e}})\Vert {\mathcal {B}}, \end{aligned}$$

(6.4)

$$\begin{aligned} \quad \quad \quad \hbox {whenever} \ \Vert (d, e)-({\bar{d}}, {\bar{e}})\Vert \le \delta '. \end{aligned}$$

(6.5)

where ${\mathcal {B}}$ denotes the unit Euclidean ball in $\mathbb {R}^n\times \mathbb {R}^{m}\times \mathbb {R}\times \mathbb {R}^j$.

The following is the main result for this appendix. Note that the scalar $\tau $ in the claim is independent the choice of y, x, s, and is independent on the coefficients of the linear term s.

Claim 6.1

Suppose all the assumptions in Assumption A are satisfied. Then there exist positive scalars $\delta $, $\tau $ such that $\mathrm{dist}(y, Y^*)\le \tau \Vert \nabla d(y)\Vert $ for all $y\in \mathcal U$ with $\Vert \nabla d(y)\Vert \le \delta $.

Proof

By the previous claim, $\mathcal {M}$ is locally Lipschitzian with modulus $\theta $ at $(\nabla \ell (x^*), 0)=(E^T\nabla p(Ex^*)+A^T\nabla g(Ax^*), 0)$.

Let $\delta \le \delta '/2$. We first show that if $\Vert \nabla d(y)\Vert \le \delta $, then we must have $\Vert \nabla \ell (x(y))-\nabla \ell (x^*)\Vert \le \delta '/2$. To this end, take a sequence $y^1,y^2,\ldots $, such that $e^r:=\nabla d(y^r)\rightarrow 0$. By Assumption (g) $\{x(y^r)\}$ lies in a compact set. Due to the fact that $s(y^r)=h(x(y^r))$, so the sequence $\{s(y^r)\}$ also lies in a compact set (cf. Assumption s(e)). By passing to a subsequence if necessary, let $(x^{\infty },s^{\infty })$ be a cluster point of $\{x(y^r), s(y^r)\}$. In light of the continuity of $\nabla \ell (\cdot )$, we have $(\nabla \ell (x(y^r)), e^r)\rightarrow (\nabla \ell (x^{\infty }),0)$. Now for all r, $\{(x(y^r), s(y^r), \nabla \ell (x(y^r)), e^r)\}$ lies in the set

$$\begin{aligned} \left\{ (x, s, d, e)\mid (x,s,y,\lambda )\in \mathcal {M}(d, e)~\hbox {for some}~ (y,\lambda )\right\} \end{aligned}$$

which is polyhedral and thus is closed. Then we can pass limit to it and conclude (cf. Proposition 6.1)

$$\begin{aligned} (x^\infty , s^{\infty }, y^{\infty }, \lambda ^{\infty })\in \mathcal {M}(\nabla \ell (x^{\infty }), 0) \end{aligned}$$

for some $(y^{\infty }, \lambda ^{\infty })\in \mathbb {R}^{m}\times \mathbb {R}^j$. Thus by (6.3) and the discussions that follow, we have $x^\infty \in X^*$ and $y^{\infty }\in Y^*$. By Lemma 2.1, we have $\nabla \ell (x^*)=\nabla \ell (x^\infty )$, which further implies that $\nabla \ell (x(y^r))\rightarrow \nabla \ell (x^*)$. This shows that the desired $\delta $ exists.

Then we let $e= \nabla d(y)$, and suppose $\Vert e\Vert \le \delta $. From the previous argument we have

$$\begin{aligned} \Vert \nabla \ell (x(y))-\nabla \ell (x^*)\Vert +\Vert e\Vert \le \delta '/2+\delta '/2=\delta '. \end{aligned}$$

Using the results in Proposition 6.1, we have that there exists $(x^*,s^*, y^*, \lambda ^*)\in \mathcal {M}(\nabla \ell (x^*), 0)$ satisfying

$$\begin{aligned} \Vert (x(y), s, y, \lambda )-(x^*,s^*, y^*, \lambda ^*)\Vert&\le \theta \left( \Vert \nabla \ell (x^*)-\nabla \ell (x(y))\Vert +\Vert e\Vert \right) . \end{aligned}$$

Since $(x(y),s,y,\lambda )\in \mathcal {M}(\nabla \ell (x(y)), e)$, it follows from the definition of $\mathcal {M}$ that

$$\begin{aligned}&E^Ty + C_x^T\lambda =\nabla \ell (x(y)), \end{aligned}$$

(6.6)

$$\begin{aligned}&C^T_s\lambda =1, \end{aligned}$$

(6.7)

$$\begin{aligned}&q-Ex(y)=e, \end{aligned}$$

(6.8)

$$\begin{aligned}&\lambda \ge 0, \ (C_x x(y)+C_s s(y))\ge c, \ \langle C_x x(y)+C_s s(y)-c,\lambda \rangle =0. \end{aligned}$$

(6.9)

Since $(x^*,s^*,y^*,\lambda ^*)\in \mathcal {M}(\nabla \ell (x^*), 0)$, we have from the definition of $\mathcal {M}$

$$\begin{aligned}&E^Ty^* + C_x^T\lambda ^* =\nabla \ell (x^*),\end{aligned}$$

(6.10)

$$\begin{aligned}&C^T_s\lambda ^*=1, \end{aligned}$$

(6.11)

$$\begin{aligned}&q-Ex^{*} =0, \end{aligned}$$

(6.12)

$$\begin{aligned}&\lambda ^*\ge 0, \ (C_x x^*+C_s s^*)\ge c, \ \langle C_x x^*+C_s s^*-c,\lambda ^*\rangle =0. \end{aligned}$$

(6.13)

Moreover, we have

$$\begin{aligned}&\sigma _g\Vert A(x(y)-x^*)\Vert ^2+\sigma _p\Vert E(x(y)-x^*)\Vert ^2\\&\qquad \le \langle A^T\nabla g(Ax(y))-A^T\nabla g(Ax(y^*)), x(y)-x(y^*)\rangle \\&\qquad \quad +\langle E^T\nabla p(Ex(y))-E^T\nabla p(Ex(y^*)), x(y)-x(y^*)\rangle \\&\qquad =\langle \nabla \ell (x(y))-\nabla \ell (x(y^*)), x(y)-x(y^*)\rangle \\&\qquad =\langle \lambda -\lambda ^*, C_x x(y)-C_x x^*\rangle +\langle y-y^*, E x(y)-E x^*\rangle \end{aligned}$$

where the first inequality comes from the strong convexity of $g(\cdot )$ and $p(\cdot )$; the last equality is from (6.6) and (6.10). Moreover, we have

$$\begin{aligned}&\langle \lambda -\lambda ^*, C_x x(y)-C_x x^*\rangle \nonumber \\&\qquad =\langle \lambda -\lambda ^*, C_x x(y)-C_x x^*\rangle +\langle \lambda -\lambda ^*, C_s s-C_s s^*\rangle \nonumber \\&\qquad =\langle \lambda -\lambda ^*, (C_x x(y)+C_s s)-(C_x x^*+C_s s^*)\rangle \nonumber \\&\qquad =-\langle \lambda ^*, C_x x(y)+C_s s-c\rangle -\langle \lambda , C_x x^*+C_s s^*-c\rangle \le 0 \end{aligned}$$

(6.14)

where in the first equality we have used the fact that $C^T_s\lambda -C^T_s\lambda ^*=0$; see (6.7) (6.11); in the third equality and in the last inequality we have used the complementary conditions (6.13) and (6.9). As a result, we have

$$\begin{aligned}&\sigma _g\Vert A(x(y)-x^*)\Vert ^2+\sigma _p\Vert E(x(y)-x^*)\Vert ^2\nonumber \\&\le \langle y-y^*, (Ex(y)-q)-(Ex^*-q)\rangle \le \Vert y-y^*\Vert \Vert e\Vert , \end{aligned}$$

(6.15)

where the last step is due to $\nabla d(y)=Ex(y)-q$ and $\nabla d(y^*)=Ex^*-q=0$. Finally we have from Proposition 6.1

$$\begin{aligned}&\Vert (x(y), s, y, \lambda )-(x^*,s^*, y^*, \lambda ^*)\Vert ^2\\&\qquad \le \theta ^2\left( \Vert \nabla \ell (x^*)-\nabla \ell (x(y))\Vert +\Vert e\Vert \right) ^2\\&\qquad \le \theta ^2\left( 2\Vert \nabla \ell (x^*)-\nabla \ell (x(y))\Vert ^2+2\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\left( 2\Vert \nabla g(x^*)-\nabla g(x(y))\Vert ^2+2\Vert \nabla p(x^*)-\nabla p(x(y))\Vert ^2+\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\left( L^2_g\Vert A^T(x(y)-x^*)\Vert ^2+L^2_p\Vert E^T(x(y)-x^*)\Vert ^2+\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\max \left( \frac{2L^2_g}{\sigma _g}, \frac{2L^2_p}{\sigma _p},1\right) \left( \sigma _g\Vert A^T(x(y)-x^*)\Vert ^2 +\sigma _p\Vert E^T(x(y)-x^*)\Vert ^2+\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\max \left( \frac{2L^2_g}{\sigma _g}, \frac{2L^2_p}{\sigma _p},1\right) \left( \Vert e\Vert \Vert y-y^*\Vert +\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\max \left( \frac{2L^2_g}{\sigma _g}, \frac{2L^2_p}{\sigma _p},1\right) \left( \Vert e\Vert \Vert (x(y), s,y, \lambda )-(x^*, s^*, y^*, \lambda ^*)\Vert +\Vert e\Vert ^2\right) , \end{aligned}$$

where the second inequality is due to $\nabla \ell (x)=\nabla g(x)+\nabla p(x)$ and the fourth step follows from properties a-3) and b).

We see that the above inequality is quadratic in $\Vert (x(y), s, y, \lambda )-(x^*, s^*, y^*, \lambda ^*)\Vert /$ $\Vert e\Vert $, so we have

$$\begin{aligned} \Vert (x(y),s, y, \lambda )-(x^*, s^*, y^*, \lambda ^*)\Vert /\Vert e\Vert \le \tau \end{aligned}$$

for some scalar $\tau $ depending on $\theta $, $L_g$, $L_p$, $\sigma _g$, $\sigma _p$. It is worth noting that $\tau $ does not depend on the choice of the coefficients of the linear term s. We conclude $\mathrm{dist}(y, Y^*)\le \tau \Vert \nabla d(y)\Vert $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hong, M., Luo, ZQ. On the linear convergence of the alternating direction method of multipliers. Math. Program. 162, 165–199 (2017). https://doi.org/10.1007/s10107-016-1034-2

Download citation

Received: 20 August 2012
Accepted: 24 May 2016
Published: 06 July 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10107-016-1034-2

Keywords

Mathematics Subject Classification

49
90

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the linear convergence of the alternating direction method of multipliers

Abstract

Access this article

Similar content being viewed by others

Random Gradient-Free Minimization of Convex Functions

The Frank-Wolfe Algorithm: A Short Introduction

Tseng’s extragradient method with double projection for solving pseudomonotone variational inequality problems in Hilbert spaces

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of dual error bound (2.8)

Proposition 6.1

Claim 6.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

On the linear convergence of the alternating direction method of multipliers

Abstract

Access this article

Similar content being viewed by others

Random Gradient-Free Minimization of Convex Functions

The Frank-Wolfe Algorithm: A Short Introduction

Tseng’s extragradient method with double projection for solving pseudomonotone variational inequality problems in Hilbert spaces

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of dual error bound (2.8)

Proposition 6.1

Claim 6.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation