Skip to main content
Log in

On the linear convergence of the alternating direction method of multipliers

  • Full Length Paper
  • Published:
Mathematical Programming Submit manuscript

Abstract

We analyze the convergence rate of the alternating direction method of multipliers (ADMM) for minimizing the sum of two or more nonsmooth convex separable functions subject to linear constraints. Previous analysis of the ADMM typically assumes that the objective function is the sum of only two convex functions defined on two separable blocks of variables even though the algorithm works well in numerical experiments for three or more blocks. Moreover, there has been no rate of convergence analysis for the ADMM without strong convexity in the objective function. In this paper we establish the global R-linear convergence of the ADMM for minimizing the sum of any number of convex separable functions, assuming that a certain error bound condition holds true and the dual stepsize is sufficiently small. Such an error bound condition is satisfied for example when the feasible set is a compact polyhedron and the objective function consists of a smooth strictly convex function composed with a linear mapping, and a nonsmooth \(\ell _1\) regularizer. This result implies the linear convergence of the ADMM for contemporary applications such as LASSO without assuming strong convexity of the objective function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. A sequence \(\{x^r\}\) is said to converge Q-linearly to some \({\bar{x}}\) if \(\Vert x^{r+1}-{\bar{x}}\Vert /\Vert x^{r}-{\bar{x}}\Vert \le \mu \) for all r, where \(\mu \in (0,1)\) is some constant. A sequence \(\{x^r\}\) is said to converge to \({\bar{x}}\) R-linearly if \(\Vert x^r-{\bar{x}}\Vert \le c\mu ^r\) for all r and for some \(c>0\).

  2. To see that such R-linear convergence is in fact global, note that \({\bar{r}}>0\) is finite, and \(\Delta _p^{r}+\Delta _d^{r}\) is Q-linearly convergent for \(r\ge {\bar{r}}\). Then one can always find an appropriate constant c such that \(\Delta _p^{r}+\Delta _d^{r}\le c(1+\lambda )^{-r}\) for all \(r=1, 2,\ldots \).

References

  1. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  2. Bertsekas, D.P., Gafni, E.: Projection methods for variational inequalities with application to the traffic assignment problem. Math. Prog. Study 17, 139–159 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bertsekas, D.P., Hosein, P.A., Tseng, P.: Relaxation methods for network flow problems with convex arc costs. SIAM J. Control Optim. 25, 1219–1243 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs (1989)

    MATH  Google Scholar 

  5. Boley, D.: Local Linear Convergence of the Alternating Direction Method of Multipliers on Quadratic or Linear Programs. SIAM J. Optim. 23(4), 2183–2207 (2013)

  6. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011). (Michael Jordan, Editor in Chief)

    Article  MATH  Google Scholar 

  7. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, C., He, B. , Yuan, X., Ye, Y.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Prog. 155(1), 57–79 (2013)

  9. Cottle, R.W., Duvall, S.G., Zikan, K.: A Lagrangian relaxation algorithm for the constrained matrix problem. Nav. Res. Logist. Q. 33, 55–76 (1986)

    Article  MATH  Google Scholar 

  10. De Pierro, A.R., Iusem, A.N.: On the convergence properties of Hildreth’s quadratic programming algorithm. Math. Prog. 47, 37–51 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  11. Deng, W., Yin, W.: On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers. J. Sci. Comput. 66(3), 889–916 (2012)

  12. Douglas, J., Rachford, H.H.: On the numerical solution of the heat conduction problem in 2 and 3 space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)

    Article  MathSciNet  MATH  Google Scholar 

  13. Eckstein, J.: Splitting Methods for Monotone Operators with Applications to Parallel Optimization. Ph.D. Thesis, Operations Research Center, MIT (1989)

  14. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  15. Eckstein, J., Svaiter, B.F.: General projective splitting methods for sums of maximal monotone operators. SIAM J. Control Optim. 48, 787–811 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Gabay, D.: Application of the method of multipliers to varuational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Application to the Numerical Solution of Boundary-Value Problem, pp. 299–331. North-Holland, Amsterdam (1983)

    Chapter  Google Scholar 

  17. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2, 17–40 (1976)

    Article  MATH  Google Scholar 

  18. Glowinski, R.: Numerical Methods for Nonlinear Variational Problems. Springer, New York (1984)

    Book  MATH  Google Scholar 

  19. Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator Splitting Methods in Nonlinear Mechanics. SIAM Studies in Applied Mathematics, Philadelphia (1989)

    Book  MATH  Google Scholar 

  20. Goldfarb, D., Ma, S.: Fast multiple splitting algorithms for convex optimization. SIAM J. Optim. 22(2), 533–556 (2012)

  21. Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Prog. A. 141(1,2), 349–382 (2013)

  22. Goldstein, A.A.: Convex programming in hilbert space. Bull. Am. Math. Soc. 70, 709–710 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  23. Goldstein, T., O’Donoghue, B., Setzer, S.: Fast alternating direction optimization methods. SIAM J. Imaging Sci, 7(3), 1588–1623 (2014)

  24. He, B.S., Tao, M., Yuan, X.M.: Alternating direction method with gaussian back substitution for separable convex programming. SIAM J. Optim. 22, 313–340 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. He, B.S., Yuan, X.M.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  26. Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Nat. Bur. Stand. 49, 263–265 (1952)

    Article  MathSciNet  Google Scholar 

  27. Iusem, A.N.: On dual convergence and the rate of primal convergence of bregman’s convex programming method. SIAM J. Control Optim. 1, 401–423 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  28. Kontogiorgis, S., Meyer, R.R.: A variable-penalty alternating directions method for convex optimization. Math. Program. 83, 29–53 (1998)

    MathSciNet  MATH  Google Scholar 

  29. Levitin, E.S., Poljak, B.T.: Constrained minimization methods. Z. Vycisl. Mat. i Mat. Fiz. 6, 787–823 (1965). English translation in USSR Comput. Math. Phys. 6, 1–50 (1965)

  30. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  31. Lin, Y.Y., Pang, J.-S.: Iterative methods for large convex quadratic programs: a survey. SIAM J. Control Optim. 18, 383–411 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  32. Luo, Z.-Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72, 7–35 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  33. Luo, Z.-Q., Tseng, P.: On the Linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Control Optim. 30, 408–425 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  34. Luo, Z.-Q., Tseng, P.: On the convergence rate of dual ascent methods for strictly convex minimization. Math. Oper. Res. 18, 846–867 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  35. Ma, S.: Alternating proximal gradient method for convex minimization. J. Sci. Comput. (2015). doi:10.1007/s10915-015-0150-0

  36. Mangasarian, O.L., Shiau, T.-H.: Lipschitz continuity of solutions of linear inequalities, programs and complementarity problems. SIAM J. Control Optim. 25, 583–595 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  37. Monteiro, R., Svaiter, B.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23, 475–507 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  38. Ohuchi, A., Kaji, I.: Lagrangian dual coordinatewise maximization algorithm for network transportation problems with quadratic costs. Networks 14, 515–530 (1984)

    Article  MATH  Google Scholar 

  39. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)

    MATH  Google Scholar 

  40. Pang, J.-S.: On the Convergence of Dual Ascent Methods for Large Scale Linearly Constrained Optimization Problems. The University of Texas, School of Management, Dallas (1984)

    Google Scholar 

  41. Pang, J.-S.: A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res. 12, 474–484 (1987)

    Article  MathSciNet  Google Scholar 

  42. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  43. Tao, M., Yuan, X.M.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  44. Tseng, P.: Dual ascent methods for problems with strictly convex costs and linear constraints: a unified approach. SIAM J. Control Optim. 28, 214–242 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  45. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Prog. 125(2), 263–295 (2010)

  46. Tseng, P., Bertsekas, D.P.: Relaxation methods for problems with strictly convex separable costs and linear constraints. Math. Prog. 38, 303–321 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  47. Tseng, P., Bertsekas, D.P.: Relaxation methods for problems with strictly convex costs and linear constraints. Math. Oper. Res. 16, 462–481 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  48. Ventura, J.A., Hearn, D.W.: Computational Development of a Lagrangian Dual Approach for Quadratic Networks. 21(4), 469–485 (1991)

  49. Wang, X.F., Yuan, X.M.: The linearized alternating direction method of multipliers for dantzig selector. SIAM J. Sci. Comput. 34, 2792–2811 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  50. Yang, J.F., Zhang, Y.: Alternating direction algorithms for \(l_1\)-problems in compressive sensing. SIAM J. Sci. Comput. 33, 250–278 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  51. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Statistical Methodology) 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  52. Zhang, H., Jiang, J.J., Luo, Z.-Q.: On the linear convergence of a proximal gradient method for a class of nonsmooth convex minimization problems. J. Oper. Res. Soc. Chin. 1(2), 163–186 (2013)

  53. Zenios, S.A., Mulvey, J.M.: Relaxation techniques for strictly convex network problems. Ann. Oper. Res. 5, 517–538 (1986)

    Article  MathSciNet  Google Scholar 

  54. Zhou, Z., Li, X., Wright, J., Candes, E.J., Ma, Y.: Stable principal component pursuit. In: Proceedings of 2010 IEEE International Symposium on Information Theory (2010)

Download references

Acknowledgments

The authors are grateful to Xiangfeng Wang and Dr. Min Tao of Nanjing University for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhi-Quan Luo.

Additional information

Dedicated to the fond memories of a close friend and collaborator, Paul Y. Tseng.

This research is also supported in part by Supported by NSFC, Grant No. 61571384, and by the Leading Talents of Guang Dong Province program, Grant No. 00201510.

Appendix

Appendix

1.1 Proof of dual error bound (2.8)

The augmented Lagrangian dual function can be expressed as

$$\begin{aligned} d(y)=\min _{x\in X} \langle y,q-Ex\rangle +\frac{\rho }{2}\Vert q-Ex\Vert ^2+g(Ax)+h(x). \end{aligned}$$
(6.1)

For convenience, define \(p(Ex):=\frac{\rho }{2}\Vert q-Ex\Vert ^2\), and let \(\ell (x):=p(Ex)+g(Ax)+h(x)\). For simplicity, in this proof we further restrict ourselves to the case where the nonsmooth part has polyhedral level sets, i.e., \(\{x: h(x)\le \xi \}\) is polyhedral for each \(\xi \). More general cases can be shown along similar lines, but the arguments become more involved.

Let us define

$$\begin{aligned} x(y)\in \arg \min _{x\in X} \ell (x)+\langle y, q-Ex\rangle . \end{aligned}$$

Let \((x^*, y^*)\) denote a primal and dual optimal solution pair. Let \(X^*\) and \(Y^*\) denote the primal and dual optimal solution set. The following properties will be useful in our subsequent analysis.

  1. (a)

    There exist positive scalars \(\sigma _g\), \(L_g\) such that \(\forall ~x(y),x(y')\in X\)

    1. a-1)

      \(\langle A^T \nabla g(A x(y'))-A^T \nabla g(A x(y)), x(y')-x(y)\rangle \ge \sigma _g\Vert A x(y')-A x(y)\Vert ^2\).

    2. a-2)

      \(g(A x(y'))-g(Ax(y))-\langle A^T \nabla g(A x(y)), x(y')-x(y)\rangle \ge \frac{\sigma _g}{2}\Vert A x(y')-Ax(y)\Vert ^2.\)

    3. a-3)

      \(\Vert A^T\nabla g(Ax(y'))-A^T\nabla g(A x(y))\Vert \le L_g\Vert Ax(y')-Ax(y)\Vert .\)

  2. (b)

    All a-1)–a-3) are true for \(p(\cdot )\) as well, with some constants \(\sigma _p\) and \(L_p\).

  3. (c)

    \(\nabla d(y)=q-Ex(y)\), and \(\Vert \nabla d(y')-\nabla d(y)\Vert \le \frac{1}{\rho }\Vert y'-y\Vert .\)

Part (a) is true due to the assumed Lipchitz continuity and strong convexity of the function \(g(\cdot )\). Part (b) is from the Lipchitz continuity and strong convexity of the quadratic penalization \(p(\cdot )\). Part (c) has been shown in Lemmas 2.1 and 2.2.

To proceed, let us rewrite the primal problem equivalently as

$$\begin{aligned} d(y)=\min _{(x,s): x\in X, h(x)\le s} \langle y, q-Ex\rangle + p(Ex)+g(Ax)+s. \end{aligned}$$
(6.2)

Let us write the polyhedral set \(\{(x,s): x\in X, h(x)\le s\}\) compactly as \(C_x x+ C_s s \ge c\) for some matrices \(C_x\in \mathbb {R}^{j\times n}\), \(C_s\in \mathbb {R}^{j\times 1}\) and \(c\in \mathbb {R}^{j\times 1}\), where \(j\ge 0\) is some integer. For any fixed y, let (x(y), s(y)) denote one optimal solution for (6.2), note we must have \(h(x(y))=s(y)\). Due to equivalence, if \(y^*\in Y^*\), we must also have \(x(y^*)\in X^*\).

Define a set-valued function \(\mathcal {M}\) that assigns the vector \((d, e)\in \mathbb {R}^{n}\times \mathbb {R}^{m}\) to the set of vectors \((x,s,y,\lambda )\in \mathbb {R}^n\times \mathbb {R}\times \mathbb {R}^m\times \mathbb {R}^j\) that satisfy the following system of equations

$$\begin{aligned}&E^Ty + C_x^T\lambda =d,\\&C^T_s\lambda =1,\\&q-Ex=e,\\&\lambda \ge 0, \ (C_x x+C_s s)\ge c, \ \langle C_x x+C_s s-c,\lambda \rangle =0. \end{aligned}$$

It is easy to verify by using the optimality condition for problem (6.2) that

$$\begin{aligned} (x,s,y,\lambda )\in \mathcal {M}(E^T\nabla p(Ex)+A^T\nabla g(Ax), e)~\hbox {for some} \ \lambda \nonumber \\ \quad \quad \quad \hbox {if and only if}\ x=x(y), \ e=\nabla d(y). \end{aligned}$$
(6.3)

We can take \(e=0\), and use the fact that \(x(y^*)\in X^*\), we see that \((x,s,y,\lambda )\in \mathcal {M}(E^T\nabla p(Ex)+A^T\nabla g(Ax), 0)\) if and only if \(x\in X^*\) and \(y\in Y^*\).

The following result states a well-known local upper Lipschitzian continuity property for the polyhedral multifunction \(\mathcal {M}\); see [26, 33, 34].

Proposition 6.1

There exists a positive scalar \(\theta \) that depends on \(A, E, C_x, C_s\) only, such that for each \(({\bar{d}}, {\bar{e}})\) there is a positive scalar \(\delta '\) satisfying

$$\begin{aligned} \mathcal {M}(d, e)\subseteq \mathcal {M}({\bar{d}}, {\bar{e}})+\theta \Vert (d, e)-({\bar{d}}, {\bar{e}})\Vert {\mathcal {B}}, \end{aligned}$$
(6.4)
$$\begin{aligned} \quad \quad \quad \hbox {whenever} \ \Vert (d, e)-({\bar{d}}, {\bar{e}})\Vert \le \delta '. \end{aligned}$$
(6.5)

where \({\mathcal {B}}\) denotes the unit Euclidean ball in \(\mathbb {R}^n\times \mathbb {R}^{m}\times \mathbb {R}\times \mathbb {R}^j\).

The following is the main result for this appendix. Note that the scalar \(\tau \) in the claim is independent the choice of y, x, s, and is independent on the coefficients of the linear term s.

Claim 6.1

Suppose all the assumptions in Assumption A are satisfied. Then there exist positive scalars \(\delta \), \(\tau \) such that \(\mathrm{dist}(y, Y^*)\le \tau \Vert \nabla d(y)\Vert \) for all \(y\in \mathcal U\) with \(\Vert \nabla d(y)\Vert \le \delta \).

Proof

By the previous claim, \(\mathcal {M}\) is locally Lipschitzian with modulus \(\theta \) at \((\nabla \ell (x^*), 0)=(E^T\nabla p(Ex^*)+A^T\nabla g(Ax^*), 0)\).

Let \(\delta \le \delta '/2\). We first show that if \(\Vert \nabla d(y)\Vert \le \delta \), then we must have \(\Vert \nabla \ell (x(y))-\nabla \ell (x^*)\Vert \le \delta '/2\). To this end, take a sequence \(y^1,y^2,\ldots \), such that \(e^r:=\nabla d(y^r)\rightarrow 0\). By Assumption (g) \(\{x(y^r)\}\) lies in a compact set. Due to the fact that \(s(y^r)=h(x(y^r))\), so the sequence \(\{s(y^r)\}\) also lies in a compact set (cf. Assumption s(e)). By passing to a subsequence if necessary, let \((x^{\infty },s^{\infty })\) be a cluster point of \(\{x(y^r), s(y^r)\}\). In light of the continuity of \(\nabla \ell (\cdot )\), we have \((\nabla \ell (x(y^r)), e^r)\rightarrow (\nabla \ell (x^{\infty }),0)\). Now for all r, \(\{(x(y^r), s(y^r), \nabla \ell (x(y^r)), e^r)\}\) lies in the set

$$\begin{aligned} \left\{ (x, s, d, e)\mid (x,s,y,\lambda )\in \mathcal {M}(d, e)~\hbox {for some}~ (y,\lambda )\right\} \end{aligned}$$

which is polyhedral and thus is closed. Then we can pass limit to it and conclude (cf. Proposition 6.1)

$$\begin{aligned} (x^\infty , s^{\infty }, y^{\infty }, \lambda ^{\infty })\in \mathcal {M}(\nabla \ell (x^{\infty }), 0) \end{aligned}$$

for some \((y^{\infty }, \lambda ^{\infty })\in \mathbb {R}^{m}\times \mathbb {R}^j\). Thus by (6.3) and the discussions that follow, we have \(x^\infty \in X^*\) and \(y^{\infty }\in Y^*\). By Lemma 2.1, we have \(\nabla \ell (x^*)=\nabla \ell (x^\infty )\), which further implies that \(\nabla \ell (x(y^r))\rightarrow \nabla \ell (x^*)\). This shows that the desired \(\delta \) exists.

Then we let \(e= \nabla d(y)\), and suppose \(\Vert e\Vert \le \delta \). From the previous argument we have

$$\begin{aligned} \Vert \nabla \ell (x(y))-\nabla \ell (x^*)\Vert +\Vert e\Vert \le \delta '/2+\delta '/2=\delta '. \end{aligned}$$

Using the results in Proposition 6.1, we have that there exists \((x^*,s^*, y^*, \lambda ^*)\in \mathcal {M}(\nabla \ell (x^*), 0)\) satisfying

$$\begin{aligned} \Vert (x(y), s, y, \lambda )-(x^*,s^*, y^*, \lambda ^*)\Vert&\le \theta \left( \Vert \nabla \ell (x^*)-\nabla \ell (x(y))\Vert +\Vert e\Vert \right) . \end{aligned}$$

Since \((x(y),s,y,\lambda )\in \mathcal {M}(\nabla \ell (x(y)), e)\), it follows from the definition of \(\mathcal {M}\) that

$$\begin{aligned}&E^Ty + C_x^T\lambda =\nabla \ell (x(y)), \end{aligned}$$
(6.6)
$$\begin{aligned}&C^T_s\lambda =1, \end{aligned}$$
(6.7)
$$\begin{aligned}&q-Ex(y)=e, \end{aligned}$$
(6.8)
$$\begin{aligned}&\lambda \ge 0, \ (C_x x(y)+C_s s(y))\ge c, \ \langle C_x x(y)+C_s s(y)-c,\lambda \rangle =0. \end{aligned}$$
(6.9)

Since \((x^*,s^*,y^*,\lambda ^*)\in \mathcal {M}(\nabla \ell (x^*), 0)\), we have from the definition of \(\mathcal {M}\)

$$\begin{aligned}&E^Ty^* + C_x^T\lambda ^* =\nabla \ell (x^*),\end{aligned}$$
(6.10)
$$\begin{aligned}&C^T_s\lambda ^*=1, \end{aligned}$$
(6.11)
$$\begin{aligned}&q-Ex^{*} =0, \end{aligned}$$
(6.12)
$$\begin{aligned}&\lambda ^*\ge 0, \ (C_x x^*+C_s s^*)\ge c, \ \langle C_x x^*+C_s s^*-c,\lambda ^*\rangle =0. \end{aligned}$$
(6.13)

Moreover, we have

$$\begin{aligned}&\sigma _g\Vert A(x(y)-x^*)\Vert ^2+\sigma _p\Vert E(x(y)-x^*)\Vert ^2\\&\qquad \le \langle A^T\nabla g(Ax(y))-A^T\nabla g(Ax(y^*)), x(y)-x(y^*)\rangle \\&\qquad \quad +\langle E^T\nabla p(Ex(y))-E^T\nabla p(Ex(y^*)), x(y)-x(y^*)\rangle \\&\qquad =\langle \nabla \ell (x(y))-\nabla \ell (x(y^*)), x(y)-x(y^*)\rangle \\&\qquad =\langle \lambda -\lambda ^*, C_x x(y)-C_x x^*\rangle +\langle y-y^*, E x(y)-E x^*\rangle \end{aligned}$$

where the first inequality comes from the strong convexity of \(g(\cdot )\) and \(p(\cdot )\); the last equality is from (6.6) and (6.10). Moreover, we have

$$\begin{aligned}&\langle \lambda -\lambda ^*, C_x x(y)-C_x x^*\rangle \nonumber \\&\qquad =\langle \lambda -\lambda ^*, C_x x(y)-C_x x^*\rangle +\langle \lambda -\lambda ^*, C_s s-C_s s^*\rangle \nonumber \\&\qquad =\langle \lambda -\lambda ^*, (C_x x(y)+C_s s)-(C_x x^*+C_s s^*)\rangle \nonumber \\&\qquad =-\langle \lambda ^*, C_x x(y)+C_s s-c\rangle -\langle \lambda , C_x x^*+C_s s^*-c\rangle \le 0 \end{aligned}$$
(6.14)

where in the first equality we have used the fact that \(C^T_s\lambda -C^T_s\lambda ^*=0\); see (6.7) (6.11); in the third equality and in the last inequality we have used the complementary conditions (6.13) and (6.9). As a result, we have

$$\begin{aligned}&\sigma _g\Vert A(x(y)-x^*)\Vert ^2+\sigma _p\Vert E(x(y)-x^*)\Vert ^2\nonumber \\&\le \langle y-y^*, (Ex(y)-q)-(Ex^*-q)\rangle \le \Vert y-y^*\Vert \Vert e\Vert , \end{aligned}$$
(6.15)

where the last step is due to \(\nabla d(y)=Ex(y)-q\) and \(\nabla d(y^*)=Ex^*-q=0\). Finally we have from Proposition 6.1

$$\begin{aligned}&\Vert (x(y), s, y, \lambda )-(x^*,s^*, y^*, \lambda ^*)\Vert ^2\\&\qquad \le \theta ^2\left( \Vert \nabla \ell (x^*)-\nabla \ell (x(y))\Vert +\Vert e\Vert \right) ^2\\&\qquad \le \theta ^2\left( 2\Vert \nabla \ell (x^*)-\nabla \ell (x(y))\Vert ^2+2\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\left( 2\Vert \nabla g(x^*)-\nabla g(x(y))\Vert ^2+2\Vert \nabla p(x^*)-\nabla p(x(y))\Vert ^2+\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\left( L^2_g\Vert A^T(x(y)-x^*)\Vert ^2+L^2_p\Vert E^T(x(y)-x^*)\Vert ^2+\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\max \left( \frac{2L^2_g}{\sigma _g}, \frac{2L^2_p}{\sigma _p},1\right) \left( \sigma _g\Vert A^T(x(y)-x^*)\Vert ^2 +\sigma _p\Vert E^T(x(y)-x^*)\Vert ^2+\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\max \left( \frac{2L^2_g}{\sigma _g}, \frac{2L^2_p}{\sigma _p},1\right) \left( \Vert e\Vert \Vert y-y^*\Vert +\Vert e\Vert ^2\right) \\&\qquad \le 2\theta ^2\max \left( \frac{2L^2_g}{\sigma _g}, \frac{2L^2_p}{\sigma _p},1\right) \left( \Vert e\Vert \Vert (x(y), s,y, \lambda )-(x^*, s^*, y^*, \lambda ^*)\Vert +\Vert e\Vert ^2\right) , \end{aligned}$$

where the second inequality is due to \(\nabla \ell (x)=\nabla g(x)+\nabla p(x)\) and the fourth step follows from properties a-3) and b).

We see that the above inequality is quadratic in \(\Vert (x(y), s, y, \lambda )-(x^*, s^*, y^*, \lambda ^*)\Vert /\) \(\Vert e\Vert \), so we have

$$\begin{aligned} \Vert (x(y),s, y, \lambda )-(x^*, s^*, y^*, \lambda ^*)\Vert /\Vert e\Vert \le \tau \end{aligned}$$

for some scalar \(\tau \) depending on \(\theta \), \(L_g\), \(L_p\), \(\sigma _g\), \(\sigma _p\). It is worth noting that \(\tau \) does not depend on the choice of the coefficients of the linear term s. We conclude \(\mathrm{dist}(y, Y^*)\le \tau \Vert \nabla d(y)\Vert \). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hong, M., Luo, ZQ. On the linear convergence of the alternating direction method of multipliers. Math. Program. 162, 165–199 (2017). https://doi.org/10.1007/s10107-016-1034-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-016-1034-2

Keywords

Mathematics Subject Classification

Navigation