nach oben

Journal of Inequalities and Applications

Erschienen in:

Open Access 01.12.2019 | Research

A fundamental proof of convergence of alternating direction method of multipliers for weakly convex optimization

verfasst von: Tao Zhang, Zhengwei Shen

Erschienen in: Journal of Inequalities and Applications | Ausgabe 1/2019

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

The convergence of the alternating direction method of multipliers (ADMMs) algorithm to convex/nonconvex combinational optimization has been well established in the literature. Due to the extensive applications of a weakly convex function in signal processing and machine learning, in this paper, we investigate the convergence of an ADMM algorithm to the strongly and weakly convex combinational optimization (SWCCO) problem. Specifically, we firstly show the convergence of the iterative sequences of the SWCCO-ADMM under a mild regularity condition; then we establish the $o(1/k)$ sublinear convergence rate of the SWCCO-ADMM algorithm using the same conditions and the linear convergence rate by imposing the gradient Lipschitz continuity condition on the objective function. The techniques used for the convergence analysis in this paper are fundamental, and we accomplish the global convergence without using the Kurdyka–Łojasiewicz (KL) inequality, which is common but complex in the proof of nonconvex ADMM.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

1.1 Motivation and problem

The alternating direction method of multiplier (ADMM) algorithm, as one of the splitting/decoupling techniques, has been successfully exploited in a wide range of structured sparsity regularization optimization problems in machine learning and inverse problems, such as signal restoration, matrix/tensor competition (or factorization), phase retrieval, compressive sensing, regression analysis or statistical inference. In these applications, one of the most important issues is how to induce a reasonable sparse solution and how to eliminate the disturbance (e.g. noise) while preserving the important features of the data. Nonconvex penalty or regularization imposed on a certain prior being incorporated into the optimization model can efficiently improve these problems. However, in practice, it is hard to obtain a closed-form solution for a generic nonconvex penalty optimization model except the weakly convex (a.k.a. semiconvex) penalty of which including a large class of functions, such as difference-of-convex (DC) functions. In addition, the convergence analysis of ADMM based on the weakly convex also has not been addressed in the literature.

Motivated by these observations, in this paper, we consider the following widely used strongly and weakly convex combinational optimization (SWCCO) problem:

$$ \min_{x} F(x)=f(x)+g(Mx), $$

(1)

where $f(\cdot )$ is proper closed strongly convex and $g(\cdot )$ is proper closed weakly convex; M is a linear transformation, such as a wavelet/framelet. This problem is equivalent to the following constrained optimization problem:

$$ \min_{x,y} f(x)+g(y)\quad \textit{s.t.} \quad Mx=y. $$

(2)

The corresponding SWCCO-ADMM algorithm, firstly proposed in [1, 2], for (2) is

$$ \textstyle\begin{cases} x^{k+1}=\arg \min_{x} L_{\rho }(x,y^{k},p^{k}),\\ y^{k+1}= \arg \min_{y}L_{\rho }(x^{k+1},y,p^{k}),\\ p^{k+1}=p^{k}+ \rho (Mx^{k+1}-y^{k+1}), \end{cases} $$

(3)

where $L_{\rho }(x,y,p)=f(x)+g(y)+\langle p,Mx-y\rangle + \frac{\rho }{2} \Vert Mx-y \Vert ^{2}$ is the augmented Lagrangian function (ALF) with ρ being the penalty parameter and $p\in {R}^{n}$ being the Lagrangian multiplier. The first-order optimality conditions for (3) are

$$ \textstyle\begin{cases} -\rho M^{T}(y^{k+1}-y^{k}) \in \partial f(x^{k+1})+ M^{T}p^{k+1},\\ 0\in \partial g(y^{k+1})- p^{k+1},\\ \rho (Mx^{k+1}-y^{k+1})=p^{k+1}-p ^{k}, \end{cases} $$

(4)

where $M^{T}(y^{k+1}-y^{k})$ and $Mx^{k+1}-y^{k+1}$ (or $p^{k+1}-p ^{k}$) are dual and primal residual, respectively.

The main tasks of this manuscript are to study the convergence and sublinear (or linear) convergence rate for the SWCCO-ADMM algorithm (3) under quite general assumptions.

In order to avoid solving a nonconvex optimization problem directly, intuitively, we can separate a part from the strongly convex term of the weakly convex term to make the optimization problem (1) a convex–convex combination as follows:

$$\begin{aligned}& \min_{x} \underbrace{ \biggl(f(x)-\frac{\rho _{2}}{2} \Vert Mx \Vert ^{2} \biggr)}_{f_{1}(x)}+\underbrace{ \biggl(g(Mx)+ \frac{\rho _{2}}{2} \Vert Mx \Vert ^{2} \biggr)}_{g_{1}(Mx)}; \\ & \min_{x} \underbrace{ \biggl(f(x)-\frac{\rho _{1}}{2} \Vert x \Vert ^{2} \biggr)}_{f_{2}(x)}+\underbrace{ \biggl(g(Mx)+ \frac{\rho _{1}}{2} \Vert x \Vert ^{2} \biggr)}_{g_{2}(\hat{M}x)}; \\ & \min_{x} \underbrace{ \biggl(f(x)-\frac{\rho _{2}}{2} \Vert x \Vert ^{2} \biggr)}_{f_{3}(x)}+\underbrace{ \biggl(g(Mx)+ \frac{\rho _{2}}{2} \Vert x \Vert ^{2} \biggr)}_{g_{3}(\hat{M}x)}; \\ & \min_{x} \underbrace{ \biggl(f(x)-\frac{\rho _{1}}{2} \Vert Mx \Vert ^{2} \biggr)}_{f_{4}(x)}+\underbrace{ \biggl(g(Mx)+ \frac{\rho _{1}}{2} \Vert Mx \Vert ^{2} \biggr)}_{g_{4}(Mx)}. \end{aligned}$$

Here $\rho _{1}$ and $\rho _{2}$ are the strongly convex and weakly convex modulus, respectively; see details in Definition 2.1. However, such a partition and combination is infeasible in most cases, since these two terms in (1) usually play different roles in signal processing, thus such a simple separation will completely change the sense of the original model; moreover, the linear operator involved in these terms will make such a separation a hard situation; and even if it can be separated, the recombined model will become harder to solve as shown in [3, 4]. Also, we cannot ensure the new proximal operators to be easily computed.

The convergence of the forward–backward splitting (FBS) and the Douglas–Rachford splitting (DRS) method corresponding to a SWCCO problem with M being an identity map has been established in [3, 5, 6] very recently. Even though using DRS for the dual problem is equivalent to using ADMM for the primal problem as shown in [7, 8] in the context of convex cases, this is not the case for a nonconvex optimization problem, since the conjugate function of a nonconvex function has not been well defined yet in the literature. Therefore, it is necessary to establish the convergence of the SWCCO-ADMM algorithm.

In the convex case, convergence for ADMM has been well studied in [9‐12]. In the nonconvex case, however, it has been merely investigated limited to certain specific assumptions. This is because the iterative sequences $(x^{k}, y^{k}, p^{k})$ generated by ADMM for a nonconvex optimization problem does not satisfy Féjer monotonicity; moreover, the sufficient decrease condition [13] used to measure the quality of descent methods to nonconvex objective function $F(x)$ is also no longer valid. In order to prove convergence to a nonconvex ADMM algorithm, some unverifiable or unreasonable assumptions have to be imposed for the objective functions or for iterative sequences $(x^{k}, y^{k}, p^{k})$. In addition, the global convergence and convergence rates of a nonconvex optimization problem usually require $F(x)$ to satisfy the Kurdyka–Łojasiewicz (KL) property at each point, where the KL exponent (i.e. the geometrical properties of the objective function around its stationary point) is not easily to determine [13].

In fact, the key steps to establish convergence of the nonconvex ADMM algorithm is to prove the dual residual $M^{T}(y^{k+1}-y^{k})$ and the primal residual $(Mx^{k+1}-y^{k+1})$ (or $(p^{k+1}-p^{k})$) going to zero as $k\to \infty $. In the end, one common method developed in very recently papers [13‐17] is to exploit the monotonically nonincreasing of certain type of Lyapunov function (e.g. the ALF [14‐16] or its variant [17]) to measure the iterative error. Furthermore, the global convergence of a nonconvex optimization problem can be obtained by using the KL inequality. However, we have to emphasize that using the ALF or its variant as the Lyapunov function leaves trouble handling the nonsmooth objective functions, since the Lyapunov function may no longer be monotonically decreasing in this case. Thus, in [14‐17], at least one or more of the objective functions are required to be gradient Lipschitz continuous to guarantee the algorithm convergence, which will greatly limit the applications of the model (1).

On the other hand, the surjective (i.e. full row rank) assumption for the linear mapping M, which is a necessary condition to prove the monotonically nonincreasing property of ALF in [14‐17], however, will exclude many interesting applications. In fact, many well-known developed sparse representation operators M do not satisfy the full row rank condition, for example, the framelet or the learned sparse representation redundant dictionary from the given data. Moreover, if we assume $f(x)$ to be gradient Lipschitz continuous and M to be full row rank, then problem (2) actually can be solved by using block coordinate descent methods (BCDs) [18] and does not need an ADMM algorithm. Specifically, suppose M to be of full row rank, then there exists a nonsingular matrix $M_{1}$ that can split the constraint $Mx=y$ into the following formulation: $M_{1}x_{1}+M _{2}x_{2}=y$. That is, $x_{1}=M_{1}^{-1}(y-M_{2}x_{2})$. In this case, the constrained optimization problem (2) can be reformatted as the following unconstrained optimization problem:

$$ \min_{x_{1}, x_{2},y}f(x_{1},x_{2}) + g(y) = \underbrace{ f \bigl(M_{1}^{-1}(y-M _{2}x_{2}),x_{2} \bigr)}_{h(x_{2},y)}+g(y). $$

(5)

Then we have

$$ \min_{x_{2},y} h(x_{2},y)+\langle 0,x_{2} \rangle +g(y), $$

(6)

which can be solved by BCD. The regularity conditions used in a convergence analysis for nonconvex ADMM are summarized in Table 1.

Table 1

Regularity conditions to convergence analysis of ADMM for nonconvex optimization problem

	M	f, g	KL property	Lyapunov function
Hong [14]	M = I	f and g gradient Lipschitz continuous	no	ALF
Li [15]	full row rank	bounded Hessian of f	yes	ALF
Wang [16]	full row rank	f gradient Lipschitz continuous g strongly convex	yes	variants of ALF
Wang [17]	weak full column rank condition	f gradient Lipschitz continuous g has special structure	yes	ALF
Ours	full column rank	f strongly convex g weakly convex	no	$H^{k}$ (shown in Sect. 3)

Recently, a similar work [4, 19] to this manuscript has investigated the use of the primal–dual hybrid gradient (PDHG) method to solve (1), and also has established convergence. Although the PDHG (a.k.a, the relaxed alternating minimization algorithm (AMA) [20]) is apparently similar to the ADMM, they are quite different. Actually, the PDHG has a deep relationship with the inexact Uzawa method [19, 20].

1.3 Contributions

In this paper, without requiring gradient Lipschitz continuity of any of the objective functions, we establish the convergence of the SWCCO-ADMM algorithm (3); we merely require M to be full column rank. We also establish the sublinear convergence rate with the same assumptions as in convergence analysis, and we establish the linear convergence rate of the SWCCO-ADMM algorithm, which needs some additional regularity assumptions. In addition, we use the nonnegative Lyapunov function defined in [9] to measure the iterative error instead of using ALF, and the global convergence result can be obtained without using the KL inequality. Thus, our proof is more fundamental than the work in [14‐17].

We summarize the contributions of this manuscript as follows.

We prove that the iterative sequences $\{(x^{k},y^{k},p^{k})\}$ of SWCCO-ADMM (3) globally converge to the critical point of (2) under the conditions that the strongly convex terms dominate the weakly convex terms and the penalty parameter ρ is at least larger than at least twice the weakly convex modulus. We also give an example that the SWCCO-ADMM (3) will diverge if the latter condition is not satisfied. Meanwhile, we prove that the iterative sequence $\{x^{k}\}$ generated by ADMM (3) converges to an optimal solution of (1).
We show a sublinear convergence rate $o(1/k)$ of the SWCCO-ADMM algorithm using the same regularity conditions as the convergence analysis. To the best of our knowledge, this is the first sublinear convergence rate result for a nonconvex ADMM algorithm.
Furthermore, we establish the linear convergence rate for the SWCCO-ADMM algorithm by imposing the gradient Lipschitz continuity on one of the objective functions.

The rest of this paper is organized as follows. In Sect. 2, we list some fundamental definitions which are useful for the following analysis. Then the convergence of ADMM for the case SWCCO is established in Sect. 3. Next, we address the sublinear and linear convergence rate in Sect. 4. We also give a numerical example in Sect. 5. At last, some conclusions are given in Sect. 6.

2 Preliminaries

To start, we list some fundamental definitions and notations for the following analysis.

We denote by $\langle \cdot ,\cdot \rangle $ and $\Vert \cdot \Vert =\sqrt{ \langle \cdot ,\cdot \rangle }$ the inner product and a norm on finite-dimensional real vectors spaces X and Y, respectively. For a given function $f: {R}^{n}\rightarrow {R}\cup \{\infty \}$, we denote $\operatorname{dom} f= \{x:f(x)<+\infty \}$ it being nonempty. The largest and smallest eigenvalue of the linear operator M are defined by $\lambda _{\max }(M)$ and $\lambda _{\min }(M)$, respectively.

Definition 2.1

The function $f: {R}^{n}\rightarrow {R}\cup \{\infty \}$ is said to be strongly convex with modulus $\rho _{1}>0$ if $f(x)-\frac{\rho _{1}}{2} \Vert x \Vert ^{2}$ is convex; the function $g: {R}^{n}\rightarrow {R}\cup \{ \infty \}$ is said to be weakly convex with modulus $\rho _{2}>0$, if $g(y)+\frac{\rho _{2}}{2} \Vert y \Vert ^{2}$ is convex.

The strongly and weakly convex functions have the following properties [21]. Let $f: {R}^{n}\rightarrow {R}\cup \{ \infty \}$ be a strongly convex function with modulus $\rho _{1}$. Then, for $u_{1}\in \partial f(x_{1})$, $u_{2}\in \partial f(x_{2})$, we have

$$ \langle u _{1}-u_{2},x_{1}-x_{2} \rangle \geq \rho _{1} \Vert x_{1}-x_{2} \Vert ^{2}. $$

(7)

For a weakly convex function $g: {R}^{n}\rightarrow {R}\cup \{\infty \}$ with modulus $\rho _{2}$ and $v_{1}\in \partial g(y_{1})$, $v_{2}\in \partial g(y_{2})$, we have

$$ \langle v_{1}-v_{2},y_{1}-y_{2} \rangle \geq -\rho _{2} \Vert y_{1}-y_{2} \Vert ^{2}. $$

(8)

Definition 2.2

([22])

Let $f: {R}^{n}\rightarrow {R} \cup \{\infty \}$ be a proper lower semi-continuous function being finite at $\bar{x} \in {R}^{n}$.

(i)

The Fréchet subdifferential of f at x̄, written $\hat{\partial }f(\bar{x})$, is the set

$$ \biggl\{ u \in {R}^{n}:\lim_{x\neq \bar{x}}\inf _{x\rightarrow \bar{x}}\frac{f(x)-f(\bar{x})-\langle u,x- \bar{x} \rangle }{ \Vert x-\bar{x} \Vert }\geq 0 \biggr\} . $$

(ii)

The limiting subdifferential of f at x̄, written $\partial f(\bar{x})$, is the set

$$ \bigl\{ u\in {R}^{n}:\exists x^{k}\rightarrow \bar{x}, f \bigl(x^{k} \bigr)\rightarrow f(\bar{x}),u^{k} \in \hat{ \partial }f \bigl(x^{k} \bigr),u^{k}\rightarrow u \bigr\} . $$

From this definition, let $f: {R}^{n}\rightarrow {R}\cup \{\infty \}$ be a proper lower semi-continuous function; we can get the following assertions.

(i)

The subdifferential of f at x̄ is the set

$$ \bigl\{ u\in {R}^{n}:\forall x\in {R}^{n},\langle x-\bar{x},u \rangle +f( \bar{x})\leq f(x) \bigr\} . $$

(ii)

The limiting subdifferential of f is closed, i.e. if $y^{k}\rightarrow \bar{y}$, $v^{k}\rightarrow \bar{v}$, $g(y^{k})\rightarrow g(\bar{y})$ and $v^{k}\in \partial g(y^{k})$, then $\bar{v}\in \partial g(\bar{y})$.

(iii)

Suppose $\operatorname{dom}(f)\cap \operatorname{dom}(g\circ M)$ is not empty, then

$$ \partial \bigl(f(x)+g(Mx) \bigr)=\partial f(x)+M^{T}(\partial g) (Mx). $$

Definition 2.3

Let a nonempty set S be the set of critical point of augmented Lagrangian function $L_{\rho }$. $(x^{*},y^{*},z^{*})\in S$ is a critical point if

$$ -M^{T}p^{*}\in \partial f \bigl(x^{*} \bigr),\quad p^{*}\in \partial g \bigl(y^{*} \bigr) \quad \textit{and} \quad M^{T}x^{*}-y^{*}=0. $$

In order to find linear convergence, we need the following gradient Lipschitz continuous definition.

Definition 2.4

Let f be a differentiable function, then the gradient ∇f is called Lipschitz continuous with modulus $L>0$ if

$$ \bigl\Vert \nabla f(x_{1})-\nabla f(x_{2}) \bigr\Vert \leq L \Vert x_{1}-x_{2} \Vert , \quad \forall x_{1}, x_{2} \in \operatorname{dom} {f}. $$

We will often use the following relations, for all vectors $u, v, w \in {R}^{n}$:

$$\begin{aligned}& 2\langle u-v, w-u\rangle = \Vert v-w \Vert ^{2}- \Vert u-v \Vert ^{2}- \Vert u-w \Vert ^{2}; \end{aligned}$$

(9)

$$\begin{aligned}& \Vert u+v \Vert ^{2}\le \biggl(1+\frac{1}{\gamma } \biggr) \Vert u \Vert ^{2}+(1+\gamma ) \Vert v \Vert ^{2}, \quad \forall \gamma >0. \end{aligned}$$

(10)

3 Convergence analysis

In this section we will study the convergence of the SWCCO-ADMM algorithm (3) under the following mild assumptions.

Assumption 3.1

(i)

Let $f(x)$ be strongly convex with modulus $\rho _{1}>0$ and $g(y)$ be weakly convex with modulus $\rho _{2}>0$, respectively; and the set of augmented Lagrangian function critical point S is nonempty. For $\forall \mu >0$, we need

$$ \rho _{1}-\rho _{2} \Vert M \Vert ^{2}-\mu \geq 0. $$

(11)

(ii)

We also suppose that the penalty parameter ρ in the augmented Lagrangian function satisfies

$$ \rho >2\rho _{2}+\frac{8\rho ^{2}_{2} \Vert MM^{T} \Vert }{\mu }. $$

(12)

Assumption 3.2

$\operatorname{dom}(f)\cap \operatorname{dom}(g\circ M)$ is not empty.

Remark 1

From (ii) of Assumption 3.1, we will see that ρ is larger than $2\rho _{2}$. This assumption not only ensures the unique solution of the second subproblem in ADMM algorithm (3) as shown later, but also is used to ensure that the Lyapunou function $H^{k}$ defined in Lemma 3.1 has a sufficient descent property. We also give an example that the iterative sequences of SWCCO-ADMM algorithm (3) will diverge if this assumption is not satisfied in the end of this section.

For the sake of convenience, let $(x^{*},y^{*},p^{*})$ be one of the critical points of the augmented function; $z^{k+1}=-M^{T}p^{k}- \rho M^{T}(Mx^{k+1}-y^{k})$; $x^{k+1}_{e}=x^{k+1}-x^{*}$; $y^{k+1} _{e}=y^{k+1}-y^{*}$; $p^{k+1}_{e}=p^{k+1}-p^{*}$ and $z^{k+1}_{e}=z ^{k+1}-(-M^{T}p^{*})$ in the following proof. Then by the strong convexity property (7) of $f(x)$ and the weak convexity property (8) of $g(y)$ we have

$$ \bigl\langle z^{k+1}_{e},x^{k+1}_{e} \bigr\rangle \geq \rho _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2} $$

(13)

and

$$ \bigl\langle p^{k+1}_{e},y^{k+1}_{e} \bigr\rangle \geq -\rho _{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2}. $$

(14)

By Assumption 3.1, we will obtain the following monotonically nonincreasing property of the nonnegative Lyapunou function $H^{k}=\frac{\rho }{2} \Vert y^{k}_{e} \Vert ^{2}+\frac{1}{2\rho } \Vert p^{k}_{e} \Vert ^{2}$, which will play an important role in our convergence analysis. Note that $x_{e}^{k}$ is not considered in $H^{k}$, since the primal variable x can be regarded as an intermediate variable in the iterations of SWCCO-ADMM, while y and p are the essential variables [9].

Lemma 3.1

Let $H^{k}=\frac{\rho }{2} \Vert y^{k}_{e} \Vert ^{2}+\frac{1}{2\rho } \Vert p^{k} _{e} \Vert ^{2}$. Then the iterative sequences $\{(x^{k},y^{k},p^{k})\}$ generated by SWCCO-ADMM algorithm (3) satisfy

$$ H^{k}-H^{k+1}\geq \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y ^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} , $$

(15)

where $\sigma _{1}=\rho _{1}-\rho _{2} \Vert M \Vert ^{2}-\mu \geq 0$, $\sigma _{2}=\frac{ \rho }{2}-\rho _{2}>0$ and $\sigma _{3}=\frac{1}{2\rho }-\frac{\rho _{2}}{ \rho ^{2}}-\frac{4\rho ^{2}_{2} \Vert MM^{T} \Vert }{\rho ^{2}\mu }>0$.

Proof

By the optimality condition of (4), we have

$$ \textstyle\begin{cases} -\rho M^{T}(y^{k+1}_{e}-y^{k}_{e})- M^{T}p^{k+1}_{e} \in \partial f(x ^{k+1})-\partial f(x^{*});\\ p^{k+1}_{e}\in \partial g(y^{k+1})- \partial g(x^{*}). \end{cases} $$

(16)

Then, by (13) and (14), it follows that

$$\begin{aligned}& \bigl\langle -M^{T}p^{k+1}_{e}-\rho M^{T} \bigl(y^{k+1}_{e}-y^{k}_{e} \bigr),x^{k+1} _{e} \bigr\rangle \geq \rho _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}; \\& \bigl\langle p^{k+1}_{e},y^{k+1}_{e} \bigr\rangle \geq -\rho _{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2}. \end{aligned}$$

Adding these two inequalities, then

$$\begin{aligned}& \bigl\langle -M^{T}p^{k+1}_{e}-\rho M^{T} \bigl(y^{k+1}_{e}-y^{k}_{e} \bigr),x^{k+1} _{e} \bigr\rangle + \bigl\langle p^{k+1}_{e},y^{k+1}_{e} \bigr\rangle \geq \rho _{1} \bigl\Vert {x}^{k+1}_{e} \bigr\Vert ^{2}- \rho _{2} \bigl\Vert {y}^{k+1}_{e} \bigr\Vert ^{2}. \end{aligned}$$

(17)

By rearranging the left side of inequality (17), we have

$$\begin{aligned}& \bigl\langle -M^{T}p^{k+1}_{e}-\rho M^{T} \bigl(y^{k+1}_{e}-y^{k}_{e} \bigr),x^{k+1} _{e} \bigr\rangle + \bigl\langle p^{k+1}_{e},y^{k+1}_{e} \bigr\rangle \\& \quad =- \bigl\langle p^{k+1}_{e},Mx^{k+1}_{e} \bigr\rangle -\rho \bigl\langle y^{k+1}_{e}-y ^{k}_{e},Mx^{k+1}_{e} \bigr\rangle + \bigl\langle p^{k+1}_{e},y^{k+1}_{e} \bigr\rangle \\& \quad = -\frac{1}{\rho } \bigl\langle p^{k+1}_{e},p^{k+1}_{e}-p^{k}_{e} \bigr\rangle - \rho \bigl\langle y^{k+1}_{e},Mx^{k+1}_{e} \bigr\rangle +\rho \bigl\langle y^{k}_{e},Mx ^{k+1}_{e} \bigr\rangle \\& \quad =-\frac{1}{2\rho } \bigl( \bigl\Vert {p}^{k+1}_{e} \bigr\Vert ^{2}+ \bigl\Vert \bar{p}^{k+1}_{e}-{p}^{k} _{e} \bigr\Vert ^{2}- \bigl\Vert {p}^{k}_{e} \bigr\Vert ^{2} \bigr) \\& \quad \quad{}-\frac{\rho }{2} \bigl( \bigl\Vert {y}^{k+1}_{e} \bigr\Vert ^{2}+ \bigl\Vert M{x}^{k+1}_{e} \bigr\Vert ^{2}- \bigl\Vert M{x}^{k+1}_{e}-{y}^{k+1}_{e} \bigr\Vert ^{2} \bigr) \\& \quad \quad{} +\frac{\rho }{2} \bigl( \bigl\Vert {y}^{k}_{e} \bigr\Vert ^{2}+ \bigl\Vert M{x}^{k+1}_{e} \bigr\Vert ^{2}- \bigl\Vert M {x}^{k+1}_{e}-{y}^{k}_{e} \bigr\Vert ^{2} \bigr) \\& \quad =\frac{1}{2\rho } \bigl( \bigl\Vert {p}^{k}_{e} \bigr\Vert ^{2}- \bigl\Vert {p}^{k+1}_{e} \bigr\Vert ^{2} \bigr)+ \frac{ \rho }{2} \bigl( \bigl\Vert {y}^{k}_{e} \bigr\Vert ^{2}- \bigl\Vert {y}^{k+1}_{e} \bigr\Vert ^{2} \bigr)- \frac{\rho }{2} \bigl\Vert M{x}^{k+1}_{e}-{y}^{k}_{e} \bigr\Vert ^{2} , \end{aligned}$$

(18)

where the third equation follows from the cosine rule (9). Then, combining (17) and (18), we have

$$ H^{k}-H^{k+1}\geq \rho _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}-\rho _{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2}+ \frac{\rho }{2} \bigl\Vert Mx^{k+1}_{e}-y^{k}_{e} \bigr\Vert ^{2}. $$

(19)

Notice that (19) does not imply the nonincreasing property of $H^{k}$ since we have the second negative term $-\rho _{2} \Vert y^{k+1} _{e} \Vert ^{2}$ on the right side. Next, we will deal with this term using strong and weak convexity of the functions $f(x)$ and $g(y)$, respectively.

Firstly, the third term $\frac{\rho }{2} \Vert Mx^{k+1}_{e}-y^{k}_{e} \Vert ^{2}$ on the right side of (19) can be rewritten as follows:

$$\begin{aligned} \begin{aligned}[b] \frac{\rho }{2} \bigl\Vert Mx^{k+1}_{e}-y^{k}_{e} \bigr\Vert ^{2} &=\frac{\rho }{2} \bigl\Vert Mx ^{k+1}_{e}-y^{k+1}_{e}+y^{k+1}_{e}-y^{k}_{e} \bigr\Vert ^{2} \\ &= \frac{\rho }{2} \bigl\Vert Mx^{k+1}_{e}-y^{k+1}_{e} \bigr\Vert ^{2}+\frac{\rho }{2} \bigl\Vert y ^{k+1}_{e}-y^{k}_{e} \bigr\Vert ^{2}+\rho \bigl\langle Mx^{k+1}_{e}-y^{k+1}_{e},y ^{k+1}_{e}-y^{k}_{e} \bigr\rangle \\ &=\frac{1}{2\rho } \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}+ \frac{\rho }{2} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+ \bigl\langle p^{k+1}-p^{k},y^{k+1}-p ^{k} \bigr\rangle \\ &\geq \frac{1}{2\rho } \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}+ \biggl(\frac{ \rho }{2}-\rho _{2} \biggr) \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}, \end{aligned} \end{aligned}$$

(20)

where the inequality follows from the weakly convex property (8). Secondly, from the last equation of (3), it follows that

$$\begin{aligned} \rho _{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2} &= \frac{\rho _{2}}{\rho ^{2}} \bigl\Vert p^{k+1}-p^{k}- \rho Mx^{k+1}_{e} \bigr\Vert ^{2} \\ &=\frac{\rho _{2}}{\rho ^{2}} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}+ \rho _{2} \bigl\Vert Mx^{k+1} _{e} \bigr\Vert ^{2}- \frac{2\rho _{2}}{\rho } \bigl\langle p^{k+1}-p^{k},Mx^{k+1}_{e} \bigr\rangle . \end{aligned}$$

(21)

Substituting (20), (21) into (19) and rearranging the terms, we have

$$\begin{aligned} H^{k}-H^{k+1}\geq{} &\rho _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}-\rho _{2} \bigl\Vert Mx^{k+1}_{e} \bigr\Vert ^{2} \\ &{}+ \biggl( \frac{1}{2\rho }- \frac{\rho _{2}}{\rho ^{2}} \biggr) \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}+ \biggl(\frac{\rho }{2}-\rho _{2} \biggr) \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2} \\ &{}+\frac{2\rho _{2}}{\rho } \bigl\langle p^{k+1}-p^{k},Mx^{k+1}_{e} \bigr\rangle \\ \geq {}&\rho _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}-\rho _{2} \bigl\Vert Mx^{k+1}_{e} \bigr\Vert ^{2}-2\rho _{2}\zeta \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2} \\ &{}+ \biggl( \frac{1}{2\rho }-\frac{\rho _{2}}{\rho ^{2}}- \frac{2\rho _{2} \Vert MM^{T} \Vert }{\rho ^{2}\zeta } \biggr) \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \\ &{}+ \biggl(\frac{\rho }{2}-\rho _{2} \biggr) \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2} \\ \geq {}& \bigl(\rho _{1}-\rho _{2} \Vert M \Vert ^{2}-2\rho _{2}\zeta \bigr) \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+ \biggl(\frac{1}{2 \rho }-\frac{\rho _{2}}{\rho ^{2}}- \frac{2\rho _{2} \Vert MM^{T} \Vert }{\rho ^{2} \zeta } \biggr) \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \\ &{}+ \biggl(\frac{\rho }{2}-\rho _{2} \biggr) \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}, \end{aligned}$$

(22)

where the second inequality follows from the Cauchy–Schwarz inequality, i.e. $\frac{2\rho _{2}}{\rho }\langle p^{k+1}-p^{k},Mx^{k+1}_{e}\rangle \geq -2\rho _{2}\zeta \Vert x^{k+1}_{e} \Vert ^{2}-\frac{2\rho _{2} \Vert MM^{T} \Vert }{ \rho ^{2}\zeta } \Vert p^{k+1}-p^{k} \Vert ^{2}$, $\forall \zeta >0$. By the arbitrariness of ζ, let $\mu =2\rho _{2}\zeta $. It follows from (22) that

$$\begin{aligned} H^{k}-H^{k+1}&\geq \bigl(\rho _{1}-\rho _{2} \Vert M \Vert ^{2}-\mu \bigr) \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+ \biggl( \frac{1}{2\rho }-\frac{\rho _{2}}{\rho ^{2}}- \frac{4\rho ^{2}_{2} \Vert MM^{T} \Vert }{\rho ^{2}\mu } \biggr) \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \\&\quad{} + \biggl( \frac{\rho }{2}-\rho _{2} \biggr) \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}. \end{aligned}$$

By Assumption 3.1, we have $\sigma _{1}=\rho _{1}-\rho _{2} \Vert M \Vert ^{2}-\mu \geq 0$, $\sigma _{2}=\frac{\rho }{2}-\rho _{2}>0$ and $\sigma _{3}=\frac{1}{2\rho }-\frac{\rho _{2}}{\rho ^{2}}-\frac{4\rho ^{2}_{2} \Vert MM^{T} \Vert }{\rho ^{2}\mu }>0$. Therefore we complete the proof. □

Now, we are ready to prove the main convergence result of the SWCCO-ADMM algorithm (3) under Assumption 3.1.

Theorem 3.2

Under Assumption 3.1 suppose the iterative sequence $\{x^{k}\}$ of SWCCO-ADMM (3) is bounded. Let $\{(x^{*},y^{*},p^{*})\}$ be a critical point in S. Then:

(i)

The iterative sequences $\{(y^{k},p^{k})\}$ generated by SWCCO-ADMM (3) converge to $\{(y^{*},p^{*})\}$ and $Mx^{k}\rightarrow Mx^{*}$, as $k\to \infty $.

(ii)

Under Assumption 3.2 if one of the following conditions holds, i.e. M is full column rank or $\sigma _{1}=\rho _{1}-\rho _{2} \Vert M \Vert ^{2}-\mu >0$, then $\{x^{k}\}$ converges to a optimal solution of SWCCO problem (1).

Proof

(i) Adding both sides of (15) in Lemma (3.1), from $k=0$ to $k=\infty $, it follows that

$$ \sigma _{2}\sum_{k=0}^{\infty } \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+\sigma _{3}\sum_{k=0}^{\infty } \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}\leq H^{0}-H^{\infty }< \infty . $$

Thus,

$$ \bigl\Vert p^{k+1}-p^{k} \bigr\Vert \rightarrow 0 \quad \textit{and} \quad \bigl\Vert y^{k+1}-y^{k} \bigr\Vert \rightarrow 0 \quad \textit{as } k \to \infty . $$

(23)

Since $\{(x^{*},y^{*},p^{*})\}$ is a critical point and $H^{k}\leq H ^{0}$ follows from Lemma (3.1), it follows that $y^{k}$ and $p^{k}$ are bounded. Next, from the boundedness of $\{(x^{k},y^{k},p ^{k})\}$ and supposing that $\{(\bar{x},\bar{y},\bar{p})\}$ is an accumulation point of $\{(x^{k},y^{k},p^{k})\}$; there exists a subsequence $\{(x^{k_{j}},y^{k_{j}},p^{k_{j}})\}$ that converges to the $\{(\bar{x},\bar{y},\bar{p})\}$, i.e.,

$$ \lim_{j\rightarrow \infty } \bigl(x^{k_{j}},y^{k_{j}},p^{k_{j}} \bigr)=( \bar{x},\bar{y},\bar{p}). $$

Also, from (23), we have $p^{k_{j}+1}-p^{k_{j}}=Mx^{k_{j}+1}-y ^{k_{j}+1}\rightarrow 0$; after passing to limits, we obtain

$$ M\bar{x}-\bar{y}=0. $$

(24)

Now, we will show that $\{(\bar{x},\bar{y},\bar{p})\}$ is a critical point in the following. By taking the limit on the first equation of the optimality conditions (4) along with $\{(x^{k_{j}},y^{k_{j}},p ^{k_{j}})\}$, and from the closeness of ∂f, we obtain

$$ -M^{T}\bar{p}\in \partial f(\bar{x}). $$

(25)

Since $y^{k_{j}+1}$ is a minimizer of $L_{\rho }(x^{k_{j}},y,p^{k_{j}})$, we have

$$ L_{\rho } \bigl(x^{k_{j}},y^{{k_{j}+1}},p^{k_{j}} \bigr) \leq L_{\rho } \bigl(x^{k_{j}}, \bar{y},p^{k_{j}} \bigr). $$

Taking the limit of the above inequality, we get

$$ \limsup_{j\rightarrow \infty } L_{\rho } \bigl(x^{k_{j}},y^{{k_{j}+1}},p ^{k_{j}} \bigr)\leq L_{\rho }(\bar{x},\bar{y},\bar{p}). $$

Next, from the lower semicontinuity of $L_{\rho }$, we also have

$$ \liminf_{j\rightarrow \infty } L_{\rho } \bigl(x^{k_{j}},y^{{k_{j}+1}},p ^{k_{j}} \bigr)\geq L_{\rho }(\bar{x},\bar{y},\bar{p}). $$

Combining the above two inequalities, we conclude $g(y^{k_{j}+1})=g( \bar{y})$, and from assertion (ii) of definition (2.2)

$$ \bar{p}\in \partial g(\bar{x}). $$

(26)

This together with (24) and (25), shows $\{(\bar{x}, \bar{y},\bar{p})\}$ is a critical point follows from the optimality conditions (4).

Without loss of generality, let $\{(x^{*},y^{*},p^{*})\}=\{(\bar{x}, \bar{y},\bar{p})\}$. Again from the nonincreasing monotonicity and boundedness of $H^{k}$, we know $H^{k}$ is convergent. Since $\lim_{j\rightarrow \infty }(x^{k_{j}},y^{{k_{j}}}, p^{k_{j}})=(x ^{*},y^{*},p^{*})$, we say $H^{k}\rightarrow 0$, i.e., $y^{k}\rightarrow y^{*}$ and $p^{k}\rightarrow p^{*}$. Again due to $p^{k+1}=p^{k}+ \rho (Mx^{k+1}-y^{k+1})$, we get $Mx^{k}\rightarrow Mx^{*}$. Finally, we complete the proof that $\{(y^{k},p^{k})\}$ converges to $\{(y^{*},p^{*})\}$ and $Mx^{k}\rightarrow Mx^{*}$.

(ii) When $\sigma _{1}=\rho _{1}-\rho _{2} \Vert M \Vert ^{2}-\mu >0$, adding both sides of (15) from $k=0$ to $k=\infty $, we get

$$ \sigma _{1}\sum_{k=0}^{\infty } \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}\leq H^{0}-H^{\infty }< \infty , $$

hence $x^{k}\rightarrow x^{*}$.

When M has full column rank, i.e. $M^{T}M\succeq \gamma I$ for some $\gamma >0$, then

$$ \gamma \bigl\Vert x^{k+1}-x^{*} \bigr\Vert \le \bigl\Vert Mx^{k+1}_{e} \bigr\Vert = \biggl\Vert \frac{1}{\rho } \bigl(p^{k+1}-p ^{k} \bigr)+y^{k+1}_{e} \biggr\Vert \leq \frac{1}{\rho } \bigl\Vert p^{k+1}-p^{k} \bigr\Vert + \bigl\Vert y^{k+1} _{e} \bigr\Vert . $$

Hence, we can also get $x^{k}\rightarrow x^{*}$ as follows from $p^{k+1}-p^{k}\rightarrow 0$ and $y^{k+1}-y^{*}\rightarrow 0$.

Last, due to Assumption 3.2 and the assertion (iii) of Definition 2.2, we have

$$ \begin{aligned} 0= {}&-M^{T}p^{*}+M^{T}p^{*} \\ \in{} & \partial f \bigl(x^{*} \bigr)+M^{T}\partial g \bigl(y ^{*} \bigr) \\ ={}&\partial f \bigl(x^{*} \bigr)+M^{T}\partial (g) \bigl(Mx^{*} \bigr) \\ ={}&\partial (f \bigl(x ^{*} \bigr)+g \bigl(Mx^{*} \bigr), \end{aligned} $$

i.e., $x^{*}$ is a critical point of $f(x)+g(Mx)$, then $x^{*}$ is the solution of (1) because of the convexity of $f(x)+g(Mx)$ and $x^{k}$ converges to an optimal solution of $f(x)+g(Mx)$. □

Remark 2

The boundedness assumption of the iterative sequence $x^{k}$ can be verified if one of the following three conditions is satisfied: (a) $\sigma _{1}=\rho _{1}-\rho _{2} \Vert M \Vert ^{2}-\mu >0$; (b) M has full column rank; (c) f is coercive.

How to tune the penalty parameter ρ in ADMM is a big issue and we expect the domain range of parameter ρ to be tuned to be as wide as possible; however, in most cases (even for convex ADMM algorithm), we have to accept the fact that only a relatively smaller range of ρ can be available. In general, we have the largest range (i.e. $(0,+\infty )$) of ρ to two-blocks convex optimization problems. For multi-block convex optimization problems, ρ has to be chosen to be larger than zero but less than a given constant, even under the assumption that one function is strongly convex as shown in [23]. In the context of nonconvexity, to ensure the sufficient descent of the selected Lyapunov function, ρ has to be larger than a constant that depends on the given regularity conditions [14‐17]. Turning to the SWCCO problem (2) addressed in this paper, we will set ρ to be larger than at least twice the weakly convex modulus, i.e. $\rho >2\rho _{2}$ in order to guarantee the nonincreasing of the Lyapunov function. We say this condition is necessary and the following example shows that the iterative sequence will diverge if $\rho \le 2\rho _{2}$.

Example 3.3

Consider the optimization problem

$$ \min_{x} \frac{a}{2}x^{2}- \frac{b}{2}x^{2}, $$

where $\rho _{1}=a>b=\rho _{2}>0$. We use ADMM (3) to solve this question, which is equivalent to

$$ \textstyle\begin{cases} \min_{x,y}\frac{a}{2}x^{2}+(-\frac{b}{2})y^{2},\\ \textit{s.t.}\quad x=y. \end{cases} $$

Note that $\frac{a}{2}x^{2}$ is strongly convex with modulus a and $(-\frac{b}{2})y^{2}$ is weakly convex with modulus 1, respectively. Therefore, from ADMM (3), we can get

$$\begin{aligned} &0=ax^{k+1}+p^{k}+\rho \bigl(x^{k+1}-y^{k} \bigr); \\ &0=-by^{k+1}-p^{k+1}; \\ &p ^{k+1}=p^{k}+\rho \bigl(x^{k+1}-y^{k+1} \bigr). \end{aligned}$$

Rearranging the above equations, we get

$$ y^{k+1}=\frac{1}{\rho -b} \biggl(\rho \frac{b+\rho }{a+\rho }-b \biggr)y^{k}. $$

It is obvious that $y^{k}$ will diverge if $\vert \frac{1}{\rho -b}(\rho \frac{b+ \rho }{a+\rho }-b) \vert >1$. Next, we let $a\rightarrow \infty $ and $\rho \in (b,2b)$, we can get $\vert \frac{1}{\rho -b}(\rho \frac{b+\rho }{a+ \rho }-b) \vert >1$, so $y^{k}$ diverges.

Why should the penalty parameter ρ be larger than a given parameter in the nonconvex case in order to ensure convergence of the ADMM algorithm? This is an interesting and open problem for further research. In comparison with the case that both of the two objective functions are convex, the penalty parameter of which is merely greater than zero, the SWCCO optimization problem essentially is a convex question, but the setting of the penalty parameter ρ also follows the rules of the nonconvex case, i.e. it is greater than a given positive constant. The main reason for this setting is that the SWCCO problem involves a nonconvex term.

4 Sublinear and linear convergence rate analysis

Compared with the large amount of convergence analysis results for convex/nonconvex ADMM algorithms, there are merely a limited number of references in the literature [24‐30] that have investigated the convergence rate as regards the convex optimization problems, not to speak of the nonconvex optimization problems. Specifically, the worst-case $O(1/k)$ convergence rate is established in [25] for the classic ADMM. The authors in [24] have investigated the dual objective function of the classic ADMM admitting $O(1/k)$ and $O(1/k^{2})$ convergence rate for the accelerated version. Very recently, the authors in [26] have established the $o(1/k)$ convergence rate to multi-blocks ADMM and the linear convergence rate to multi-block ADMM is established in [28, 29] but requiring a strongly convex, gradient Lipschitz continuous function or some additional assumptions for the objective function. Without smoothness assumptions for the objective function, the authors in [27] have established Q-linear convergence for the more general convex piecewise linear-quadratic programming problems solved by the ADMM and the linearized ADMM, respectively. Subsequently, this global Q-linear convergence rate has been extended to a general semi-proximal ADMM in [30] for solving convex composite piecewise linear-quadratic programming and quadratic semidefinite programming. In this section, for the SWCCO-ADMM algorithm (3) applied to problem (1), we will show the sublinear convergence rate $o(1/k)$ only under Assumption 3.1 and linear convergence rate result under further assumptions.

4.1 Sublinear convergence rate analysis

In this section, we extend the sublinear convergence rate results of the multi-block convex ADMM in [26] to the SWCCO-ADMM (3) motivated by the preceding observation that the primal residual $Mx^{k+1}-y^{k+1}$ (or $p^{k+1}-p^{k}$) and dual residual $y^{k+1}-y^{k}$ can be used to measure the optimality of the iterations of the SWCCO-ADMM. For simplicity, we denote $\bar{x}^{k+1}_{e}=x^{k+1}-x^{k}$, $\bar{y}^{k+1}_{e}=y^{k+1}-y^{k}$ and $\bar{p}^{k+1}_{e}=p^{k+1}-p^{k}$. Let us start the proof with a basic lemma.

Lemma 4.1

If a nonnegative sequence $\{a^{k}\}_{k=0}^{k=\infty }\subset {R}$ is monotonically nonincreasing and obeys $\sum_{k=0}^{\infty }a^{k}< \infty $, then $\{a^{k}\}_{k=0}^{k=\infty }$ enjoys $o(1/k)$ sublinear convergence rate.

Proof

Adding $a^{k}$ from $k=t$ to $k=2t$,

$$ \begin{aligned} ta^{2t}\leq \sum _{k=t}^{2t}a^{k}=\sum _{k=0}^{2t}a^{k}-\sum _{k=0}^{t}a ^{k}. \end{aligned} $$

Then taking $t\rightarrow \infty $, we have $\sum_{k=0}^{2t}a^{k}- \sum_{k=0}^{t}a^{k}\rightarrow 0$; therefore, $a^{k}=o(1/k)$. □

Thus, the key step to proving the sublinear convergence of SWCCO-ADMM is to verify that $h^{k}=\frac{\rho }{2} \Vert y^{k+1}-y^{k} \Vert ^{2}+\frac{1}{2 \rho } \Vert p^{k+1}-p^{k} \Vert ^{2}$ is monotonically nonincreasing and $\sum_{k=0}^{\infty } h^{k}$ is bounded.

Theorem 4.2

Suppose Assumption 3.1 holds, then the iterative sequences $\{(x^{k},y^{k},p^{k})\}$ generated by SWCCO-ADMM (3) admits

$$ \frac{\rho }{2} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+ \frac{1}{2\rho } \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}=o(1/k). $$

(27)

Proof

Firstly, we will prove $\frac{\rho }{2} \Vert y^{k+1}-y^{k} \Vert ^{2}+\frac{1}{2 \rho } \Vert p^{k+1}-p^{k} \Vert ^{2}$ is monotonically nonincreasing. By the optimality condition (4), we get $-M^{T}p^{k+1}-\rho M^{T}(y ^{k+1}-y^{k})\in \partial f(x^{k+1})$ and $p^{k+1}\in \partial g(y ^{k+1})$. From the strong convexity property (7) of f and the weakly convex property (8) of g, we obtain

$$ \begin{aligned} & \bigl\langle -M^{T}\bar{p}^{k+1}_{e}- \rho M^{T} \bigl(\bar{y}^{k+1}_{e}-\bar{y} ^{k}_{e} \bigr),\bar{x}^{k+1}_{e} \bigr\rangle = \bigl\langle -\bar{p}^{k+1}_{e}-\rho \bigl( \bar{y}^{k+1}_{e}-\bar{y}^{k}_{e} \bigr),M \bar{x}^{k+1}_{e} \bigr\rangle \geq \rho _{1} \bigl\Vert \bar{x}^{k+1}_{e} \bigr\Vert ^{2}; \\ & \bigl\langle \bar{p}^{k+1}_{e},\bar{y} ^{k+1}_{e} \bigr\rangle \geq -\rho _{2} \bigl\Vert \bar{y}^{k+1}_{e} \bigr\Vert ^{2}. \end{aligned} $$

Adding the above two relations and rearranging them, we get

$$\begin{aligned}& - \bigl\langle \bar{p}^{k+1}_{e},M \bar{x}^{k+1}_{e}- \bar{y}^{k+1}_{e} \bigr\rangle -\rho \bigl\langle \bar{y}^{k+1}_{e}- \bar{y}^{k}_{e},M \bar{x}^{k+1} _{e} \bigr\rangle \\& \quad = -\frac{1}{\rho } \bigl\langle \bar{p}^{k+1}_{e}, \bar{p} ^{k+1}_{e}-\bar{p}^{k}_{e} \bigr\rangle - \rho \bigl\langle \bar{y}^{k+1}_{e},M \bar{x}^{k+1}_{e} \bigr\rangle +\rho \bigl\langle \bar{y}^{k}_{e},M \bar{x}^{k+1} _{e} \bigr\rangle \\& \quad = -\frac{1}{2\rho } \bigl( \bigl\Vert \bar{p}^{k+1}_{e} \bigr\Vert ^{2}+ \bigl\Vert \bar{p}^{k+1}_{e}- \bar{p}^{k}_{e} \bigr\Vert ^{2}- \bigl\Vert \bar{p}^{k}_{e} \bigr\Vert ^{2} \bigr) \\& \quad\quad{} - \frac{ \rho }{2} \bigl( \bigl\Vert \bar{y}^{k+1}_{e} \bigr\Vert ^{2}+ \bigl\Vert M\bar{x}^{k+1}_{e} \bigr\Vert ^{2}- \bigl\Vert M \bar{x}^{k+1}_{e}- \bar{y}^{k+1}_{e} \bigr\Vert ^{2} \bigr) \\& \quad \quad{} +\frac{\rho }{2} \bigl( \bigl\Vert \bar{y}^{k}_{e} \bigr\Vert ^{2}+ \bigl\Vert M\bar{x}^{k+1}_{e} \bigr\Vert ^{2}- \bigl\Vert M\bar{x}^{k+1}_{e}- \bar{y}^{k}_{e} \bigr\Vert ^{2} \bigr) \\& \quad = \frac{1}{2\rho } \bigl( \bigl\Vert \bar{p}^{k}_{e} \bigr\Vert ^{2}- \bigl\Vert \bar{p}^{k+1}_{e} \bigr\Vert ^{2} \bigr)+\frac{\rho }{2} \bigl( \bigl\Vert \bar{y}^{k}_{e} \bigr\Vert ^{2}- \bigl\Vert \bar{y}^{k+1}_{e} \bigr\Vert ^{2} \bigr)- \frac{\rho }{2} \bigl\Vert M\bar{x}^{k+1}_{e}-\bar{y} ^{k}_{e} \bigr\Vert ^{2} \\& \quad \geq \rho _{1} \bigl\Vert \bar{x}^{k+1}_{e} \bigr\Vert ^{2}-\rho _{2} \bigl\Vert \bar{y}^{k+1}_{e} \bigr\Vert ^{2}, \end{aligned}$$

where the first and third equations follow from the relation $p^{k+1}=p^{k}+\rho (Mx^{k+1}-y^{k+1})$; the second equation follows from the relation $2\langle a,b\rangle =a^{2}+b^{2}-(a-b)^{2}$.

Let $h^{k}=\frac{1}{2\rho } \Vert \bar{p}^{k}_{e} \Vert ^{2}+\frac{\rho }{2} \Vert \bar{y}^{k}_{e} \Vert ^{2}$. From the above inequality, we get

$$ h^{k}-h^{k+1}\geq \rho _{1} \bigl\Vert \bar{x}^{k+1}_{e} \bigr\Vert ^{2}-\rho _{2} \bigl\Vert \bar{y} ^{k+1}_{e} \bigr\Vert ^{2}+\frac{\rho }{2} \bigl\Vert M \bar{x}^{k+1}_{e}- \bar{y}^{k}_{e} \bigr\Vert ^{2}. $$

Using similar proof methods as shown in Theorem 3.2, if follows that

$$ h^{k}-h^{k+1}\geq \sigma _{1} \bigl\Vert \bar{x}^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert \bar{y}^{k+1}_{e}- \bar{y}^{k}_{e} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert \bar{p}^{k+1}_{e}- \bar{p}^{k}_{e} \bigr\Vert ^{2}. $$

Since the right side of the above inequality is nonnegative, $h^{k}$ is monotonically nonincreasing.

Next, we verify the boundedness of $\sum_{k=0}^{\infty }h^{k}$. From Lemma 3.1, we know

$$ \sigma _{2}\sum_{k=0}^{\infty } \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+\sigma _{3}\sum_{k=0}^{\infty } \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}< \infty . $$

Therefore,

$$ \sum_{k=0}^{\infty }\frac{\rho }{2} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}< \infty \quad \textit{and} \quad \sum_{k=0}^{\infty } \frac{1}{2\rho } \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}< \infty . $$

Adding the above two relations, we have

$$ \sum_{k=0}^{\infty }\frac{\rho }{2} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+ \frac{1}{2 \rho } \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}< \infty . $$

Hence we get the boundedness of $\sum_{k=0}^{\infty }h^{k}$ and the $o(1/k)$ sublinear convergence rate following from Lemma 4.1. □

4.2 Linear convergence rate

For the convex generalized ADMM algorithm, the linear convergence rate has been established in [27, 29, 30] if appropriate regularity conditions are satisfied. In this section, we will investigate that the SWCCO-ADMM algorithm also admits a linear convergence rate based on some mild regularity conditions. The main idea to prove the linear convergence rate of an SWCCO-ADMM algorithm is to set up the relation

$$ H^{k}\geq (1+\tau )H^{k+1}, $$

(28)

where the parameter $\tau >0$ and $H^{k}$ is the Lyapunov function defined in Lemma 3.1. We first list the assumptions used to establish the linear convergence rate as follows.

Assumption 4.1

M is full column rank and $g(\cdot )$ is gradient Lipschitz continuous with modulus $L_{g}$; Assumption 3.1 holds, but with $\sigma _{1}=\rho _{1}-\rho _{2} \Vert M \Vert ^{2}-\mu >0$ (not ≥0).

Assumption 4.2

M is invertible, $f(x)$ is gradient Lipschitz continuous with modulus $L_{f}$; Assumption 3.1 holds, but with $\sigma _{1}=\rho _{1}- \rho _{2} \Vert M \Vert ^{2}-\mu >0$ (not ≥0).

Remark 3

When M is a wavelet transform and $g(x)=\log _{2}(1+x^{2})$, Assumption 4.1 will easily be satisfied. When M is the identity matrix and $f(x)=\frac{1}{2} \Vert v-Hx \Vert ^{2}$, Assumption 4.2 will easily be satisfied. In fact, we can obtain a linear convergence rate if one of Assumption 4.1 and Assumption 4.2 is satisfied in Theorem 4.5. Assumption 3.1 ensures the convergence of the SWCCO-ADMM algorithm. Because $\Vert x^{k}_{e} \Vert ^{2}$ is an important measure of $\Vert y^{k}_{e} \Vert ^{2}$ and $\Vert p^{k}-p^{k-1} \Vert ^{2}$, we assume the condition $\sigma _{1}>0$ to ensure $\sigma _{1} \Vert x^{k}_{e} \Vert ^{2}>0$. From Theorem 4.5, it can be seen that if $\sigma _{1}=0$, then $H^{k}\geq (1+\tau )H^{k+1} $ will become $H^{k}\geq H^{k+1}$, in this case, we will not obtain a linear convergence rate. Meanwhile, as shown in [29], the full rank of M and the gradient Lipschitz property of f and g are used to show a linear convergence rate for a convex generalized ADMM algorithm.

From Lemma 3.1, we first note that the monotonically nonincreasing inequality (15) as regards $H^{k}$ holds under Assumptions 4.1 or 4.2. In order to establish Eq. (28), we just prove $H^{k}-H^{k+1}\geq \tau H^{k+1}$. By using inequality (10) with $\gamma =1$, we first have the following relation:

$$\begin{aligned} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2}={} & \biggl\Vert Mx^{k+1}_{e}+\frac{1}{\rho } \bigl(p^{k}-p^{k+1} \bigr) \biggr\Vert ^{2} \\ \leq{} &2 \bigl\Vert Mx^{k+1}_{e} \bigr\Vert ^{2}+ \frac{2}{\rho ^{2}} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \\ \leq{} &2 \Vert M \Vert ^{2} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\frac{2}{\rho ^{2}} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}. \end{aligned}$$

(29)

To show the linear convergence rate, we need the following lemma.

Lemma 4.3

Suppose $g(y)$ is gradient Lipschitz continuous with modulus $L_{g}$, then the iterative sequences generated by SWCCO-ADMM (3) satisfy

$$ \bigl\Vert p^{k+1}-p^{*} \bigr\Vert ^{2}\leq L^{2}_{g} \bigl\Vert y^{k+1}-y^{*} \bigr\Vert ^{2}\quad \bigl(\textit{i.e.}, \bigl\Vert p^{k+1}_{e} \bigr\Vert ^{2}\leq L^{2}_{g} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2} \bigr). $$

Proof

According to the optimality condition (4), we have $p^{k+1}=\nabla g(y^{k+1})$. Hence, noticing that $p^{*}=\nabla g(y ^{*})$, we obtain

$$ \bigl\Vert p^{k+1}-p^{*} \bigr\Vert ^{2}= \bigl\Vert \nabla g \bigl(y^{k+1} \bigr)-\nabla g \bigl(y^{*} \bigr) \bigr\Vert ^{2}\leq L ^{2}_{g} \bigl\Vert y^{k+1}-y^{*} \bigr\Vert ^{2}. $$

□

Lemma 4.4

Suppose M is full row rank and $f(x)$ is gradient Lipschitz continuous with modulus $L_{f}$, then the iterative sequences generated by SWCCO-ADMM (3) satisfy

$$ \bigl\Vert p^{k+1}-p^{*} \bigr\Vert ^{2}\leq \frac{2L^{2}_{f}}{\lambda _{\min }(MM^{T})} \bigl\Vert x ^{k+1}-x^{*} \bigr\Vert ^{2}+\frac{2\rho ^{2}\lambda _{\max }(MM^{T})}{\lambda _{\min }(MM ^{T})} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}. $$

Proof

According to the optimality condition (4), we have $0=\nabla f(x^{k+1})+M^{T}p^{k+1}+\rho M^{T}(y^{k+1}-y^{k})$ and $0=\nabla f(x^{*})+M^{T}p^{*}$. Hence, we have

$$\begin{aligned} \lambda _{\min } \bigl(MM^{T} \bigr) \bigl\Vert p^{k+1}-p^{*} \bigr\Vert ^{2}\leq {}& \bigl\Vert M^{T}p^{k+1}-M^{T}p ^{*} \bigr\Vert ^{2} \\ ={}& \bigl\Vert \nabla f \bigl(x^{k+1} \bigr)-\nabla f \bigl(x^{*} \bigr)+ \rho M^{T} \bigl(y^{k+1}-y ^{k} \bigr) \bigr\Vert ^{2} \\ \leq {}&2 \bigl\Vert \nabla f \bigl(x^{k+1} \bigr)-\nabla f \bigl(x^{*} \bigr) \bigr\Vert ^{2}+2\rho ^{2} \bigl\Vert M^{T} \bigl(y^{k+1}-y^{k} \bigr) \bigr\Vert ^{2} \\ \leq {}&2L^{2}_{f} \bigl\Vert x^{k+1}-x^{*} \bigr\Vert ^{2}+2\rho ^{2}\lambda _{\max } \bigl(MM^{T} \bigr) \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}, \end{aligned}$$

where the second inequality follows from Eq. (10) with $\tau =1$ and the last inequality follows from the fact that $\nabla f(x)$ is Lipschitz continuous with constant $L_{f}$. Since M is full row rank, so $\lambda _{\max }(MM^{T})\geq \lambda _{\min }(MM ^{T})>0$. Then, dividing both sides of the above inequality by $\lambda _{\min }(MM^{T})$, we complete the proof. □

Next, we state the linear convergence rate result of the SWCCO-ADMM algorithm in the following theorem.

Theorem 4.5

Suppose Assumption 4.1 or Assumption 4.2 holds, the iterative sequences generated by SWCCO-ADMM (3) converge linearly to a critical point.

Proof

On the one hand, if Assumption 4.1 is set up, we have $\Vert M \Vert ^{2}>0$ due to M being full column rank. Then

$$\begin{aligned}& \frac{1}{2} \bigl\{ \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y ^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \bigr\} \\& \quad \geq \frac{1}{2} \bigl\{ \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p ^{k+1}-p^{k} \bigr\Vert ^{2} \bigr\} \\& \quad = \frac{1}{2} \biggl\{ \biggl( \frac{\sigma _{1}}{2 \Vert M \Vert ^{2}}\frac{2}{\rho } \biggr) \biggl(2 \Vert M \Vert ^{2}\frac{\rho }{2} \biggr) \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+ \biggl(\sigma _{3} \frac{\rho ^{2}}{2}\frac{2}{\rho } \biggr) \biggl( \frac{2}{\rho ^{2}} \frac{\rho }{2} \biggr) \bigl\Vert p^{k+1}-p ^{k} \bigr\Vert ^{2} \biggr\} \\& \quad \geq \frac{\min \{\frac{\sigma _{1}}{2 \Vert M \Vert ^{2}}\frac{2}{\rho },\sigma _{3}\frac{\rho ^{2}}{2}\frac{2}{\rho }\}}{2} \biggl\{ \biggl(2 \Vert M \Vert ^{2} \frac{ \rho }{2} \biggr) \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+ \biggl(\frac{2}{\rho ^{2}}\frac{\rho }{2} \biggr) \bigl\Vert p ^{k+1}-p^{k} \bigr\Vert ^{2} \biggr\} \\& \quad \geq \tau _{1}\frac{\rho }{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2}, \end{aligned}$$

(30)

where $\tau _{1}=\frac{\min \{\frac{\sigma _{1}}{2 \Vert M \Vert ^{2}}\frac{2}{ \rho },\sigma _{3}\frac{\rho ^{2}}{2}\frac{2}{\rho }\}}{2}$ and the last inequality follows from Eq. (29). Since $\nabla g(y)$ is Lipschitz continuous with constant $L_{g}$, and by Lemma 4.3, we have

$$\begin{aligned} \frac{1}{2} \bigl\{ \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y ^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \bigr\} \geq{} &\tau _{1}\frac{ \rho }{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2} \\ \geq{} &\tau _{1}\frac{\rho }{2}\frac{1}{L^{2}_{g}}2\rho \frac{1}{2 \rho } \bigl\Vert p^{k+1}_{e} \bigr\Vert ^{2} \\ ={}&\tau _{2}\frac{1}{2\rho } \bigl\Vert p^{k+1}_{e} \bigr\Vert ^{2} , \end{aligned}$$

(31)

where $\tau _{2}=\tau _{1}\frac{\rho }{2}\frac{1}{L^{2}_{g}}2\rho $. Adding (30) and (31), we have

$$ \begin{aligned} &\sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \\ &\quad =2\frac{1}{2} \bigl\{ \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \bigr\} \\ &\quad\geq \tau _{1}\frac{\rho }{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2}+\tau _{2} \frac{1}{2 \rho } \bigl\Vert p^{k+1}_{e} \bigr\Vert ^{2} \\ &\quad\geq \tau ' \biggl(\frac{\rho }{2} \bigl\Vert y^{k}_{e} \bigr\Vert ^{2}+\frac{1}{2\rho } \bigl\Vert p^{k}_{e} \bigr\Vert ^{2} \biggr)= \tau ' H^{k+1}, \end{aligned} $$

where $\tau '=\min \{\tau _{1},\tau _{2}\}$. So according to Lemma 3.1, we have

$$ \begin{aligned} H^{k}-H^{k+1}\geq \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y ^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2}\geq \tau ' H^{k+1}. \end{aligned} $$

Therefore,

$$ H^{k}\geq \bigl(1+\tau ' \bigr)H^{k+1}, $$

which implies that the iterative sequences of SWCCO-ADMM (3) converge linearly to a critical point under Assumption 4.1.

One the other hand, if Assumption 4.2 is set up, then

$$\begin{aligned}& \frac{1}{2} \bigl\{ \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y ^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \bigr\} \\& \quad \geq \frac{1}{2} \bigl\{ \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y ^{k+1}-y^{k} \bigr\Vert ^{2} \bigr\} \\& \quad = \frac{1}{2} \biggl\{ \biggl(\sigma _{1} \frac{\lambda _{\min }(MM^{T})}{2L^{2} _{2}}2\rho \biggr) \biggl(\frac{2L^{2}_{2}}{\lambda _{\min }(MM^{T})}\frac{1}{2\rho } \biggr) \bigl\Vert x^{k+1}-x^{*} \bigr\Vert ^{2} \\& \quad\quad{} + \biggl(\sigma _{2}\frac{\lambda _{\min }(MM^{T})}{2\rho ^{2}\lambda _{\max }(MM ^{T})}2\rho \biggr) \biggl(\frac{2\rho ^{2}\lambda _{\max }(MM^{T})}{\lambda _{\min }(MM ^{T})}\frac{1}{2\rho } \biggr) \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2} \biggr\} \\& \quad \geq \frac{\min \{\sigma _{1}\frac{\lambda _{\min }(MM^{T})}{2L^{2}_{2}}2 \rho ,\sigma _{2}\frac{\lambda _{\min }(MM^{T})}{2\rho ^{2}\lambda _{\max }(MM ^{T})}2\rho \}}{2} \biggl\{ \frac{2L^{2}_{2}}{\lambda _{\min }(MM^{T})}\frac{1}{2 \rho } \bigl\Vert x^{k+1}-x^{*} \bigr\Vert ^{2} \\& \quad \quad{} +\frac{2\rho ^{2}\lambda _{\max }(MM^{T})}{\lambda _{\min }(MM^{T})}\frac{1}{2 \rho } \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2} \biggr\} \\& \quad \geq \tau _{3}\frac{1}{2\rho } \bigl\Vert p^{k+1}-p^{*} \bigr\Vert ^{2}=\tau _{3} \frac{1}{2 \rho } \bigl\Vert p^{k+1}_{e} \bigr\Vert ^{2}, \end{aligned}$$

(32)

where $\tau _{3}=\frac{\min \{\sigma _{1}\frac{\lambda _{\min }(MM^{T})}{2L ^{2}_{2}}2\rho ,\sigma _{2}\frac{\lambda _{\min }(MM^{T})}{2\rho ^{2} \lambda _{\max }(MM^{T})}2\rho \}}{2}$ and the last inequality follows from Lemma 4.4. Then, since M is full column rank, (30) holds. Next, adding (30) and (32), we have

$$\begin{aligned} &\sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \\ &\quad=2\frac{1}{2} \bigl\{ \sigma _{1} \bigl\Vert x^{k+1}_{e} \bigr\Vert ^{2}+\sigma _{2} \bigl\Vert y^{k+1}-y^{k} \bigr\Vert ^{2}+\sigma _{3} \bigl\Vert p^{k+1}-p^{k} \bigr\Vert ^{2} \bigr\} \\ &\quad\geq \tau _{1}\frac{\rho }{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2}+\tau _{3} \frac{1}{2 \rho } \bigl\Vert p^{k+1}_{e} \bigr\Vert ^{2} \\ &\quad\geq \tau '' \biggl(\frac{\rho }{2} \bigl\Vert y^{k+1}_{e} \bigr\Vert ^{2}+ \frac{1}{2\rho } \bigl\Vert p^{k+1}_{e} \bigr\Vert ^{2} \biggr)=\tau ''H^{k+1}, \end{aligned}$$

where $\tau ''=\min \{\tau _{1},\tau _{3}\}$. Using Lemma 3.1 again, we have

$$ H^{k}\geq \bigl(1+\tau '' \bigr) H^{k+1}. $$

Hence, the iterative sequences of SWCCO-ADMM (3) converge linearly to a critical point. □

Remark 4

If $\sigma _{1}=0$, $\tau '=0$ and $\tau {''}=0$.

5 Numerical example

In this section, in order to confirm the convergence and the efficiency of the suggested SWCCO-ADMM algorithm we carry out a numerical experiment on a realistic optimization problem of image deblurring, by comparing with the classical ADMM with $l_{1}$ penalty function. Mathematically, the problem of image deblurring can be formulated as

$$ f = Hu + \eta, $$

(33)

where H is certain type of blur kernel, such as Gaussian or motion blur; η is a certain kind of additional disturbance, such as Gaussian noise with zero-mean and variance $\sigma ^{2}$. The aim of image deblurring is to recover the latent high-quality image u from the degraded (i.e. noisy or blurred) image f.

The common sparse regularized based model to (33) is

$$ \min_{u} \frac{1}{2} \Vert Hu-f \Vert _{2} ^{2}+\lambda \varPhi (Wu), \quad \lambda >0, $$

(34)

where W is wavelet frame transform satisfying ${W^{T}}W = I$; $\varPhi (\cdot )$ is the regularizer aiming to induce sparse solutions, such as the classical convex regularizer and $\ell _{1}$-norm $\Vert \cdot \Vert _{1}$. In this paper, we exploit the following nonconvex sparsity-inducing regularizer, i.e. the following a-weakly convex function $g(x)$ (a.k.a. the generalized minimax-concave penalty function):

$$ g(x)= \textstyle\begin{cases} \tau \vert x \vert -\frac{x^{2}}{2a} &\textit{if } \vert x \vert < \frac{\tau }{a},\\ \frac{\tau ^{2}}{2a} &\textit{if } \vert x \vert \geq \frac{\tau }{a}, \end{cases} $$

(35)

considered in [5] to induce a more unbiased sparse solution. The corresponding proximal mapping (a.k.a. firm shrinkage) to this weakly convex function can be written as

$$ \operatorname{prox}_{g}(x)= \textstyle\begin{cases} 0, & \vert x \vert < \lambda \tau , \\ (1-\lambda a)^{-1}(x-\lambda \tau ), & \lambda \tau \le \vert x \vert < \frac{\tau }{a} , \\ x, & \frac{\tau }{a}\le \vert x \vert , \end{cases} $$

(36)

where λ is the shrinkage parameter as set in the formulation of Eq. (34).

From Fig. 1 for the regularizers and the associated proximal mappings, we can see the firm shrinkage does not underestimate large magnitudes, i.e. the features (edges) of the solution u.

In the experiments, we set the standard deviation of Gaussian blurring kernel H to be 2, the variance $\sigma _{n}^{2}$ of Gaussian white noise η to be 2. In addition, we also compare our numerical results implemented using the proposed SWCCO-ADMM (3) with the classical $\ell _{1}$-norm ADMM algorithms. Both algorithms are implemented on Matlab2014a with the same parameters setting as occur in the ADMM framework; and we set the iteration numbers to be 20. The performance evaluation criteria of the proposed algorithms are quantitatively measured by means of the peak signal-to-noise ratio (PSNR).

From Fig. 2, it can be seen that the proposed SCWWO-ADMM can achieve a better PNSR than the classical ADMM in the same parameters setting.

The dependence of the energy of the objective function on the iterative number for a Barbara image shown in Fig. 3, which indicates that our proposed SWCCO-ADMM algorithm has convergence in practice and has a faster convergence rate than the classical ADMM.

6 Conclusions

In this paper, we have investigated the convergence for the alternating direction method of multipliers algorithm for the minimization of a combinational optimization problem, which consists of a strongly convex function and a weakly convex function. Meanwhile, we also established the sublinear ($o(1/k)$) and linear convergence rate for this SWCCO-ADMM algorithm for the first time. In order to guarantee the algorithm convergence, the proof shows that the penalty parameter ρ has to be chosen larger than at least twice the weakly convex modulus. We state that this is because the existence of the weakly convex term in the objective function. To extend the SWCCO-ADMM to the multi-block composition of a strongly convex and a weakly convex function will be an interesting topic for the future research. The convergence analysis in this manuscript is based on the fact that the strong convexity dominates the weak convexity in the objective function, i.e. the objective function is strongly convex. A convergence analysis of the SWCCO-ADMM algorithm with a weak assumption is our future work.

Acknowledgements

Not applicable.

Availability of data and materials

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel A new discrete Hilbert-type inequality involving partial sums

Nächster Artikel A note on degenerate Bernstein polynomials

Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976) MATHCrossRef

Glowinski, R., Marrocco, A.: Sur l’approximation, par éléments finis d’ordre 1, et la résolution, par pénalisation-dualité, d’une classe de problèmes de Dirichlet non linéaires. J. Equine Vet. Sci. 2(R2), 41–76 (1975) MATH

Bayram, I., Selesnick, I.W.: The Douglas–Rachford algorithm for weakly convex penalties. Oper. Res. Lett. 44(3), 379–382 (2015)

Möllenhoff, T., Strekalovskiy, E., Moeller, M., Cremers, D.: The primal-dual hybrid gradient method for semiconvex splittings. SIAM J. Imaging Sci. 8(2), 827–857 (2015) MathSciNetMATHCrossRef

Bayram, I.: On the convergence of the iterative shrinkage/thresholding algorithm with a weakly convex penalty. IEEE Trans. Signal Process. 64(6), 1597–1608 (2016) MathSciNetMATHCrossRef

Guo, K., Han, D.R., Yuan, X.M.: Convergence analysis of Douglas–Rachford splitting method for “strongly + weakly” convex programming. SIAM J. Numer. Anal. 55(4), 1549–1577 (2017) MathSciNetMATHCrossRef

Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Massachusetts Institute of Technology (1989)

Eckstein, J., Wang, Y.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Pac. J. Optim. 11(4), 619–644 (2015) MathSciNetMATH

Boyd, S., Parikh, N., Chu, E., Peleato, B.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011) MATHCrossRef

10.

Chen, L., Sun, D.F., Toh, K.C.: A note on the convergence of ADMM for linearly constrained convex optimization problems. Comput. Optim. Appl. 66(2), 1–17 (2016) MathSciNet

11.

Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992) MathSciNetMATHCrossRef

12.

Fazel, M., Pong, T.K., Sun, D.F., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34(3), 946–977 (2013) MathSciNetMATHCrossRef

13.

Attouch, H., Bolte, J., Svaiter, F.B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013) MathSciNetMATHCrossRef

14.

Hong, M.Y., Luo, Z.Q., Razaviyayn, M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. In: IEEE Int. Conf. Acoustics Speech. Signal Proc., pp. 3836–3840 (2015)

15.

Li, G.Y., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015) MathSciNetMATHCrossRef

16.

Wang, F.H.: Convergence of Bregman alternating direction method with multipliers for nonconvex composite problems. Arxiv preprint arXiv:1410.8625 (2014)

17.

Wang, Y., Yin, W.T., Zeng, J.S.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput., 1–35 (2018)

18.

Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013) MathSciNetMATHCrossRef

19.

Sun, T., Barrio, R., Cheng, L., Jiang, H.: Precompact convergence of the nonconvex primal-dual hybird gradient algorithm. J. Comput. Appl. Math. 330, 15–27 (2018) MathSciNetMATHCrossRef

20.

Esser, E., Zhang, X., Chan, T.F.: A gernal framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3(4), 1015–1046 (2010) MathSciNetMATHCrossRef

21.

Rockafellar, R.T.: Convex Analysis, pp. 5–101. Princeton University Press

22.

Rockafellar, R.T., Wets, J.B.R.: Variational Analysis. Springer

23.

Cai, X.J., Han, D.R., Yuan, X.M.: On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 66(1), 39–73 (2017) MathSciNetMATHCrossRef

24.

Goldstein, T., O’Donoghue, B., Setzer, S.: Fast alternating direction optimization methods. SIAM J. Imaging Sci. 7(3), 225–231 (2014) MathSciNetMATHCrossRef

25.

He, B.S., Yuan, X.M.: On the $O(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012) MathSciNetMATHCrossRef

26.

Deng, W., Lai, M.J., Peng, Z.M., Yin, W.T.: Parallel multi-block ADMM with $o (1 / k)$ convergence. J. Sci. Comput. 71(2), 712–736 (2017) MathSciNetMATHCrossRef

27.

Yang, W.H., Han, D.R.: Linear convergence of the alternating direction method of multipliers for a class of convex optimization problems. SIAM J. Numer. Anal. 54(2), 625–640 (2016) MathSciNetMATHCrossRef

28.

Hong, M.Y., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162(1–2), 1–35 (2012) MathSciNet

29.

Deng, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016) MathSciNetMATHCrossRef

30.

Han, D.R., Sun, D.F.: Linear rate convergence of the alternating direction method of multipliers for convex composite problemming. Math. Oper. Res. 43(2), 622–637 (2018) MathSciNetCrossRef

Titel: A fundamental proof of convergence of alternating direction method of multipliers for weakly convex optimization
verfasst von: Tao Zhang
Zhengwei Shen
Publikationsdatum: 01.12.2019
Verlag: Springer International Publishing
Erschienen in: Journal of Inequalities and Applications / Ausgabe 1/2019
Elektronische ISSN: 1029-242X
DOI: https://doi.org/10.1186/s13660-019-2080-0

Springer Professional

Abstract

Publisher’s Note

1 Introduction

1.1 Motivation and problem

1.2 Related work

1.3 Contributions

2 Preliminaries

3 Convergence analysis

4 Sublinear and linear convergence rate analysis

4.1 Sublinear convergence rate analysis

4.2 Linear convergence rate

5 Numerical example

6 Conclusions

Acknowledgements

Availability of data and materials

Competing interests

Publisher’s Note

Weitere Artikel der Ausgabe 1/2019

Upper and lower bounds for the Bregman divergence

Approximations of the generalized-Euler-constant function and the generalized Somos’ quadratic recurrence constant

Existence and exponential stability in the pth moment for impulsive neutral stochastic integro-differential equations driven by mixed fractional Brownian motion

Generalizations of Troisi’s inequality in weighted p-Sobolev spaces with singularities

A note on modified degenerate q-Daehee polynomials and numbers

An alternating iteration algorithm for solving the split equality fixed point problem with L-Lipschitz and quasi-pseudo-contractive mappings