Randomized Primal–Dual Proximal Block Coordinate Updates

Gao, Xiang; Xu, Yang-Yang; Zhang, Shu-Zhong

doi:10.1007/s40305-018-0232-4

Randomized Primal–Dual Proximal Block Coordinate Updates

Published: 19 December 2018

Volume 7, pages 205–250, (2019)
Cite this article

Journal of the Operations Research Society of China Aims and scope Submit manuscript

598 Accesses
14 Citations
Explore all metrics

Abstract

In this paper, we propose a randomized primal–dual proximal block coordinate updating framework for a general multi-block convex optimization model with coupled objective function and linear constraints. Assuming mere convexity, we establish its O(1 / t) convergence rate in terms of the objective value and feasibility measure. The framework includes several existing algorithms as special cases such as a primal–dual method for bilinear saddle-point problems (PD-S), the proximal Jacobian alternating direction method of multipliers (Prox-JADMM) and a randomized variant of the ADMM for multi-block convex optimization. Our analysis recovers and/or strengthens the convergence properties of several existing algorithms. For example, for PD-S our result leads to the same order of convergence rate without the previously assumed boundedness condition on the constraint sets, and for Prox-JADMM the new result provides convergence rate in terms of the objective value and the feasibility violation. It is well known that the original ADMM may fail to converge when the number of blocks exceeds two. Our result shows that if an appropriate randomization procedure is invoked to select the updating blocks, then a sublinear rate of convergence in expectation can be guaranteed for multi-block ADMM, without assuming any strong convexity. The new approach is also extended to solve problems where only a stochastic approximation of the subgradient of the objective is available, and we establish an $O(1/\sqrt{t})$ convergence rate of the extended approach for solving stochastic programming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated primal–dual proximal block coordinate updating methods for constrained convex optimization

Article 28 November 2017

Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights

Article 08 November 2017

Inexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization

Article 29 February 2020

Notes

Actually, [36] presented its algorithm in a more general way with the parameters adaptive to the iteration. However, its convergence result assumes constant values of these parameters for the convex case.

References

James, G.M., Paulson, C., Rusmevichientong, P.: The constrained lasso. Technical report. University of Southern California (2013)
Rockafellar, R.T.: Large-scale extended linear-quadratic programming and multistage optimization. Advances in Numerical Partial Differential Equations and Optimization: In: Proceedings of the Fifth Mexico-United States Workshop, vol. 2, pp. 247–261 (1991)
Hong, M., Chang, T.-H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.-Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization (2014). arXiv:1401.7079
Cui, Y., Li, X., Sun, D., Toh, K.-C.: On the convergence properties of a majorized ADMM for linearly constrained convex optimization problems with coupled objective functions. J. Optim. Theory Appl. 169(3), 1013–1041 (2016)
Article MathSciNet MATH Google Scholar
Chen, C., Li, M., Liu, X., Ye, Y.: On the convergence of multi-block alternating direction method of multipliers and block coordinate descent method (2015). arXiv:1508.00193
Gao, X., Zhang, S.: First-order algorithms for convex optimization with nonseparate objective and coupled constraints. J. Oper. Res. Soc. China 5(2), 131–159 (2017)
Article MathSciNet MATH Google Scholar
Glowinski, R., Marrocco, A.: Sur l’approximation, par eléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM Math. Model. Numer. Anal. 9(R2), 41–76 (1975)
MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics, vol. 9. SIAM, Philadelphia (1989)
Book MATH Google Scholar
Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: On the $O(1/n)$ convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MathSciNet MATH Google Scholar
Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)
Article MathSciNet MATH Google Scholar
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2015)
Article MathSciNet MATH Google Scholar
Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block admm. J. Oper. Res. Soc. China 3(3), 251–274 (2015)
Article MathSciNet MATH Google Scholar
Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162, 165–199 (2017)
Article MathSciNet MATH Google Scholar
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Article MathSciNet MATH Google Scholar
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Article Google Scholar
Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57–81 (2011)
Article MathSciNet MATH Google Scholar
Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)
Article MathSciNet MATH Google Scholar
Deng, W., Lai, M.-J., Peng, Z., Yin, W.: Parallel multi-block ADMM with $o(1/k)$ convergence. J. Sci. Comput. 71, 1–25 (2016)
MathSciNet MATH Google Scholar
He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Math. Oper. Res. 42(3), 662–691 (2017)
Article MathSciNet MATH Google Scholar
He, B., Hou, L., Yuan, X.: On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2015)
Article MathSciNet MATH Google Scholar
He, B., Tao, M., Yuan, X.: Alternating direction method with gaussian back substitution for separable convex programming. SIAM J. Optim. 22(2), 313–340 (2012)
Article MathSciNet MATH Google Scholar
Xu, Y.: Hybrid Jacobian and Gauss–Seidel proximal block coordinate update methods for linearly constrained convex programming. SIAM J. Optim. 28(1), 646–670 (2018)
Article MathSciNet MATH Google Scholar
Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. In: Abstract and Applied Analysis, vol. 2013. Hindawi Publishing Corporation (2013)
Cai, X., Han, D., Yuan, X.: On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 66(1), 39–73 (2017)
Article MathSciNet MATH Google Scholar
Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the admm with multiblock variables. SIAM J. Optim. 25(3), 1478–1497 (2015)
Article MathSciNet MATH Google Scholar
Li, M., Sun, D., Toh, K.-C.: A convergent 3-block semi-proximal ADMM for convex minimization problems with one strongly convex block. Asia-Pac. J. Oper. Res. 32(04), 1550024 (2015)
Article MathSciNet MATH Google Scholar
Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)
Article MathSciNet MATH Google Scholar
Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization (2015). arXiv:1511.06324
Lin, T., Ma, S., Zhang, S.: Iteration complexity analysis of multi-block admm for a family of convex minimization without strong convexity. J. Sci. Comput. 69, 1–30 (2016)
Article MathSciNet MATH Google Scholar
Chen, L., Sun, D., Toh, K.-C.: An efficient inexact symmetric gauss-seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161, 237–270 (2017)
Article MathSciNet MATH Google Scholar
Li, X., Sun, D., Toh, K.-C.: A schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions. Math. Program. 155(1–2), 333–373 (2016)
Article MathSciNet MATH Google Scholar
Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints. SIAM J. Optim. 25(2), 882–915 (2015)
Article MathSciNet MATH Google Scholar
Sun, R., Luo, Z.-Q., Ye, Y.: On the expected convergence of randomly permuted ADMM (2015). arXiv:1503.06387
Dang, C., Lan, G.: Randomized first-order methods for saddle point optimization (2014). arXiv:1409.8625
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet MATH Google Scholar
Peng, Z., Wu, T., Xu, Y., Yan, M., Yin, W.: Coordinate friendly structures, algorithms and applications. Ann. Math. Sci. Appl. 1(1), 57–119 (2016)
MathSciNet MATH Google Scholar
Gao, X., Jiang, B., Zhang, S.: On the information-adaptive variants of the ADMM: an iteration complexity perspective. J. Sci. Comput. 76(1), 327–363 (2018)
Article MathSciNet MATH Google Scholar
Xu, Y., Zhang, S.: Accelerated primal-dual proximal block coordinate updating methods for constrained convex optimization. Comput. Optim. Appl. 70(1), 91–128 (2018)
Article MathSciNet MATH Google Scholar
Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)
Article MathSciNet MATH Google Scholar
Xu, Y., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686–1716 (2015)
Article MathSciNet MATH Google Scholar
Grant, M., Boyd, S., Ye, Y.: CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx (2013)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Article MathSciNet MATH Google Scholar
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)
Article MathSciNet MATH Google Scholar
Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1–2), 615–642 (2015)
Article MathSciNet MATH Google Scholar
Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156, 1–52 (2015)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, USA
Xiang Gao & Shu-Zhong Zhang
Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, USA
Yang-Yang Xu

Authors

Xiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yang-Yang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Zhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang-Yang Xu.

Additional information

This work is partly supported by the National Science Foundation (Nos. DMS-1719549 and CMMI-1462408).

Appendices

Proofs of Lemmas

We give proofs of several lemmas that are used to show our main results.

1.1 Proof of Lemma 3.1

We prove (3.2), and (3.3) can be shown by the same arguments. By the optimality of $x_{I_k}^{k+1}$ from (2.1) or equivalently (2.16), we have for any $x_{I_k}\in {\mathcal {X}}_{I_k}$,

$$\begin{aligned}&(x_{I_k}-x_{I_k}^{k+1})^\top \left( \nabla _{I_k}f(x^k)-A_{I_k}^\top \lambda ^k +\rho _xA_{I_k}^\top r^{k} +\tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})+\hat{P}_{I_k}(x_{I_k}^{k+1}-x_{I_k}^k)\right) \nonumber \\&\geqslant 0, \end{aligned}$$

(A.1)

where $\tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})$ is a subgradient of $u_{I_k}$ at $x_{I_k}^{k+1}$, and we have used the formula of $r^{k+\frac{1}{2}}$ given in (2.2). We compute the expectation of each term in (A.1) as follows. First, we have

$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \nabla _{I_k}f(x^k)\nonumber \\= & {} {E}_{I_k}\left( x_{I_k}-x_{I_k}^k\right) ^\top \nabla _{I_k}f(x^k) + {E}_{I_k}(x_{I_k}^k-x_{I_k}^{k+1})^\top \nabla _{I_k}f(x^k)\nonumber \\= & {} \frac{n}{N}\left( x-x^k\right) ^\top \nabla f(x^k) + {E}_{I_k}(x^k-x^{k+1})^\top \nabla f(x^k) \end{aligned}$$

(A.2)

$$\begin{aligned}\leqslant & {} \frac{n}{N}\left( f(x)-f(x^k)\right) + {E}_{I_k}\left[ f(x^k)-f(x^{k+1})+\frac{L_f}{2}\Vert x^k-x^{k+1}\Vert ^2\right] \nonumber \\= & {} \frac{n-N}{N}\left( f(x)-f(x^k)\right) \nonumber \\&+\,{E}_{I_k}\left[ f(x)-f(x^{k+1})+\frac{L_f}{2}\Vert x^k-x^{k+1}\Vert ^2\right] , \end{aligned}$$

(A.3)

where the last inequality is from the convexity of f and the Lipschitz continuity of $\nabla _{I_k} f(x)$. Secondly,

$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top (-A_{I_k}^\top \lambda ^k)\nonumber \\= & {} {E}_{I_k}(x_{I_k}-x_{I_k}^{k})^\top (-A_{I_k}^\top \lambda ^k)+{E}_{I_k}(x_{I_k}^k-x_{I_k}^{k+1})^\top (-A_{I_k}^\top \lambda ^k)\nonumber \\= & {} \frac{n}{N}\left( x-x^k\right) ^\top (-A^\top \lambda ^k)+{E}_{I_k}(x^k-x^{k+1})^\top (-A^\top \lambda ^k)\nonumber \\= & {} \frac{n-N}{N}\left( x-x^k\right) ^\top (-A^\top \lambda ^k)+{E}_{I_k}(x-x^{k+1})^\top (-A^\top \lambda ^k). \end{aligned}$$

(A.4)

Similarly,

$$\begin{aligned} \rho _x {E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top A_{I_k}^\top r^{k}= & {} \frac{n-N}{N}\rho _x(x-x^k)^\top A^\top r^k\nonumber \\&+\,\rho _x{E}_{I_k}(x-x^{k+1})^\top A^\top r^{k}. \end{aligned}$$

(A.5)

For the fourth term of (A.1), we have

$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})\nonumber \\\leqslant & {} {E}_{I_k}\left[ u_{I_k}(x_{I_k})-u_{I_k}(x_{I_k}^{k+1})\right] \nonumber \\= & {} \frac{n}{N}\, u(x)-{E}_{I_k}[u(x^{k+1})-u(x^k)+u_{I_k}(x_{I_k}^k)]\nonumber \\= & {} \frac{n}{N}\left[ u(x)-u(x^k)\right] +{E}_{I_k}[u(x^k)-u(x^{k+1})]\nonumber \\= & {} \frac{n-N}{N}\left[ u(x)-u(x^k)\right] +{E}_{I_k}[u(x)-u(x^{k+1})], \end{aligned}$$

(A.6)

where the inequality is from the convexity of $u_{I_k}$. Finally, we have

$$\begin{aligned} {E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \hat{P}_{I_k}(x_{I_k}^{k+1}-x_{I_k}^k)={E}_{I_k}(x-x^{k+1})^\top \hat{P}(x^{k+1}-x^k). \end{aligned}$$

(A.7)

Plugging (A.3) through (A.7) into (A.1) and recalling $F(x)=f(x)+u(x)$, by rearranging terms we have

$$\begin{aligned}&{E}_{I_k}\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k+1}-x)^\top A^\top r^{k}\right] \nonumber \\&\quad +\,{E}_{I_k}\left( x^{k+1}-x\right) ^\top \hat{P}(x^{k+1}-x^k)-\frac{L_f}{2}{E}_{I_k}\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\leqslant \frac{N-n}{N}\left[ F(x^k)-F(x)+(x^{k}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k}-x)^\top A^\top r^{k}\right] . \end{aligned}$$

(A.8)

Note

$$\begin{aligned}&\qquad (x^{k+1}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k+1}-x)^\top A^\top r^{k}\\&\,\,\, =(x^{k+1}-x)^\top (-A^\top \lambda ^k)+\rho _x(x^{k+1}-x)^\top A^\top r^{k+1}\\&\qquad -\,\rho _x(x^{k+1}-x)^\top A^\top A(x^{k+1}-x^k) -\rho _x(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k)\\&{\overset{(2.5)}{=}}(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})+(\rho _x-\rho )(x^{k+1}-x)^\top A^\top r^{k+1}\\&\qquad -\,\rho _x(x^{k+1}-x)^\top A^\top A(x^{k+1}-x^k)-\rho _x(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k). \end{aligned}$$

Hence, we can rewrite (A.8) equivalently into (3.2). Through the same arguments, one can show (3.3), thus completing the proof.

1.2 Proof of Lemma 3.3

Letting $x=x^*,y=y^*$ in (3.8), we have for any $\lambda $ that

$$\begin{aligned}&\qquad E[h(x^*,y^*,\lambda )]\nonumber \\&\quad \,\,\geqslant E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+(\hat{x}-x^*)^{\top }(-A^{\top }\hat{\lambda }) +(\hat{y}-y^*)^{\top }(-B^{\top }\hat{\lambda })\right. \nonumber \\&\qquad \quad \left. +(\hat{\lambda }-\lambda )^{\top }(A\hat{x}+B\hat{y}-b)\right] \nonumber \\&\quad = E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+\langle \hat{\lambda },Ax^*+By^*-b\rangle -\langle \lambda ,A\hat{x}+B\hat{y}-b\rangle \right] \nonumber \\&\quad =E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)-\langle \lambda ,A\hat{x}+B\hat{y}-b\rangle \right] , \end{aligned}$$

(A.9)

where the last equality follows from the feasibility of $(x^*,y^*)$. For any $\gamma >0$, restricting $\lambda $ in $\mathcal {B}_\gamma $, we have

$$\begin{aligned} E[h(x^*,y^*,\lambda )]\leqslant \sup \limits _{\lambda \in {\mathcal {B}}_\gamma }h(x^*,y^*,\lambda ). \end{aligned}$$

Hence, letting $\lambda =-\frac{\gamma (A\hat{x}+B\hat{y}-b)}{\Vert A\hat{x}+B\hat{y}-b\Vert }\in \mathcal {B}_\gamma $ in (A.9) gives the desired result.

1.3 Proof of Lemma 3.5

In view of (3.9), we have

$$\begin{aligned}&E\big [(\gamma - \Vert \lambda ^*\Vert )\Vert A\hat{x}+B\hat{y}-b\Vert \big ]\\\leqslant & {} {E}\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+\gamma \Vert A\hat{x}+B\hat{y}-b\Vert \big ] \leqslant \varepsilon , \end{aligned}$$

which implies

$$\begin{aligned} E\Vert A\hat{x}+B\hat{y}-b\Vert \leqslant \frac{\varepsilon }{\gamma -\Vert \lambda ^*\Vert }, \text{ and } E\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big ] \leqslant \varepsilon . \end{aligned}$$

(A.10)

Denote $a^-=\max (0,-a)$ for a real number a. Then, from (3.9) and (A.10), it follows that

$$\begin{aligned} E\big (\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big )^- \leqslant \Vert \lambda ^*\Vert \cdot E\Vert A\hat{x}+B\hat{y}-b\Vert \leqslant \frac{\Vert \lambda ^*\Vert \varepsilon }{\gamma -\Vert \lambda ^*\Vert }. \end{aligned}$$

Noting $|a| = a + 2a ^ -$ for any real number a, we have

$$\begin{aligned}&\quad E\big |\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big | \nonumber \\&= E\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big ] + 2E\big (\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big )^- \nonumber \\&\leqslant \left( \frac{2\Vert \lambda ^*\Vert }{\gamma -\Vert \lambda ^*\Vert } + 1\right) \varepsilon . \end{aligned}$$

(A.11)

1.4 Proof of Inequalities (4.9c) and (4.9d) with $\alpha _k=\frac{\alpha _0}{\sqrt{k}}$

We have $\beta _k=\frac{\alpha _0}{\rho \big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )},$ and

$$\begin{aligned}&\qquad \frac{\alpha _{k-1}}{\alpha _{k}}\beta _{k}+(1-\theta )\beta _{k+1}-\frac{\alpha _k}{\alpha _{k+1}}\beta _{k+1}-(1-\theta )\beta _{k}\\&=\frac{\alpha _0}{\rho }\left[ \left( \frac{\sqrt{k}}{\sqrt{k-1}}-(1-\theta )\right) \frac{1}{\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )} \right. \\&\qquad \left. -\left( \frac{\sqrt{k+1}}{\sqrt{k}}-(1-\theta )\right) \frac{1}{\big (\sqrt{k+1}-(1-\theta )\sqrt{k}\big )}\right] \\&=: \frac{\alpha _0}{\rho }[\psi (k)-\psi (k+1)]. \end{aligned}$$

By elementary calculus, we have

$$\begin{aligned} \psi '(k)= & {} \frac{\frac{\sqrt{k-1}}{\sqrt{k}}-\frac{\sqrt{k}}{\sqrt{k-1}}}{2(k-1)}\frac{1}{\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )}\\&\qquad +\left( \frac{\sqrt{k}}{\sqrt{k-1}}-(1-\theta )\right) \frac{-1}{2\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )^2}\left( \frac{1}{\sqrt{k}}-\frac{1-\theta }{\sqrt{k-1}}\right) \\= & {} \frac{1}{2(k-1)(\sqrt{k}-(1-\theta )\sqrt{k-1})} \\&\qquad \times \left[ \frac{\sqrt{k-1}}{\sqrt{k}}-\frac{\sqrt{k}}{\sqrt{k-1}}-\sqrt{k-1}\left( \frac{1}{\sqrt{k}}-\frac{1-\theta }{\sqrt{k-1}}\right) \right] \\= & {} \frac{1}{2(k-1)(\sqrt{k}-(1-\theta )\sqrt{k-1})}\left( (1-\theta )-\frac{\sqrt{k}}{\sqrt{k-1}}\right) <0. \end{aligned}$$

Hence, $\psi (k)$ is decreasing with respect to k, and thus (4.9c) holds.

When $\alpha _k=\frac{\alpha _0}{\sqrt{k}}$, (4.9d) becomes

$$\begin{aligned} \frac{\alpha _0}{2\rho \sqrt{t}}\geqslant \left| \left( \frac{\sqrt{t}}{\sqrt{t-1}}-(1-\theta )\right) \frac{\alpha _0}{\rho (\sqrt{t}-(1-\theta )\sqrt{t-1})}-\frac{\alpha _0}{\rho \sqrt{t}}\right| , \end{aligned}$$

which is equivalent to

$$\begin{aligned} \frac{1}{2}\geqslant \frac{\sqrt{t}}{\sqrt{t-1}}-1 \Longleftrightarrow t\geqslant \frac{9}{5}. \end{aligned}$$

This completes the proof.

Proofs of Theorems

In this section, we give the technical details for showing all theorems. For simplicity of notation, throughout the proofs of this section, we define $\tilde{P}$ and $\tilde{Q}$ as follows:

$$\begin{aligned} \tilde{P}=\hat{P}-\rho _x A^\top A,\qquad \tilde{Q}=\hat{Q}-\rho _y B^\top B. \end{aligned}$$

(B.1)

1.1 Proof of Theorem 3.6

Taking expectation over both sides of (3.2) and summing it over $k=0$ through t, we have

$$\begin{aligned}&E\big [F(x^{t+1})-F(x)+(x^{t+1}-x)^\top (-A^\top \lambda ^{t+1})\big ]+(1-\theta )\rho _x E(x^{t+1}-x)^\top A^\top r^{t+1}\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\big [F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})\big ]\nonumber \\&\qquad -\sum _{k=0}^t\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k)\nonumber \\&\qquad +\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)-\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] , \end{aligned}$$

(B.2)

where we have used $\frac{n}{N}=\theta $, the condition in (3.17) and the definition of $\tilde{P}$ in (B.1). Similarly, taking expectation over both sides of (3.3), summing it over $k=0$ through t, we have

$$\begin{aligned}&E\big [G(y^{t+1})-G(y)+(y^{t+1}-y)^\top (-B^\top \lambda ^{t+1})\big ]\nonumber \\&\qquad + (1-\theta )\rho _y E(y^{t+1}-y)^\top B^\top r^{t+1}\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\big [G(y^{k+1})-G(y)+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\big ]\nonumber \\&\qquad +\sum _{k=0}^t E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)-\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&\leqslant (1-\theta )\left[ G(y^0)-G(y)+(y^{0}-y)^\top (-B^\top \lambda ^0)+ \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad +(1-\theta )\sum _{k=0}^t E\rho _y(y^{k}-y)^\top B^\top A(x^{k+1}-x^k). \end{aligned}$$

(B.3)

Recall $\lambda ^{k+1}=\lambda ^k-\rho r^{k+1}$, thus

$$\begin{aligned} (\lambda ^{k+1}-\lambda )^\top r^{k+1} =-\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k), \end{aligned}$$

(B.4)

where $\lambda $ is an arbitrary vector and possibly random. Denote $\tilde{\lambda }^{t+1}=\lambda ^t-\rho _xr^{t+1}.$ Then, similar to (B.4), we have

$$\begin{aligned} (\tilde{\lambda }^{t+1}-\lambda )^\top r^{t+1} =-\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t). \end{aligned}$$

(B.5)

Summing (B.2) and (B.3) together and using (B.4) and (B.5), we have

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1}) \right. \left. +\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right. \nonumber \\&\qquad \left. +\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k)\right] \nonumber \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad +(1-\theta )\left[ G(y^0)-G(y)+(y^{0}-y)^\top (-B^\top \lambda ^0)+ \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad +\sum _{k=0}^t\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) +(1-\theta )\sum _{k=0}^t\rho _y E(y^{k}-y)^\top B^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\sum _{k=0}^t E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2, \end{aligned}$$

(B.6)

where we have used $\varPhi (x,y)=F(x)+G(y)$ and the definition of H given in (2.18).

When $B=0$ and $y^k\equiv y^0$, (B.6) reduces to

$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})+\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right. \\&\qquad \left. +\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k)\right] \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$

Using (2.21) and noting $\theta =\frac{\rho }{\rho _x}$, from the above inequality after canceling terms we have

$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&+\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&+\frac{1}{2\rho _x} E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2+\sum _{k=0}^{t-1}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&-\frac{1}{2}E\left[ \Vert x^{t+1}-x\Vert _{\tilde{P}}^2-\Vert x^0-x\Vert _{\tilde{P}}^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\right] \nonumber \\&+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$

(B.7)

For any feasible x, we note $\tilde{\lambda }^{t+1}-\lambda ^t=\rho _xA(x^{t+1}-x)$ and thus

$$\begin{aligned} \frac{1}{\rho _x}\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2=\rho _x\Vert x^{t+1}-x\Vert _{A^\top A}^2. \end{aligned}$$

(B.8)

In addition, since $x^{k+1}$ and $x^k$ differ only on the index set $I_k$, we have by recalling $\tilde{P}=\hat{P}-\rho _x A^\top A$ that

$$\begin{aligned} \Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2-L_f\Vert x^{k+1}-x^k\Vert ^2= & {} \Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert _{\hat{P}_{I_k}}^2-\Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert _{\rho _x A_{I_k}^\top A_{I_k}}^2\nonumber \\&-L_f\Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert ^2. \end{aligned}$$

(B.9)

Plugging (B.8) and (B.9) into (B.7) and using (3.10) lead to

$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&+\,\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&+\frac{1}{2\rho _x} E\Vert \lambda ^0-\lambda \Vert ^2+\frac{1}{2}\Vert x^0-x\Vert _{\tilde{P}}^2. \end{aligned}$$

The desired result follows from $\lambda ^0=0$, and Lemmas 3.3 and 3.5 with $\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}$.

1.2 Proof of Theorem 3.7

It follows from (3.3) with $\rho _y=\rho $ and $m=M$ that (recall the definition of $\tilde{Q}$ in (B.1)) for any $y\in {\mathcal {Y}}$,

$$\begin{aligned}&G(y^{k+1})-G(y)-\frac{L_g}{2}\Vert y^k-y^{k+1}\Vert ^2+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\nonumber \\&\quad +(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)\leqslant 0. \end{aligned}$$

(B.10)

Similar to (B.10), and recalling the definition of $\tilde{y}^{t+1}$, we have for any $y\in {\mathcal {Y}}$,

$$\begin{aligned}&G(\tilde{y}^{t+1})-G(y)-\frac{L_g}{2}\Vert \tilde{y}^{t+1}-y^t\Vert ^2+(\tilde{y}^{t+1}-y)^\top (-B^\top \tilde{\lambda }^{t+1})\nonumber \\&\quad +\theta (\tilde{y}^{t+1}-y)^\top \tilde{Q}(y^{t+1}-y^t)\leqslant 0, \end{aligned}$$

(B.11)

where

$$\begin{aligned} \tilde{\lambda }^{t+1}=\lambda ^t-\rho _x(Ax^{t+1}+B\tilde{y}^{t+1}-b). \end{aligned}$$

(B.12)

Adding (B.10) and (B.11) to (B.2) and using the formula of $\lambda ^k$ give

$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(x^{t+1}-x)^\top (-A^\top \tilde{\lambda }^{t+1})\right] \nonumber \\&\qquad +E\left( \tilde{\lambda }^{t+1}-\lambda \right) ^\top \left( Ax^{t+1}+B\tilde{y}^{t+1}-b+\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda ^t)\right) \nonumber \\&\qquad +E\left[ G(\tilde{y}^{t+1})-G(y)+(\tilde{y}^{t+1}-y)^\top (-B^\top \tilde{\lambda }^{t+1}) \right. \nonumber \\&\left. \qquad +\theta (\tilde{y}^{t+1}-y)^\top \tilde{Q}(\tilde{y}^{t+1}-y^k)\right] -\frac{L_g}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert ^2\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})\right] \nonumber \\&\qquad -\sum _{k=0}^{t-1}\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) -\rho _x{E}(x^{t+1}-x)^\top A^\top B(\tilde{y}^{t+1}-y^t)\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ G(y^{k+1})-G(y)-\frac{L_g}{2}\Vert y^k-y^{k+1}\Vert ^2+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\right. \nonumber \\&\qquad \left. +(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E(\lambda ^{k+1}-\lambda )^\top \left( r^{k+1}+\frac{1}{\rho }(\lambda ^{k+1}-\lambda ^k)\right) \nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad -\sum _{k=0}^tE(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$

(B.13)

By the notation in (2.18) and using (3.6), (B.13) can be written into

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\&\qquad +\theta \sum _{k=0}^{t-1}E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\theta {E}(\tilde{y}^{t+1}-y)^\top \tilde{Q}(\tilde{y}^{t+1}-y^t)\\&\qquad -\sum _{k=0}^{t-1}\rho _xE\left( \frac{1}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k) -(y^{k+1}-y)^\top B^\top B(y^{k+1}-y^k)\right) \\&\qquad -\rho _x E\left( \frac{1}{\rho _x}(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t) -(\tilde{y}^{t+1}-y)^\top B^\top B(\tilde{y}^{t+1}-y^t)\right) \\&\qquad +\frac{\theta }{\rho } E(\tilde{\lambda }^{t+1}-\lambda )^\top \left( \tilde{\lambda }^{t+1}-\lambda ^t\right) +\frac{\theta }{\rho }\sum _{k=0}^{t-1}E(\lambda ^{k+1}-\lambda )^\top \left( \lambda ^{k+1}-\lambda ^k\right) \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\\&\qquad +\frac{\theta L_g}{2}\sum _{k=0}^{t-1}E\Vert y^k-y^{k+1}\Vert ^2+\frac{L_g}{2} E\Vert \tilde{y}^{t+1}-y^t\Vert ^2. \end{aligned}$$

Now use (2.21) to derive from the above inequality that

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\theta }{2}\left( E\Vert \tilde{y}^{t+1}-y\Vert _{\tilde{Q}}^2-\Vert y^0-y\Vert _{\tilde{Q}}^2\right) \nonumber \\&\qquad +\frac{\theta }{2}\sum _{k=0}^{t-1}E\Vert y^{k+1}-y^k\Vert _{\tilde{Q}}^2+\frac{\theta }{2}E\Vert \tilde{y}^{t+1}-y^t\Vert _{\tilde{Q}}^2\nonumber \\&\qquad +\frac{\rho _x}{2}\left( E\Vert \tilde{y}^{t+1}-y\Vert _{B^\top B}^2-\Vert y^0-y\Vert _{B^\top B}^2\right) \nonumber \\&\qquad +\frac{\rho _x}{2}\sum _{k=0}^{t-1}E\Vert y^{k+1}-y^k\Vert _{B^\top B}^2+\frac{\rho _x}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert _{B^\top B}^2\nonumber \\&\qquad -\sum _{k=0}^{t-1}E\frac{\rho _x}{\rho }\left( \lambda ^k-\lambda ^{k+1}\right) ^\top B(y^{k+1}-y^k) -E(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t)\nonumber \\&\qquad +\frac{\theta }{2\rho }\left( E\Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2\right) \nonumber \\&\qquad +\frac{\theta }{2\rho }\sum _{k=0}^{t-1}E\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2+\frac{\theta }{2\rho }E\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\nonumber \\&\quad \leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad -\frac{1}{2}\left[ E\Vert x^{t+1}-x\Vert ^2_{\tilde{P}}-\Vert x^0-x\Vert ^2_{\tilde{P}}\right. \nonumber \\&\qquad \left. +\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2_{\tilde{P}}\right] +\frac{L_f}{2}\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad +\frac{\theta L_g}{2}\sum _{k=0}^{t-1}E\Vert y^k-y^{k+1}\Vert ^2+\frac{L_g}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert ^2. \end{aligned}$$

(B.14)

Note that for $k\leqslant t-1$,

$$\begin{aligned} -\frac{\rho _x}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k)+\frac{\theta }{2\rho }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\geqslant -\frac{\rho }{2\theta ^3}\Vert y^{k+1}-y^k\Vert _{B^\top B}^2 \end{aligned}$$

and

$$\begin{aligned} -(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t)+\frac{\theta }{2\rho }\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\geqslant -\frac{\rho }{2\theta }\Vert \tilde{y}^{t+1}-y^t\Vert _{B^\top B}^2. \end{aligned}$$

Because $\tilde{P}, {\tilde{Q}}$ and $\rho $ satisfy (3.13), we have from (B.14) that

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&+\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&+\frac{1}{2}\Vert x^0-x\Vert ^2_{\tilde{P}}+\frac{\theta }{2}\Vert y^0-y\Vert _{\tilde{Q}}^2+\frac{\rho }{2\theta }\Vert y^0-y\Vert _{B^\top B}^2+\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$

Similar to Theorem 3.8, from the convexity of $\varPhi $ and (2.23), we have

$$\begin{aligned}&(1+\theta t)E\big [\varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H({w})\big ]\nonumber \\&\leqslant (1-\theta )\big [F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\big ]\nonumber \\&+\frac{1}{2}\Vert x^0-x\Vert ^2_{\tilde{P}}+\frac{\theta }{2}\Vert y^0-y\Vert _{\tilde{Q}}^2+\frac{\rho }{2\theta }\Vert y^0-y\Vert _{B^\top B}^2+\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$

(B.15)

Noting $\lambda ^0=0$ and $(x^{0}-x)^\top A^\top r^{0}\leqslant \frac{1}{2}\big [\Vert x^0-x\Vert _{A^\top A}+\Vert r^0\Vert ^2\big ]$, and using Lemmas 3.3 and 3.5 with $\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}$, we obtain the result (3.16).

1.3 Proof of Theorem 3.8

Using (3.6) and (3.7), applying (2.21) to the cross terms, and also noting the definition of $\tilde{P}$ and $\tilde{Q}$ in (B.1), we have

$$\begin{aligned}&-\frac{\theta }{\rho }E\left[ \sum _{k=0}^{t-1}(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k) +(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \nonumber \\&\qquad +\sum _{k=0}^t\rho _xE(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) +(1-\theta )\sum _{k=0}^t\rho _y E(y^{k}-y)^\top B^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\sum _{k=0}^tE(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\sum _{k=0}^tE(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&=-\frac{\theta }{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=0}^{t-1}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\rho _x}{\rho }\sum _{k=0}^t E(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k) +\frac{(1-\theta )\rho _y}{\rho }\sum _{k=0}^t E(\lambda ^{k-1}-\lambda ^{k})^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\frac{\theta \rho _y}{2}E\left( \Vert x^0-x\Vert _{A^\top A}^2-\Vert x^{t+1}-x\Vert _{A^\top A}^2\right) +\frac{(2-\theta )\rho _y}{2}\sum _{k=0}^t E\Vert x^{k+1}-x^k\Vert _{A^\top A}^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert x^{t+1}-x\Vert _{\hat{P}}^2-\Vert x^0-x\Vert _{\hat{P}}^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\hat{P}}^2\right) \nonumber \\&\qquad +\frac{L_f}{2}\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert y^{t+1}-y\Vert _{\hat{Q}}^2-\Vert y^0-y\Vert _{\hat{Q}}^2+\sum _{k=0}^t\Vert y^{k+1}-y^k\Vert _{\hat{Q}}^2\right) \nonumber \\&\qquad +\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2, \end{aligned}$$

(B.16)

where we have used the conditions in (3.17). By Young’s inequality, we have that for $0\leqslant k\leqslant t$,

$$\begin{aligned}&\quad \frac{\rho _x}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k)-\frac{\theta }{2\rho }\frac{1}{2-\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\nonumber \\&\leqslant \frac{\rho }{\theta }\frac{2-\theta }{2}\frac{\rho _x^2}{\rho ^2}\Vert B(y^{k+1}-y^k)\Vert ^2 \overset{(3.17)}{=} \frac{(2-\theta )\rho _y}{2\theta ^2}\Vert y^{k+1}-y^k\Vert _{B^\top B}^2, \end{aligned}$$

(B.17)

and for $1\leqslant k\leqslant t$,

$$\begin{aligned}&\qquad \frac{(1-\theta )\rho _y}{\rho }(\lambda ^{k-1}-\lambda ^{k})^\top A(x^{k+1}-x^k)-\frac{\theta }{2\rho }\frac{1-\theta }{2-\theta }\Vert \lambda ^{k-1}-\lambda ^k\Vert ^2\nonumber \\&\,\,\,\leqslant (1-\theta )\frac{\rho }{\theta }\frac{(2-\theta )\rho _y^2}{2\rho ^2}\Vert A(x^{k+1}-x^k)\Vert ^2\nonumber \\&\overset{(3.17)}{=}\frac{(1-\theta )(2-\theta )}{2\theta ^2}\rho _x\Vert x^{k+1}-x^k\Vert _{A^\top A}^2. \end{aligned}$$

(B.18)

Plugging (B.17) and (B.18) and also noting $\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\geqslant \Vert \lambda ^{t+1}-\lambda ^t\Vert ^2$, we can upper bound the right-hand side of (B.16) by

$$\begin{aligned}&\quad -\frac{\theta }{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2\right] -\frac{\theta \rho _y}{2}E\left( \Vert x^0-x\Vert _{A^\top A}^2-\Vert x^{t+1}-x\Vert _{A^\top A}^2\right) \nonumber \\&\qquad +\left( \frac{(1-\theta )(2-\theta )}{2\theta ^2}\rho _x+\frac{(2-\theta )\rho _y}{2}\right) \sum _{k=0}^t E\Vert x^{k+1}-x^k\Vert _{A^\top A}^2\nonumber \\&\qquad +\frac{(2-\theta )\rho _y}{2\theta ^2}\sum _{k=0}^t E\Vert y^{k+1}-y^k\Vert _{B^\top B}^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert x^{t+1}-x\Vert _{\hat{P}}^2-\Vert x^0-x\Vert _{\hat{P}}^2\right. \nonumber \\&\qquad \left. +\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\hat{P}}^2\right) +\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert y^{t+1}-y\Vert _{\hat{Q}}^2-\Vert y^0-y\Vert _{\hat{Q}}^2+\sum _{k=0}^t\Vert y^{k+1}-y^k\Vert _{\hat{Q}}^2\right) \nonumber \\&\qquad +\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&\overset{(3.18{\mathrm{a}})}{\leqslant }\frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$

(B.19)

In addition, note that

$$\begin{aligned}&\theta \Vert x^{t+1}-x\Vert _{A^\top A}^2=\frac{n}{N}\left\| \sum _{i=1}^N A_i(x_i^{t+1}-x_i)\right\| ^2\leqslant n\sum _{i=1}^N\Vert x_i^{t+1}-x_i\Vert _{A_i^\top A_i}^2,\\&\Vert x^k-x^{k+1}\Vert _{A^\top A}^2=\left\| \sum _{i\in I_k}A_i(x_i^k-x_i^{k+1})\right\| ^2\leqslant n\sum _{i=1}^N\Vert x_i^k-x_i^{k+1}\Vert _{A_i^\top A_i}^2,\\&\Vert y^k-y^{k+1}\Vert _{B^\top B}^2=\left\| \sum _{j\in J_k}B_j(y_j^k-y_j^{k+1})\right\| ^2\leqslant m\sum _{j=1}^M\Vert y_j^k-y_j^{k+1}\Vert _{B_j^\top B_j}. \end{aligned}$$

Hence, if $\hat{P}$ and $\hat{Q}$ satisfy (3.18b), then (B.19) also holds.

Combining (B.6), (B.16), and (B.19) yields

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\\leqslant & {} (1-\theta )\left[ \varPhi (x^0,y^0)-\varPhi (x,y)\right] \nonumber \\&\qquad +(1-\theta )\left[ (x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}+(y^{0}-y)^\top (-B^\top \lambda ^0)\right. \nonumber \\&\qquad \left. + \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad + \frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }{E}\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$

(B.20)

Applying the convexity of $\varPhi $ and the properties (2.23) of H, we have

$$\begin{aligned}&~~\qquad (1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(w)\right] \nonumber \\&\overset{(2.11)}{=} (1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(\hat{w}^{t+1})\right] \nonumber \\&\overset{(2.14)}{\leqslant }E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\,\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] .\qquad \end{aligned}$$

(B.21)

Now combining (B.21) and (B.20), we have

$$\begin{aligned}&(1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(w)\right] \nonumber \\\leqslant & {} (1-\theta )\left[ \varPhi (x^0,y^0)-\varPhi (x,y)\right] \nonumber \\&\quad +(1-\theta )\left[ (x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}+(y^{0}-y)^\top (-B^\top \lambda ^0)\right. \nonumber \\&\quad \left. + \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\quad + \frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$

(B.22)

By Lemmas 3.3 and 3.5 with $\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}$, we have the desired result.

1.4 Proof of Theorem 4.2

From the nonincreasing monotonicity of $\alpha _k$, one can easily show the following result.

Lemma B.1

Assume $\lambda ^{-1}=\lambda ^0$. It holds that

$$\begin{aligned}&\sum _{k=0}^t\frac{(1-\theta )\beta _k}{2}\left[ \Vert \lambda ^{k}-\lambda \Vert ^2-\Vert \lambda ^{k-1}-\lambda \Vert ^2+\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\right] \nonumber \\&\qquad -\sum _{k=0}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\left[ \Vert \lambda ^{k+1}-\lambda \Vert ^2-\Vert \lambda ^{k}-\lambda \Vert ^2+\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\leqslant -\sum _{k=0}^{t-1}\frac{\beta _{k+1}}{2}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2 \nonumber \\&-\sum _{k=0}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\Vert \lambda ^{k+1}-\lambda \Vert ^2 -\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k-1}-\lambda \Vert ^2\nonumber \\&+\frac{\alpha _0\beta _1}{2\alpha _1}\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=1}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\Vert \lambda ^{k}-\lambda \Vert ^2 +\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k}-\lambda \Vert ^2\nonumber \\&=-\sum _{k=0}^{t-1}\frac{\theta \beta _{k+1}}{2}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) \Vert \lambda ^0-\lambda \Vert ^2\nonumber \\&-\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) \Vert \lambda ^{t}-\lambda \Vert ^2\nonumber \\&-\sum _{k=1}^{t-1}\left( \frac{\alpha _{k-1}\beta _k}{2\alpha _k} +\frac{(1-\theta )\beta _{k+1}}{2}-\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}-\frac{(1-\theta )\beta _k}{2}\right) \Vert \lambda ^{k}-\lambda \Vert ^2. \end{aligned}$$

(B.23)

By the update formula of $\lambda $ in (4.5), we have from (4.8) that

$$\begin{aligned}&\quad E\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})+(\lambda ^{k+1}-\lambda )^\top r^{k+1}\right] \nonumber \\&\quad + E\left[ \frac{(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^{k})}{\left( 1-\frac{(1-\theta )\alpha _{k+1}}{\alpha _k }\right) \rho } +\frac{(1-\theta )\alpha _{k+1}}{\alpha _k}\rho (x^{k+1}-x)^\top A^\top r^{k+1}\right] \nonumber \\&\quad + E(x^{k+1}-x)^\top \left( \tilde{P}+\frac{I}{\alpha _k}\right) (x^{k+1}-x^k) -\frac{L_f}{2} E\Vert x^k-x^{k+1}\Vert ^2+ E(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta ) E\left[ F(x^k)-F(x)+(x^{k}-x)^\top (-A^\top \lambda ^k) \right. \left. +(\lambda ^{k}-\lambda )^\top r^{k}+\frac{(\lambda ^{k}-\lambda )^\top (\lambda ^k-\lambda ^{k-1})}{\left( 1-\frac{(1-\theta )\alpha _{k}}{\alpha _{k-1}}\right) \rho } \right] \nonumber \\&\quad +(1-\theta )\rho E(x^{k}-x)^\top A^\top r^{k}, \end{aligned}$$

(B.24)

where similar to (B.1), we have defined $\tilde{P}=\hat{P}-\rho A^\top A$.

Multiplying $\alpha _k$ to both sides of (B.24) and using (4.2) and (2.21), we have

$$\begin{aligned}&\quad \alpha _k E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}} E\left[ \Vert \lambda ^{k+1}-\lambda \Vert ^2-\Vert \lambda ^{k}-\lambda \Vert ^2+\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\qquad + E\left[ (1-\theta )\alpha _{k+1}\rho (x^{k+1}-x)^\top A^\top r^{k+1}\right] \nonumber \\&\qquad +\frac{\alpha _k}{2} E\big [\Vert x^{k+1}-x\Vert _{\tilde{P}}^2-\Vert x^{k}-x\Vert _{\tilde{P}}^2+\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\big ]\nonumber \\&\qquad +\frac{1}{2} E\left[ \Vert x^{k+1}-x\Vert ^2-\Vert x^{k}-x\Vert ^2+\Vert x^{k+1}-x^k\Vert ^2\right] \nonumber \\&\qquad -\frac{\alpha _kL_f}{2} E\Vert x^k-x^{k+1}\Vert ^2+\alpha _k{E}(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta )\alpha _k E\left[ F(x^k)-F(x)+(w^{k}-w)^\top H(w^k)\right] \nonumber \\&\qquad +\frac{(1-\theta )\beta _k}{2} E\left[ \Vert \lambda ^{k}-\lambda \Vert ^2-\Vert \lambda ^{k-1}-\lambda \Vert ^2+\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\right] \nonumber \\&\qquad +\alpha _k(1-\theta )\rho E(x^{k}-x)^\top A^\top r^{k}. \end{aligned}$$

(B.25)

Denote $\tilde{\lambda }^{t+1}=\lambda ^t-\rho r^{t+1}.$ Then, for $k=t$, it is easy to see that (B.25) becomes

$$\begin{aligned}&\quad \alpha _tE\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\frac{\alpha _t}{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\alpha _t}{2}E\left[ \Vert x^{t+1}-x\Vert _{\tilde{P}}^2-\Vert x^{t}-x\Vert _{\tilde{P}}^2+\Vert x^{t+1}-x^t\Vert _{\tilde{P}}^2\right] \nonumber \\&\qquad +\frac{1}{2}E\left[ \Vert x^{t+1}-x\Vert ^2-\Vert x^{t}-x\Vert ^2+\Vert x^{t+1}-x^t\Vert ^2\right] \nonumber \\&\qquad -\frac{\alpha _tL_f}{2}E\Vert x^t-x^{t+1}\Vert ^2+\alpha _t{E}(x^{t+1}-x^t)^\top \delta ^t\nonumber \\&\leqslant (1-\theta )\alpha _t E\left[ F(x^t)-F(x)+(w^{t}-w)^\top H(w^t)\right] \nonumber \\&\qquad +\frac{(1-\theta )\beta _t}{2}E\left[ \Vert \lambda ^{t}-\lambda \Vert ^2-\Vert \lambda ^{t-1}-\lambda \Vert ^2+\Vert \lambda ^{t}-\lambda ^{t-1}\Vert ^2\right] \nonumber \\&\qquad +\alpha _t(1-\theta ) E\rho (x^{t}-x)^\top A^\top r^{t}. \end{aligned}$$

(B.26)

By the nonincreasing monotonicity of $\alpha _k$, summing (B.25) from $k=0$ through $t-1$ and (B.26) and plugging (B.23) give

$$\begin{aligned}&\alpha _t E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \alpha _{k+1}\sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\alpha _t}{2\rho } E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\alpha _{t+1}}{2} E\Vert x^{t+1}-x\Vert _{\tilde{P}}^2 +\sum _{k=0}^t\frac{\alpha _k}{2} E\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\nonumber \\&\qquad +\frac{1}{2} E\big [\Vert x^{t+1}-x\Vert ^2-\Vert x^{0}-x\Vert ^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert ^2\big ]\nonumber \\&\qquad -\sum _{k=0}^t\frac{\alpha _kL_f}{2}E\Vert x^k-x^{k+1}\Vert ^2+\sum _{k=0}^t\alpha _k E(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta )\alpha _0 E\left[ F(x^0)-F(x)+(w^{0}-w)^\top H(w^0)\right] \nonumber \\&\qquad +\alpha _0(1-\theta )\rho (x^{0}-x)^\top A^\top r^{0}+\frac{\alpha _0}{2}\Vert x^{0}-x\Vert _{\tilde{P}}^2\nonumber \\&\qquad -\sum _{k=0}^{t-1}\frac{\theta \beta _{k+1}}{2} E\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) E\Vert \lambda ^0-\lambda \Vert ^2\nonumber \\&\qquad -\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) E\Vert \lambda ^{t}-\lambda \Vert ^2\nonumber \\&\qquad -\sum _{k=1}^{t-1}\left( \frac{\alpha _{k-1}\beta _k}{2\alpha _k}+\frac{(1-\theta )\beta _{k+1}}{2} -\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}-\frac{(1-\theta )\beta _k}{2}\right) E\Vert \lambda ^{k}-\lambda \Vert ^2 .\qquad \quad \quad \end{aligned}$$

(B.27)

From (4.9d), we have

$$\begin{aligned}&\frac{\alpha _t}{2\rho }\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2 +\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\\geqslant & {} -\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) \Vert \lambda ^{t}-\lambda \Vert ^2. \end{aligned}$$

In addition, from Young’s inequality, it holds that

$$\begin{aligned} \frac{1}{2}\Vert x^{k+1}-x^k\Vert ^2+\alpha _k{E}(x^{k+1}-x^k)^\top \delta ^k \geqslant \frac{\alpha ^2_k}{2}\Vert \delta \Vert ^2. \end{aligned}$$

Hence, dropping negative terms on the right-hand side of (B.27), from the convexity of $\varPhi $ and (2.23), we have

$$\begin{aligned}&\left( \alpha _{t+1}+\theta \sum \limits _{k=1}^t\alpha _k\right) E\left[ F(\hat{x}^{t})-F(x)+(\hat{w}^{t}-w)^\top H(\hat{w}^{t})\right] \nonumber \\&\qquad \alpha _t E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \alpha _{k+1}\sum _{k=0}^{t-1}E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\leqslant (1-\theta )\alpha _0\left[ F(x^0)-F(x)+(w^{0}-w)^\top H(w^0)\right] \nonumber \\&\qquad +(1-\theta )\alpha _0\rho (x^{0}-x)^\top A^\top r^{0}+\frac{\alpha _0}{2}\Vert x^{0}-x\Vert _{\tilde{P}}^2+\frac{1}{2}\Vert x^0-x\Vert ^2\nonumber \\&\qquad +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) E\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=0}^t\frac{\alpha _k^2}{2}E\Vert \delta ^k\Vert ^2. \end{aligned}$$

(B.28)

Using Lemma 3.3 and the properties of H, we derive the desired result.

1.5 Proof of Proposition 6.1

Let $(I+\partial \phi )^{-1}(x):={{\,\mathrm{arg\,min}\,}}_z \phi (z)+\frac{1}{2}\Vert z-x\Vert _2^2$ denote the proximal mapping of $\phi $ at x. Then, the update in (1.9b) can be written to

$$\begin{aligned} z^{k+1}=\left( I+\partial \left( \frac{g^*}{\eta }\right) \right) ^{-1}\left( z^k-\frac{1}{\eta }Ax^{k+1}\right) . \end{aligned}$$

Define $y^{k+1}$ as that in (6.7b). Then,

$$\begin{aligned} \frac{1}{\eta }y^{k+1}= & {} \frac{1}{\eta }\left\{ {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(y)-\langle y, z^k\rangle +\frac{1}{2\eta }\Vert y+ A x^{k+1}\Vert ^2\right\} \\= & {} \frac{1}{\eta }\left\{ {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(y)+\frac{\eta }{2}\Vert \frac{1}{\eta } y-(z^k-\frac{1}{\eta } A x^{k+1})\Vert ^2\right\} \\= & {} {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(\eta y)+\frac{\eta }{2}\Vert y-\left( z^k-\frac{1}{\eta } A x^{k+1}\right) \Vert ^2\\= & {} \left( I+\partial \left( \frac{1}{\eta } g(\eta \cdot )\right) \right) ^{-1}\left( z^k-\frac{1}{\eta } A x^{k+1}\right) . \end{aligned}$$

Hence, using the fact that the conjugate of $\frac{1}{\eta }g^*$ is $\frac{1}{\eta } g(\eta \cdot )$ and the Moreau’s identity $(I+\partial \phi )^{-1}+(I+\partial \phi ^*)^{-1}=I$ for any convex function $\phi $, we have

$$\begin{aligned}&\left( I+\partial \left( \frac{g^*}{\eta }\right) \right) ^{-1}\left( z^k-\frac{1}{\eta }Ax^{k+1}\right) \\&\quad +\left( I+\partial \left( \frac{1}{\eta } g(\eta \cdot )\right) \right) ^{-1}\left( z^k-\frac{1}{\eta } A x^{k+1}\right) =z^k-\frac{1}{\eta } A x^{k+1}. \end{aligned}$$

Therefore, (6.7c) holds, and thus from (1.9c) it follows

$$\begin{aligned} \bar{z}^{k+1}=z^{k+1}-\frac{q}{\eta }(Ax^{k+1}+y^{k+1}). \end{aligned}$$

Substituting the formula of $\bar{z}^k$ into (1.9a), we have for $i=i_k$,

$$\begin{aligned} x_i^{k+1}&={\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -\bar{z}^k, A_ix_i\rangle +u_i(x_i)+\frac{\tau }{2}\Vert x_i-x_i^k\Vert ^2\\&= {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -z^k, A_i x_i\rangle +\frac{q}{\eta }\langle A x^k+y^{k}, A_i x_i\rangle +u_i(x_i)+\frac{\tau }{2}\Vert x_i-x_i^k\Vert ^2\\&= {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -z^k, A_i x_i\rangle +u_i(x_i)+\frac{q}{2\eta }\Vert A_i x_i +A_{\ne i} x_{\ne i}^k+y^k\Vert ^2 \\&\quad +\frac{1}{2}\Vert x_i-x_i^k\Vert _{\tau I-\frac{q}{\eta }A_i^\top A_i}, \end{aligned}$$

which is exactly (6.7a). Hence, we complete the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gao, X., Xu, YY. & Zhang, SZ. Randomized Primal–Dual Proximal Block Coordinate Updates. J. Oper. Res. Soc. China 7, 205–250 (2019). https://doi.org/10.1007/s40305-018-0232-4

Download citation

Received: 06 February 2018
Revised: 18 July 2018
Accepted: 08 November 2018
Published: 19 December 2018
Issue Date: 05 June 2019
DOI: https://doi.org/10.1007/s40305-018-0232-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Randomized Primal–Dual Proximal Block Coordinate Updates

Abstract

Access this article

Similar content being viewed by others

Accelerated primal–dual proximal block coordinate updating methods for constrained convex optimization

Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights

Inexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Proofs of Lemmas

1.1 Proof of Lemma 3.1

1.2 Proof of Lemma 3.3

1.3 Proof of Lemma 3.5

1.4 Proof of Inequalities (4.9c) and (4.9d) with \(\alpha _k=\frac{\alpha _0}{\sqrt{k}}\)

Proofs of Theorems

1.1 Proof of Theorem 3.6

1.2 Proof of Theorem 3.7

1.3 Proof of Theorem 3.8

1.4 Proof of Theorem 4.2

Lemma B.1

1.5 Proof of Proposition 6.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Randomized Primal–Dual Proximal Block Coordinate Updates

Abstract

Access this article

Similar content being viewed by others

Accelerated primal–dual proximal block coordinate updating methods for constrained convex optimization

Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights

Inexact Variable Metric Stochastic Block-Coordinate Descent for Regularized Optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Proofs of Lemmas

1.1 Proof of Lemma 3.1

1.2 Proof of Lemma 3.3

1.3 Proof of Lemma 3.5

1.4 Proof of Inequalities (4.9c) and (4.9d) with \(\alpha _k=\frac{\alpha _0}{\sqrt{k}}\)

Proofs of Theorems

1.1 Proof of Theorem 3.6

1.2 Proof of Theorem 3.7

1.3 Proof of Theorem 3.8

1.4 Proof of Theorem 4.2

Lemma B.1

1.5 Proof of Proposition 6.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation