Skip to main content
Log in

Randomized Primal–Dual Proximal Block Coordinate Updates

  • Published:
Journal of the Operations Research Society of China Aims and scope Submit manuscript

Abstract

In this paper, we propose a randomized primal–dual proximal block coordinate updating framework for a general multi-block convex optimization model with coupled objective function and linear constraints. Assuming mere convexity, we establish its O(1 / t) convergence rate in terms of the objective value and feasibility measure. The framework includes several existing algorithms as special cases such as a primal–dual method for bilinear saddle-point problems (PD-S), the proximal Jacobian alternating direction method of multipliers (Prox-JADMM) and a randomized variant of the ADMM for multi-block convex optimization. Our analysis recovers and/or strengthens the convergence properties of several existing algorithms. For example, for PD-S our result leads to the same order of convergence rate without the previously assumed boundedness condition on the constraint sets, and for Prox-JADMM the new result provides convergence rate in terms of the objective value and the feasibility violation. It is well known that the original ADMM may fail to converge when the number of blocks exceeds two. Our result shows that if an appropriate randomization procedure is invoked to select the updating blocks, then a sublinear rate of convergence in expectation can be guaranteed for multi-block ADMM, without assuming any strong convexity. The new approach is also extended to solve problems where only a stochastic approximation of the subgradient of the objective is available, and we establish an \(O(1/\sqrt{t})\) convergence rate of the extended approach for solving stochastic programming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Actually, [36] presented its algorithm in a more general way with the parameters adaptive to the iteration. However, its convergence result assumes constant values of these parameters for the convex case.

References

  1. James, G.M., Paulson, C., Rusmevichientong, P.: The constrained lasso. Technical report. University of Southern California (2013)

  2. Rockafellar, R.T.: Large-scale extended linear-quadratic programming and multistage optimization. Advances in Numerical Partial Differential Equations and Optimization: In: Proceedings of the Fifth Mexico-United States Workshop, vol. 2, pp. 247–261 (1991)

  3. Hong, M., Chang, T.-H., Wang, X., Razaviyayn, M., Ma, S., Luo, Z.-Q.: A block successive upper bound minimization method of multipliers for linearly constrained convex optimization (2014). arXiv:1401.7079

  4. Cui, Y., Li, X., Sun, D., Toh, K.-C.: On the convergence properties of a majorized ADMM for linearly constrained convex optimization problems with coupled objective functions. J. Optim. Theory Appl. 169(3), 1013–1041 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chen, C., Li, M., Liu, X., Ye, Y.: On the convergence of multi-block alternating direction method of multipliers and block coordinate descent method (2015). arXiv:1508.00193

  6. Gao, X., Zhang, S.: First-order algorithms for convex optimization with nonseparate objective and coupled constraints. J. Oper. Res. Soc. China 5(2), 131–159 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  7. Glowinski, R., Marrocco, A.: Sur l’approximation, par eléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM Math. Model. Numer. Anal. 9(R2), 41–76 (1975)

    MATH  Google Scholar 

  8. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  9. Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics, vol. 9. SIAM, Philadelphia (1989)

    Book  MATH  Google Scholar 

  10. Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  11. He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas-Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  12. Monteiro, R.D., Svaiter, B.F.: Iteration-complexity of block-decomposition algorithms and the alternating direction method of multipliers. SIAM J. Optim. 23(1), 475–507 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lin, T., Ma, S., Zhang, S.: On the sublinear convergence rate of multi-block admm. J. Oper. Res. Soc. China 3(3), 251–274 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hong, M., Luo, Z.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162, 165–199 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  16. Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  17. Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: Rasl: Robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)

    Article  Google Scholar 

  18. Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21(1), 57–81 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  19. Chen, C., He, B., Ye, Y., Yuan, X.: The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math. Program. 155(1–2), 57–79 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  20. Deng, W., Lai, M.-J., Peng, Z., Yin, W.: Parallel multi-block ADMM with \(o(1/k)\) convergence. J. Sci. Comput. 71, 1–25 (2016)

    MathSciNet  MATH  Google Scholar 

  21. He, B., Tao, M., Yuan, X.: Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming. Math. Oper. Res. 42(3), 662–691 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  22. He, B., Hou, L., Yuan, X.: On full Jacobian decomposition of the augmented Lagrangian method for separable convex programming. SIAM J. Optim. 25(4), 2274–2312 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  23. He, B., Tao, M., Yuan, X.: Alternating direction method with gaussian back substitution for separable convex programming. SIAM J. Optim. 22(2), 313–340 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  24. Xu, Y.: Hybrid Jacobian and Gauss–Seidel proximal block coordinate update methods for linearly constrained convex programming. SIAM J. Optim. 28(1), 646–670 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  25. Chen, C., Shen, Y., You, Y.: On the convergence analysis of the alternating direction method of multipliers with three blocks. In: Abstract and Applied Analysis, vol. 2013. Hindawi Publishing Corporation (2013)

  26. Cai, X., Han, D., Yuan, X.: On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 66(1), 39–73 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lin, T., Ma, S., Zhang, S.: On the global linear convergence of the admm with multiblock variables. SIAM J. Optim. 25(3), 1478–1497 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  28. Li, M., Sun, D., Toh, K.-C.: A convergent 3-block semi-proximal ADMM for convex minimization problems with one strongly convex block. Asia-Pac. J. Oper. Res. 32(04), 1550024 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  29. Han, D., Yuan, X.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155(1), 227–238 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  30. Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization (2015). arXiv:1511.06324

  31. Lin, T., Ma, S., Zhang, S.: Iteration complexity analysis of multi-block admm for a family of convex minimization without strong convexity. J. Sci. Comput. 69, 1–30 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  32. Chen, L., Sun, D., Toh, K.-C.: An efficient inexact symmetric gauss-seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 161, 237–270 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  33. Li, X., Sun, D., Toh, K.-C.: A schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions. Math. Program. 155(1–2), 333–373 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  34. Sun, D., Toh, K.-C., Yang, L.: A convergent 3-block semiproximal alternating direction method of multipliers for conic programming with 4-type constraints. SIAM J. Optim. 25(2), 882–915 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  35. Sun, R., Luo, Z.-Q., Ye, Y.: On the expected convergence of randomly permuted ADMM (2015). arXiv:1503.06387

  36. Dang, C., Lan, G.: Randomized first-order methods for saddle point optimization (2014). arXiv:1409.8625

  37. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  38. Peng, Z., Wu, T., Xu, Y., Yan, M., Yin, W.: Coordinate friendly structures, algorithms and applications. Ann. Math. Sci. Appl. 1(1), 57–119 (2016)

    MathSciNet  MATH  Google Scholar 

  39. Gao, X., Jiang, B., Zhang, S.: On the information-adaptive variants of the ADMM: an iteration complexity perspective. J. Sci. Comput. 76(1), 327–363 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  40. Xu, Y., Zhang, S.: Accelerated primal-dual proximal block coordinate updating methods for constrained convex optimization. Comput. Optim. Appl. 70(1), 91–128 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  41. Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  42. Xu, Y., Yin, W.: Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J. Optim. 25(3), 1686–1716 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  43. Grant, M., Boyd, S., Ye, Y.: CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx (2013)

  44. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  45. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  46. Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. Math. Program. 152(1–2), 615–642 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  47. Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156, 1–52 (2015)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang-Yang Xu.

Additional information

This work is partly supported by the National Science Foundation (Nos. DMS-1719549 and CMMI-1462408).

Appendices

Proofs of Lemmas

We give proofs of several lemmas that are used to show our main results.

1.1 Proof of Lemma 3.1

We prove (3.2), and (3.3) can be shown by the same arguments. By the optimality of \(x_{I_k}^{k+1}\) from (2.1) or equivalently (2.16), we have for any \(x_{I_k}\in {\mathcal {X}}_{I_k}\),

$$\begin{aligned}&(x_{I_k}-x_{I_k}^{k+1})^\top \left( \nabla _{I_k}f(x^k)-A_{I_k}^\top \lambda ^k +\rho _xA_{I_k}^\top r^{k} +\tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})+\hat{P}_{I_k}(x_{I_k}^{k+1}-x_{I_k}^k)\right) \nonumber \\&\geqslant 0, \end{aligned}$$
(A.1)

where \(\tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})\) is a subgradient of \(u_{I_k}\) at \(x_{I_k}^{k+1}\), and we have used the formula of \(r^{k+\frac{1}{2}}\) given in (2.2). We compute the expectation of each term in (A.1) as follows. First, we have

$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \nabla _{I_k}f(x^k)\nonumber \\= & {} {E}_{I_k}\left( x_{I_k}-x_{I_k}^k\right) ^\top \nabla _{I_k}f(x^k) + {E}_{I_k}(x_{I_k}^k-x_{I_k}^{k+1})^\top \nabla _{I_k}f(x^k)\nonumber \\= & {} \frac{n}{N}\left( x-x^k\right) ^\top \nabla f(x^k) + {E}_{I_k}(x^k-x^{k+1})^\top \nabla f(x^k) \end{aligned}$$
(A.2)
$$\begin{aligned}\leqslant & {} \frac{n}{N}\left( f(x)-f(x^k)\right) + {E}_{I_k}\left[ f(x^k)-f(x^{k+1})+\frac{L_f}{2}\Vert x^k-x^{k+1}\Vert ^2\right] \nonumber \\= & {} \frac{n-N}{N}\left( f(x)-f(x^k)\right) \nonumber \\&+\,{E}_{I_k}\left[ f(x)-f(x^{k+1})+\frac{L_f}{2}\Vert x^k-x^{k+1}\Vert ^2\right] , \end{aligned}$$
(A.3)

where the last inequality is from the convexity of f and the Lipschitz continuity of \(\nabla _{I_k} f(x)\). Secondly,

$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top (-A_{I_k}^\top \lambda ^k)\nonumber \\= & {} {E}_{I_k}(x_{I_k}-x_{I_k}^{k})^\top (-A_{I_k}^\top \lambda ^k)+{E}_{I_k}(x_{I_k}^k-x_{I_k}^{k+1})^\top (-A_{I_k}^\top \lambda ^k)\nonumber \\= & {} \frac{n}{N}\left( x-x^k\right) ^\top (-A^\top \lambda ^k)+{E}_{I_k}(x^k-x^{k+1})^\top (-A^\top \lambda ^k)\nonumber \\= & {} \frac{n-N}{N}\left( x-x^k\right) ^\top (-A^\top \lambda ^k)+{E}_{I_k}(x-x^{k+1})^\top (-A^\top \lambda ^k). \end{aligned}$$
(A.4)

Similarly,

$$\begin{aligned} \rho _x {E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top A_{I_k}^\top r^{k}= & {} \frac{n-N}{N}\rho _x(x-x^k)^\top A^\top r^k\nonumber \\&+\,\rho _x{E}_{I_k}(x-x^{k+1})^\top A^\top r^{k}. \end{aligned}$$
(A.5)

For the fourth term of (A.1), we have

$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})\nonumber \\\leqslant & {} {E}_{I_k}\left[ u_{I_k}(x_{I_k})-u_{I_k}(x_{I_k}^{k+1})\right] \nonumber \\= & {} \frac{n}{N}\, u(x)-{E}_{I_k}[u(x^{k+1})-u(x^k)+u_{I_k}(x_{I_k}^k)]\nonumber \\= & {} \frac{n}{N}\left[ u(x)-u(x^k)\right] +{E}_{I_k}[u(x^k)-u(x^{k+1})]\nonumber \\= & {} \frac{n-N}{N}\left[ u(x)-u(x^k)\right] +{E}_{I_k}[u(x)-u(x^{k+1})], \end{aligned}$$
(A.6)

where the inequality is from the convexity of \(u_{I_k}\). Finally, we have

$$\begin{aligned} {E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \hat{P}_{I_k}(x_{I_k}^{k+1}-x_{I_k}^k)={E}_{I_k}(x-x^{k+1})^\top \hat{P}(x^{k+1}-x^k). \end{aligned}$$
(A.7)

Plugging (A.3) through (A.7) into (A.1) and recalling \(F(x)=f(x)+u(x)\), by rearranging terms we have

$$\begin{aligned}&{E}_{I_k}\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k+1}-x)^\top A^\top r^{k}\right] \nonumber \\&\quad +\,{E}_{I_k}\left( x^{k+1}-x\right) ^\top \hat{P}(x^{k+1}-x^k)-\frac{L_f}{2}{E}_{I_k}\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\leqslant \frac{N-n}{N}\left[ F(x^k)-F(x)+(x^{k}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k}-x)^\top A^\top r^{k}\right] . \end{aligned}$$
(A.8)

Note

$$\begin{aligned}&\qquad (x^{k+1}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k+1}-x)^\top A^\top r^{k}\\&\,\,\, =(x^{k+1}-x)^\top (-A^\top \lambda ^k)+\rho _x(x^{k+1}-x)^\top A^\top r^{k+1}\\&\qquad -\,\rho _x(x^{k+1}-x)^\top A^\top A(x^{k+1}-x^k) -\rho _x(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k)\\&{\overset{(2.5)}{=}}(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})+(\rho _x-\rho )(x^{k+1}-x)^\top A^\top r^{k+1}\\&\qquad -\,\rho _x(x^{k+1}-x)^\top A^\top A(x^{k+1}-x^k)-\rho _x(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k). \end{aligned}$$

Hence, we can rewrite (A.8) equivalently into (3.2). Through the same arguments, one can show (3.3), thus completing the proof.

1.2 Proof of Lemma 3.3

Letting \(x=x^*,y=y^*\) in (3.8), we have for any \(\lambda \) that

$$\begin{aligned}&\qquad E[h(x^*,y^*,\lambda )]\nonumber \\&\quad \,\,\geqslant E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+(\hat{x}-x^*)^{\top }(-A^{\top }\hat{\lambda }) +(\hat{y}-y^*)^{\top }(-B^{\top }\hat{\lambda })\right. \nonumber \\&\qquad \quad \left. +(\hat{\lambda }-\lambda )^{\top }(A\hat{x}+B\hat{y}-b)\right] \nonumber \\&\quad = E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+\langle \hat{\lambda },Ax^*+By^*-b\rangle -\langle \lambda ,A\hat{x}+B\hat{y}-b\rangle \right] \nonumber \\&\quad =E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)-\langle \lambda ,A\hat{x}+B\hat{y}-b\rangle \right] , \end{aligned}$$
(A.9)

where the last equality follows from the feasibility of \((x^*,y^*)\). For any \(\gamma >0\), restricting \(\lambda \) in \(\mathcal {B}_\gamma \), we have

$$\begin{aligned} E[h(x^*,y^*,\lambda )]\leqslant \sup \limits _{\lambda \in {\mathcal {B}}_\gamma }h(x^*,y^*,\lambda ). \end{aligned}$$

Hence, letting \(\lambda =-\frac{\gamma (A\hat{x}+B\hat{y}-b)}{\Vert A\hat{x}+B\hat{y}-b\Vert }\in \mathcal {B}_\gamma \) in (A.9) gives the desired result.

1.3 Proof of Lemma 3.5

In view of (3.9), we have

$$\begin{aligned}&E\big [(\gamma - \Vert \lambda ^*\Vert )\Vert A\hat{x}+B\hat{y}-b\Vert \big ]\\\leqslant & {} {E}\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+\gamma \Vert A\hat{x}+B\hat{y}-b\Vert \big ] \leqslant \varepsilon , \end{aligned}$$

which implies

$$\begin{aligned} E\Vert A\hat{x}+B\hat{y}-b\Vert \leqslant \frac{\varepsilon }{\gamma -\Vert \lambda ^*\Vert }, \text{ and } E\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big ] \leqslant \varepsilon . \end{aligned}$$
(A.10)

Denote \(a^-=\max (0,-a)\) for a real number a. Then, from (3.9) and (A.10), it follows that

$$\begin{aligned} E\big (\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big )^- \leqslant \Vert \lambda ^*\Vert \cdot E\Vert A\hat{x}+B\hat{y}-b\Vert \leqslant \frac{\Vert \lambda ^*\Vert \varepsilon }{\gamma -\Vert \lambda ^*\Vert }. \end{aligned}$$

Noting \(|a| = a + 2a ^ -\) for any real number a, we have

$$\begin{aligned}&\quad E\big |\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big | \nonumber \\&= E\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big ] + 2E\big (\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big )^- \nonumber \\&\leqslant \left( \frac{2\Vert \lambda ^*\Vert }{\gamma -\Vert \lambda ^*\Vert } + 1\right) \varepsilon . \end{aligned}$$
(A.11)

1.4 Proof of Inequalities (4.9c) and (4.9d) with \(\alpha _k=\frac{\alpha _0}{\sqrt{k}}\)

We have \(\beta _k=\frac{\alpha _0}{\rho \big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )},\) and

$$\begin{aligned}&\qquad \frac{\alpha _{k-1}}{\alpha _{k}}\beta _{k}+(1-\theta )\beta _{k+1}-\frac{\alpha _k}{\alpha _{k+1}}\beta _{k+1}-(1-\theta )\beta _{k}\\&=\frac{\alpha _0}{\rho }\left[ \left( \frac{\sqrt{k}}{\sqrt{k-1}}-(1-\theta )\right) \frac{1}{\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )} \right. \\&\qquad \left. -\left( \frac{\sqrt{k+1}}{\sqrt{k}}-(1-\theta )\right) \frac{1}{\big (\sqrt{k+1}-(1-\theta )\sqrt{k}\big )}\right] \\&=: \frac{\alpha _0}{\rho }[\psi (k)-\psi (k+1)]. \end{aligned}$$

By elementary calculus, we have

$$\begin{aligned} \psi '(k)= & {} \frac{\frac{\sqrt{k-1}}{\sqrt{k}}-\frac{\sqrt{k}}{\sqrt{k-1}}}{2(k-1)}\frac{1}{\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )}\\&\qquad +\left( \frac{\sqrt{k}}{\sqrt{k-1}}-(1-\theta )\right) \frac{-1}{2\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )^2}\left( \frac{1}{\sqrt{k}}-\frac{1-\theta }{\sqrt{k-1}}\right) \\= & {} \frac{1}{2(k-1)(\sqrt{k}-(1-\theta )\sqrt{k-1})} \\&\qquad \times \left[ \frac{\sqrt{k-1}}{\sqrt{k}}-\frac{\sqrt{k}}{\sqrt{k-1}}-\sqrt{k-1}\left( \frac{1}{\sqrt{k}}-\frac{1-\theta }{\sqrt{k-1}}\right) \right] \\= & {} \frac{1}{2(k-1)(\sqrt{k}-(1-\theta )\sqrt{k-1})}\left( (1-\theta )-\frac{\sqrt{k}}{\sqrt{k-1}}\right) <0. \end{aligned}$$

Hence, \(\psi (k)\) is decreasing with respect to k, and thus (4.9c) holds.

When \(\alpha _k=\frac{\alpha _0}{\sqrt{k}}\), (4.9d) becomes

$$\begin{aligned} \frac{\alpha _0}{2\rho \sqrt{t}}\geqslant \left| \left( \frac{\sqrt{t}}{\sqrt{t-1}}-(1-\theta )\right) \frac{\alpha _0}{\rho (\sqrt{t}-(1-\theta )\sqrt{t-1})}-\frac{\alpha _0}{\rho \sqrt{t}}\right| , \end{aligned}$$

which is equivalent to

$$\begin{aligned} \frac{1}{2}\geqslant \frac{\sqrt{t}}{\sqrt{t-1}}-1 \Longleftrightarrow t\geqslant \frac{9}{5}. \end{aligned}$$

This completes the proof.

Proofs of Theorems

In this section, we give the technical details for showing all theorems. For simplicity of notation, throughout the proofs of this section, we define \(\tilde{P}\) and \(\tilde{Q}\) as follows:

$$\begin{aligned} \tilde{P}=\hat{P}-\rho _x A^\top A,\qquad \tilde{Q}=\hat{Q}-\rho _y B^\top B. \end{aligned}$$
(B.1)

1.1 Proof of Theorem 3.6

Taking expectation over both sides of (3.2) and summing it over \(k=0\) through t, we have

$$\begin{aligned}&E\big [F(x^{t+1})-F(x)+(x^{t+1}-x)^\top (-A^\top \lambda ^{t+1})\big ]+(1-\theta )\rho _x E(x^{t+1}-x)^\top A^\top r^{t+1}\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\big [F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})\big ]\nonumber \\&\qquad -\sum _{k=0}^t\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k)\nonumber \\&\qquad +\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)-\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] , \end{aligned}$$
(B.2)

where we have used \(\frac{n}{N}=\theta \), the condition in (3.17) and the definition of \(\tilde{P}\) in (B.1). Similarly, taking expectation over both sides of (3.3), summing it over \(k=0\) through t, we have

$$\begin{aligned}&E\big [G(y^{t+1})-G(y)+(y^{t+1}-y)^\top (-B^\top \lambda ^{t+1})\big ]\nonumber \\&\qquad + (1-\theta )\rho _y E(y^{t+1}-y)^\top B^\top r^{t+1}\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\big [G(y^{k+1})-G(y)+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\big ]\nonumber \\&\qquad +\sum _{k=0}^t E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)-\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&\leqslant (1-\theta )\left[ G(y^0)-G(y)+(y^{0}-y)^\top (-B^\top \lambda ^0)+ \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad +(1-\theta )\sum _{k=0}^t E\rho _y(y^{k}-y)^\top B^\top A(x^{k+1}-x^k). \end{aligned}$$
(B.3)

Recall \(\lambda ^{k+1}=\lambda ^k-\rho r^{k+1}\), thus

$$\begin{aligned} (\lambda ^{k+1}-\lambda )^\top r^{k+1} =-\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k), \end{aligned}$$
(B.4)

where \(\lambda \) is an arbitrary vector and possibly random. Denote \(\tilde{\lambda }^{t+1}=\lambda ^t-\rho _xr^{t+1}.\) Then, similar to (B.4), we have

$$\begin{aligned} (\tilde{\lambda }^{t+1}-\lambda )^\top r^{t+1} =-\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t). \end{aligned}$$
(B.5)

Summing (B.2) and (B.3) together and using (B.4) and (B.5), we have

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1}) \right. \left. +\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right. \nonumber \\&\qquad \left. +\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k)\right] \nonumber \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad +(1-\theta )\left[ G(y^0)-G(y)+(y^{0}-y)^\top (-B^\top \lambda ^0)+ \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad +\sum _{k=0}^t\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) +(1-\theta )\sum _{k=0}^t\rho _y E(y^{k}-y)^\top B^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\sum _{k=0}^t E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2, \end{aligned}$$
(B.6)

where we have used \(\varPhi (x,y)=F(x)+G(y)\) and the definition of H given in (2.18).

When \(B=0\) and \(y^k\equiv y^0\), (B.6) reduces to

$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})+\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right. \\&\qquad \left. +\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k)\right] \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$

Using (2.21) and noting \(\theta =\frac{\rho }{\rho _x}\), from the above inequality after canceling terms we have

$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&+\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&+\frac{1}{2\rho _x} E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2+\sum _{k=0}^{t-1}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&-\frac{1}{2}E\left[ \Vert x^{t+1}-x\Vert _{\tilde{P}}^2-\Vert x^0-x\Vert _{\tilde{P}}^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\right] \nonumber \\&+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$
(B.7)

For any feasible x, we note \(\tilde{\lambda }^{t+1}-\lambda ^t=\rho _xA(x^{t+1}-x)\) and thus

$$\begin{aligned} \frac{1}{\rho _x}\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2=\rho _x\Vert x^{t+1}-x\Vert _{A^\top A}^2. \end{aligned}$$
(B.8)

In addition, since \(x^{k+1}\) and \(x^k\) differ only on the index set \(I_k\), we have by recalling \(\tilde{P}=\hat{P}-\rho _x A^\top A\) that

$$\begin{aligned} \Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2-L_f\Vert x^{k+1}-x^k\Vert ^2= & {} \Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert _{\hat{P}_{I_k}}^2-\Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert _{\rho _x A_{I_k}^\top A_{I_k}}^2\nonumber \\&-L_f\Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert ^2. \end{aligned}$$
(B.9)

Plugging (B.8) and (B.9) into (B.7) and using (3.10) lead to

$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&+\,\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&+\frac{1}{2\rho _x} E\Vert \lambda ^0-\lambda \Vert ^2+\frac{1}{2}\Vert x^0-x\Vert _{\tilde{P}}^2. \end{aligned}$$

The desired result follows from \(\lambda ^0=0\), and Lemmas 3.3 and 3.5 with \(\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}\).

1.2 Proof of Theorem 3.7

It follows from (3.3) with \(\rho _y=\rho \) and \(m=M\) that (recall the definition of \(\tilde{Q}\) in (B.1)) for any \(y\in {\mathcal {Y}}\),

$$\begin{aligned}&G(y^{k+1})-G(y)-\frac{L_g}{2}\Vert y^k-y^{k+1}\Vert ^2+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\nonumber \\&\quad +(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)\leqslant 0. \end{aligned}$$
(B.10)

Similar to (B.10), and recalling the definition of \(\tilde{y}^{t+1}\), we have for any \(y\in {\mathcal {Y}}\),

$$\begin{aligned}&G(\tilde{y}^{t+1})-G(y)-\frac{L_g}{2}\Vert \tilde{y}^{t+1}-y^t\Vert ^2+(\tilde{y}^{t+1}-y)^\top (-B^\top \tilde{\lambda }^{t+1})\nonumber \\&\quad +\theta (\tilde{y}^{t+1}-y)^\top \tilde{Q}(y^{t+1}-y^t)\leqslant 0, \end{aligned}$$
(B.11)

where

$$\begin{aligned} \tilde{\lambda }^{t+1}=\lambda ^t-\rho _x(Ax^{t+1}+B\tilde{y}^{t+1}-b). \end{aligned}$$
(B.12)

Adding (B.10) and (B.11) to (B.2) and using the formula of \(\lambda ^k\) give

$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(x^{t+1}-x)^\top (-A^\top \tilde{\lambda }^{t+1})\right] \nonumber \\&\qquad +E\left( \tilde{\lambda }^{t+1}-\lambda \right) ^\top \left( Ax^{t+1}+B\tilde{y}^{t+1}-b+\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda ^t)\right) \nonumber \\&\qquad +E\left[ G(\tilde{y}^{t+1})-G(y)+(\tilde{y}^{t+1}-y)^\top (-B^\top \tilde{\lambda }^{t+1}) \right. \nonumber \\&\left. \qquad +\theta (\tilde{y}^{t+1}-y)^\top \tilde{Q}(\tilde{y}^{t+1}-y^k)\right] -\frac{L_g}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert ^2\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})\right] \nonumber \\&\qquad -\sum _{k=0}^{t-1}\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) -\rho _x{E}(x^{t+1}-x)^\top A^\top B(\tilde{y}^{t+1}-y^t)\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ G(y^{k+1})-G(y)-\frac{L_g}{2}\Vert y^k-y^{k+1}\Vert ^2+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\right. \nonumber \\&\qquad \left. +(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E(\lambda ^{k+1}-\lambda )^\top \left( r^{k+1}+\frac{1}{\rho }(\lambda ^{k+1}-\lambda ^k)\right) \nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad -\sum _{k=0}^tE(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$
(B.13)

By the notation in (2.18) and using (3.6), (B.13) can be written into

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\&\qquad +\theta \sum _{k=0}^{t-1}E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\theta {E}(\tilde{y}^{t+1}-y)^\top \tilde{Q}(\tilde{y}^{t+1}-y^t)\\&\qquad -\sum _{k=0}^{t-1}\rho _xE\left( \frac{1}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k) -(y^{k+1}-y)^\top B^\top B(y^{k+1}-y^k)\right) \\&\qquad -\rho _x E\left( \frac{1}{\rho _x}(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t) -(\tilde{y}^{t+1}-y)^\top B^\top B(\tilde{y}^{t+1}-y^t)\right) \\&\qquad +\frac{\theta }{\rho } E(\tilde{\lambda }^{t+1}-\lambda )^\top \left( \tilde{\lambda }^{t+1}-\lambda ^t\right) +\frac{\theta }{\rho }\sum _{k=0}^{t-1}E(\lambda ^{k+1}-\lambda )^\top \left( \lambda ^{k+1}-\lambda ^k\right) \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\\&\qquad +\frac{\theta L_g}{2}\sum _{k=0}^{t-1}E\Vert y^k-y^{k+1}\Vert ^2+\frac{L_g}{2} E\Vert \tilde{y}^{t+1}-y^t\Vert ^2. \end{aligned}$$

Now use (2.21) to derive from the above inequality that

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\theta }{2}\left( E\Vert \tilde{y}^{t+1}-y\Vert _{\tilde{Q}}^2-\Vert y^0-y\Vert _{\tilde{Q}}^2\right) \nonumber \\&\qquad +\frac{\theta }{2}\sum _{k=0}^{t-1}E\Vert y^{k+1}-y^k\Vert _{\tilde{Q}}^2+\frac{\theta }{2}E\Vert \tilde{y}^{t+1}-y^t\Vert _{\tilde{Q}}^2\nonumber \\&\qquad +\frac{\rho _x}{2}\left( E\Vert \tilde{y}^{t+1}-y\Vert _{B^\top B}^2-\Vert y^0-y\Vert _{B^\top B}^2\right) \nonumber \\&\qquad +\frac{\rho _x}{2}\sum _{k=0}^{t-1}E\Vert y^{k+1}-y^k\Vert _{B^\top B}^2+\frac{\rho _x}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert _{B^\top B}^2\nonumber \\&\qquad -\sum _{k=0}^{t-1}E\frac{\rho _x}{\rho }\left( \lambda ^k-\lambda ^{k+1}\right) ^\top B(y^{k+1}-y^k) -E(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t)\nonumber \\&\qquad +\frac{\theta }{2\rho }\left( E\Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2\right) \nonumber \\&\qquad +\frac{\theta }{2\rho }\sum _{k=0}^{t-1}E\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2+\frac{\theta }{2\rho }E\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\nonumber \\&\quad \leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad -\frac{1}{2}\left[ E\Vert x^{t+1}-x\Vert ^2_{\tilde{P}}-\Vert x^0-x\Vert ^2_{\tilde{P}}\right. \nonumber \\&\qquad \left. +\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2_{\tilde{P}}\right] +\frac{L_f}{2}\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad +\frac{\theta L_g}{2}\sum _{k=0}^{t-1}E\Vert y^k-y^{k+1}\Vert ^2+\frac{L_g}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert ^2. \end{aligned}$$
(B.14)

Note that for \(k\leqslant t-1\),

$$\begin{aligned} -\frac{\rho _x}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k)+\frac{\theta }{2\rho }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\geqslant -\frac{\rho }{2\theta ^3}\Vert y^{k+1}-y^k\Vert _{B^\top B}^2 \end{aligned}$$

and

$$\begin{aligned} -(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t)+\frac{\theta }{2\rho }\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\geqslant -\frac{\rho }{2\theta }\Vert \tilde{y}^{t+1}-y^t\Vert _{B^\top B}^2. \end{aligned}$$

Because \(\tilde{P}, {\tilde{Q}}\) and \(\rho \) satisfy (3.13), we have from (B.14) that

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&+\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&+\frac{1}{2}\Vert x^0-x\Vert ^2_{\tilde{P}}+\frac{\theta }{2}\Vert y^0-y\Vert _{\tilde{Q}}^2+\frac{\rho }{2\theta }\Vert y^0-y\Vert _{B^\top B}^2+\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$

Similar to Theorem 3.8, from the convexity of \(\varPhi \) and (2.23), we have

$$\begin{aligned}&(1+\theta t)E\big [\varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H({w})\big ]\nonumber \\&\leqslant (1-\theta )\big [F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\big ]\nonumber \\&+\frac{1}{2}\Vert x^0-x\Vert ^2_{\tilde{P}}+\frac{\theta }{2}\Vert y^0-y\Vert _{\tilde{Q}}^2+\frac{\rho }{2\theta }\Vert y^0-y\Vert _{B^\top B}^2+\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
(B.15)

Noting \(\lambda ^0=0\) and \((x^{0}-x)^\top A^\top r^{0}\leqslant \frac{1}{2}\big [\Vert x^0-x\Vert _{A^\top A}+\Vert r^0\Vert ^2\big ]\), and using Lemmas 3.3 and 3.5 with \(\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}\), we obtain the result (3.16).

1.3 Proof of Theorem 3.8

Using (3.6) and (3.7), applying (2.21) to the cross terms, and also noting the definition of \(\tilde{P}\) and \(\tilde{Q}\) in (B.1), we have

$$\begin{aligned}&-\frac{\theta }{\rho }E\left[ \sum _{k=0}^{t-1}(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k) +(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \nonumber \\&\qquad +\sum _{k=0}^t\rho _xE(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) +(1-\theta )\sum _{k=0}^t\rho _y E(y^{k}-y)^\top B^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\sum _{k=0}^tE(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\sum _{k=0}^tE(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&=-\frac{\theta }{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=0}^{t-1}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\rho _x}{\rho }\sum _{k=0}^t E(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k) +\frac{(1-\theta )\rho _y}{\rho }\sum _{k=0}^t E(\lambda ^{k-1}-\lambda ^{k})^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\frac{\theta \rho _y}{2}E\left( \Vert x^0-x\Vert _{A^\top A}^2-\Vert x^{t+1}-x\Vert _{A^\top A}^2\right) +\frac{(2-\theta )\rho _y}{2}\sum _{k=0}^t E\Vert x^{k+1}-x^k\Vert _{A^\top A}^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert x^{t+1}-x\Vert _{\hat{P}}^2-\Vert x^0-x\Vert _{\hat{P}}^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\hat{P}}^2\right) \nonumber \\&\qquad +\frac{L_f}{2}\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert y^{t+1}-y\Vert _{\hat{Q}}^2-\Vert y^0-y\Vert _{\hat{Q}}^2+\sum _{k=0}^t\Vert y^{k+1}-y^k\Vert _{\hat{Q}}^2\right) \nonumber \\&\qquad +\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2, \end{aligned}$$
(B.16)

where we have used the conditions in (3.17). By Young’s inequality, we have that for \(0\leqslant k\leqslant t\),

$$\begin{aligned}&\quad \frac{\rho _x}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k)-\frac{\theta }{2\rho }\frac{1}{2-\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\nonumber \\&\leqslant \frac{\rho }{\theta }\frac{2-\theta }{2}\frac{\rho _x^2}{\rho ^2}\Vert B(y^{k+1}-y^k)\Vert ^2 \overset{(3.17)}{=} \frac{(2-\theta )\rho _y}{2\theta ^2}\Vert y^{k+1}-y^k\Vert _{B^\top B}^2, \end{aligned}$$
(B.17)

and for \(1\leqslant k\leqslant t\),

$$\begin{aligned}&\qquad \frac{(1-\theta )\rho _y}{\rho }(\lambda ^{k-1}-\lambda ^{k})^\top A(x^{k+1}-x^k)-\frac{\theta }{2\rho }\frac{1-\theta }{2-\theta }\Vert \lambda ^{k-1}-\lambda ^k\Vert ^2\nonumber \\&\,\,\,\leqslant (1-\theta )\frac{\rho }{\theta }\frac{(2-\theta )\rho _y^2}{2\rho ^2}\Vert A(x^{k+1}-x^k)\Vert ^2\nonumber \\&\overset{(3.17)}{=}\frac{(1-\theta )(2-\theta )}{2\theta ^2}\rho _x\Vert x^{k+1}-x^k\Vert _{A^\top A}^2. \end{aligned}$$
(B.18)

Plugging (B.17) and (B.18) and also noting \(\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\geqslant \Vert \lambda ^{t+1}-\lambda ^t\Vert ^2\), we can upper bound the right-hand side of (B.16) by

$$\begin{aligned}&\quad -\frac{\theta }{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2\right] -\frac{\theta \rho _y}{2}E\left( \Vert x^0-x\Vert _{A^\top A}^2-\Vert x^{t+1}-x\Vert _{A^\top A}^2\right) \nonumber \\&\qquad +\left( \frac{(1-\theta )(2-\theta )}{2\theta ^2}\rho _x+\frac{(2-\theta )\rho _y}{2}\right) \sum _{k=0}^t E\Vert x^{k+1}-x^k\Vert _{A^\top A}^2\nonumber \\&\qquad +\frac{(2-\theta )\rho _y}{2\theta ^2}\sum _{k=0}^t E\Vert y^{k+1}-y^k\Vert _{B^\top B}^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert x^{t+1}-x\Vert _{\hat{P}}^2-\Vert x^0-x\Vert _{\hat{P}}^2\right. \nonumber \\&\qquad \left. +\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\hat{P}}^2\right) +\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert y^{t+1}-y\Vert _{\hat{Q}}^2-\Vert y^0-y\Vert _{\hat{Q}}^2+\sum _{k=0}^t\Vert y^{k+1}-y^k\Vert _{\hat{Q}}^2\right) \nonumber \\&\qquad +\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&\overset{(3.18{\mathrm{a}})}{\leqslant }\frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
(B.19)

In addition, note that

$$\begin{aligned}&\theta \Vert x^{t+1}-x\Vert _{A^\top A}^2=\frac{n}{N}\left\| \sum _{i=1}^N A_i(x_i^{t+1}-x_i)\right\| ^2\leqslant n\sum _{i=1}^N\Vert x_i^{t+1}-x_i\Vert _{A_i^\top A_i}^2,\\&\Vert x^k-x^{k+1}\Vert _{A^\top A}^2=\left\| \sum _{i\in I_k}A_i(x_i^k-x_i^{k+1})\right\| ^2\leqslant n\sum _{i=1}^N\Vert x_i^k-x_i^{k+1}\Vert _{A_i^\top A_i}^2,\\&\Vert y^k-y^{k+1}\Vert _{B^\top B}^2=\left\| \sum _{j\in J_k}B_j(y_j^k-y_j^{k+1})\right\| ^2\leqslant m\sum _{j=1}^M\Vert y_j^k-y_j^{k+1}\Vert _{B_j^\top B_j}. \end{aligned}$$

Hence, if \(\hat{P}\) and \(\hat{Q}\) satisfy (3.18b), then (B.19) also holds.

Combining (B.6), (B.16), and (B.19) yields

$$\begin{aligned}&E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\\leqslant & {} (1-\theta )\left[ \varPhi (x^0,y^0)-\varPhi (x,y)\right] \nonumber \\&\qquad +(1-\theta )\left[ (x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}+(y^{0}-y)^\top (-B^\top \lambda ^0)\right. \nonumber \\&\qquad \left. + \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad + \frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }{E}\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
(B.20)

Applying the convexity of \(\varPhi \) and the properties (2.23) of H, we have

$$\begin{aligned}&~~\qquad (1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(w)\right] \nonumber \\&\overset{(2.11)}{=} (1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(\hat{w}^{t+1})\right] \nonumber \\&\overset{(2.14)}{\leqslant }E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\,\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] .\qquad \end{aligned}$$
(B.21)

Now combining (B.21) and (B.20), we have

$$\begin{aligned}&(1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(w)\right] \nonumber \\\leqslant & {} (1-\theta )\left[ \varPhi (x^0,y^0)-\varPhi (x,y)\right] \nonumber \\&\quad +(1-\theta )\left[ (x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}+(y^{0}-y)^\top (-B^\top \lambda ^0)\right. \nonumber \\&\quad \left. + \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\quad + \frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
(B.22)

By Lemmas 3.3 and 3.5 with \(\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}\), we have the desired result.

1.4 Proof of Theorem 4.2

From the nonincreasing monotonicity of \(\alpha _k\), one can easily show the following result.

Lemma B.1

Assume \(\lambda ^{-1}=\lambda ^0\). It holds that

$$\begin{aligned}&\sum _{k=0}^t\frac{(1-\theta )\beta _k}{2}\left[ \Vert \lambda ^{k}-\lambda \Vert ^2-\Vert \lambda ^{k-1}-\lambda \Vert ^2+\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\right] \nonumber \\&\qquad -\sum _{k=0}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\left[ \Vert \lambda ^{k+1}-\lambda \Vert ^2-\Vert \lambda ^{k}-\lambda \Vert ^2+\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\leqslant -\sum _{k=0}^{t-1}\frac{\beta _{k+1}}{2}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2 \nonumber \\&-\sum _{k=0}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\Vert \lambda ^{k+1}-\lambda \Vert ^2 -\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k-1}-\lambda \Vert ^2\nonumber \\&+\frac{\alpha _0\beta _1}{2\alpha _1}\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=1}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\Vert \lambda ^{k}-\lambda \Vert ^2 +\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k}-\lambda \Vert ^2\nonumber \\&=-\sum _{k=0}^{t-1}\frac{\theta \beta _{k+1}}{2}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) \Vert \lambda ^0-\lambda \Vert ^2\nonumber \\&-\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) \Vert \lambda ^{t}-\lambda \Vert ^2\nonumber \\&-\sum _{k=1}^{t-1}\left( \frac{\alpha _{k-1}\beta _k}{2\alpha _k} +\frac{(1-\theta )\beta _{k+1}}{2}-\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}-\frac{(1-\theta )\beta _k}{2}\right) \Vert \lambda ^{k}-\lambda \Vert ^2. \end{aligned}$$
(B.23)

By the update formula of \(\lambda \) in (4.5), we have from (4.8) that

$$\begin{aligned}&\quad E\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})+(\lambda ^{k+1}-\lambda )^\top r^{k+1}\right] \nonumber \\&\quad + E\left[ \frac{(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^{k})}{\left( 1-\frac{(1-\theta )\alpha _{k+1}}{\alpha _k }\right) \rho } +\frac{(1-\theta )\alpha _{k+1}}{\alpha _k}\rho (x^{k+1}-x)^\top A^\top r^{k+1}\right] \nonumber \\&\quad + E(x^{k+1}-x)^\top \left( \tilde{P}+\frac{I}{\alpha _k}\right) (x^{k+1}-x^k) -\frac{L_f}{2} E\Vert x^k-x^{k+1}\Vert ^2+ E(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta ) E\left[ F(x^k)-F(x)+(x^{k}-x)^\top (-A^\top \lambda ^k) \right. \left. +(\lambda ^{k}-\lambda )^\top r^{k}+\frac{(\lambda ^{k}-\lambda )^\top (\lambda ^k-\lambda ^{k-1})}{\left( 1-\frac{(1-\theta )\alpha _{k}}{\alpha _{k-1}}\right) \rho } \right] \nonumber \\&\quad +(1-\theta )\rho E(x^{k}-x)^\top A^\top r^{k}, \end{aligned}$$
(B.24)

where similar to (B.1), we have defined \(\tilde{P}=\hat{P}-\rho A^\top A\).

Multiplying \(\alpha _k\) to both sides of (B.24) and using (4.2) and (2.21), we have

$$\begin{aligned}&\quad \alpha _k E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}} E\left[ \Vert \lambda ^{k+1}-\lambda \Vert ^2-\Vert \lambda ^{k}-\lambda \Vert ^2+\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\qquad + E\left[ (1-\theta )\alpha _{k+1}\rho (x^{k+1}-x)^\top A^\top r^{k+1}\right] \nonumber \\&\qquad +\frac{\alpha _k}{2} E\big [\Vert x^{k+1}-x\Vert _{\tilde{P}}^2-\Vert x^{k}-x\Vert _{\tilde{P}}^2+\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\big ]\nonumber \\&\qquad +\frac{1}{2} E\left[ \Vert x^{k+1}-x\Vert ^2-\Vert x^{k}-x\Vert ^2+\Vert x^{k+1}-x^k\Vert ^2\right] \nonumber \\&\qquad -\frac{\alpha _kL_f}{2} E\Vert x^k-x^{k+1}\Vert ^2+\alpha _k{E}(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta )\alpha _k E\left[ F(x^k)-F(x)+(w^{k}-w)^\top H(w^k)\right] \nonumber \\&\qquad +\frac{(1-\theta )\beta _k}{2} E\left[ \Vert \lambda ^{k}-\lambda \Vert ^2-\Vert \lambda ^{k-1}-\lambda \Vert ^2+\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\right] \nonumber \\&\qquad +\alpha _k(1-\theta )\rho E(x^{k}-x)^\top A^\top r^{k}. \end{aligned}$$
(B.25)

Denote \(\tilde{\lambda }^{t+1}=\lambda ^t-\rho r^{t+1}.\) Then, for \(k=t\), it is easy to see that (B.25) becomes

$$\begin{aligned}&\quad \alpha _tE\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\frac{\alpha _t}{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\alpha _t}{2}E\left[ \Vert x^{t+1}-x\Vert _{\tilde{P}}^2-\Vert x^{t}-x\Vert _{\tilde{P}}^2+\Vert x^{t+1}-x^t\Vert _{\tilde{P}}^2\right] \nonumber \\&\qquad +\frac{1}{2}E\left[ \Vert x^{t+1}-x\Vert ^2-\Vert x^{t}-x\Vert ^2+\Vert x^{t+1}-x^t\Vert ^2\right] \nonumber \\&\qquad -\frac{\alpha _tL_f}{2}E\Vert x^t-x^{t+1}\Vert ^2+\alpha _t{E}(x^{t+1}-x^t)^\top \delta ^t\nonumber \\&\leqslant (1-\theta )\alpha _t E\left[ F(x^t)-F(x)+(w^{t}-w)^\top H(w^t)\right] \nonumber \\&\qquad +\frac{(1-\theta )\beta _t}{2}E\left[ \Vert \lambda ^{t}-\lambda \Vert ^2-\Vert \lambda ^{t-1}-\lambda \Vert ^2+\Vert \lambda ^{t}-\lambda ^{t-1}\Vert ^2\right] \nonumber \\&\qquad +\alpha _t(1-\theta ) E\rho (x^{t}-x)^\top A^\top r^{t}. \end{aligned}$$
(B.26)

By the nonincreasing monotonicity of \(\alpha _k\), summing (B.25) from \(k=0\) through \(t-1\) and (B.26) and plugging (B.23) give

$$\begin{aligned}&\alpha _t E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \alpha _{k+1}\sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\alpha _t}{2\rho } E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\alpha _{t+1}}{2} E\Vert x^{t+1}-x\Vert _{\tilde{P}}^2 +\sum _{k=0}^t\frac{\alpha _k}{2} E\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\nonumber \\&\qquad +\frac{1}{2} E\big [\Vert x^{t+1}-x\Vert ^2-\Vert x^{0}-x\Vert ^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert ^2\big ]\nonumber \\&\qquad -\sum _{k=0}^t\frac{\alpha _kL_f}{2}E\Vert x^k-x^{k+1}\Vert ^2+\sum _{k=0}^t\alpha _k E(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta )\alpha _0 E\left[ F(x^0)-F(x)+(w^{0}-w)^\top H(w^0)\right] \nonumber \\&\qquad +\alpha _0(1-\theta )\rho (x^{0}-x)^\top A^\top r^{0}+\frac{\alpha _0}{2}\Vert x^{0}-x\Vert _{\tilde{P}}^2\nonumber \\&\qquad -\sum _{k=0}^{t-1}\frac{\theta \beta _{k+1}}{2} E\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) E\Vert \lambda ^0-\lambda \Vert ^2\nonumber \\&\qquad -\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) E\Vert \lambda ^{t}-\lambda \Vert ^2\nonumber \\&\qquad -\sum _{k=1}^{t-1}\left( \frac{\alpha _{k-1}\beta _k}{2\alpha _k}+\frac{(1-\theta )\beta _{k+1}}{2} -\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}-\frac{(1-\theta )\beta _k}{2}\right) E\Vert \lambda ^{k}-\lambda \Vert ^2 .\qquad \quad \quad \end{aligned}$$
(B.27)

From (4.9d), we have

$$\begin{aligned}&\frac{\alpha _t}{2\rho }\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2 +\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\\geqslant & {} -\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) \Vert \lambda ^{t}-\lambda \Vert ^2. \end{aligned}$$

In addition, from Young’s inequality, it holds that

$$\begin{aligned} \frac{1}{2}\Vert x^{k+1}-x^k\Vert ^2+\alpha _k{E}(x^{k+1}-x^k)^\top \delta ^k \geqslant \frac{\alpha ^2_k}{2}\Vert \delta \Vert ^2. \end{aligned}$$

Hence, dropping negative terms on the right-hand side of (B.27), from the convexity of \(\varPhi \) and (2.23), we have

$$\begin{aligned}&\left( \alpha _{t+1}+\theta \sum \limits _{k=1}^t\alpha _k\right) E\left[ F(\hat{x}^{t})-F(x)+(\hat{w}^{t}-w)^\top H(\hat{w}^{t})\right] \nonumber \\&\qquad \alpha _t E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \alpha _{k+1}\sum _{k=0}^{t-1}E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\leqslant (1-\theta )\alpha _0\left[ F(x^0)-F(x)+(w^{0}-w)^\top H(w^0)\right] \nonumber \\&\qquad +(1-\theta )\alpha _0\rho (x^{0}-x)^\top A^\top r^{0}+\frac{\alpha _0}{2}\Vert x^{0}-x\Vert _{\tilde{P}}^2+\frac{1}{2}\Vert x^0-x\Vert ^2\nonumber \\&\qquad +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) E\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=0}^t\frac{\alpha _k^2}{2}E\Vert \delta ^k\Vert ^2. \end{aligned}$$
(B.28)

Using Lemma 3.3 and the properties of H, we derive the desired result.

1.5 Proof of Proposition 6.1

Let \((I+\partial \phi )^{-1}(x):={{\,\mathrm{arg\,min}\,}}_z \phi (z)+\frac{1}{2}\Vert z-x\Vert _2^2\) denote the proximal mapping of \(\phi \) at x. Then, the update in (1.9b) can be written to

$$\begin{aligned} z^{k+1}=\left( I+\partial \left( \frac{g^*}{\eta }\right) \right) ^{-1}\left( z^k-\frac{1}{\eta }Ax^{k+1}\right) . \end{aligned}$$

Define \(y^{k+1}\) as that in (6.7b). Then,

$$\begin{aligned} \frac{1}{\eta }y^{k+1}= & {} \frac{1}{\eta }\left\{ {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(y)-\langle y, z^k\rangle +\frac{1}{2\eta }\Vert y+ A x^{k+1}\Vert ^2\right\} \\= & {} \frac{1}{\eta }\left\{ {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(y)+\frac{\eta }{2}\Vert \frac{1}{\eta } y-(z^k-\frac{1}{\eta } A x^{k+1})\Vert ^2\right\} \\= & {} {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(\eta y)+\frac{\eta }{2}\Vert y-\left( z^k-\frac{1}{\eta } A x^{k+1}\right) \Vert ^2\\= & {} \left( I+\partial \left( \frac{1}{\eta } g(\eta \cdot )\right) \right) ^{-1}\left( z^k-\frac{1}{\eta } A x^{k+1}\right) . \end{aligned}$$

Hence, using the fact that the conjugate of \(\frac{1}{\eta }g^*\) is \(\frac{1}{\eta } g(\eta \cdot )\) and the Moreau’s identity \((I+\partial \phi )^{-1}+(I+\partial \phi ^*)^{-1}=I\) for any convex function \(\phi \), we have

$$\begin{aligned}&\left( I+\partial \left( \frac{g^*}{\eta }\right) \right) ^{-1}\left( z^k-\frac{1}{\eta }Ax^{k+1}\right) \\&\quad +\left( I+\partial \left( \frac{1}{\eta } g(\eta \cdot )\right) \right) ^{-1}\left( z^k-\frac{1}{\eta } A x^{k+1}\right) =z^k-\frac{1}{\eta } A x^{k+1}. \end{aligned}$$

Therefore, (6.7c) holds, and thus from (1.9c) it follows

$$\begin{aligned} \bar{z}^{k+1}=z^{k+1}-\frac{q}{\eta }(Ax^{k+1}+y^{k+1}). \end{aligned}$$

Substituting the formula of \(\bar{z}^k\) into (1.9a), we have for \(i=i_k\),

$$\begin{aligned} x_i^{k+1}&={\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -\bar{z}^k, A_ix_i\rangle +u_i(x_i)+\frac{\tau }{2}\Vert x_i-x_i^k\Vert ^2\\&= {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -z^k, A_i x_i\rangle +\frac{q}{\eta }\langle A x^k+y^{k}, A_i x_i\rangle +u_i(x_i)+\frac{\tau }{2}\Vert x_i-x_i^k\Vert ^2\\&= {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -z^k, A_i x_i\rangle +u_i(x_i)+\frac{q}{2\eta }\Vert A_i x_i +A_{\ne i} x_{\ne i}^k+y^k\Vert ^2 \\&\quad +\frac{1}{2}\Vert x_i-x_i^k\Vert _{\tau I-\frac{q}{\eta }A_i^\top A_i}, \end{aligned}$$

which is exactly (6.7a). Hence, we complete the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, X., Xu, YY. & Zhang, SZ. Randomized Primal–Dual Proximal Block Coordinate Updates. J. Oper. Res. Soc. China 7, 205–250 (2019). https://doi.org/10.1007/s40305-018-0232-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40305-018-0232-4

Keywords

Mathematics Subject Classification

Navigation