Proofs of Lemmas
We give proofs of several lemmas that are used to show our main results.
1.1 Proof of Lemma 3.1
We prove (3.2), and (3.3) can be shown by the same arguments. By the optimality of \(x_{I_k}^{k+1}\) from (2.1) or equivalently (2.16), we have for any \(x_{I_k}\in {\mathcal {X}}_{I_k}\),
$$\begin{aligned}&(x_{I_k}-x_{I_k}^{k+1})^\top \left( \nabla _{I_k}f(x^k)-A_{I_k}^\top \lambda ^k +\rho _xA_{I_k}^\top r^{k} +\tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})+\hat{P}_{I_k}(x_{I_k}^{k+1}-x_{I_k}^k)\right) \nonumber \\&\geqslant 0, \end{aligned}$$
(A.1)
where \(\tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})\) is a subgradient of \(u_{I_k}\) at \(x_{I_k}^{k+1}\), and we have used the formula of \(r^{k+\frac{1}{2}}\) given in (2.2). We compute the expectation of each term in (A.1) as follows. First, we have
$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \nabla _{I_k}f(x^k)\nonumber \\= & {} {E}_{I_k}\left( x_{I_k}-x_{I_k}^k\right) ^\top \nabla _{I_k}f(x^k) + {E}_{I_k}(x_{I_k}^k-x_{I_k}^{k+1})^\top \nabla _{I_k}f(x^k)\nonumber \\= & {} \frac{n}{N}\left( x-x^k\right) ^\top \nabla f(x^k) + {E}_{I_k}(x^k-x^{k+1})^\top \nabla f(x^k) \end{aligned}$$
(A.2)
$$\begin{aligned}\leqslant & {} \frac{n}{N}\left( f(x)-f(x^k)\right) + {E}_{I_k}\left[ f(x^k)-f(x^{k+1})+\frac{L_f}{2}\Vert x^k-x^{k+1}\Vert ^2\right] \nonumber \\= & {} \frac{n-N}{N}\left( f(x)-f(x^k)\right) \nonumber \\&+\,{E}_{I_k}\left[ f(x)-f(x^{k+1})+\frac{L_f}{2}\Vert x^k-x^{k+1}\Vert ^2\right] , \end{aligned}$$
(A.3)
where the last inequality is from the convexity of f and the Lipschitz continuity of \(\nabla _{I_k} f(x)\). Secondly,
$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top (-A_{I_k}^\top \lambda ^k)\nonumber \\= & {} {E}_{I_k}(x_{I_k}-x_{I_k}^{k})^\top (-A_{I_k}^\top \lambda ^k)+{E}_{I_k}(x_{I_k}^k-x_{I_k}^{k+1})^\top (-A_{I_k}^\top \lambda ^k)\nonumber \\= & {} \frac{n}{N}\left( x-x^k\right) ^\top (-A^\top \lambda ^k)+{E}_{I_k}(x^k-x^{k+1})^\top (-A^\top \lambda ^k)\nonumber \\= & {} \frac{n-N}{N}\left( x-x^k\right) ^\top (-A^\top \lambda ^k)+{E}_{I_k}(x-x^{k+1})^\top (-A^\top \lambda ^k). \end{aligned}$$
(A.4)
Similarly,
$$\begin{aligned} \rho _x {E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top A_{I_k}^\top r^{k}= & {} \frac{n-N}{N}\rho _x(x-x^k)^\top A^\top r^k\nonumber \\&+\,\rho _x{E}_{I_k}(x-x^{k+1})^\top A^\top r^{k}. \end{aligned}$$
(A.5)
For the fourth term of (A.1), we have
$$\begin{aligned}&{E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \tilde{\nabla } u_{I_k}(x_{I_k}^{k+1})\nonumber \\\leqslant & {} {E}_{I_k}\left[ u_{I_k}(x_{I_k})-u_{I_k}(x_{I_k}^{k+1})\right] \nonumber \\= & {} \frac{n}{N}\, u(x)-{E}_{I_k}[u(x^{k+1})-u(x^k)+u_{I_k}(x_{I_k}^k)]\nonumber \\= & {} \frac{n}{N}\left[ u(x)-u(x^k)\right] +{E}_{I_k}[u(x^k)-u(x^{k+1})]\nonumber \\= & {} \frac{n-N}{N}\left[ u(x)-u(x^k)\right] +{E}_{I_k}[u(x)-u(x^{k+1})], \end{aligned}$$
(A.6)
where the inequality is from the convexity of \(u_{I_k}\). Finally, we have
$$\begin{aligned} {E}_{I_k}(x_{I_k}-x_{I_k}^{k+1})^\top \hat{P}_{I_k}(x_{I_k}^{k+1}-x_{I_k}^k)={E}_{I_k}(x-x^{k+1})^\top \hat{P}(x^{k+1}-x^k). \end{aligned}$$
(A.7)
Plugging (A.3) through (A.7) into (A.1) and recalling \(F(x)=f(x)+u(x)\), by rearranging terms we have
$$\begin{aligned}&{E}_{I_k}\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k+1}-x)^\top A^\top r^{k}\right] \nonumber \\&\quad +\,{E}_{I_k}\left( x^{k+1}-x\right) ^\top \hat{P}(x^{k+1}-x^k)-\frac{L_f}{2}{E}_{I_k}\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\leqslant \frac{N-n}{N}\left[ F(x^k)-F(x)+(x^{k}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k}-x)^\top A^\top r^{k}\right] . \end{aligned}$$
(A.8)
Note
$$\begin{aligned}&\qquad (x^{k+1}-x)^\top (-A^\top \lambda ^k)+ \rho _x(x^{k+1}-x)^\top A^\top r^{k}\\&\,\,\, =(x^{k+1}-x)^\top (-A^\top \lambda ^k)+\rho _x(x^{k+1}-x)^\top A^\top r^{k+1}\\&\qquad -\,\rho _x(x^{k+1}-x)^\top A^\top A(x^{k+1}-x^k) -\rho _x(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k)\\&{\overset{(2.5)}{=}}(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})+(\rho _x-\rho )(x^{k+1}-x)^\top A^\top r^{k+1}\\&\qquad -\,\rho _x(x^{k+1}-x)^\top A^\top A(x^{k+1}-x^k)-\rho _x(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k). \end{aligned}$$
Hence, we can rewrite (A.8) equivalently into (3.2). Through the same arguments, one can show (3.3), thus completing the proof.
1.2 Proof of Lemma 3.3
Letting \(x=x^*,y=y^*\) in (3.8), we have for any \(\lambda \) that
$$\begin{aligned}&\qquad E[h(x^*,y^*,\lambda )]\nonumber \\&\quad \,\,\geqslant E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+(\hat{x}-x^*)^{\top }(-A^{\top }\hat{\lambda }) +(\hat{y}-y^*)^{\top }(-B^{\top }\hat{\lambda })\right. \nonumber \\&\qquad \quad \left. +(\hat{\lambda }-\lambda )^{\top }(A\hat{x}+B\hat{y}-b)\right] \nonumber \\&\quad = E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+\langle \hat{\lambda },Ax^*+By^*-b\rangle -\langle \lambda ,A\hat{x}+B\hat{y}-b\rangle \right] \nonumber \\&\quad =E\left[ \varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)-\langle \lambda ,A\hat{x}+B\hat{y}-b\rangle \right] , \end{aligned}$$
(A.9)
where the last equality follows from the feasibility of \((x^*,y^*)\). For any \(\gamma >0\), restricting \(\lambda \) in \(\mathcal {B}_\gamma \), we have
$$\begin{aligned} E[h(x^*,y^*,\lambda )]\leqslant \sup \limits _{\lambda \in {\mathcal {B}}_\gamma }h(x^*,y^*,\lambda ). \end{aligned}$$
Hence, letting \(\lambda =-\frac{\gamma (A\hat{x}+B\hat{y}-b)}{\Vert A\hat{x}+B\hat{y}-b\Vert }\in \mathcal {B}_\gamma \) in (A.9) gives the desired result.
1.3 Proof of Lemma 3.5
In view of (3.9), we have
$$\begin{aligned}&E\big [(\gamma - \Vert \lambda ^*\Vert )\Vert A\hat{x}+B\hat{y}-b\Vert \big ]\\\leqslant & {} {E}\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)+\gamma \Vert A\hat{x}+B\hat{y}-b\Vert \big ] \leqslant \varepsilon , \end{aligned}$$
which implies
$$\begin{aligned} E\Vert A\hat{x}+B\hat{y}-b\Vert \leqslant \frac{\varepsilon }{\gamma -\Vert \lambda ^*\Vert }, \text{ and } E\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big ] \leqslant \varepsilon . \end{aligned}$$
(A.10)
Denote \(a^-=\max (0,-a)\) for a real number a. Then, from (3.9) and (A.10), it follows that
$$\begin{aligned} E\big (\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big )^- \leqslant \Vert \lambda ^*\Vert \cdot E\Vert A\hat{x}+B\hat{y}-b\Vert \leqslant \frac{\Vert \lambda ^*\Vert \varepsilon }{\gamma -\Vert \lambda ^*\Vert }. \end{aligned}$$
Noting \(|a| = a + 2a ^ -\) for any real number a, we have
$$\begin{aligned}&\quad E\big |\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big | \nonumber \\&= E\big [\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big ] + 2E\big (\varPhi (\hat{x},\hat{y})-\varPhi (x^*,y^*)\big )^- \nonumber \\&\leqslant \left( \frac{2\Vert \lambda ^*\Vert }{\gamma -\Vert \lambda ^*\Vert } + 1\right) \varepsilon . \end{aligned}$$
(A.11)
1.4 Proof of Inequalities (4.9c) and (4.9d) with \(\alpha _k=\frac{\alpha _0}{\sqrt{k}}\)
We have \(\beta _k=\frac{\alpha _0}{\rho \big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )},\) and
$$\begin{aligned}&\qquad \frac{\alpha _{k-1}}{\alpha _{k}}\beta _{k}+(1-\theta )\beta _{k+1}-\frac{\alpha _k}{\alpha _{k+1}}\beta _{k+1}-(1-\theta )\beta _{k}\\&=\frac{\alpha _0}{\rho }\left[ \left( \frac{\sqrt{k}}{\sqrt{k-1}}-(1-\theta )\right) \frac{1}{\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )} \right. \\&\qquad \left. -\left( \frac{\sqrt{k+1}}{\sqrt{k}}-(1-\theta )\right) \frac{1}{\big (\sqrt{k+1}-(1-\theta )\sqrt{k}\big )}\right] \\&=: \frac{\alpha _0}{\rho }[\psi (k)-\psi (k+1)]. \end{aligned}$$
By elementary calculus, we have
$$\begin{aligned} \psi '(k)= & {} \frac{\frac{\sqrt{k-1}}{\sqrt{k}}-\frac{\sqrt{k}}{\sqrt{k-1}}}{2(k-1)}\frac{1}{\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )}\\&\qquad +\left( \frac{\sqrt{k}}{\sqrt{k-1}}-(1-\theta )\right) \frac{-1}{2\big (\sqrt{k}-(1-\theta )\sqrt{k-1}\big )^2}\left( \frac{1}{\sqrt{k}}-\frac{1-\theta }{\sqrt{k-1}}\right) \\= & {} \frac{1}{2(k-1)(\sqrt{k}-(1-\theta )\sqrt{k-1})} \\&\qquad \times \left[ \frac{\sqrt{k-1}}{\sqrt{k}}-\frac{\sqrt{k}}{\sqrt{k-1}}-\sqrt{k-1}\left( \frac{1}{\sqrt{k}}-\frac{1-\theta }{\sqrt{k-1}}\right) \right] \\= & {} \frac{1}{2(k-1)(\sqrt{k}-(1-\theta )\sqrt{k-1})}\left( (1-\theta )-\frac{\sqrt{k}}{\sqrt{k-1}}\right) <0. \end{aligned}$$
Hence, \(\psi (k)\) is decreasing with respect to k, and thus (4.9c) holds.
When \(\alpha _k=\frac{\alpha _0}{\sqrt{k}}\), (4.9d) becomes
$$\begin{aligned} \frac{\alpha _0}{2\rho \sqrt{t}}\geqslant \left| \left( \frac{\sqrt{t}}{\sqrt{t-1}}-(1-\theta )\right) \frac{\alpha _0}{\rho (\sqrt{t}-(1-\theta )\sqrt{t-1})}-\frac{\alpha _0}{\rho \sqrt{t}}\right| , \end{aligned}$$
which is equivalent to
$$\begin{aligned} \frac{1}{2}\geqslant \frac{\sqrt{t}}{\sqrt{t-1}}-1 \Longleftrightarrow t\geqslant \frac{9}{5}. \end{aligned}$$
This completes the proof.
Proofs of Theorems
In this section, we give the technical details for showing all theorems. For simplicity of notation, throughout the proofs of this section, we define \(\tilde{P}\) and \(\tilde{Q}\) as follows:
$$\begin{aligned} \tilde{P}=\hat{P}-\rho _x A^\top A,\qquad \tilde{Q}=\hat{Q}-\rho _y B^\top B. \end{aligned}$$
(B.1)
1.1 Proof of Theorem 3.6
Taking expectation over both sides of (3.2) and summing it over \(k=0\) through t, we have
$$\begin{aligned}&E\big [F(x^{t+1})-F(x)+(x^{t+1}-x)^\top (-A^\top \lambda ^{t+1})\big ]+(1-\theta )\rho _x E(x^{t+1}-x)^\top A^\top r^{t+1}\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\big [F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})\big ]\nonumber \\&\qquad -\sum _{k=0}^t\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k)\nonumber \\&\qquad +\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)-\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] , \end{aligned}$$
(B.2)
where we have used \(\frac{n}{N}=\theta \), the condition in (3.17) and the definition of \(\tilde{P}\) in (B.1). Similarly, taking expectation over both sides of (3.3), summing it over \(k=0\) through t, we have
$$\begin{aligned}&E\big [G(y^{t+1})-G(y)+(y^{t+1}-y)^\top (-B^\top \lambda ^{t+1})\big ]\nonumber \\&\qquad + (1-\theta )\rho _y E(y^{t+1}-y)^\top B^\top r^{t+1}\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\big [G(y^{k+1})-G(y)+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\big ]\nonumber \\&\qquad +\sum _{k=0}^t E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)-\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&\leqslant (1-\theta )\left[ G(y^0)-G(y)+(y^{0}-y)^\top (-B^\top \lambda ^0)+ \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad +(1-\theta )\sum _{k=0}^t E\rho _y(y^{k}-y)^\top B^\top A(x^{k+1}-x^k). \end{aligned}$$
(B.3)
Recall \(\lambda ^{k+1}=\lambda ^k-\rho r^{k+1}\), thus
$$\begin{aligned} (\lambda ^{k+1}-\lambda )^\top r^{k+1} =-\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k), \end{aligned}$$
(B.4)
where \(\lambda \) is an arbitrary vector and possibly random. Denote \(\tilde{\lambda }^{t+1}=\lambda ^t-\rho _xr^{t+1}.\) Then, similar to (B.4), we have
$$\begin{aligned} (\tilde{\lambda }^{t+1}-\lambda )^\top r^{t+1} =-\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t). \end{aligned}$$
(B.5)
Summing (B.2) and (B.3) together and using (B.4) and (B.5), we have
$$\begin{aligned}&E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1}) \right. \left. +\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right. \nonumber \\&\qquad \left. +\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k)\right] \nonumber \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad +(1-\theta )\left[ G(y^0)-G(y)+(y^{0}-y)^\top (-B^\top \lambda ^0)+ \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad +\sum _{k=0}^t\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) +(1-\theta )\sum _{k=0}^t\rho _y E(y^{k}-y)^\top B^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\sum _{k=0}^t E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2, \end{aligned}$$
(B.6)
where we have used \(\varPhi (x,y)=F(x)+G(y)\) and the definition of H given in (2.18).
When \(B=0\) and \(y^k\equiv y^0\), (B.6) reduces to
$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})+\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right. \\&\qquad \left. +\frac{1}{\rho }(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k)\right] \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$
Using (2.21) and noting \(\theta =\frac{\rho }{\rho _x}\), from the above inequality after canceling terms we have
$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&+\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&+\frac{1}{2\rho _x} E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2+\sum _{k=0}^{t-1}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&-\frac{1}{2}E\left[ \Vert x^{t+1}-x\Vert _{\tilde{P}}^2-\Vert x^0-x\Vert _{\tilde{P}}^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\right] \nonumber \\&+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$
(B.7)
For any feasible x, we note \(\tilde{\lambda }^{t+1}-\lambda ^t=\rho _xA(x^{t+1}-x)\) and thus
$$\begin{aligned} \frac{1}{\rho _x}\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2=\rho _x\Vert x^{t+1}-x\Vert _{A^\top A}^2. \end{aligned}$$
(B.8)
In addition, since \(x^{k+1}\) and \(x^k\) differ only on the index set \(I_k\), we have by recalling \(\tilde{P}=\hat{P}-\rho _x A^\top A\) that
$$\begin{aligned} \Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2-L_f\Vert x^{k+1}-x^k\Vert ^2= & {} \Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert _{\hat{P}_{I_k}}^2-\Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert _{\rho _x A_{I_k}^\top A_{I_k}}^2\nonumber \\&-L_f\Vert x_{I_k}^{k+1}-x_{I_k}^k\Vert ^2. \end{aligned}$$
(B.9)
Plugging (B.8) and (B.9) into (B.7) and using (3.10) lead to
$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&+\,\theta \sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&+\frac{1}{2\rho _x} E\Vert \lambda ^0-\lambda \Vert ^2+\frac{1}{2}\Vert x^0-x\Vert _{\tilde{P}}^2. \end{aligned}$$
The desired result follows from \(\lambda ^0=0\), and Lemmas 3.3 and 3.5 with \(\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}\).
1.2 Proof of Theorem 3.7
It follows from (3.3) with \(\rho _y=\rho \) and \(m=M\) that (recall the definition of \(\tilde{Q}\) in (B.1)) for any \(y\in {\mathcal {Y}}\),
$$\begin{aligned}&G(y^{k+1})-G(y)-\frac{L_g}{2}\Vert y^k-y^{k+1}\Vert ^2+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\nonumber \\&\quad +(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)\leqslant 0. \end{aligned}$$
(B.10)
Similar to (B.10), and recalling the definition of \(\tilde{y}^{t+1}\), we have for any \(y\in {\mathcal {Y}}\),
$$\begin{aligned}&G(\tilde{y}^{t+1})-G(y)-\frac{L_g}{2}\Vert \tilde{y}^{t+1}-y^t\Vert ^2+(\tilde{y}^{t+1}-y)^\top (-B^\top \tilde{\lambda }^{t+1})\nonumber \\&\quad +\theta (\tilde{y}^{t+1}-y)^\top \tilde{Q}(y^{t+1}-y^t)\leqslant 0, \end{aligned}$$
(B.11)
where
$$\begin{aligned} \tilde{\lambda }^{t+1}=\lambda ^t-\rho _x(Ax^{t+1}+B\tilde{y}^{t+1}-b). \end{aligned}$$
(B.12)
Adding (B.10) and (B.11) to (B.2) and using the formula of \(\lambda ^k\) give
$$\begin{aligned}&E\left[ F(x^{t+1})-F(x)+(x^{t+1}-x)^\top (-A^\top \tilde{\lambda }^{t+1})\right] \nonumber \\&\qquad +E\left( \tilde{\lambda }^{t+1}-\lambda \right) ^\top \left( Ax^{t+1}+B\tilde{y}^{t+1}-b+\frac{1}{\rho _x}(\tilde{\lambda }^{t+1}-\lambda ^t)\right) \nonumber \\&\qquad +E\left[ G(\tilde{y}^{t+1})-G(y)+(\tilde{y}^{t+1}-y)^\top (-B^\top \tilde{\lambda }^{t+1}) \right. \nonumber \\&\left. \qquad +\theta (\tilde{y}^{t+1}-y)^\top \tilde{Q}(\tilde{y}^{t+1}-y^k)\right] -\frac{L_g}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert ^2\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})\right] \nonumber \\&\qquad -\sum _{k=0}^{t-1}\rho _x E(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) -\rho _x{E}(x^{t+1}-x)^\top A^\top B(\tilde{y}^{t+1}-y^t)\nonumber \\&\qquad +\theta \sum _{k=0}^{t-1} E\left[ G(y^{k+1})-G(y)-\frac{L_g}{2}\Vert y^k-y^{k+1}\Vert ^2+(y^{k+1}-y)^\top (-B^\top \lambda ^{k+1})\right. \nonumber \\&\qquad \left. +(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E(\lambda ^{k+1}-\lambda )^\top \left( r^{k+1}+\frac{1}{\rho }(\lambda ^{k+1}-\lambda ^k)\right) \nonumber \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad -\sum _{k=0}^tE(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2. \end{aligned}$$
(B.13)
By the notation in (2.18) and using (3.6), (B.13) can be written into
$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\&\qquad +\theta \sum _{k=0}^{t-1}E(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\theta {E}(\tilde{y}^{t+1}-y)^\top \tilde{Q}(\tilde{y}^{t+1}-y^t)\\&\qquad -\sum _{k=0}^{t-1}\rho _xE\left( \frac{1}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k) -(y^{k+1}-y)^\top B^\top B(y^{k+1}-y^k)\right) \\&\qquad -\rho _x E\left( \frac{1}{\rho _x}(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t) -(\tilde{y}^{t+1}-y)^\top B^\top B(\tilde{y}^{t+1}-y^t)\right) \\&\qquad +\frac{\theta }{\rho } E(\tilde{\lambda }^{t+1}-\lambda )^\top \left( \tilde{\lambda }^{t+1}-\lambda ^t\right) +\frac{\theta }{\rho }\sum _{k=0}^{t-1}E(\lambda ^{k+1}-\lambda )^\top \left( \lambda ^{k+1}-\lambda ^k\right) \\\leqslant & {} (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&\qquad -\sum _{k=0}^t E(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\\&\qquad +\frac{\theta L_g}{2}\sum _{k=0}^{t-1}E\Vert y^k-y^{k+1}\Vert ^2+\frac{L_g}{2} E\Vert \tilde{y}^{t+1}-y^t\Vert ^2. \end{aligned}$$
Now use (2.21) to derive from the above inequality that
$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\theta }{2}\left( E\Vert \tilde{y}^{t+1}-y\Vert _{\tilde{Q}}^2-\Vert y^0-y\Vert _{\tilde{Q}}^2\right) \nonumber \\&\qquad +\frac{\theta }{2}\sum _{k=0}^{t-1}E\Vert y^{k+1}-y^k\Vert _{\tilde{Q}}^2+\frac{\theta }{2}E\Vert \tilde{y}^{t+1}-y^t\Vert _{\tilde{Q}}^2\nonumber \\&\qquad +\frac{\rho _x}{2}\left( E\Vert \tilde{y}^{t+1}-y\Vert _{B^\top B}^2-\Vert y^0-y\Vert _{B^\top B}^2\right) \nonumber \\&\qquad +\frac{\rho _x}{2}\sum _{k=0}^{t-1}E\Vert y^{k+1}-y^k\Vert _{B^\top B}^2+\frac{\rho _x}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert _{B^\top B}^2\nonumber \\&\qquad -\sum _{k=0}^{t-1}E\frac{\rho _x}{\rho }\left( \lambda ^k-\lambda ^{k+1}\right) ^\top B(y^{k+1}-y^k) -E(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t)\nonumber \\&\qquad +\frac{\theta }{2\rho }\left( E\Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2\right) \nonumber \\&\qquad +\frac{\theta }{2\rho }\sum _{k=0}^{t-1}E\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2+\frac{\theta }{2\rho }E\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\nonumber \\&\quad \leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \nonumber \\&\qquad -\frac{1}{2}\left[ E\Vert x^{t+1}-x\Vert ^2_{\tilde{P}}-\Vert x^0-x\Vert ^2_{\tilde{P}}\right. \nonumber \\&\qquad \left. +\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2_{\tilde{P}}\right] +\frac{L_f}{2}\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad +\frac{\theta L_g}{2}\sum _{k=0}^{t-1}E\Vert y^k-y^{k+1}\Vert ^2+\frac{L_g}{2}E\Vert \tilde{y}^{t+1}-y^t\Vert ^2. \end{aligned}$$
(B.14)
Note that for \(k\leqslant t-1\),
$$\begin{aligned} -\frac{\rho _x}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k)+\frac{\theta }{2\rho }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\geqslant -\frac{\rho }{2\theta ^3}\Vert y^{k+1}-y^k\Vert _{B^\top B}^2 \end{aligned}$$
and
$$\begin{aligned} -(\lambda ^t-\tilde{\lambda }^{t+1})^\top B(\tilde{y}^{t+1}-y^t)+\frac{\theta }{2\rho }\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\geqslant -\frac{\rho }{2\theta }\Vert \tilde{y}^{t+1}-y^t\Vert _{B^\top B}^2. \end{aligned}$$
Because \(\tilde{P}, {\tilde{Q}}\) and \(\rho \) satisfy (3.13), we have from (B.14) that
$$\begin{aligned}&E\left[ \varPhi (x^{t+1},\tilde{y}^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \\&+\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \\&\leqslant (1-\theta )\left[ F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\right] \\&+\frac{1}{2}\Vert x^0-x\Vert ^2_{\tilde{P}}+\frac{\theta }{2}\Vert y^0-y\Vert _{\tilde{Q}}^2+\frac{\rho }{2\theta }\Vert y^0-y\Vert _{B^\top B}^2+\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
Similar to Theorem 3.8, from the convexity of \(\varPhi \) and (2.23), we have
$$\begin{aligned}&(1+\theta t)E\big [\varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H({w})\big ]\nonumber \\&\leqslant (1-\theta )\big [F(x^0)-F(x)+(x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}\big ]\nonumber \\&+\frac{1}{2}\Vert x^0-x\Vert ^2_{\tilde{P}}+\frac{\theta }{2}\Vert y^0-y\Vert _{\tilde{Q}}^2+\frac{\rho }{2\theta }\Vert y^0-y\Vert _{B^\top B}^2+\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
(B.15)
Noting \(\lambda ^0=0\) and \((x^{0}-x)^\top A^\top r^{0}\leqslant \frac{1}{2}\big [\Vert x^0-x\Vert _{A^\top A}+\Vert r^0\Vert ^2\big ]\), and using Lemmas 3.3 and 3.5 with \(\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}\), we obtain the result (3.16).
1.3 Proof of Theorem 3.8
Using (3.6) and (3.7), applying (2.21) to the cross terms, and also noting the definition of \(\tilde{P}\) and \(\tilde{Q}\) in (B.1), we have
$$\begin{aligned}&-\frac{\theta }{\rho }E\left[ \sum _{k=0}^{t-1}(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^k) +(\tilde{\lambda }^{t+1}-\lambda )^\top (\tilde{\lambda }^{t+1}-\lambda ^t)\right] \nonumber \\&\qquad +\sum _{k=0}^t\rho _xE(x^{k+1}-x)^\top A^\top B(y^{k+1}-y^k) +(1-\theta )\sum _{k=0}^t\rho _y E(y^{k}-y)^\top B^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\sum _{k=0}^tE(x^{k+1}-x)^\top \tilde{P}(x^{k+1}-x^k)+\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\sum _{k=0}^tE(y^{k+1}-y)^\top \tilde{Q}(y^{k+1}-y^k)+\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&=-\frac{\theta }{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=0}^{t-1}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\rho _x}{\rho }\sum _{k=0}^t E(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k) +\frac{(1-\theta )\rho _y}{\rho }\sum _{k=0}^t E(\lambda ^{k-1}-\lambda ^{k})^\top A(x^{k+1}-x^k)\nonumber \\&\qquad -\frac{\theta \rho _y}{2}E\left( \Vert x^0-x\Vert _{A^\top A}^2-\Vert x^{t+1}-x\Vert _{A^\top A}^2\right) +\frac{(2-\theta )\rho _y}{2}\sum _{k=0}^t E\Vert x^{k+1}-x^k\Vert _{A^\top A}^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert x^{t+1}-x\Vert _{\hat{P}}^2-\Vert x^0-x\Vert _{\hat{P}}^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\hat{P}}^2\right) \nonumber \\&\qquad +\frac{L_f}{2}\sum _{k=0}^tE\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert y^{t+1}-y\Vert _{\hat{Q}}^2-\Vert y^0-y\Vert _{\hat{Q}}^2+\sum _{k=0}^t\Vert y^{k+1}-y^k\Vert _{\hat{Q}}^2\right) \nonumber \\&\qquad +\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2, \end{aligned}$$
(B.16)
where we have used the conditions in (3.17). By Young’s inequality, we have that for \(0\leqslant k\leqslant t\),
$$\begin{aligned}&\quad \frac{\rho _x}{\rho }(\lambda ^k-\lambda ^{k+1})^\top B(y^{k+1}-y^k)-\frac{\theta }{2\rho }\frac{1}{2-\theta }\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\nonumber \\&\leqslant \frac{\rho }{\theta }\frac{2-\theta }{2}\frac{\rho _x^2}{\rho ^2}\Vert B(y^{k+1}-y^k)\Vert ^2 \overset{(3.17)}{=} \frac{(2-\theta )\rho _y}{2\theta ^2}\Vert y^{k+1}-y^k\Vert _{B^\top B}^2, \end{aligned}$$
(B.17)
and for \(1\leqslant k\leqslant t\),
$$\begin{aligned}&\qquad \frac{(1-\theta )\rho _y}{\rho }(\lambda ^{k-1}-\lambda ^{k})^\top A(x^{k+1}-x^k)-\frac{\theta }{2\rho }\frac{1-\theta }{2-\theta }\Vert \lambda ^{k-1}-\lambda ^k\Vert ^2\nonumber \\&\,\,\,\leqslant (1-\theta )\frac{\rho }{\theta }\frac{(2-\theta )\rho _y^2}{2\rho ^2}\Vert A(x^{k+1}-x^k)\Vert ^2\nonumber \\&\overset{(3.17)}{=}\frac{(1-\theta )(2-\theta )}{2\theta ^2}\rho _x\Vert x^{k+1}-x^k\Vert _{A^\top A}^2. \end{aligned}$$
(B.18)
Plugging (B.17) and (B.18) and also noting \(\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\geqslant \Vert \lambda ^{t+1}-\lambda ^t\Vert ^2\), we can upper bound the right-hand side of (B.16) by
$$\begin{aligned}&\quad -\frac{\theta }{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^0-\lambda \Vert ^2\right] -\frac{\theta \rho _y}{2}E\left( \Vert x^0-x\Vert _{A^\top A}^2-\Vert x^{t+1}-x\Vert _{A^\top A}^2\right) \nonumber \\&\qquad +\left( \frac{(1-\theta )(2-\theta )}{2\theta ^2}\rho _x+\frac{(2-\theta )\rho _y}{2}\right) \sum _{k=0}^t E\Vert x^{k+1}-x^k\Vert _{A^\top A}^2\nonumber \\&\qquad +\frac{(2-\theta )\rho _y}{2\theta ^2}\sum _{k=0}^t E\Vert y^{k+1}-y^k\Vert _{B^\top B}^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert x^{t+1}-x\Vert _{\hat{P}}^2-\Vert x^0-x\Vert _{\hat{P}}^2\right. \nonumber \\&\qquad \left. +\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert _{\hat{P}}^2\right) +\frac{L_f}{2}\sum _{k=0}^t E\Vert x^k-x^{k+1}\Vert ^2\nonumber \\&\qquad -\frac{1}{2}E\left( \Vert y^{t+1}-y\Vert _{\hat{Q}}^2-\Vert y^0-y\Vert _{\hat{Q}}^2+\sum _{k=0}^t\Vert y^{k+1}-y^k\Vert _{\hat{Q}}^2\right) \nonumber \\&\qquad +\frac{L_g}{2}\sum _{k=0}^t E\Vert y^k-y^{k+1}\Vert ^2\nonumber \\&\overset{(3.18{\mathrm{a}})}{\leqslant }\frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
(B.19)
In addition, note that
$$\begin{aligned}&\theta \Vert x^{t+1}-x\Vert _{A^\top A}^2=\frac{n}{N}\left\| \sum _{i=1}^N A_i(x_i^{t+1}-x_i)\right\| ^2\leqslant n\sum _{i=1}^N\Vert x_i^{t+1}-x_i\Vert _{A_i^\top A_i}^2,\\&\Vert x^k-x^{k+1}\Vert _{A^\top A}^2=\left\| \sum _{i\in I_k}A_i(x_i^k-x_i^{k+1})\right\| ^2\leqslant n\sum _{i=1}^N\Vert x_i^k-x_i^{k+1}\Vert _{A_i^\top A_i}^2,\\&\Vert y^k-y^{k+1}\Vert _{B^\top B}^2=\left\| \sum _{j\in J_k}B_j(y_j^k-y_j^{k+1})\right\| ^2\leqslant m\sum _{j=1}^M\Vert y_j^k-y_j^{k+1}\Vert _{B_j^\top B_j}. \end{aligned}$$
Hence, if \(\hat{P}\) and \(\hat{Q}\) satisfy (3.18b), then (B.19) also holds.
Combining (B.6), (B.16), and (B.19) yields
$$\begin{aligned}&E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\\leqslant & {} (1-\theta )\left[ \varPhi (x^0,y^0)-\varPhi (x,y)\right] \nonumber \\&\qquad +(1-\theta )\left[ (x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}+(y^{0}-y)^\top (-B^\top \lambda ^0)\right. \nonumber \\&\qquad \left. + \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\qquad + \frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }{E}\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
(B.20)
Applying the convexity of \(\varPhi \) and the properties (2.23) of H, we have
$$\begin{aligned}&~~\qquad (1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(w)\right] \nonumber \\&\overset{(2.11)}{=} (1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(\hat{w}^{t+1})\right] \nonumber \\&\overset{(2.14)}{\leqslant }E\left[ \varPhi (x^{t+1},y^{t+1})-\varPhi (x,y)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\,\theta \sum _{k=0}^{t-1}E\left[ \varPhi (x^{k+1},y^{k+1})-\varPhi (x,y)+(w^{k+1}-w)^\top H(w^{k+1})\right] .\qquad \end{aligned}$$
(B.21)
Now combining (B.21) and (B.20), we have
$$\begin{aligned}&(1+\theta t)E\left[ \varPhi (\hat{x}^t,\hat{y}^t)-\varPhi (x,y)+(\hat{w}^{t+1}-w)^\top H(w)\right] \nonumber \\\leqslant & {} (1-\theta )\left[ \varPhi (x^0,y^0)-\varPhi (x,y)\right] \nonumber \\&\quad +(1-\theta )\left[ (x^{0}-x)^\top (-A^\top \lambda ^0)+ \rho _x(x^{0}-x)^\top A^\top r^{0}+(y^{0}-y)^\top (-B^\top \lambda ^0)\right. \nonumber \\&\quad \left. + \rho _y(y^{0}-y)^\top B^\top r^{0}\right] \nonumber \\&\quad + \frac{1}{2}\left( \Vert x^0-x\Vert _{\hat{P}-\theta \rho _xA^\top A}^2+\Vert y^0-y\Vert _{\hat{Q}}^2\right) +\frac{\theta }{2\rho }E\Vert \lambda ^0-\lambda \Vert ^2. \end{aligned}$$
(B.22)
By Lemmas 3.3 and 3.5 with \(\gamma =\max \{0.5+\Vert \lambda ^*\Vert , 3\Vert \lambda ^*\Vert \}\), we have the desired result.
1.4 Proof of Theorem 4.2
From the nonincreasing monotonicity of \(\alpha _k\), one can easily show the following result.
Lemma B.1
Assume \(\lambda ^{-1}=\lambda ^0\). It holds that
$$\begin{aligned}&\sum _{k=0}^t\frac{(1-\theta )\beta _k}{2}\left[ \Vert \lambda ^{k}-\lambda \Vert ^2-\Vert \lambda ^{k-1}-\lambda \Vert ^2+\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\right] \nonumber \\&\qquad -\sum _{k=0}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\left[ \Vert \lambda ^{k+1}-\lambda \Vert ^2-\Vert \lambda ^{k}-\lambda \Vert ^2+\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\leqslant -\sum _{k=0}^{t-1}\frac{\beta _{k+1}}{2}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2 \nonumber \\&-\sum _{k=0}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\Vert \lambda ^{k+1}-\lambda \Vert ^2 -\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k-1}-\lambda \Vert ^2\nonumber \\&+\frac{\alpha _0\beta _1}{2\alpha _1}\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=1}^{t-1}\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}\Vert \lambda ^{k}-\lambda \Vert ^2 +\sum _{k=1}^t\frac{(1-\theta )\beta _k}{2}\Vert \lambda ^{k}-\lambda \Vert ^2\nonumber \\&=-\sum _{k=0}^{t-1}\frac{\theta \beta _{k+1}}{2}\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) \Vert \lambda ^0-\lambda \Vert ^2\nonumber \\&-\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) \Vert \lambda ^{t}-\lambda \Vert ^2\nonumber \\&-\sum _{k=1}^{t-1}\left( \frac{\alpha _{k-1}\beta _k}{2\alpha _k} +\frac{(1-\theta )\beta _{k+1}}{2}-\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}-\frac{(1-\theta )\beta _k}{2}\right) \Vert \lambda ^{k}-\lambda \Vert ^2. \end{aligned}$$
(B.23)
By the update formula of \(\lambda \) in (4.5), we have from (4.8) that
$$\begin{aligned}&\quad E\left[ F(x^{k+1})-F(x)+(x^{k+1}-x)^\top (-A^\top \lambda ^{k+1})+(\lambda ^{k+1}-\lambda )^\top r^{k+1}\right] \nonumber \\&\quad + E\left[ \frac{(\lambda ^{k+1}-\lambda )^\top (\lambda ^{k+1}-\lambda ^{k})}{\left( 1-\frac{(1-\theta )\alpha _{k+1}}{\alpha _k }\right) \rho } +\frac{(1-\theta )\alpha _{k+1}}{\alpha _k}\rho (x^{k+1}-x)^\top A^\top r^{k+1}\right] \nonumber \\&\quad + E(x^{k+1}-x)^\top \left( \tilde{P}+\frac{I}{\alpha _k}\right) (x^{k+1}-x^k) -\frac{L_f}{2} E\Vert x^k-x^{k+1}\Vert ^2+ E(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta ) E\left[ F(x^k)-F(x)+(x^{k}-x)^\top (-A^\top \lambda ^k) \right. \left. +(\lambda ^{k}-\lambda )^\top r^{k}+\frac{(\lambda ^{k}-\lambda )^\top (\lambda ^k-\lambda ^{k-1})}{\left( 1-\frac{(1-\theta )\alpha _{k}}{\alpha _{k-1}}\right) \rho } \right] \nonumber \\&\quad +(1-\theta )\rho E(x^{k}-x)^\top A^\top r^{k}, \end{aligned}$$
(B.24)
where similar to (B.1), we have defined \(\tilde{P}=\hat{P}-\rho A^\top A\).
Multiplying \(\alpha _k\) to both sides of (B.24) and using (4.2) and (2.21), we have
$$\begin{aligned}&\quad \alpha _k E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}} E\left[ \Vert \lambda ^{k+1}-\lambda \Vert ^2-\Vert \lambda ^{k}-\lambda \Vert ^2+\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2\right] \nonumber \\&\qquad + E\left[ (1-\theta )\alpha _{k+1}\rho (x^{k+1}-x)^\top A^\top r^{k+1}\right] \nonumber \\&\qquad +\frac{\alpha _k}{2} E\big [\Vert x^{k+1}-x\Vert _{\tilde{P}}^2-\Vert x^{k}-x\Vert _{\tilde{P}}^2+\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\big ]\nonumber \\&\qquad +\frac{1}{2} E\left[ \Vert x^{k+1}-x\Vert ^2-\Vert x^{k}-x\Vert ^2+\Vert x^{k+1}-x^k\Vert ^2\right] \nonumber \\&\qquad -\frac{\alpha _kL_f}{2} E\Vert x^k-x^{k+1}\Vert ^2+\alpha _k{E}(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta )\alpha _k E\left[ F(x^k)-F(x)+(w^{k}-w)^\top H(w^k)\right] \nonumber \\&\qquad +\frac{(1-\theta )\beta _k}{2} E\left[ \Vert \lambda ^{k}-\lambda \Vert ^2-\Vert \lambda ^{k-1}-\lambda \Vert ^2+\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ^2\right] \nonumber \\&\qquad +\alpha _k(1-\theta )\rho E(x^{k}-x)^\top A^\top r^{k}. \end{aligned}$$
(B.25)
Denote \(\tilde{\lambda }^{t+1}=\lambda ^t-\rho r^{t+1}.\) Then, for \(k=t\), it is easy to see that (B.25) becomes
$$\begin{aligned}&\quad \alpha _tE\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\frac{\alpha _t}{2\rho }E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\alpha _t}{2}E\left[ \Vert x^{t+1}-x\Vert _{\tilde{P}}^2-\Vert x^{t}-x\Vert _{\tilde{P}}^2+\Vert x^{t+1}-x^t\Vert _{\tilde{P}}^2\right] \nonumber \\&\qquad +\frac{1}{2}E\left[ \Vert x^{t+1}-x\Vert ^2-\Vert x^{t}-x\Vert ^2+\Vert x^{t+1}-x^t\Vert ^2\right] \nonumber \\&\qquad -\frac{\alpha _tL_f}{2}E\Vert x^t-x^{t+1}\Vert ^2+\alpha _t{E}(x^{t+1}-x^t)^\top \delta ^t\nonumber \\&\leqslant (1-\theta )\alpha _t E\left[ F(x^t)-F(x)+(w^{t}-w)^\top H(w^t)\right] \nonumber \\&\qquad +\frac{(1-\theta )\beta _t}{2}E\left[ \Vert \lambda ^{t}-\lambda \Vert ^2-\Vert \lambda ^{t-1}-\lambda \Vert ^2+\Vert \lambda ^{t}-\lambda ^{t-1}\Vert ^2\right] \nonumber \\&\qquad +\alpha _t(1-\theta ) E\rho (x^{t}-x)^\top A^\top r^{t}. \end{aligned}$$
(B.26)
By the nonincreasing monotonicity of \(\alpha _k\), summing (B.25) from \(k=0\) through \(t-1\) and (B.26) and plugging (B.23) give
$$\begin{aligned}&\alpha _t E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \alpha _{k+1}\sum _{k=0}^{t-1} E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\qquad +\frac{\alpha _t}{2\rho } E\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2+\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\&\qquad +\frac{\alpha _{t+1}}{2} E\Vert x^{t+1}-x\Vert _{\tilde{P}}^2 +\sum _{k=0}^t\frac{\alpha _k}{2} E\Vert x^{k+1}-x^k\Vert _{\tilde{P}}^2\nonumber \\&\qquad +\frac{1}{2} E\big [\Vert x^{t+1}-x\Vert ^2-\Vert x^{0}-x\Vert ^2+\sum _{k=0}^t\Vert x^{k+1}-x^k\Vert ^2\big ]\nonumber \\&\qquad -\sum _{k=0}^t\frac{\alpha _kL_f}{2}E\Vert x^k-x^{k+1}\Vert ^2+\sum _{k=0}^t\alpha _k E(x^{k+1}-x^k)^\top \delta ^k\nonumber \\&\leqslant (1-\theta )\alpha _0 E\left[ F(x^0)-F(x)+(w^{0}-w)^\top H(w^0)\right] \nonumber \\&\qquad +\alpha _0(1-\theta )\rho (x^{0}-x)^\top A^\top r^{0}+\frac{\alpha _0}{2}\Vert x^{0}-x\Vert _{\tilde{P}}^2\nonumber \\&\qquad -\sum _{k=0}^{t-1}\frac{\theta \beta _{k+1}}{2} E\Vert \lambda ^{k+1}-\lambda ^k\Vert ^2 +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) E\Vert \lambda ^0-\lambda \Vert ^2\nonumber \\&\qquad -\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) E\Vert \lambda ^{t}-\lambda \Vert ^2\nonumber \\&\qquad -\sum _{k=1}^{t-1}\left( \frac{\alpha _{k-1}\beta _k}{2\alpha _k}+\frac{(1-\theta )\beta _{k+1}}{2} -\frac{\alpha _k\beta _{k+1}}{2\alpha _{k+1}}-\frac{(1-\theta )\beta _k}{2}\right) E\Vert \lambda ^{k}-\lambda \Vert ^2 .\qquad \quad \quad \end{aligned}$$
(B.27)
From (4.9d), we have
$$\begin{aligned}&\frac{\alpha _t}{2\rho }\left[ \Vert \tilde{\lambda }^{t+1}-\lambda \Vert ^2-\Vert \lambda ^{t}-\lambda \Vert ^2 +\Vert \tilde{\lambda }^{t+1}-\lambda ^t\Vert ^2\right] \nonumber \\\geqslant & {} -\left( \frac{\alpha _{t-1}\beta _t}{2\alpha _t}-\frac{(1-\theta )\beta _t}{2}\right) \Vert \lambda ^{t}-\lambda \Vert ^2. \end{aligned}$$
In addition, from Young’s inequality, it holds that
$$\begin{aligned} \frac{1}{2}\Vert x^{k+1}-x^k\Vert ^2+\alpha _k{E}(x^{k+1}-x^k)^\top \delta ^k \geqslant \frac{\alpha ^2_k}{2}\Vert \delta \Vert ^2. \end{aligned}$$
Hence, dropping negative terms on the right-hand side of (B.27), from the convexity of \(\varPhi \) and (2.23), we have
$$\begin{aligned}&\left( \alpha _{t+1}+\theta \sum \limits _{k=1}^t\alpha _k\right) E\left[ F(\hat{x}^{t})-F(x)+(\hat{w}^{t}-w)^\top H(\hat{w}^{t})\right] \nonumber \\&\qquad \alpha _t E\left[ F(x^{t+1})-F(x)+(\tilde{w}^{t+1}-w)^\top H(\tilde{w}^{t+1})\right] \nonumber \\&\qquad +\theta \alpha _{k+1}\sum _{k=0}^{t-1}E\left[ F(x^{k+1})-F(x)+(w^{k+1}-w)^\top H(w^{k+1})\right] \nonumber \\&\leqslant (1-\theta )\alpha _0\left[ F(x^0)-F(x)+(w^{0}-w)^\top H(w^0)\right] \nonumber \\&\qquad +(1-\theta )\alpha _0\rho (x^{0}-x)^\top A^\top r^{0}+\frac{\alpha _0}{2}\Vert x^{0}-x\Vert _{\tilde{P}}^2+\frac{1}{2}\Vert x^0-x\Vert ^2\nonumber \\&\qquad +\left( \frac{\alpha _0\beta _1}{2\alpha _1}-\frac{(1-\theta )\beta _1}{2}\right) E\Vert \lambda ^0-\lambda \Vert ^2+\sum _{k=0}^t\frac{\alpha _k^2}{2}E\Vert \delta ^k\Vert ^2. \end{aligned}$$
(B.28)
Using Lemma 3.3 and the properties of H, we derive the desired result.
1.5 Proof of Proposition 6.1
Let \((I+\partial \phi )^{-1}(x):={{\,\mathrm{arg\,min}\,}}_z \phi (z)+\frac{1}{2}\Vert z-x\Vert _2^2\) denote the proximal mapping of \(\phi \) at x. Then, the update in (1.9b) can be written to
$$\begin{aligned} z^{k+1}=\left( I+\partial \left( \frac{g^*}{\eta }\right) \right) ^{-1}\left( z^k-\frac{1}{\eta }Ax^{k+1}\right) . \end{aligned}$$
Define \(y^{k+1}\) as that in (6.7b). Then,
$$\begin{aligned} \frac{1}{\eta }y^{k+1}= & {} \frac{1}{\eta }\left\{ {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(y)-\langle y, z^k\rangle +\frac{1}{2\eta }\Vert y+ A x^{k+1}\Vert ^2\right\} \\= & {} \frac{1}{\eta }\left\{ {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(y)+\frac{\eta }{2}\Vert \frac{1}{\eta } y-(z^k-\frac{1}{\eta } A x^{k+1})\Vert ^2\right\} \\= & {} {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{y}}\, g(\eta y)+\frac{\eta }{2}\Vert y-\left( z^k-\frac{1}{\eta } A x^{k+1}\right) \Vert ^2\\= & {} \left( I+\partial \left( \frac{1}{\eta } g(\eta \cdot )\right) \right) ^{-1}\left( z^k-\frac{1}{\eta } A x^{k+1}\right) . \end{aligned}$$
Hence, using the fact that the conjugate of \(\frac{1}{\eta }g^*\) is \(\frac{1}{\eta } g(\eta \cdot )\) and the Moreau’s identity \((I+\partial \phi )^{-1}+(I+\partial \phi ^*)^{-1}=I\) for any convex function \(\phi \), we have
$$\begin{aligned}&\left( I+\partial \left( \frac{g^*}{\eta }\right) \right) ^{-1}\left( z^k-\frac{1}{\eta }Ax^{k+1}\right) \\&\quad +\left( I+\partial \left( \frac{1}{\eta } g(\eta \cdot )\right) \right) ^{-1}\left( z^k-\frac{1}{\eta } A x^{k+1}\right) =z^k-\frac{1}{\eta } A x^{k+1}. \end{aligned}$$
Therefore, (6.7c) holds, and thus from (1.9c) it follows
$$\begin{aligned} \bar{z}^{k+1}=z^{k+1}-\frac{q}{\eta }(Ax^{k+1}+y^{k+1}). \end{aligned}$$
Substituting the formula of \(\bar{z}^k\) into (1.9a), we have for \(i=i_k\),
$$\begin{aligned} x_i^{k+1}&={\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -\bar{z}^k, A_ix_i\rangle +u_i(x_i)+\frac{\tau }{2}\Vert x_i-x_i^k\Vert ^2\\&= {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -z^k, A_i x_i\rangle +\frac{q}{\eta }\langle A x^k+y^{k}, A_i x_i\rangle +u_i(x_i)+\frac{\tau }{2}\Vert x_i-x_i^k\Vert ^2\\&= {\mathop {{{\,\mathrm{arg\,min}\,}}}\limits _{x_i\in {\mathcal {X}}_i}}\langle -z^k, A_i x_i\rangle +u_i(x_i)+\frac{q}{2\eta }\Vert A_i x_i +A_{\ne i} x_{\ne i}^k+y^k\Vert ^2 \\&\quad +\frac{1}{2}\Vert x_i-x_i^k\Vert _{\tau I-\frac{q}{\eta }A_i^\top A_i}, \end{aligned}$$
which is exactly (6.7a). Hence, we complete the proof.