5.1 Proof Sketch
The proof of Theorem
1 is based on the design and analysis of a
generalized approximate message passing (GAMP) algorithm. GAMP is a class of iterative algorithms proposed by Rangan [
44] for estimation in generalized linear models. A GAMP algorithm is defined in terms of a sequence of Lipschitz functions
\(f_t:\mathbb R\rightarrow \mathbb R\) and
\(g_t:\mathbb R \times \mathbb {R}\rightarrow \mathbb R\), for
\(t \ge 0\). For
\(t\ge 0\), the GAMP iteration computes:
$$\begin{aligned} \begin{aligned} {\varvec{u}}^t&= \frac{1}{\sqrt{\delta }}{\varvec{A}}f_t({\varvec{v}}^t)-\mathsf{b}_t g_{t-1}({\varvec{u}}^{t-1}; {\varvec{y}}),\\ {\varvec{v}}^{t+1}&= \frac{1}{\sqrt{\delta }}{\varvec{A}}^\mathsf{T}g_t({\varvec{u}}^t; {\varvec{y}}) - \mathsf{c}_t f_t({\varvec{v}}^t). \end{aligned} \end{aligned}$$
(48)
Here, the functions
\(f_t\) and
\(g_t\) are understood to be applied component-wise, i.e.,
\(f_t({\varvec{v}}^t)=(f_t(v^t_1)\),
\(\ldots , f_t(v^t_d))\) and
\(g_t({\varvec{u}}^t; {\varvec{y}})=(g_t(u^t_1; y_1), \ldots , g_t(u^t_n; y_n))\). The scalars
\(\mathsf{b}_t, \mathsf{c}_t\) are defined as
$$\begin{aligned} \mathsf{b}_t =\frac{1}{n}\sum _{i=1}^d f_t'(v_i^t), \qquad \mathsf{c}_t = \frac{1}{n}\sum _{i=1}^n g_t'(u_i^t; y_i), \end{aligned}$$
(49)
where
\(g_t'(\cdot , \cdot )\) denotes the derivative with respect to the first argument. The iteration (
48) is initialized with
$$\begin{aligned} {\varvec{u}}^{0}= c {\varvec{1}}_n, \qquad {\varvec{v}}^1 = \frac{1}{\sqrt{\delta }} {\varvec{A}}^{\mathsf{T}} g_{0}({\varvec{u}}^{0} ; {\varvec{y}}), \end{aligned}$$
(50)
for some constant
\(c>0\). Here,
\({\varvec{1}}_n \in \mathbb {R}^n\) denotes the all-ones vector.
A key feature of the GAMP algorithm (
48) is that the asymptotic empirical distribution of its iterates can be succinctly characterized via a deterministic recursion, called
state evolution. Hence, the performance of the high-dimensional problem involving the iterates
\({\varvec{u}}^t, {\varvec{v}}^t\) is captured by a scalar recursion. Specifically, this result gives that for
\(t \ge 1\), the empirical distributions of
\({\varvec{u}}^t\) and
\({\varvec{v}}^t\) converge in
\(W_k\) distance to the laws of the random variables
\(U_t\) and
\(V_t\), respectively, with
$$\begin{aligned} U_t&\equiv \mu _{U,t} G + \sigma _{U,t} W_{U,t}, \end{aligned}$$
(51)
$$\begin{aligned} V_t&\equiv \mu _{V,t} X + \sigma _{V,t} W_{V,t}, \end{aligned}$$
(52)
where
\((G, W_{U,t}) \sim _{\text {i.i.d.}} \mathsf{N}(0,1)\). Similarly,
\(X \sim P_X\) and
\(W_{V,t} \sim \mathsf{N}(0,1)\) are independent. The deterministic parameters
\((\mu _{U,t}, \sigma _{U,t}\),
\(\mu _{V,t}, \sigma _{V,t} )\) are computed via the recursion (
56) detailed in Sect.
5.2, and the formal statement of the state evolution result is contained in Proposition
1 (again in Sect.
5.2).
Next, in Sect.
5.3, we show that a suitable choice of the functions
\(f_t, g_t\) leads to a GAMP algorithm such that
(i) \({\varvec{v}}^1 = \sqrt{d} \, \hat{\varvec{x}}^\mathrm{L}\), and
(ii) \({\varvec{v}}^t\) is aligned with
\(\sqrt{d} \, \hat{\varvec{x}}^\mathrm{s}\) as
t grows large. Specifically, choosing
\(g_0(u; y)=\mathcal T_L(y)/\sqrt{\delta }\) in (
50) immediately gives
\({\varvec{v}}^1 = \sqrt{d} \, \hat{\varvec{x}}^\mathrm{L}\). In order to approach the spectral estimator as
\(t\rightarrow \infty \), we pick
\(f_t, g_t\) to be linear functions; see (
68). The idea is that, with this choice of
\(f_t\) and
\(g_t\), the GAMP iteration is effectively a power method. Let us now briefly discuss why this is the case. With the choice of
\(f_t, g_t\) in (
68), the GAMP iteration can be expressed as
$$\begin{aligned} \begin{aligned} {\varvec{u}}^t&= \frac{1}{\sqrt{\delta } \, \beta _t} \big [ {\varvec{A}}{\varvec{v}}^t \, - \, {\varvec{Z}}{\varvec{u}}^{t-1} \big ], \\ {\varvec{v}}^{t+1}&= {\varvec{A}}^\mathsf{T}{\varvec{Z}}{\varvec{u}}^t - \frac{\sqrt{\delta }}{\beta _t} \, {\mathbb E}\{ Z \} \, {\varvec{v}}^t, \end{aligned} \end{aligned}$$
(53)
where
\({\varvec{Z}}= \mathrm{diag}({\mathcal {T}}(y_1), \ldots , {\mathcal {T}}(y_n))\),
\(Z=\mathbb E\{{\mathcal {T}}(Y)\}\), and the function
\(\mathcal T:\mathbb R\rightarrow \mathbb R\) is defined later in terms of the spectral preprocessing function
\({\mathcal {T}}_s\); see (
82). Then, Lemma
4 analyzes the fixed points of the state evolution of the GAMP algorithm (
53), and Lemma
6 proves that in the high-dimensional limit, the vector
\({\varvec{v}}^t\) tends to align with the principal eigenvector of the matrix
\({\varvec{M}}_n= {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+\delta \mathbb E\{Z(G^2-1)\}{\varvec{I}}_n)^{-1} {\varvec{A}}\) as
\(t \rightarrow \infty \). Here, we provide a heuristic sanity check for this last claim. Assume that the iterates
\({\varvec{v}}^t\) and
\({\varvec{u}}^t\) converge to the limits
\({\varvec{v}}^\infty \) and
\({\varvec{u}}^\infty \), respectively, in the sense that
\(\lim _{t\rightarrow \infty }\frac{1}{d}\Vert {\varvec{v}}^t-{\varvec{v}}^\infty \Vert ^2=0\) and
\(\lim _{t\rightarrow \infty }\frac{1}{n}\Vert {\varvec{u}}^t-{\varvec{u}}^\infty \Vert ^2=0\). Then, from (
53), the limits
\({\varvec{v}}^\infty \) and
\({\varvec{u}}^\infty \) satisfy
$$\begin{aligned} \begin{aligned} {\varvec{u}}^\infty&= \frac{1}{\sqrt{\delta } \, \beta _\infty } \big [ {\varvec{A}}{\varvec{v}}^\infty \, - \, {\varvec{Z}}{\varvec{u}}^{\infty } \big ], \\ {\varvec{v}}^{\infty }&= {\varvec{A}}^\mathsf{T}{\varvec{Z}}{\varvec{u}}^\infty - \frac{\sqrt{\delta }}{\beta _\infty } \, {\mathbb E}\{ Z \} \, {\varvec{v}}^\infty . \end{aligned} \end{aligned}$$
(54)
Furthermore, from the analysis of the fixed points of state evolution of Lemma
4, we obtain that
\(\beta _\infty =\sqrt{\delta }\mathbb E\{Z(G^2-1)\}\). Thus, after some manipulations, (
54) can be rewritten as
$$\begin{aligned} \begin{aligned} {\varvec{u}}^\infty&= ({\varvec{Z}}+\delta \mathbb E\{Z(G^2-1)\}{\varvec{I}}_n)^{-1} {\varvec{A}}{\varvec{v}}^\infty \\ \left( 1+\frac{ {\mathbb E}\{ Z \}}{\mathbb E\{Z(G^2-1)\}} \,\right) {\varvec{v}}^{\infty }&= {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+\delta \mathbb E\{Z(G^2-1)\}{\varvec{I}}_n)^{-1} {\varvec{A}}{\varvec{v}}^\infty . \end{aligned} \end{aligned}$$
(55)
Hence,
\({\varvec{v}}^\infty \) is an eigenvector of the matrix
\({\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+\delta \mathbb E\{Z(G^2-1)\}{\varvec{I}}_n)^{-1} {\varvec{A}}\), and the GAMP algorithm is effectively performing a power method. Finally, we choose
\(\mathcal T\) (and consequently
\({\varvec{Z}}\)) so that
\({\varvec{Z}}({\varvec{Z}}+\delta \mathbb E\{Z(G^2-1)\}{\varvec{I}}_n)^{-1}={\varvec{Z}}_s\), with
\({\varvec{Z}}_s\) given by (
12). In this way, the matrix
\({\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+\delta \mathbb E\{Z(G^2-1)\}{\varvec{I}}_n)^{-1} {\varvec{A}}\) coincides with the spectral matrix
\({\varvec{D}}_n\), as defined in (
13), and therefore
\({\varvec{v}}^t\) approaches the spectral estimator
\(\hat{\varvec{x}}^\mathrm{s}\).
In conclusion, as
\({\varvec{v}}^1 = \sqrt{d} \hat{\varvec{x}}^\mathrm{L}\) and
\({\varvec{v}}^t\) tends to be aligned with
\(\sqrt{d} \hat{\varvec{x}}^\mathrm{s}\) for large
t, we can use the state evolution result of Proposition
1 to analyze the joint empirical distribution of
\(({\varvec{x}}, \sqrt{d} \hat{\varvec{x}}^\mathrm{L}, \sqrt{d} \hat{\varvec{x}}^\mathrm{s})\). At this point, the proof of Theorem
1 becomes a straightforward application of Lemma
6 and is presented at the end of Sect.
5.3. The proof of Lemma
6 is quite long, so it is provided separately in Sect.
5.4.
5.3 GAMP as a Power Method to Compute the Spectral Estimator
Consider the GAMP iteration in (
48) initialized with
\({\varvec{u}}^0 = \frac{1}{\delta } {\varvec{1}}_n\), and the function
\(g_0:\mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}\) chosen as
$$\begin{aligned} g_0(u; \, y) = \frac{{\mathcal {T}}_L(y)}{\sqrt{\delta }}, \end{aligned}$$
(66)
so that
$$\begin{aligned} {\varvec{v}}^1 = \frac{1}{\delta }{\varvec{A}}^{\mathsf{T}} {\mathcal {T}}_L({\varvec{y}}). \end{aligned}$$
(67)
(The function
\(f_0\) is assumed to be zero.) From (
10), we note that
\({\varvec{v}}^1 = \sqrt{d} \, \hat{\varvec{x}}^\mathrm{L}\).
For
\(t \ge 1\), let the functions
\(g_t: \mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}\) and
\(f_t:\mathbb {R}\rightarrow \mathbb {R}\) be chosen as
$$\begin{aligned} g_t(u; \, y) = \sqrt{\delta } \, u {\mathcal {T}}(y), \qquad f_{t}(v) = \frac{v}{\beta _{t}}, \end{aligned}$$
(68)
where the function
\({\mathcal {T}}: \mathbb {R}\rightarrow \mathbb {R}\) is bounded and Lipschitz, and
\(\beta _t\) is a constant, defined iteratively for
\(t \ge 1\) via the state evolution equations below (Eqs. (
72)–(
74)). To prove Theorem
1, we will choose
\({\mathcal {T}}\) as a suitable function of
\({\mathcal {T}}_s\) (see (
82)). Note that the functions
\(g_t\) and
\(f_t\) are required to be Lipschitz for
\(t\ge 0\), and this will be ensured choosing
\({\mathcal {T}}\) to be bounded and Lipschitz (see Lemma
6).
With this choice of
\(f_t, g_t\), for
\(t\ge 1\), the scalars in (
49) are given by
$$\begin{aligned} \mathsf{b}_t =\frac{1}{\delta \beta _t}, \qquad \mathsf{c}_t = \sqrt{\delta } \cdot \frac{1}{n}\sum _{i=1}^n {\mathcal {T}}(y_i). \end{aligned}$$
(69)
In the GAMP iteration below, we will replace the parameter
\(\mathsf{c}_t\) by its almost sure limit
\(\bar{\mathsf{c}}_t = \sqrt{\delta } \, {\mathbb E}\{Z\}\), where
\(Z \triangleq {\mathcal {T}}(Y)\). The state evolution result in Proposition
1 still holds when
\(\mathsf{c}_t\) is replaced with
\(\bar{\mathsf{c}}_t\) in the GAMP iteration [
27,
44]. This can be shown using the pseudo-Lipschitz property of the test functions
\(\psi \) in Eqs. (
61)–(
62) and the fact that
\( \lim _{n \rightarrow \infty } \frac{1}{n}\sum _{i=1}^n {\mathcal {T}}(y_i) = {\mathbb E}\{ Z \}\) almost surely, due to the strong law of large numbers.
With these choices, the GAMP iteration in (
48) is as follows. Initializing with
$$\begin{aligned} {\varvec{u}}^0 = \frac{1}{\delta } {\varvec{1}}_n, \qquad {\varvec{v}}^1 = \frac{1}{\delta } {\varvec{A}}^{\mathsf{T}} {\mathcal {T}}_L({\varvec{y}}), \end{aligned}$$
(70)
we have for
\(t \ge 1\):
$$\begin{aligned} \begin{aligned} {\varvec{u}}^t&= {\left\{ \begin{array}{ll} \frac{1}{\sqrt{\delta } \, \beta _t} \big [ {\varvec{A}}{\varvec{v}}^t \, - \, {\varvec{Z}}_L {\varvec{u}}^{t-1} \big ], &{} \text { for } t=1, \\ \frac{1}{\sqrt{\delta } \, \beta _t} \big [ {\varvec{A}}{\varvec{v}}^t \, - \, {\varvec{Z}}{\varvec{u}}^{t-1} \big ], &{} \text { for } t >1, \end{array}\right. }\\ {\varvec{v}}^{t+1}&= {\varvec{A}}^\mathsf{T}{\varvec{Z}}{\varvec{u}}^t - \frac{\sqrt{\delta }}{\beta _t} \, {\mathbb E}\{ Z \} \, {\varvec{v}}^t, \end{aligned} \end{aligned}$$
(71)
where
\({\varvec{Z}}_L = \mathrm{diag}({\mathcal {T}}_L(y_1), \ldots , {\mathcal {T}}_L(y_n))\) and
\({\varvec{Z}}= \mathrm{diag}({\mathcal {T}}(y_1), \ldots , {\mathcal {T}}(y_n))\). The state evolution equations in (
56) become:
$$\begin{aligned}&\mu _{U,t} = \frac{\mu _{V,t}}{\sqrt{\delta } \beta _t}, \qquad \sigma _{U, t}^2 = \frac{\sigma _{V,t}^2}{ \delta \, \beta ^2_t}, \end{aligned}$$
(72)
$$\begin{aligned}&\mu _{V,t+1} = \sqrt{\delta } \,\frac{\mu _{V,t}}{\beta _t} \big [ {\mathbb E}\{ZG^2\} - {\mathbb E}\{Z\} \big ], \qquad \sigma _{V,t+1}^2 = \frac{1}{\beta _t^2}\big [ \mu _{V,t}^2 {\mathbb E}\{Z^2G^2\} + \sigma _{V,t}^2 {\mathbb E}\{ Z^2\} \big ], \end{aligned}$$
(73)
$$\begin{aligned}&\beta _{t+1} = \sqrt{\mu _{V,{t+1}}^2 + \sigma _{V,{t+1}}^2}. \end{aligned}$$
(74)
Here, we recall that
\(G \sim \mathsf{N}(0,1)\) and
\(Z={\mathcal {T}}(Y)\), for
\(Y \sim p_{Y|G}(\cdot \mid G)\). From (
57), the state evolution iteration is initialized with the following:
$$\begin{aligned} \mu _{V,1} = {\mathbb E}\{ {\mathcal {T}}_L(Y) G \}, \qquad \sigma _{V,1}^2 = \frac{1}{\delta }{\mathbb E}\{ {\mathcal {T}}_L(Y)^2 \}, \qquad \beta _1 = \sqrt{\mu _{V,1}^2 + \sigma _{V,1}^2}. \end{aligned}$$
(75)
We will show in Lemma
6 that in the high-dimensional limit, the vector
\({\varvec{v}}^t\) in (
71) tends to align with the principal eigenvector of the matrix
\({\varvec{M}}_n= {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+\delta \mathbb E\{Z(G^2-1)\}{\varvec{I}}_n)^{-1} {\varvec{A}}\) as
\(t \rightarrow \infty \). In other words, the GAMP iteration is equivalent to a power iteration for
\({\varvec{M}}_n\). This equivalence, together with Proposition
1, allows us to precisely characterize the limiting empirical distribution of the eigenvector of
\({\varvec{M}}_n\) in Lemma
6.
We begin with a result characterizing the fixed points of the state evolution recursion in (
72)–(
73).
The lemma is proved in Appendix
E. We note that Lemma
4 (and the subsequent Lemmas
5–
6 as well) assumes that
\({\mathbb E}\{ Z(G^2-1)\} >0\) and
\(\delta > \frac{{\mathbb E}\{ Z^2\}}{ ({\mathbb E}\{ Z G^2 \} \, - \,{\mathbb E}\{Z\})^2}\). These conditions concern the auxiliary random variable
Z (or, equivalently, the auxiliary function
\({\mathcal {T}}\)). In the proof of Theorem
1, which appears at the end of this section, we will provide a choice of
Z (depending on
\(Z_s\)) that fulfills these requirements; see (
82). Let us also point out that the condition
\(\delta > \frac{{\mathbb E}\{ Z^2\}}{ ({\mathbb E}\{ Z G^2 \} \, - \,{\mathbb E}\{Z\})^2}\) follows from
\(\psi _{\delta }'(\lambda _{\delta }^*)> 0\), which in turn ensures that
\(\rho _s>0\). For a discussion of the case
\(\rho _s=0\), see Remark
1.
The next lemma shows that the mean-squared difference between successive AMP iterates vanishes as \(t \rightarrow \infty \) in the high-dimensional limit.
The proof of the lemma is given in Appendix
F. The next result is the main technical lemma needed to prove Theorem
1. It shows that, as
t grows large,
\({\varvec{v}}^t\) tends to be aligned with the principal eigenvector of the matrix
\({\varvec{M}}_n\) defined in (
79). Theorem
1 is then obtained from Lemma
6 by using a suitable choice for
\({\mathcal {T}}(\cdot )\) in the GAMP iteration (
71), which ensures that
\({\varvec{M}}_n\) is a scaled version of
\({\varvec{D}}_n\) in (
13).
We first show how Theorem
1 is obtained from Lemma
6, and then prove the lemma in the following subsection.
5.4 Proof of Lemma 6
Fix
\(c>0\), and let
\(\tilde{{\varvec{Z}}}=c {\varvec{Z}}\) and
\(\tilde{Z}=c Z\). Then,
$$\begin{aligned} \begin{aligned} {\varvec{Z}}({\varvec{Z}}+\delta \mathbb E\{Z(G^2-1)\}{\varvec{I}}_n)^{-1}&= \tilde{{\varvec{Z}}}(\tilde{{\varvec{Z}}}+\delta \mathbb E\{\tilde{Z}(G^2-1)\}{\varvec{I}}_n)^{-1}. \end{aligned} \end{aligned}$$
(88)
Thus, by inspecting the definition (
79), one immediately obtains that
\(\tilde{{\varvec{\varphi }}}^{(1)}\) does not change if we rescale
\({\varvec{Z}}\) and
Z by the multiplicative factor
c. Furthermore, by using the definitions (
76)–(
77), we have that
$$\begin{aligned} \tilde{\beta }^{-1}\tilde{\mu }_{V}&=\sqrt{\frac{\delta \, ( {\mathbb E}\{ZG^2\} - {\mathbb E}\{Z\})^2 - {\mathbb E}\{Z^2\}}{\delta \, ( {\mathbb E}\{ZG^2\} - {\mathbb E}\{Z\})^2 + {\mathbb E}\{Z^2 G^2\} - {\mathbb E}\{Z^2\}}} \nonumber \\&= \sqrt{\frac{\delta \, ( {\mathbb E}\{\tilde{Z}G^2\} - {\mathbb E}\{\tilde{Z}\})^2 - {\mathbb E}\{\tilde{Z}^2\}}{\delta \, ( {\mathbb E}\{\tilde{Z}G^2\} - {\mathbb E}\{\tilde{Z}\})^2 + {\mathbb E}\{\tilde{Z}^2 G^2\} - {\mathbb E}\{\tilde{Z}^2\}}},\end{aligned}$$
(89)
$$\begin{aligned} \tilde{\beta }^{-1}\tilde{\sigma }_{V}&=\sqrt{\frac{{\mathbb E}\{Z^2 G^2\}}{ \delta \, ( {\mathbb E}\{ZG^2\} - {\mathbb E}\{Z\})^2 + {\mathbb E}\{ Z^2 G^2\} - {\mathbb E}\{Z^2\}}}\nonumber \\&=\sqrt{\frac{{\mathbb E}\{\tilde{Z}^2 G^2\}}{ \delta \, ( {\mathbb E}\{\tilde{Z}G^2\} - {\mathbb E}\{\tilde{Z}\})^2 + {\mathbb E}\{ \tilde{Z}^2 G^2\} - {\mathbb E}\{\tilde{Z}^2\}}}, \end{aligned}$$
(90)
and that
$$\begin{aligned} \begin{aligned} \frac{ \tilde{\mu }_{V} \, {\mathbb E}\{ {\mathcal {T}}_L(Y) Z G \}}{ \tilde{\beta } \tilde{\sigma _{V}} \sqrt{ {\mathbb E}\{ {\mathcal {T}}_L(Y)^2\} } }&=\frac{{\mathbb E}\{ {\mathcal {T}}_L(Y) Z G \}}{\sqrt{\delta }( {\mathbb E}\{ZG^2\} - {\mathbb E}\{Z\})}\sqrt{\frac{\delta \, ( {\mathbb E}\{ZG^2\} - {\mathbb E}\{Z\})^2 - {\mathbb E}\{Z^2\}}{{\mathbb E}\{ Z^2 G^2 \}{\mathbb E}\{ {\mathcal {T}}_L(Y)^2\}}}\\&=\frac{{\mathbb E}\{ {\mathcal {T}}_L(Y) \tilde{Z} G \}}{\sqrt{\delta }( {\mathbb E}\{\tilde{Z}G^2\} - {\mathbb E}\{\tilde{Z}\})}\sqrt{\frac{\delta \, ( {\mathbb E}\{\tilde{Z}G^2\} - {\mathbb E}\{\tilde{Z}\})^2 - {\mathbb E}\{\tilde{Z}^2\}}{{\mathbb E}\{ \tilde{Z}^2 G^2 \}{\mathbb E}\{ {\mathcal {T}}_L(Y)^2\}}}. \end{aligned} \end{aligned}$$
(91)
Consequently, both the LHS and the RHS of (
80) are unchanged when we rescale
\({\varvec{Z}}\) and
Z by the multiplicative factor
c.
The argument above proves that, without loss of generality, we can rescale
\({\varvec{Z}}\) and
Z by any multiplicative factor
\(c>0\). To simplify the rest of the argument, it is convenient to assume the normalization condition
$$\begin{aligned} \mathbb E\{Z(G^2-1)\}=\frac{1}{\delta }, \end{aligned}$$
(92)
under which
\({\varvec{M}}_n = {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}\).
Consider the iteration (
71) with the initialization in (
70). Since by hypothesis,
\(\mu _{V,1}^2 = ({\mathbb E}\{ {\mathcal {T}}_L(Y) G \})^2 >0\), Lemma
4 guarantees that the state evolution recursion (
73) converges to either fixed point
\(\mathsf{FP}_1\) or
\(\mathsf{FP}_2\) as
\(t \rightarrow \infty \). That is,
$$\begin{aligned} \lim _{t \rightarrow \infty } \mu _{V,t}^2 = \tilde{\mu }_V^2, \quad \lim _{t \rightarrow \infty } \sigma ^2_{V,t} = \tilde{\sigma }^2_V, \quad \lim _{t \rightarrow \infty } \beta _t^2 = \tilde{\mu }_V^2 + \tilde{\sigma }^2_V = \tilde{\beta }^2 = \frac{1}{\delta }. \end{aligned}$$
(93)
The last equality above follows by combining (
77) and (
92).
By substituting the expression for
\({\varvec{u}}^t\) in (
71) in the
\({\varvec{v}}^{t+1}\) update, the iteration can be rewritten as follows, for
\(t \ge 2\):
$$\begin{aligned} {\varvec{u}}^t&= \frac{1}{\sqrt{\delta } \, \beta _t} \big [ {\varvec{A}}{\varvec{v}}^t \, - \, {\varvec{Z}}{\varvec{u}}^{t-1} \big ], \end{aligned}$$
(94)
$$\begin{aligned} {\varvec{v}}^{t+1}&= \frac{1}{\sqrt{\delta } \beta _t}\Big [ \big ({\varvec{A}}^\mathsf{T}{\varvec{Z}}{\varvec{A}}- \delta {\mathbb E}\{Z\} \, {\varvec{I}}_d \big ) {\varvec{v}}^t \, - \, {\varvec{A}}^\mathsf{T}{\varvec{Z}}^2 {\varvec{u}}^{t-1} \Big ]. \end{aligned}$$
(95)
In the remainder of the proof, we will assume that
\(t \ge 2\). Define
$$\begin{aligned} {\varvec{e}}_1^t&= {\varvec{u}}^{t}-{\varvec{u}}^{t-1}, \end{aligned}$$
(96)
$$\begin{aligned} {\varvec{e}}_2^t&= {\varvec{v}}^{t+1}-{\varvec{v}}^{t}. \end{aligned}$$
(97)
By combining (
96) with (
94), we have that
$$\begin{aligned} {\varvec{u}}^{t-1}=({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}{\varvec{A}}{\varvec{v}}^t-\sqrt{\delta }\beta _t({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}{\varvec{e}}_1^t. \end{aligned}$$
(98)
By substituting the expression for
\({\varvec{u}}^{t-1}\) obtained in (
98) into (
95), we have
$$\begin{aligned} \begin{aligned} {\varvec{v}}^{t+1}&= \left( {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1} {\varvec{A}}- \frac{\sqrt{\delta } {\mathbb E}\{Z\}}{\beta _t} \, {\varvec{I}}_d \right) {\varvec{v}}^t +{\varvec{A}}^\mathsf{T}{\varvec{Z}}^2({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}{\varvec{e}}_1^t\\&= \left( {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}-\delta {\mathbb E}\{Z\} \, {\varvec{I}}_d \right) {\varvec{v}}^t \\&\quad +(1-\sqrt{\delta }\beta _t){\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1}({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1} {\varvec{A}}{\varvec{v}}^t \\&\quad + \delta {\mathbb E}\{Z\}\left( 1-\frac{1}{\sqrt{\delta }\beta _t}\right) {\varvec{v}}^t+{\varvec{A}}^\mathsf{T}{\varvec{Z}}^2({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}{\varvec{e}}_1^t. \end{aligned} \end{aligned}$$
(99)
Let
$$\begin{aligned} {\varvec{e}}_3^t = \left( {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}-(\delta {\mathbb E}\{Z\}+1) \, {\varvec{I}}_d \right) {\varvec{v}}^t. \end{aligned}$$
(100)
From (
97) and (
99), we obtain
$$\begin{aligned} \begin{aligned} {\varvec{e}}_3^t&={\varvec{e}}_2^t-(1-\sqrt{\delta }\beta _t){\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1}({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1} {\varvec{A}}{\varvec{v}}^t \\&\quad - \delta {\mathbb E}\{Z\}\left( 1-\frac{1}{\sqrt{\delta }\beta _t}\right) {\varvec{v}}^t-{\varvec{A}}^\mathsf{T}{\varvec{Z}}^2({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}{\varvec{e}}_1^t. \end{aligned} \end{aligned}$$
(101)
Let us decompose
\({\varvec{v}}^t\) into a component in the direction of
\(\hat{{\varvec{\varphi }}}_1\) plus an orthogonal component
\({\varvec{r}}^t\):
$$\begin{aligned} {\varvec{v}}^t = \xi _t \hat{{\varvec{\varphi }}}_1+{\varvec{r}}^t, \end{aligned}$$
(102)
where
\(\xi _t=\langle {\varvec{v}}^t, \hat{{\varvec{\varphi }}}_1\rangle \).
At this point, the idea is to show that, when
t is large,
\({\varvec{r}}^t\) is small, thus
\({\varvec{v}}^t\) tends to be aligned with
\(\hat{{\varvec{\varphi }}}_1\). To do so, we prove that, as
\(n\rightarrow \infty \), the largest eigenvalue of the matrix
\({\varvec{M}}_n\) defined in (
79) converges almost surely to
\(\delta {\mathbb E}\{Z\}+1\). Furthermore, we prove that the matrix
\({\varvec{M}}_n\) exhibits a spectral gap, in the sense that the second largest eigenvalue of
\({\varvec{M}}_n\) converges almost surely to a value strictly smaller than
\(\delta {\mathbb E}\{Z\}+1\). Since
\({\varvec{r}}^t\) is orthogonal to
\(\hat{{\varvec{\varphi }}}_1\) and
\({\varvec{M}}_n\) has a spectral gap, the norm of
\({\varvec{e}}_3^t\) in (
100) can be lower bounded by a strictly positive constant times the norm of
\({\varvec{r}}^t\). Next, using the expression in (
101), we show that the norm of
\({\varvec{e}}_3^t\) can be made arbitrarily small by taking
n and
t sufficiently large. From this, we conclude that
\({\varvec{r}}^t\) must be small.
We begin by proving that \({\varvec{M}}_n\) has a spectral gap.
Let us now go back to (
100) and combine it with (
102). Then,
$$\begin{aligned} \left( {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}-(\delta {\mathbb E}\{Z\}+1) \, {\varvec{I}}_d \right) {\varvec{r}}^t ={\varvec{e}}_3^t+\xi _t(\delta {\mathbb E}\{Z\}+1-\lambda _1^{{\varvec{M}}_n})\hat{{\varvec{\varphi }}}_1. \end{aligned}$$
(114)
We now prove that, almost surely, for all sufficiently large
n, the following lower bound on the norm of the LHS of (
114) holds:
$$\begin{aligned} \left\| \left( {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}-(\delta {\mathbb E}\{Z\}+1) \, {\varvec{I}}_d \right) {\varvec{r}}^t\right\| _2\ge c_2\Vert {\varvec{r}}^t\Vert _2, \end{aligned}$$
(115)
where
\(c_2>0\) is a numerical constant independent of
n,
t.
As the matrix
\({\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}-(\delta {\mathbb E}\{Z\}+1) \, {\varvec{I}}_d\) is symmetric, it can be written in the form
\({\varvec{Q}}{\varvec{\Lambda }}{\varvec{Q}}^\mathsf{T}\), with
\({\varvec{Q}}\) orthogonal and
\({\varvec{\Lambda }}\) diagonal. Furthermore, the columns of
\({\varvec{Q}}\) are the eigenvectors of
\({\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}-(\delta {\mathbb E}\{Z\}+1) \, {\varvec{I}}_d\) and the diagonal entries of
\({\varvec{\Lambda }}\) are the corresponding eigenvalues. As
\({\varvec{r}}^t\) is orthogonal to
\(\hat{{\varvec{\varphi }}}_1\), we can write
$$\begin{aligned} \left( {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}-(\delta {\mathbb E}\{Z\}+1) \, {\varvec{I}}_d \right) {\varvec{r}}^t={\varvec{Q}}{\varvec{\Lambda }}'{\varvec{Q}}^\mathsf{T}{\varvec{r}}^t, \end{aligned}$$
(116)
where
\({\varvec{\Lambda }}'\) is obtained from
\({\varvec{\Lambda }}\) by changing the entry corresponding to
\(\lambda _1^{{\varvec{M}}_n}-(\delta {\mathbb E}\{Z\}+1)\) to any other value. For our purposes, it suffices to substitute
\(\lambda _1^{{\varvec{M}}_n}-(\delta {\mathbb E}\{Z\}+1)\) with
\(\lambda _2^{{\varvec{M}}_n}-(\delta {\mathbb E}\{Z\}+1)\). Note that
$$\begin{aligned} \begin{aligned} \Vert {\varvec{Q}}{\varvec{\Lambda }}'{\varvec{Q}}^\mathsf{T}{\varvec{r}}^t\Vert _2^2&\ge \Vert {\varvec{r}}^t\Vert _2^2 \min _{\varvec{s}:\Vert \varvec{s}\Vert =1}\Vert {\varvec{Q}}{\varvec{\Lambda }}'{\varvec{Q}}^\mathsf{T}\varvec{s}\Vert _2^2\\&=\Vert {\varvec{r}}^t\Vert _2^2\min _{\varvec{s}:\Vert \varvec{s}\Vert =1}\langle \varvec{s}, {\varvec{Q}}\left( {\varvec{\Lambda }}'\right) ^2{\varvec{Q}}^\mathsf{T}\varvec{s}\rangle \\&=\Vert {\varvec{r}}^t\Vert _2^2\,\lambda _\mathrm{min}({\varvec{Q}}\left( {\varvec{\Lambda }}'\right) ^2{\varvec{Q}}^\mathsf{T}), \end{aligned} \end{aligned}$$
(117)
where
\(\lambda _\mathrm{min}({\varvec{Q}}\left( {\varvec{\Lambda }}'\right) ^2{\varvec{Q}}^\mathsf{T})\) denotes the smallest eigenvalue of
\({\varvec{Q}}\left( {\varvec{\Lambda }}'\right) ^2{\varvec{Q}}^\mathsf{T}\) and the last equality follows from the variational characterization of the smallest eigenvalue of a symmetric matrix. Note that
$$\begin{aligned} \lambda _\mathrm{min}({\varvec{Q}}\left( {\varvec{\Lambda }}'\right) ^2{\varvec{Q}}^\mathsf{T})=\lambda _\mathrm{min}\left( ({\varvec{\Lambda }}')^2\right) =\min _{i\in \{2, \ldots , d\}}\left( ((\delta {\mathbb E}\{Z\}+1)-\lambda _i^{{\varvec{M}}_n})^2\right) . \end{aligned}$$
(118)
By using (
104), we obtain that, almost surely, for all sufficiently large
n,
$$\begin{aligned} \min _{i\in \{2, \ldots , d\}}\left( ((\delta {\mathbb E}\{Z\}+1)-\lambda _i^{{\varvec{M}}_n})^2\right) \ge \left( \frac{c_1}{2}\right) ^2. \end{aligned}$$
(119)
By combining (
116), (
117), (
118) and (
119), we conclude that (
115) holds.
Recalling that
\({\varvec{r}}^t\) satisfies (
114), we will next show that, almost surely,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{d\rightarrow \infty }\frac{1}{\sqrt{d}}\left\| {\varvec{e}}_3^t+\xi _t(\delta {\mathbb E}\{Z\}+1-\lambda _1^{{\varvec{M}}_n})\hat{{\varvec{\varphi }}}_1\right\| _2=0. \end{aligned}$$
(120)
Combined with (
114) and (
115), this implies that
\( \lim _{t\rightarrow \infty }\lim _{d\rightarrow \infty } \frac{\Vert {\varvec{r}}^t \Vert _2}{\sqrt{d}} = 0\) almost surely.
By using the triangle inequality, we have
$$\begin{aligned} \begin{aligned} \left\| {\varvec{e}}_3^t+\xi _t(\delta {\mathbb E}\{Z\}+1-\lambda _1^{{\varvec{M}}_n})\hat{{\varvec{\varphi }}}_1\right\| _2&\le \left\| {\varvec{e}}_3^t\right\| _2+|\xi _t|\cdot |\delta {\mathbb E}\{Z\}+1-\lambda _1^{{\varvec{M}}_n}|\cdot \left\| \hat{{\varvec{\varphi }}}_1\right\| _2\\&\le \left\| {\varvec{e}}_3^t\right\| _2+\Vert {\varvec{v}}^t\Vert _2\cdot |\delta {\mathbb E}\{Z\}+1-\lambda _1^{{\varvec{M}}_n}|, \end{aligned} \end{aligned}$$
(121)
where the second inequality uses
\(\left\| \hat{{\varvec{\varphi }}}_1\right\| _2=1\) and that
\(|\xi _t|= \langle {\varvec{v}}^t, \hat{{\varvec{\varphi }}}_1 \rangle \le \Vert {\varvec{v}}^t\Vert _2\).
We can bound the second term on the RHS of (
121) using the result in Proposition
1, applied with the PL(2) test function
\(\psi (v) = v^2\). Then, almost surely,
$$\begin{aligned} \lim _{d\rightarrow \infty } \, \frac{1}{d}\Vert {\varvec{v}}^t\Vert ^2_2 = {\mathbb E}\{V_t^2\}=\beta _t^2. \end{aligned}$$
(122)
Here we used the definitions of
\(V_t\) and
\(\beta _t^2\) from (
52) and (
74). Recalling from (
93) that
\(\lim _{t\rightarrow \infty } \beta _t^2 =\frac{1}{\delta }\), the limit in (
122) combined with Remark
8 and the continuous mapping theorem implies that, almost surely,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{d\rightarrow \infty } \frac{1}{\sqrt{d}}\Vert {\varvec{v}}^t\Vert _2=\frac{1}{\sqrt{\delta }}. \end{aligned}$$
(123)
Thus, by using (
103), we conclude that, almost surely,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{n\rightarrow \infty }\frac{1}{\sqrt{d}}\Vert {\varvec{v}}^t\Vert _2\cdot |\delta {\mathbb E}\{Z\}+1-\lambda _1^{{\varvec{M}}_n}|=0. \end{aligned}$$
(124)
We now bound the first term on the RHS of (
121). Recalling the definition of
\({\varvec{e}}_3^t\) in (
101), an application of the triangle inequality gives
$$\begin{aligned} \begin{aligned} \left\| {\varvec{e}}_3^t\right\| _2&\le \left\| {\varvec{e}}_2^t\right\| _2+|1-\sqrt{\delta }\beta _t|\cdot \left\| {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1}({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1} {\varvec{A}}{\varvec{v}}^t\right\| _2 \\&\quad + \delta |{\mathbb E}\{Z\}|\cdot \left| 1-\frac{1}{\sqrt{\delta }\beta _t}\right| \cdot \left\| {\varvec{v}}^t\right\| _2+\left\| {\varvec{A}}^\mathsf{T}{\varvec{Z}}^2({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}{\varvec{e}}_1^t\right\| _2\\&\le \left\| {\varvec{e}}_2^t\right\| _2+|1-\sqrt{\delta }\beta _t|\cdot \left\| {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1}({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1} {\varvec{A}}\right\| _\mathrm{op}\Vert {\varvec{v}}^t\Vert _2 \\&\quad + \delta |{\mathbb E}\{Z\}|\cdot \left| 1-\frac{1}{\sqrt{\delta }\beta _t}\right| \cdot \left\| {\varvec{v}}^t\right\| _2+\left\| {\varvec{A}}^\mathsf{T}{\varvec{Z}}^2({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}\right\| _\mathrm{op}\Vert {\varvec{e}}_1^t\Vert _2, \end{aligned} \end{aligned}$$
(125)
where the second inequality follows from the fact that, given a matrix
\({\varvec{M}}\) and a vector
\({\varvec{v}}\),
\(\Vert {\varvec{M}}{\varvec{v}}\Vert _2\le \Vert {\varvec{M}}\Vert _\mathrm{op} \Vert {\varvec{v}}\Vert _2\).
Let us bound the operator norm of the two matrices appearing in the RHS of (
125). As the operator norm is sub-multiplicative, we have
$$\begin{aligned} \begin{aligned}&\left\| {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1}({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1} {\varvec{A}}\right\| _\mathrm{op}\\&\quad \le \left\| {\varvec{Z}}\right\| _\mathrm{op}\left\| ({\varvec{Z}}+{\varvec{I}}_n)^{-1}\right\| _\mathrm{op}\left\| ({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}\right\| _\mathrm{op} \left\| {\varvec{A}}\right\| _\mathrm{op}^2,\\&\left\| {\varvec{A}}^\mathsf{T}{\varvec{Z}}^2({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}\right\| _\mathrm{op}\\&\quad \le \left\| {\varvec{Z}}\right\| _\mathrm{op}^2\left\| ({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}\right\| _\mathrm{op}\left\| {\varvec{A}}\right\| _\mathrm{op}^2. \end{aligned} \end{aligned}$$
(126)
As
Z is bounded, the operator norm of
\({\varvec{Z}}\) is upper bounded by a numerical constant (independent of
n,
t). The operator norm of
\(({\varvec{Z}}+{\varvec{I}}_n)^{-1}\) and
\(({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}\) is also upper bounded by a numerical constant (independent of
n,
t). Indeed, from (
93)
\(\beta _t \rightarrow 1/\sqrt{\delta }\) as
\(t \rightarrow \infty \), and the support of
Z does not contain points arbitrarily close to
\(-1\). We also have that, almost surely, for all sufficiently large
n, the operator norm of
\({\varvec{A}}\) is upper bounded by a constant (independent of
n,
t). As a result, we deduce that, almost surely, for all sufficiently large
n,
t,
$$\begin{aligned} \begin{aligned} \left\| {\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1}({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1} {\varvec{A}}\right\| _\mathrm{op}&\le C,\\ \left\| {\varvec{A}}^\mathsf{T}{\varvec{Z}}^2({\varvec{Z}}+\sqrt{\delta }\beta _t{\varvec{I}}_n)^{-1}\right\| _\mathrm{op}&\le C, \end{aligned} \end{aligned}$$
(127)
where
C is a numerical constant (independent of
n,
t). Furthermore, by Lemma
5, the following limits hold almost surely:
$$\begin{aligned} \begin{aligned} \lim _{t\rightarrow \infty }\lim _{n\rightarrow \infty }\frac{1}{\sqrt{n}}\left\| {\varvec{e}}_1^t\right\| _2&=0,\\ \lim _{t\rightarrow \infty }\lim _{n\rightarrow \infty }\frac{1}{\sqrt{d}}\left\| {\varvec{e}}_2^t\right\| _2&=0. \end{aligned} \end{aligned}$$
(128)
By combining (
93), (
123), (
127) and (
128), we obtain that, almost surely, each of the terms in the RHS of (
125) vanishes when scaled by the factor
\(1/\sqrt{n}\), as
\(t, n\rightarrow \infty \). As a result, almost surely,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{d\rightarrow \infty }\frac{1}{\sqrt{d}}\left\| {\varvec{e}}_3^t\right\| _2=0. \end{aligned}$$
(129)
By combining (
121), (
124) and (
129), we conclude that, almost surely, (
120) holds.
Recall that
\({\varvec{r}}^t\) satisfies (
114). Thus, by combining the lower bound in (
115) with the almost sure limit in (
120), we obtain that, almost surely,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{d \rightarrow \infty }\frac{1}{\sqrt{d}}\left\| {\varvec{r}}^t\right\| _2=0. \end{aligned}$$
(130)
Recalling from (
102) that
\({\varvec{r}}^t\) is the component of
\({\varvec{v}}^t\) orthogonal to
\(\hat{{\varvec{\varphi }}}_1\), the result in (
130) implies that
\({\varvec{v}}^t\) tends to be aligned with
\(\hat{{\varvec{\varphi }}}_1\) in the high-dimensional limit. In formulas, by combining (
102) with (
130), we have that, almost surely,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{n\rightarrow \infty }\frac{1}{\sqrt{n}}\left\| {\varvec{v}}^t - \xi _t \hat{{\varvec{\varphi }}}_1\right\| _2=0. \end{aligned}$$
(131)
Note that
$$\begin{aligned} \left\| {\varvec{v}}^t - \xi _t \hat{{\varvec{\varphi }}}_1\right\| _2^2=\left\| {\varvec{v}}^t\right\| _2^2-\xi _t^2. \end{aligned}$$
(132)
Thus, by using (
123), we obtain that, almost surely,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{n\rightarrow \infty } \frac{1}{\sqrt{d}}|\xi _t|=\frac{1}{\sqrt{\delta }}. \end{aligned}$$
(133)
To obtain the sign of
\(\xi _t\), we first observe that, by Proposition
1, almost surely,
$$\begin{aligned} \lim _{d \rightarrow \infty }\frac{1}{d}\langle {\varvec{v}}^t, {\varvec{x}}\rangle =\mu _{V, t}. \end{aligned}$$
(134)
As
\(\mu _{V, 0} = \alpha > 0\) and
\({\mathbb E}\{ Z(G^2-1)\} = 1/\delta \), the state evolution iteration (
73) implies that
\(\mu _{V, t} >0\) for all
\(t \ge 0\). Using (
102), we can write
$$\begin{aligned} \frac{1}{d} \langle {\varvec{v}}^t, {\varvec{x}}\rangle = \frac{\xi _t}{\sqrt{d}} \frac{\langle \hat{{\varvec{\varphi }}}_1, {\varvec{x}}\rangle }{\sqrt{d}} \, + \, \frac{\langle {\varvec{r}}^t, {\varvec{x}}\rangle }{d}. \end{aligned}$$
(135)
Recall that by hypothesis,
\(\langle \hat{{\varvec{\varphi }}}_1, {\varvec{x}}\rangle \ge 0\). Moreover, using (
130) and Cauchy–Schwarz, we have that, almost surely,
$$\begin{aligned} \lim _{t \rightarrow \infty } \lim _{d \rightarrow \infty }\frac{\langle {\varvec{r}}^t, {\varvec{x}}\rangle }{\sqrt{d}} = 0. \end{aligned}$$
Thus we deduce that, almost surely,
$$\begin{aligned} \lim _{t \rightarrow \infty } \lim _{d \rightarrow \infty } \frac{\xi _t}{\sqrt{d}} = + \frac{1}{\sqrt{\delta }}.\end{aligned}$$
Therefore,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{d\rightarrow \infty }\frac{1}{\sqrt{d}}\left\| \sqrt{\delta }{\varvec{v}}^t - \tilde{{\varvec{\varphi }}}^{(1)}\right\| _2=0 \quad \text {a.s.}, \end{aligned}$$
(136)
with
\(\tilde{{\varvec{\varphi }}}^{(1)}=\sqrt{d}\hat{{\varvec{\varphi }}}_1\).
At this point, we are ready to prove (
80). For any
\(\mathrm{PL}(k)\) function
\(\psi : \mathbb {R}^3 \rightarrow \mathbb {R}\), we have that
$$\begin{aligned} \begin{aligned}&\left| \frac{1}{d}\sum _{i=1}^d \psi (x_i, \tilde{x}^\mathrm{L}_i, \tilde{\varphi }_i^{(1)}) - \frac{1}{d}\sum _{i=1}^d \psi (x_i, \tilde{x}^\mathrm{L}_i, \sqrt{\delta }v_i^t) \right| \\&\quad \le \frac{1}{d}\sum _{i=1}^d \left| \psi (x_i, \tilde{x}^\mathrm{L}_i, \tilde{\varphi }_i^{(1)}) - \psi (x_i, \tilde{x}^\mathrm{L}_i, \sqrt{\delta }v_i^t) \right| \\&\ \le \frac{C}{d}\sum _{i=1}^d |\tilde{\varphi }_i^{(1)}-\sqrt{\delta }v_i^t| \\&\quad \left[ 1+\, \left( \big (\tilde{\varphi }_i^{(1)}\big )^2+ (\tilde{x}^\mathrm{L}_i)^2 +x_i^2 \right) ^{(k-1)/2}+ \, \left( \delta (v_i^t)^2 + (\tilde{x}^\mathrm{L}_i)^2 +x_i^2\right) ^{(k-1)/2} \right] \\&\ \le \frac{C}{d}\sum _{i=1}^d |\tilde{\varphi }_i^{(1)}-\sqrt{\delta }v_i^t|\\&\quad \left[ 1+\, 3^{\frac{k-1}{2}} \Big ( \left|{\tilde{\varphi }_i^{(1)}}\right|^{k-1} + \left|{\tilde{x}^\mathrm{L}_i}\right|^{k-1} + \left|{x_i}\right|^{k-1} + \left|{\sqrt{\delta } v_i^t}\right|^{k-1} + \left|{\tilde{x}^\mathrm{L}_i}\right|^{k-1} + \left|{x}\right|_i^{k-1} \Big ) \right] \\&\ \le C'\frac{\Vert \tilde{{\varvec{\varphi }}}^{(1)} - \sqrt{\delta }{\varvec{v}}^t\Vert _2}{\sqrt{d}} \left[ 1+ \sum _{i=1}^d \left( \frac{ |\tilde{\varphi }_i^{(1)}|^{2(k-1)} + |\tilde{x}^\mathrm{L}_i|^{2(k-1)} + |{x}_i|^{2(k-1)} + |\sqrt{\delta } v_i^t|^{2(k-1)}}{d} \right) \right] ^{1/2}, \end{aligned} \end{aligned}$$
(137)
where
\(C,C'\) are universal constants (which may depend on
k but not on
d,
n). The inequality in the second line above uses
\(\psi \in \mathrm{PL}(k)\), and the third and fourth lines are obtained via the Hölder and Cauchy–Schwarz inequalities. We now claim that, almost surely,
$$\begin{aligned} \lim _{d\rightarrow \infty }\sum _{i=1}^d \left( \frac{ |\tilde{\varphi }_i^{(1)}|^{2(k-1)} + |\tilde{x}^\mathrm{L}_i|^{2(k-1)} + |{x}_i|^{2(k-1)} + |\sqrt{\delta } v_i^t|^{2(k-1)}}{d} \right) \le C, \end{aligned}$$
(138)
where, from now on, we will use
C to denote a generic positive constant that does not depend on
d,
n. If (
138) holds, then by using (
136) and (
137), we deduce that, almost surely,
$$\begin{aligned} \lim _{t\rightarrow \infty }\lim _{d\rightarrow \infty } \left| \frac{1}{d}\sum _{i=1}^d \psi (x_i, \tilde{x}^\mathrm{L}_i, \tilde{\varphi }_i^{(1)}) - \frac{1}{d}\sum _{i=1}^d \psi (x_i, \tilde{x}^\mathrm{L}_i, \sqrt{\delta }v_i^t) \right| = 0. \end{aligned}$$
(139)
Let us now prove (
138). First, by assumption
(B1), we have that
$$\begin{aligned} \lim _{d \rightarrow \infty } \frac{1}{d} \sum _{i} |{x}_i|^{2(k-1)} \le C. \end{aligned}$$
(140)
Next, the main technical lemma [
27, Lemma 2] leading to the state evolution result in Proposition
1 implies that, almost surely, for
\(t\ge 1\),
$$\begin{aligned} \limsup _{d \rightarrow \infty } \frac{1}{d} \sum _i | v_i^t|^{2(k-1)} \le C. \end{aligned}$$
(141)
In particular, this follows from [
27, Lemma 2(e)] (see also [
4, Lemma 1(e)]). Since
\(\tilde{{\varvec{x}}}^\mathrm{L} = {\varvec{v}}^1\), we also have that, almost surely,
$$\begin{aligned} \limsup _{d \rightarrow \infty } \frac{1}{d} \sum _i | \tilde{x}_i^\mathrm{L}|^{2(k-1)} \le C. \end{aligned}$$
(142)
It remains to show that, almost surely,
$$\begin{aligned} \limsup _{d \rightarrow \infty } \frac{1}{d} \sum _i (\tilde{\varphi }_i^{(1)})^{2(k-1)}\le C. \end{aligned}$$
(143)
To do so, we use a rotational invariance argument. Let
\({\varvec{R}}\in \mathbb R^{d \times d}\) be an orthogonal matrix such that
\({\varvec{R}}{\varvec{x}}= {\varvec{x}}\). Then,
$$\begin{aligned} \langle {\varvec{x}}, {\varvec{a}}_i\rangle =\langle {\varvec{R}}{\varvec{x}}, {\varvec{R}}{\varvec{a}}_i \rangle = \langle {\varvec{x}}, {\varvec{R}}{\varvec{a}}_i\rangle . \end{aligned}$$
(144)
Consequently, we have that
$$\begin{aligned} {\varvec{R}}{\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}{\varvec{R}}^{\mathsf{T}} {\mathop {=}\limits ^{{\text {d}}}}{\varvec{A}}^\mathsf{T}{\varvec{Z}}({\varvec{Z}}+{\varvec{I}}_n)^{-1} {\varvec{A}}, \end{aligned}$$
(145)
which immediately implies that
$$\begin{aligned} {\varvec{R}}\tilde{{\varvec{\varphi }}}^{(1)} {\mathop {=}\limits ^{{\text {d}}}}\tilde{{\varvec{\varphi }}}^{(1)}. \end{aligned}$$
(146)
Then, we can decompose
\(\tilde{{\varvec{\varphi }}}^{(1)}\) as
$$\begin{aligned} \tilde{{\varvec{\varphi }}}^{(1)} = a_d \, {\varvec{x}}+ \sqrt{1-a_d^2}\, {\varvec{\varphi }}^{\perp }, \end{aligned}$$
(147)
where
\({\varvec{\varphi }}^{\perp }\) is uniformly distributed over the set of vectors orthogonal to
\({\varvec{x}}\) with norm
\(\sqrt{d}\) and
$$\begin{aligned} \frac{1}{d} \langle {\varvec{x}}, \tilde{{\varvec{\varphi }}}^{(1)} \rangle = a_d. \end{aligned}$$
(148)
Relating the uniform distribution on the sphere to the normal distribution [
54, Sec. 3.3.3], we can express
\({\varvec{\varphi }}^\perp \) as follows:
$$\begin{aligned} {\varvec{\varphi }}^{\perp } =\sqrt{d} \, \frac{{\varvec{u}}- \displaystyle \frac{1}{d}\langle {\varvec{u}}, {\varvec{x}}\rangle {\varvec{x}}}{\Big \Vert {\varvec{u}}- \displaystyle \frac{1}{d}\langle {\varvec{u}}, {\varvec{x}}\rangle {\varvec{x}}\Big \Vert _2}, \end{aligned}$$
(149)
where
\({\varvec{u}}\sim \mathsf{N}({\varvec{0}}_d, {\varvec{I}}_d)\) and independent of
\({\varvec{x}}\). By the law of large numbers, we have the almost sure limits
$$\begin{aligned} \begin{aligned} \lim _{d\rightarrow \infty }\frac{1}{d}\langle {\varvec{u}}, {\varvec{x}}\rangle = 0, \qquad \quad \lim _{d\rightarrow \infty }\frac{1}{d}\Big \Vert {\varvec{u}}- \displaystyle \frac{1}{d}\langle {\varvec{u}}, {\varvec{x}}\rangle {\varvec{x}}\Big \Vert _2^2 = 1. \end{aligned} \end{aligned}$$
(150)
Thus, by combining (
147) and (
149), we conclude that
$$\begin{aligned} \tilde{{\varvec{\varphi }}}^{(1)} = c_1 {\varvec{x}}+c_2{\varvec{u}}, \end{aligned}$$
(151)
where the coefficients
\(c_1\) and
\(c_2\) can be bounded by universal constants (independent of
n,
d) using (
150). As a result,
$$\begin{aligned}&\frac{1}{d} \sum _i (\tilde{\varphi }_i^{(1)})^{2(k-1)}\le 2^{2(k-1)} |c_1|^{2(k-1)} \frac{1}{d}\nonumber \\&\sum _i |x_i|^{2(k-1)}+2^{2(k-1)} |c_2|^{2(k-1)} \frac{1}{d} \sum _i |u_i|^{2(k-1)}. \end{aligned}$$
(152)
Note that, almost surely,
$$\begin{aligned} \lim _{d\rightarrow \infty } \frac{1}{d} \sum _i |u_i|^{2(k-1)}={\mathbb E}\{U^{2(k-1)}\}\le C, \end{aligned}$$
(153)
where
\(U\sim \mathsf{N}(0, 1)\). By combining (
140), (
152) and (
153), (
143) immediately follows. Finally, by combining (
140), (
141), (
142) and (
143), we deduce that (
138) holds.
We now use Proposition
1 which guarantees that, almost surely,
$$\begin{aligned} \left|{ \lim _{d \rightarrow \infty } \frac{1}{d} \sum _{i=1}^d \psi (x_i, \tilde{x}^\mathrm{L}_i, \sqrt{\delta }v_i^t) - {\mathbb E}\{ \psi (X, \, \mu _{V, 1} X + \sigma _{V,1} W_{V,1} , \, \sqrt{\delta }(\mu _{V, t} X + \sigma _{V,t} W_{V,t})) \}}\right| = 0, \quad t \ge 1. \end{aligned}$$
(154)
To conclude the proof of (
80), we take the limit
\(t \rightarrow \infty \) and use Remark
8. For this, we will show that
$$\begin{aligned} \begin{aligned}&\lim _{t \rightarrow \infty } {\mathbb E}\{ \psi (X, \, \mu _{V, 1} X + \sigma _{V,1} W_{V,1} , \, \sqrt{\delta }(\mu _{V, t} X + \sigma _{V,t} W_{V,t})) \} \\&\quad = {\mathbb E}\{ \psi (X, \, \mu _{V, 1} X + \sigma _{V,1} W_{V,1} , \, \sqrt{\delta }(\tilde{\mu }_{V} X + \tilde{\sigma }_{V} W_{V,\infty })) \}, \end{aligned} \end{aligned}$$
(155)
where
\((W_{V,1}, W_{V,\infty })\) are zero mean jointly Gaussian random variables with covariance given by (
81). Using (
58) and the formulas for
\(g_0\) and
\(g_t\) from (
66) and (
68), we have
$$\begin{aligned} \begin{aligned} {\mathbb E}\{ W_{V,1} W_{V,t} \}&= \frac{1}{\sigma _{V,1} \sigma _{V,t}}{\mathbb E}\{ {\mathcal {T}}_L(Y) Z (\mu _{U,t-1} G + \sigma _{U,t-1}W_{U,t-1}) \} \\&= \frac{\mu _{V,t-1}}{\sqrt{\delta } \beta _{t-1} \sigma _{V,1} \sigma _{V,t}} {\mathbb E}\{ {\mathcal {T}}_L(Y) Z G \}. \end{aligned} \end{aligned}$$
(156)
In the second equality above, we used (
72). Using the expression for
\(\sigma _{V,1}\) from (
75) and letting
\(t \rightarrow \infty \), we have
$$\begin{aligned} \lim _{t \rightarrow \infty } {\mathbb E}\{ W_{V,1} W_{V,t} \} = \frac{ \tilde{\mu }_{V} {\mathbb E}\{ {\mathcal {T}}_L(Y) Z G \}}{ \tilde{\beta } \tilde{\sigma }_{V} \sqrt{ {\mathbb E}\{ {\mathcal {T}}_L(Y)^2\} } } = {\mathbb E}\{ W_{V,1} W_{V, \infty } \}. \end{aligned}$$
(157)
Therefore, the sequence of zero mean jointly Gaussian pairs
\((W_{V,1}, W_{V,t})_{t \ge 1}\) converges in distribution to the jointly Gaussian pair
\((W_{V,1}, W_{V,\infty })\), whose covariance is given by (
81).
To show (
155), we use Lemma
9 in Appendix
G. We apply this result taking
\(Q_t\) to be the distribution of
$$(X, \, \mu _{V,1}X + \sigma _{V,1} W_{V,1}, \, {\mu }_{V,t}X + {\sigma }_{V,t} W_{V,t} ).$$
Since
\(\mu _{V,t} \rightarrow \tilde{\mu }_V\),
\(\sigma _{V,t} \rightarrow \tilde{\sigma }_V\), the sequence
\((Q_t)_{t \ge 2}\) converges weakly to
Q, which is the distribution of
$$(X, \, \mu _{V,1}X + \sigma _{V,1} W_{V,1}, \, \tilde{\mu }_{V}X + \tilde{\sigma }_{V} W_{V,\infty } ).$$
In our case,
\(\psi : \mathbb {R}^3 \rightarrow \mathbb {R}\) is PL(
k), and therefore
\(\psi (a,b,c) \le C'(1 + \left|{a}\right|^k + \left|{b}\right|^k + \left|{c}\right|^k)\), for all
\((a,b,c) \in \mathbb {R}^3\) for some constant
\(C'\). Choosing
\(h(a,b,c) = \left|{a}\right|^k + \left|{b}\right|^k + \left|{c}\right|^k\), we have
\(\frac{\left|{\psi }\right|}{1 + h} \le C'\). Furthermore,
\(\int h \, \mathrm{d}Q_t\) is a linear combination of
\(\{\mu _{V,t}, \mu _{V,t}^2, \ldots , \mu _{V,t}^k, \sigma _{V,t}, \sigma _{V,t}^2, \ldots , \sigma _{V,t}^k \}\), with coefficients that do not depend on
t. The integral
\(\int h \, \mathrm{d}Q\) has the same form, except that
\(\mu _{V,t}, \sigma _{V,t}\) are replaced by
\(\tilde{\mu }_{V}, \tilde{\sigma }_{V}\), respectively. Since
\(\mu _{V,t} \rightarrow \tilde{\mu }_{V}\) and
\(\sigma _{V,t} \rightarrow \tilde{\sigma }_{V}\), we have that
$$\lim _{t \rightarrow \infty } \int h \, \mathrm{d}Q_t = \int h \, \mathrm{d}Q.$$
Therefore, by applying Lemma
9 in Appendix
G, we have that
$$\begin{aligned} \lim _{t \rightarrow \infty } \int \psi \, \mathrm{d}Q_t = \int \psi \, \mathrm{d}Q, \end{aligned}$$
which is equivalent to (
155). This completes the proof of the lemma.