Let
M denote the
\(m \times n\) matrix whose
i-th row is the normalised vector
\(y_i/\sqrt{m}\), so that the sample covariance satisfies
\(S = M^{\mathsf {T}}M\). Noting that
\(m \ge n \ge d\), we apply the GSVD from Theorem
7.2 to the
\(m \times d\) matrix
\(A = MF\) and the
\(n \times d\) matrix
\(B = F\). This produces factorisations
$$\begin{aligned} MF = W_A \Delta G \qquad \text {and} \qquad F = W_B \Sigma G \end{aligned}$$
with orthogonal
\(W_A, W_B\), invertible
G, and diagonal
\(\Delta ,\Sigma \). Since
D is generic and
F has full rank, we may safely assume that the diagonal entries of
\(\Delta \) and
\(\Sigma \) are nonzero. And by orthogonality of both the
W-matrices, we obtain two new identities
$$\begin{aligned} (MF)^{\mathsf {T}}(MF) = G^{\mathsf {T}}\Delta ^2G \qquad \text {and} \qquad F^{\mathsf {T}}F = G^{\mathsf {T}}\Sigma ^2G. \end{aligned}$$
(15)
Since
\(S = M^{\mathsf {T}}M\) by design, the first identity reduces to
\(F^{\mathsf {T}}S F = G^{\mathsf {T}}\Delta ^2G\). Let us write
\({\left\{ {\delta _1,\ldots ,\delta _d}\right\} }\) and
\({\left\{ {\sigma _1,\ldots ,\sigma _d}\right\} }\) for the (necessarily nonzero) diagonal entries of
\(\Delta \) and
\(\Sigma \), respectively, and denote by
\(g_i\) the
i-th column of
\(G^{-1}\). It follows from (
15) that
\(g_i\) is a generalised eigenvector for the
\(d \times d\) matrix pencil
\((F^{\mathsf {T}}S F) - \lambda \cdot (F^{\mathsf {T}}F)\), corresponding to the generalised eigenvalue
\(\lambda _i := \nicefrac {\delta _i^2}{\sigma _i^2}\). In other words, we have
$$\begin{aligned} (F^{\mathsf {T}}S F) g_i = \lambda _i \cdot (F^{\mathsf {T}}F) g_i. \end{aligned}$$
(16)
The top
\(d \times d\) block
\(\Sigma _d\) of
\(\Sigma \) is invertible because its diagonal has nonzero entries. Since
G is also invertible, the product
\(\Sigma _dG\) permutes the set of
\(d \times r\) matrices via
\(Y \mapsto Y_\circ = \Sigma _dGY\), which allows us to re-express the optimisation (
13) in a particularly convenient form. To this end, we calculate:
$$\begin{aligned} Y^{\mathsf {T}}(F^{\mathsf {T}}S F) Y&= Y^{\mathsf {T}}(G^{\mathsf {T}}\Delta ^2G) Y&\text {by}~ (15) \\&= (G^{-1}\Sigma _d^{-1}Y_\circ )^{\mathsf {T}}(G^{\mathsf {T}}\Delta ^2G) (G^{-1}\Sigma _d^{-1}Y_\circ )&\text {since }Y_\circ = \Sigma _dGY \\&= Y_\circ ^{\mathsf {T}}~\Sigma _d^{-1}\Delta ^2\Sigma _d^{-1}~Y_\circ&\text {after two cancellations}. \end{aligned}$$
Now the intermediate product
\(\nabla := \Sigma _d^{-1}\Delta ^2\Sigma _d^{-1}\) is a
\(d \times d\) diagonal matrix whose
i-th diagonal entry is
\(\lambda _i = \nicefrac {\delta ^2_i}{\sigma ^2_i}\). Reordering basis vectors if necessary, we can assume without loss of generality that
\(\lambda _1> \cdots > \lambda _d\). The change of variables
\(Y \mapsto Y_\circ \) transforms the optimisation problem from (
13) into
$$\begin{aligned} \max _{Y_\circ } \mathrm{tr}( Y_\circ ^{\mathsf {T}}\nabla Y_\circ ) \quad \text {subject to} \quad Y_\circ ^{\mathsf {T}}Y_\circ = \text {id}_r. \end{aligned}$$
This is ordinary PCA the optimisation (
11), which generically admits a unique solution
\(Y_*\) obtained by successively increasing
r. Since
\(\nabla \) is diagonal, the
i-th column of
\(Y_*\) is the
i-th elementary basis vector. Thus, the columns
\({\left\{ {u_1, \ldots , u_r}\right\} }\) of
\(U = G^{-1}\Sigma _d^{-1}Y_*\) lie in the directions of the corresponding columns of
\(G^{-1}\). By (
16), these columns are generalised eigenvectors associated to the
r largest generalised eigenvalues of our matrix pencil. Finally, applying
F to
U gives the principal components along the quiver representation as in Proposition
6.4.
\(\square \)