1 Introduction
Principal component analysis (PCA) is the most widely used multivariate statistical technique for projecting a data set onto a lower dimensional space, preserving as much variability as possible (Jolliffe et al.
2016). The basis vectors of this new subspace known as
principal components (PCs) are obtained as a linear combination of the original variables. The coefficients of that combination are called
loadings. Each PC is calculated in terms of all variables and the results can be difficult to interpret. Therefore, several alternatives have been proposed to produce modified PCs with some zero loadings (named
sparse loadings). These alternatives are known as sparse PCA (Jolliffe et al.
2003; Zou et al.
2006; Shen and Huang
2008; Journée et al.
2010; Li et al.
2016). This purpose is achieved by adding sparse-promoting constraints in the optimization problem. Different constraint techniques are proposed in the literature but some of the most used are Ridge (Hoerl and Kennard
1988) and Lasso (Tibshirani
1996). Ridge shrinks the coefficients towards zero and encourages highly correlated variables to have similar coefficients. On the other hand, Lasso makes some of them zero, but tends to choose a single variable from a set of highly correlated variables, discarding the others. To overcome this, Zou and Hastie (
2005) proposed the use of elastic net (enet), which combines Lasso and Ridge to preserve both favourable properties. In addition, enet is particularly useful when the number of variables is higher than the number of observations (Zou and Hastie,
2005).
It is important to emphasise that in some of these Sparse PCA techniques, the loading matrix orthogonality is lost at the expense of sparsity. Thus, some authors, such as Trendafilov (
2014) and Genicot et al. (
2015), provide sparse and orthogonal components simultaneously.
The coordinates of the observations and the variables in the first components are used to graphically represent them in the score and loading plots, respectively. To visualize them on the same reference system simultaneously, Gabriel (
1971) and Galindo (
1986) proposed the use of biplot methods. These techniques have been applied in several fields (Xavier et al.
2018; Amor-Esteban et al.
2019; Carrasco et al.
2019; Bernal et al.
2020). Biplots define a common reference system where the rows and the columns of a matrix can be jointly displayed. So, the relationships between them can be interpreted by means of geometric elements in a Euclidean space (distances, angles, projections, …) (Gabriel
1971; Galindo
1986).
In the case of sparse biplots, there are only two techniques mentioned in the literature related to sparse loadings: CDBiplot (Nieto-Librero et al.
2017), based on the CDPCA of Vichi and Saporta (
2009), and Elastic-net HJ-Biplot (Cubilla-Montilla et al.
2021), based on the SPCA of Zou et al. (
2006). On the one hand, CDBiplot extracts disjoint PCs, in which each original variable only contributes to the construction of one dimension. On the other hand, Elastic-net HJ-Biplot does not provide orthogonal sparse PCs, even though orthogonality is necessary to keep the geometrical properties that allow biplots interpretation. Also, in this work biplot coordinates are estimated once the sparse loading matrix is obtained from the SPCA (Zou et al.
2006). Nevertheless, as some authors have pointed out it is important to obtain the results in the same optimization process and not in a tandem analysis (Vichi and Saporta
2009; Nieto-Librero et al.
2017).
All things considered, our main objective is to propose a new mathematical technique, called C
enetBiplots, that simultaneously incorporates the orthogonality of PCs and the selection of variables by means of the enet sparse constraint. Since the biplot solution is obtained from the singular value decomposition (SVD) (Gabriel
1971; Galindo
1986), our research is focused on the sparse and orthogonal SVD via Lasso proposed by Guillemot et al. (
2019) but imposing the enet constraint to overcome the disadvantages mentioned above.
Therefore, this work is structured as follows. Section
2 includes the notation paragraph and Sect.
3 defines the extension of constrained singular value decomposition as the solution of a convex-optimization problem with enet and orthogonality restrictions. Section
3 also shows the algorithm used to solve C
enetSVD, extending the projection onto convex sets (POCS) algorithm in the sense of a divide and conquer algorithm. Section
4 presents the implementation of the sparse and orthogonal biplot methods, known as the C
enetBiplots. The selection of the sparsity parameters is proposed in Sect.
5. Section
6 shows the usefulness of these methodologies analysing high-dimensional real genomic data and low-dimensional psychometric data. Finally, Sect.
7 includes a discussion and the main conclusions of the study.
3 Singular value decomposition
Given a matrix
\({{\varvec{X}}}_{IJ}\) of rank
\(R\le \mathrm{min}(I,J)\), the SVD of
\({\varvec{X}}\) is defined as the product:
$${{\varvec{X}}}_{IJ}={{\varvec{U}}}_{IR}{{\varvec{D}}}_{R}{{{\varvec{V}}}^{T}}_{RJ}$$
(1)
where
\({\varvec{U}}=[{{\varvec{u}}}_{1},\dots ,{{\varvec{u}}}_{I}]\) and
\({\varvec{V}}=[{{\varvec{v}}}_{1},\dots ,{{\varvec{v}}}_{J}]\) are orthonormal matrices,
\({{\varvec{U}}}^{T}{\varvec{U}}={\varvec{I}}\) and
\({{\varvec{V}}}^{T}{\varvec{V}}={\varvec{I}}\).
\({\varvec{U}}\) contains the left-singular vectors of the SVD in columns,
\({\varvec{V}}\) contains the right-singular vectors and
\({\varvec{D}}\) is a diagonal matrix containing the
\({d}_{r}\) singular values of
\({\varvec{X}}\) \((r=1,\dots ,R)\), conveniently expressed so that
\({d}_{1}\ge {d}_{2}\ge \dots \ge {d}_{R}\ge 0\). For optimal
\(Q\le R,\) SVD provides the best low
\(Q\)-rank approximation
\({\widehat{{\varvec{X}}}}_{Q}\) of
\({\varvec{X}}\) in the sense of least squares by minimizing the
\({\ell}_{2}\) norm of the difference between the initial and the reconstructed matrices (Eckart and Young
1936; Shen and Huang
2008).
\({\widehat{{\varvec{X}}}}_{Q}\), is defined as:
$${\widehat{{\varvec{X}}}}_{Q}={{\varvec{U}}}_{IQ}{{\varvec{D}}}_{Q}{{{\varvec{V}}}^{T}}_{QJ}=\sum_{q=1}^{Q}{d}_{\mathrm{q}}{{\varvec{u}}}_{\mathrm{q}}{{{\varvec{v}}}_{\mathrm{q}}}^{T}$$
(2)
with \({{{\varvec{u}}}_{\mathrm{q}}}^{T}{{\varvec{u}}}_{\mathrm{q}}={{{\varvec{v}}}_{\mathrm{q}}}^{T}{{\varvec{v}}}_{\mathrm{q}}=1\) and \({\varvec{u}}_{q}^{T} {\varvec{u}}_{q^{\prime}} = {\varvec{v}}_{q}^{T} {\varvec{v}}_{q^{\prime}} = 0 \forall q \ne q^{\prime} \left( {q = 1, \ldots ,Q} \right)\).
Frequently, the orthogonal singular vectors of SVD are computed using the power iteration algorithm together with a deflation approach. Instead, Guillemot et al. (
2019) suggest obtaining the singular vectors by the projection onto the convex sets (POCS) algorithm (Bauschke and Combettes
2017).
3.1 Extension of constrained singular value decomposition to elastic net (CenetSVD)
C
enetSVD provides a factorization of
\({{\varvec{X}}}_{IJ}\) by means of sparse and orthogonal singular vectors (called pseudo-singular vectors) and pseudo-singular values. The key point of C
enetSVD is the calculation of sparse and orthogonal vectors simultaneously. The C
enetSVD formulation is based on the constrained optimization problem of CSVD proposed by Guillemot et al. (
2019), replacing lasso by enet restriction:
$$ \begin{gathered} \mathop {\rm{argmin}}\limits_{d,u,v} \frac{1}{2}\left\| {\varvec{X} - \mathop \sum \limits_{q = 1}^{Q} d_{q} {\varvec{u}}_{q} {\varvec{v}}_{q}^{T} } \right\|^{2}_{F} \hfill \\ s.t.\left\{ {\begin{array}{*{20}c} {\varvec{u}_{q}^{T} \varvec{u}_{q} = \varvec{v}_{q}^{T} \varvec{v}_{q} = 1, \varvec{u}_{q}^{T} \varvec{u}_{{q^{\prime } }} = \varvec{v}_{q}^{T} \varvec{v}_{q^{\prime}} = 0\quad\forall q \ne q^{\prime } } \\ {\left( {1 - \alpha } \right)\left\|\varvec{u}_{q} \right\|_{1}+ \alpha \left\|\varvec{u}_{q} \right\|^{2}_{2} \le \tau_{1,q} ; \left( {1 - \alpha } \right)\left\|\varvec{v}_{q} \right\|_{1} + \alpha \left\|\varvec{v}_{q}\right\|^{2}_{2} \le \tau_{2,q} } \\ \end{array} } \right. \hfill \\ \end{gathered} $$
(3)
where
\({\tau }_{1,q},{\tau }_{2,q}>0\) are the shrinkage parameters that control the sparsity degree included in the constrained model. The higher
\({\tau }_{1}\) or
\({\tau }_{2}\) is, the fewer sparse coefficients there are. It is important to remark that only some values for
\({\tau }_{1,q}\) and
\({\tau }_{2,q}\) lead to possible solutions (see Sect.
5.2 and Online Resource 2A). The parameter
\(\alpha \in [\mathrm{0,1})\) defines the amount of the Lasso or the Ridge constraint included in the enet restriction.
To find the solution of (3), an iterative process is defined (Guillemot et al.
2019). First, it is necessary to establish an equivalent form of the previous minimization problem. Equation (
3) is equivalent to:
$$ \begin{gathered} \mathop {\rm{argmax}}\limits_{u,v} \varvec{u}^{T} \varvec{Xv} \hfill \\ s.t.\left\{ {\begin{array}{*{20}c} {\varvec{u}^{T} \varvec{u} \le 1, \varvec{v}^{T} \varvec{v} \le 1, \varvec{u}^{T} \varvec{u}_{q^{\prime}} = \varvec{v}^{T} \varvec{v}_{q^{\prime}} = 0\quad \forall q^{\prime} < q} \\ {\left( {1 - \alpha_{1} } \right)\left\|{\varvec{u}}\right\|_{1} + \alpha_{1} \left\|{\varvec{u}}\right\|_{2}^{2} \le \tau_{1,q} ; \left( {1 - \alpha_{2} } \right)\left\|{\varvec{v}}\right\|_{1} + \alpha_{2} \left\|{\varvec{v}}\right\|_{2}^{2} \le \tau_{2,q} } \\ \end{array} } \right. \hfill \\ \end{gathered} $$
(4)
With
\({\varvec{v}}_{q^{\prime}}\) and
\({\varvec{u}}_{q^{\prime}}\) previously calculated,
\(0 \le q^{\prime } < q\) and
\(q\ge 1\). Equation (
4) is solved by block relaxation, an iterative process that resolves two alternating parts:
(1) Find the vector solution
\({\varvec{u}}\) optimizing the function with
\({\varvec{v}}\) fixed:
$$ \begin{gathered} \mathop {\arg \min }\limits_{u} \frac{1}{2}\left\| {\varvec{u} - \varvec{Xv}} \right\|_{F}^{2} \hfill \\ s.t.\left\{ {\varvec{u} \in {\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right),} \right. \varvec{u} \in {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right), \varvec{u} \in \varvec{U}^{ \bot } \leftrightarrow \varvec{u} \in {\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right) \cap {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right) \hfill \\ \end{gathered} $$
(5)
where
\(\varvec{U}^{ \bot }\) is the orthogonal complement of the space spanned by the columns of a matrix
\({\varvec{U}}\). These constraints involved projections of a vector onto a convex space. In addition, it should be noted that the intersection of two convex spaces is also convex. Thus, the problem can be solved by using the POCS and the PL1L2 algorithms (Gloaguen et al.
2017; Guillemot et al.
2019). The projection of a vector onto the enet ball
\(({\mathfrak{B}}_{{\ell}_{1}+{\ell}_{2}})\) proposed here follows the line of the algorithms in linear time to the projection onto the Lasso ball
\({\mathfrak{B}}_{{\ell}_{1}}\) (Berg et al.
2008; Duchi et al.
2008; Guillemot et al.
2019) and the enet ball
\({\mathfrak{B}}_{{\ell}_{1}+{\ell}_{2}}\) (Mairal et al.
2010) (see Online Resource 2B). In our case, the method for projecting a vector onto
\({\mathfrak{B}}_{{\ell}_{1}+{\ell}_{2}}\cap {\mathfrak{B}}_{{\ell}_{2}}\) is an extension of the fast and exact algorithm for the projection onto the intersection of
\({\mathfrak{B}}_{{\ell}_{1}}\) and
\({\mathfrak{B}}_{{\ell}_{2}}\) proposed by Guillemot et al. (
2019) (see Online Resource 2C).
(2) Maximize (4) to find the vector
\({\varvec{v}}\) solution with
\({\varvec{u}}\) fixed:
$$ \begin{gathered} \mathop {\rm{argmin}}\limits_{u} \frac{1}{2}\left\| {\varvec{v} - \varvec{X}^{\varvec{T}} \varvec{u} } \right\|_{F}^{2} \hfill \\ s.t.\left\{ {\varvec{v} \in } \right. {\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right), {\varvec{v}} \in {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right), {\varvec{v}} \in {\varvec{V}}^{ \bot } \leftrightarrow {\varvec{v}} \in {\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right) \cap {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right) \hfill \\ \end{gathered} $$
(6)
where
\(V^{ \bot }\) is the orthogonal complement of the space spanned by the columns of a matrix
\({\varvec{V}}\). The projection
\(v_{t + 1} = {\rm{proj}}^{{{\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right) \cap {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right)\varvec{ \cap V}^{ \bot } }} \left( {\varvec{X}^{T} \varvec{u}_{{\it{t}} + 1} } \right)\) is carried out in the same manner as for (1).
The global optimization problem of C
enetSVD is handled using the POCS algorithm (Table
1). Lines 6 and 7 are modified from (Guillemot et al.
2019) to address the problem of projection onto the
\({\mathfrak{B}}_{{\ell}_{1}+{\ell}_{2}}\cap {\mathfrak{B}}_{{\ell}_{2}}\) space.
Table 1
Algorithm for the implementation of CenetSVD based on the POCS algorithm
7 Conclusions
In this work, the extension of CSVD (Guillemot et al.
2019) to the enet ball is proposed, integrating sparse and orthogonal vectors simultaneously. This method is based on the projection of a vector onto the convex intersection of enet and
\({\ell}_{2}\) balls. Here, the enet constraint is shown as a suitable constraint approach, restricting coefficients to zero while ensuring that correlated variables have similar coefficients, a desirable property in disciplines such as genomics or psychometry. Our approach using C
enetSVD is useful for analysing large-scale problems with
\(J\gg I\) and datasets with
\(I>J\). Additionally, C
enetSVD is extended to sparse and orthogonal constrained C
enetPCA and sparse and orthogonal constrained C
enetBiplots. These techniques provide the possibility of recognizing groups with similar patterns and the causative variables associated with them. In addition, they are variable selection techniques that improve the interpretation of results due to the sparse components established. Furthermore, this work provides a sparsity parameter selection procedure based on the cross-validation and the BIC, as well as the possibility to manually establish distinct levels of sparsity.
Future lines of research may contemplate the possibility of applying other types of constraints within the CSVD framework or even the proposal of other algorithms for projecting a vector onto non-convex sets based on the correspondent mathematical theory. Additionally, statistical techniques of two-way and three-way data analysis could be developed through CenetSVD. We conclude that our proposed methods are promising tools for conducting multivariate analysis and are applicable to a wide range of research areas.
Acknowledgements
“Copyright (c) 2018 Vincent Guillemot. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE”.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.