Skip to main content
Erschienen in: Advances in Data Analysis and Classification 1/2023

Open Access 20.11.2021 | Regular Article

CenetBiplot: a new proposal of sparse and orthogonal biplots methods by means of elastic net CSVD

verfasst von: Nerea González-García, Ana Belén Nieto-Librero, Purificación Galindo-Villardón

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 1/2023

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, a new mathematical algorithm for sparse and orthogonal constrained biplots, called CenetBiplots, is proposed. Biplots provide a joint representation of observations and variables of a multidimensional matrix in the same reference system. In this subspace the relationships between them can be interpreted in terms of geometric elements. CenetBiplots projects a matrix onto a low-dimensional space generated simultaneously by sparse and orthogonal principal components. Sparsity is desired to select variables automatically, and orthogonality is necessary to keep the geometrical properties that ensure the biplots graphical interpretation. To this purpose, the present study focuses on two different objectives: 1) the extension of constrained singular value decomposition to incorporate an elastic net sparse constraint (CenetSVD), and 2) the implementation of CenetBiplots using CenetSVD. The usefulness of the proposed methodologies for analysing high-dimensional and low-dimensional matrices is shown. Our method is implemented in R software and available for download from https://​github.​com/​ananieto/​SparseCenetMA.
Hinweise

Supplementary Information

The online version contains supplementary material available at https://​doi.​org/​10.​1007/​s11634-021-00468-1.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Principal component analysis (PCA) is the most widely used multivariate statistical technique for projecting a data set onto a lower dimensional space, preserving as much variability as possible (Jolliffe et al. 2016). The basis vectors of this new subspace known as principal components (PCs) are obtained as a linear combination of the original variables. The coefficients of that combination are called loadings. Each PC is calculated in terms of all variables and the results can be difficult to interpret. Therefore, several alternatives have been proposed to produce modified PCs with some zero loadings (named sparse loadings). These alternatives are known as sparse PCA (Jolliffe et al. 2003; Zou et al. 2006; Shen and Huang 2008; Journée et al. 2010; Li et al. 2016). This purpose is achieved by adding sparse-promoting constraints in the optimization problem. Different constraint techniques are proposed in the literature but some of the most used are Ridge (Hoerl and Kennard 1988) and Lasso (Tibshirani 1996). Ridge shrinks the coefficients towards zero and encourages highly correlated variables to have similar coefficients. On the other hand, Lasso makes some of them zero, but tends to choose a single variable from a set of highly correlated variables, discarding the others. To overcome this, Zou and Hastie (2005) proposed the use of elastic net (enet), which combines Lasso and Ridge to preserve both favourable properties. In addition, enet is particularly useful when the number of variables is higher than the number of observations (Zou and Hastie, 2005).
It is important to emphasise that in some of these Sparse PCA techniques, the loading matrix orthogonality is lost at the expense of sparsity. Thus, some authors, such as Trendafilov (2014) and Genicot et al. (2015), provide sparse and orthogonal components simultaneously.
The coordinates of the observations and the variables in the first components are used to graphically represent them in the score and loading plots, respectively. To visualize them on the same reference system simultaneously, Gabriel (1971) and Galindo (1986) proposed the use of biplot methods. These techniques have been applied in several fields (Xavier et al. 2018; Amor-Esteban et al. 2019; Carrasco et al. 2019; Bernal et al. 2020). Biplots define a common reference system where the rows and the columns of a matrix can be jointly displayed. So, the relationships between them can be interpreted by means of geometric elements in a Euclidean space (distances, angles, projections, …) (Gabriel 1971; Galindo 1986).
In the case of sparse biplots, there are only two techniques mentioned in the literature related to sparse loadings: CDBiplot (Nieto-Librero et al. 2017), based on the CDPCA of Vichi and Saporta (2009), and Elastic-net HJ-Biplot (Cubilla-Montilla et al. 2021), based on the SPCA of Zou et al. (2006). On the one hand, CDBiplot extracts disjoint PCs, in which each original variable only contributes to the construction of one dimension. On the other hand, Elastic-net HJ-Biplot does not provide orthogonal sparse PCs, even though orthogonality is necessary to keep the geometrical properties that allow biplots interpretation. Also, in this work biplot coordinates are estimated once the sparse loading matrix is obtained from the SPCA (Zou et al. 2006). Nevertheless, as some authors have pointed out it is important to obtain the results in the same optimization process and not in a tandem analysis (Vichi and Saporta 2009; Nieto-Librero et al. 2017).
All things considered, our main objective is to propose a new mathematical technique, called CenetBiplots, that simultaneously incorporates the orthogonality of PCs and the selection of variables by means of the enet sparse constraint. Since the biplot solution is obtained from the singular value decomposition (SVD) (Gabriel 1971; Galindo 1986), our research is focused on the sparse and orthogonal SVD via Lasso proposed by Guillemot et al. (2019) but imposing the enet constraint to overcome the disadvantages mentioned above.
Therefore, this work is structured as follows. Section 2 includes the notation paragraph and Sect. 3 defines the extension of constrained singular value decomposition as the solution of a convex-optimization problem with enet and orthogonality restrictions. Section 3 also shows the algorithm used to solve CenetSVD, extending the projection onto convex sets (POCS) algorithm in the sense of a divide and conquer algorithm. Section 4 presents the implementation of the sparse and orthogonal biplot methods, known as the CenetBiplots. The selection of the sparsity parameters is proposed in Sect. 5. Section 6 shows the usefulness of these methodologies analysing high-dimensional real genomic data and low-dimensional psychometric data. Finally, Sect. 7 includes a discussion and the main conclusions of the study.

2 Notation

We present below the notation and terminology used in this manuscript. \({{\varvec{X}}}_{IJ}\) denotes a matrix with the information of \(I\) observations in the rows and \(J\) variables in the columns. The elements of a matrix \({\varvec{X}}\) are denoted as \({x}_{ij}\). The transpose of a matrix \({\varvec{X}}\) is denoted by \({{\varvec{X}}}^{\mathrm{T}}\), and its inverse, as \({{\varvec{X}}}^{-1}\). The \({\ell}_{2}\) norm of a matrix \({\varvec{X}}\) is defined by \({\Vert {\varvec{X}}\Vert }_{{\varvec{F}}}^{2}\). The \({\ell}_{2}\) norm of a vector \({\varvec{x}}={\{{x}_{j}\}}_{j=1}^{J}\) is computed by \(\sqrt{\sum_{j}{{x}_{j}}^{2}}\), and the \({\ell}_{1}\) norm is calculated by \(\sum_{j}\left|{x}_{j}\right|\). A vector is normalized when it is divided by its \({\ell}_{2}\) norm. Constraint balls are defined as the regions \({\mathfrak{B}}_{\tau }^{{\ell}_{2}}({\varvec{x}})=\{{\varvec{x}} / {\Vert {\varvec{x}}\Vert }_{2}^{2}\le \tau \}\), \({\mathfrak{B}}_{\tau }^{{\ell}_{1}}({\varvec{x}})=\{{\varvec{x}} / {\Vert {\varvec{x}}\Vert }_{1}\le \tau \}\) and \({\mathfrak{B}}_{\tau }^{{\ell}_{1}+{\ell}_{2}}({\varvec{x}})=\{{\varvec{x}} / {(1-\mathrm{\alpha })\Vert {\varvec{x}}\Vert }_{1}+\alpha {\Vert {\varvec{x}}\Vert }_{2}^{2}\le \tau \}\) for some \(\alpha \in [\mathrm{0,1}]\).

3 Singular value decomposition

Given a matrix \({{\varvec{X}}}_{IJ}\) of rank \(R\le \mathrm{min}(I,J)\), the SVD of \({\varvec{X}}\) is defined as the product:
$${{\varvec{X}}}_{IJ}={{\varvec{U}}}_{IR}{{\varvec{D}}}_{R}{{{\varvec{V}}}^{T}}_{RJ}$$
(1)
where \({\varvec{U}}=[{{\varvec{u}}}_{1},\dots ,{{\varvec{u}}}_{I}]\) and \({\varvec{V}}=[{{\varvec{v}}}_{1},\dots ,{{\varvec{v}}}_{J}]\) are orthonormal matrices,\({{\varvec{U}}}^{T}{\varvec{U}}={\varvec{I}}\) and \({{\varvec{V}}}^{T}{\varvec{V}}={\varvec{I}}\). \({\varvec{U}}\) contains the left-singular vectors of the SVD in columns, \({\varvec{V}}\) contains the right-singular vectors and \({\varvec{D}}\) is a diagonal matrix containing the \({d}_{r}\) singular values of \({\varvec{X}}\) \((r=1,\dots ,R)\), conveniently expressed so that \({d}_{1}\ge {d}_{2}\ge \dots \ge {d}_{R}\ge 0\). For optimal \(Q\le R,\) SVD provides the best low \(Q\)-rank approximation \({\widehat{{\varvec{X}}}}_{Q}\) of \({\varvec{X}}\) in the sense of least squares by minimizing the \({\ell}_{2}\) norm of the difference between the initial and the reconstructed matrices (Eckart and Young 1936; Shen and Huang 2008). \({\widehat{{\varvec{X}}}}_{Q}\), is defined as:
$${\widehat{{\varvec{X}}}}_{Q}={{\varvec{U}}}_{IQ}{{\varvec{D}}}_{Q}{{{\varvec{V}}}^{T}}_{QJ}=\sum_{q=1}^{Q}{d}_{\mathrm{q}}{{\varvec{u}}}_{\mathrm{q}}{{{\varvec{v}}}_{\mathrm{q}}}^{T}$$
(2)
with \({{{\varvec{u}}}_{\mathrm{q}}}^{T}{{\varvec{u}}}_{\mathrm{q}}={{{\varvec{v}}}_{\mathrm{q}}}^{T}{{\varvec{v}}}_{\mathrm{q}}=1\) and \({\varvec{u}}_{q}^{T} {\varvec{u}}_{q^{\prime}} = {\varvec{v}}_{q}^{T} {\varvec{v}}_{q^{\prime}} = 0 \forall q \ne q^{\prime} \left( {q = 1, \ldots ,Q} \right)\).
Frequently, the orthogonal singular vectors of SVD are computed using the power iteration algorithm together with a deflation approach. Instead, Guillemot et al. (2019) suggest obtaining the singular vectors by the projection onto the convex sets (POCS) algorithm (Bauschke and Combettes 2017).

3.1 Extension of constrained singular value decomposition to elastic net (CenetSVD)

CenetSVD provides a factorization of \({{\varvec{X}}}_{IJ}\) by means of sparse and orthogonal singular vectors (called pseudo-singular vectors) and pseudo-singular values. The key point of CenetSVD is the calculation of sparse and orthogonal vectors simultaneously. The CenetSVD formulation is based on the constrained optimization problem of CSVD proposed by Guillemot et al. (2019), replacing lasso by enet restriction:
$$ \begin{gathered} \mathop {\rm{argmin}}\limits_{d,u,v} \frac{1}{2}\left\| {\varvec{X} - \mathop \sum \limits_{q = 1}^{Q} d_{q} {\varvec{u}}_{q} {\varvec{v}}_{q}^{T} } \right\|^{2}_{F} \hfill \\ s.t.\left\{ {\begin{array}{*{20}c} {\varvec{u}_{q}^{T} \varvec{u}_{q} = \varvec{v}_{q}^{T} \varvec{v}_{q} = 1, \varvec{u}_{q}^{T} \varvec{u}_{{q^{\prime } }} = \varvec{v}_{q}^{T} \varvec{v}_{q^{\prime}} = 0\quad\forall q \ne q^{\prime } } \\ {\left( {1 - \alpha } \right)\left\|\varvec{u}_{q} \right\|_{1}+ \alpha \left\|\varvec{u}_{q} \right\|^{2}_{2} \le \tau_{1,q} ; \left( {1 - \alpha } \right)\left\|\varvec{v}_{q} \right\|_{1} + \alpha \left\|\varvec{v}_{q}\right\|^{2}_{2} \le \tau_{2,q} } \\ \end{array} } \right. \hfill \\ \end{gathered} $$
(3)
where \({\tau }_{1,q},{\tau }_{2,q}>0\) are the shrinkage parameters that control the sparsity degree included in the constrained model. The higher \({\tau }_{1}\) or \({\tau }_{2}\) is, the fewer sparse coefficients there are. It is important to remark that only some values for \({\tau }_{1,q}\) and \({\tau }_{2,q}\) lead to possible solutions (see Sect. 5.2 and Online Resource 2A). The parameter \(\alpha \in [\mathrm{0,1})\) defines the amount of the Lasso or the Ridge constraint included in the enet restriction.
To find the solution of (3), an iterative process is defined (Guillemot et al. 2019). First, it is necessary to establish an equivalent form of the previous minimization problem. Equation (3) is equivalent to:
$$ \begin{gathered} \mathop {\rm{argmax}}\limits_{u,v} \varvec{u}^{T} \varvec{Xv} \hfill \\ s.t.\left\{ {\begin{array}{*{20}c} {\varvec{u}^{T} \varvec{u} \le 1, \varvec{v}^{T} \varvec{v} \le 1, \varvec{u}^{T} \varvec{u}_{q^{\prime}} = \varvec{v}^{T} \varvec{v}_{q^{\prime}} = 0\quad \forall q^{\prime} < q} \\ {\left( {1 - \alpha_{1} } \right)\left\|{\varvec{u}}\right\|_{1} + \alpha_{1} \left\|{\varvec{u}}\right\|_{2}^{2} \le \tau_{1,q} ; \left( {1 - \alpha_{2} } \right)\left\|{\varvec{v}}\right\|_{1} + \alpha_{2} \left\|{\varvec{v}}\right\|_{2}^{2} \le \tau_{2,q} } \\ \end{array} } \right. \hfill \\ \end{gathered} $$
(4)
With \({\varvec{v}}_{q^{\prime}}\) and \({\varvec{u}}_{q^{\prime}}\) previously calculated, \(0 \le q^{\prime } < q\) and \(q\ge 1\). Equation (4) is solved by block relaxation, an iterative process that resolves two alternating parts:
(1) Find the vector solution \({\varvec{u}}\) optimizing the function with \({\varvec{v}}\) fixed:
$$ \begin{gathered} \mathop {\arg \min }\limits_{u} \frac{1}{2}\left\| {\varvec{u} - \varvec{Xv}} \right\|_{F}^{2} \hfill \\ s.t.\left\{ {\varvec{u} \in {\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right),} \right. \varvec{u} \in {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right), \varvec{u} \in \varvec{U}^{ \bot } \leftrightarrow \varvec{u} \in {\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right) \cap {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right) \hfill \\ \end{gathered} $$
(5)
where \(\varvec{U}^{ \bot }\) is the orthogonal complement of the space spanned by the columns of a matrix \({\varvec{U}}\). These constraints involved projections of a vector onto a convex space. In addition, it should be noted that the intersection of two convex spaces is also convex. Thus, the problem can be solved by using the POCS and the PL1L2 algorithms (Gloaguen et al. 2017; Guillemot et al. 2019). The projection of a vector onto the enet ball \(({\mathfrak{B}}_{{\ell}_{1}+{\ell}_{2}})\) proposed here follows the line of the algorithms in linear time to the projection onto the Lasso ball \({\mathfrak{B}}_{{\ell}_{1}}\) (Berg et al. 2008; Duchi et al. 2008; Guillemot et al. 2019) and the enet ball \({\mathfrak{B}}_{{\ell}_{1}+{\ell}_{2}}\) (Mairal et al. 2010) (see Online Resource 2B). In our case, the method for projecting a vector onto \({\mathfrak{B}}_{{\ell}_{1}+{\ell}_{2}}\cap {\mathfrak{B}}_{{\ell}_{2}}\) is an extension of the fast and exact algorithm for the projection onto the intersection of \({\mathfrak{B}}_{{\ell}_{1}}\) and \({\mathfrak{B}}_{{\ell}_{2}}\) proposed by Guillemot et al. (2019) (see Online Resource 2C).
(2) Maximize (4) to find the vector \({\varvec{v}}\) solution with \({\varvec{u}}\) fixed:
$$ \begin{gathered} \mathop {\rm{argmin}}\limits_{u} \frac{1}{2}\left\| {\varvec{v} - \varvec{X}^{\varvec{T}} \varvec{u} } \right\|_{F}^{2} \hfill \\ s.t.\left\{ {\varvec{v} \in } \right. {\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right), {\varvec{v}} \in {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right), {\varvec{v}} \in {\varvec{V}}^{ \bot } \leftrightarrow {\varvec{v}} \in {\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right) \cap {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right) \hfill \\ \end{gathered} $$
(6)
where \(V^{ \bot }\) is the orthogonal complement of the space spanned by the columns of a matrix\({\varvec{V}}\). The projection \(v_{t + 1} = {\rm{proj}}^{{{\mathfrak{B}}_{{\ell_{1} + \ell_{2} }} \left( \tau \right) \cap {\mathfrak{B}}_{{\ell_{2} }} \left( 1 \right)\varvec{ \cap V}^{ \bot } }} \left( {\varvec{X}^{T} \varvec{u}_{{\it{t}} + 1} } \right)\) is carried out in the same manner as for (1).
The global optimization problem of CenetSVD is handled using the POCS algorithm (Table 1). Lines 6 and 7 are modified from (Guillemot et al. 2019) to address the problem of projection onto the \({\mathfrak{B}}_{{\ell}_{1}+{\ell}_{2}}\cap {\mathfrak{B}}_{{\ell}_{2}}\) space.
Table 1
Algorithm for the implementation of CenetSVD based on the POCS algorithm
https://static-content.springer.com/image/art%3A10.1007%2Fs11634-021-00468-1/MediaObjects/11634_2021_468_Tab1_HTML.png

4 Extension of CenetSVD to the biplot methods

The results of traditional PCA and Biplot methods are often calculated from the SVD of a matrix. Consequently, sparse and orthogonal constrained PCA (CenetPCA) and sparse and orthogonal constrained biplots (CenetBiplots) are obtained from the results of the CenetSVD. Set the CenetSVD of \({\varvec{X}}\) as the low \(Q\)-rank approximation of the original matrix \({\varvec{X}}\approx {{\varvec{U}}}_{enet}{\varvec{D}}{{{\varvec{V}}}^{T}}_{enet}\), where \({{\varvec{U}}}_{enet}\) and \({{\varvec{V}}}_{enet}\) are sparse and orthonormal matrices.

4.1 Sparse and orthogonal PCA

The main objective of CenetPCA is to project the original matrix onto a subspace determined by a new set of \(Q<J\) sparse and orthogonal PCs. Given \({{\varvec{X}}}_{IJ}\), the sparse \(q\)-CenetPC is defined as a linear combination of the initial variables as:
$${{\varvec{Y}}}_{IQ}={{\varvec{X}}}_{IJ} {{{\varvec{V}}}_{enet}}_{JQ}$$
(7)
being \({{\varvec{Y}}}_{IQ}\) the score matrix, which contains the coordinates of the observations in the new subspace, and \({{\varvec{V}}}_{\mathrm{enet}}\) the sparse loading matrix. A flowchart of CenetPCA is shown in Fig. 1a.

4.2 Sparse and orthogonal Biplots

Biplot methods are optimal tools to visualize multivariate data in a low-dimensional space. The relationship between the observations and the variables can be interpreted in a scatterplot because of the inner product properties. Gabriel (1971) proposed the JK and GH biplots, which assign an optimal quality of representation to rows (GH-Biplot) and columns (JK-Biplot) in the same Euclidean space. In order to provide the highest-quality representation for both rows and columns in the same reference system, Galindo (1986) proposed the HJ-Biplot. To this end, the original matrix is factorized as \({\varvec{X}}\approx {\varvec{A}}{{\varvec{B}}}^{T},\) so that the inner product \({{\varvec{a}}}_{i}^{\mathrm{T}}{{\varvec{b}}}_{j}\) approximates the element \({x}_{ij}\) as closely as possible. \({{\varvec{A}}}_{IQ}\) and \({{\varvec{B}}}_{JQ}\) are the row and the column markers matrices, respectively. The pseudo-singular vectors and values obtained from the CenetSVD of \({\varvec{X}}\) are used to implement sparse and orthogonal biplots. CenetJK-Biplot sets the row and column markers as \({\varvec{A}}={{\varvec{U}}}_{enet}{\varvec{D}}\) and \({\varvec{B}}={{\varvec{V}}}_{enet}\), CenetGH-Biplot sets \({\varvec{A}}={{\varvec{U}}}_{enet}\) and \({\varvec{B}}={{\varvec{V}}}_{enet}{\varvec{D}}\), and finally CenetHJ-Biplot sets \({\varvec{A}}={{\varvec{U}}}_{enet}{\varvec{D}}\) and \({\varvec{B}}={{\varvec{V}}}_{enet}{\varvec{D}}\), respectively (Fig. 1b).
To interpret the CenetBiplots graphical representation, it is important to note that i) observations are represented by dots and variables by vectors in the graph; ii) distances between points show the dissimilarities between observations; iii) lengths of the vectors refer to the variability of the variables; iv) relationships between variables are interpreted from the cosine of the angles between the corresponding vectors (obtuse: inverse relationship; acute: direct relationship; right angle: linear independence); and v) projections of the points in the direction of a vector approximate the values of the variables for those observations.

5 Selection of the sparsity parameters

5.1 Selection of \(\alpha \)and\((1-\alpha )\)

The parameter \(\alpha \in [\mathrm{0,1})\) defines the amount of Lasso or Ridge constraint included in the enet restriction. Usually, \(\alpha \) is set to \(0.5\) to aggregate the same Lasso and Ridge constraint to the model. In practice, the parameter \(\alpha \) could be selected manually. But also, we suggest to select the \(\alpha \) that minimizes the cross-validation error (James et al. 2014) through an iterative process. This process starts defining an increasing sequence of possible values for \(\alpha \) and segmenting the matrix into \(n=10\) folds. For each of the \(n\)-folds, the training matrix is conformed by the observations of the submatrices formed in the \((n-1)\)-folds. The remaining observations constitute the test matrix. Then, the iterative process for each of the \(\boldsymbol{\alpha }=({\alpha }_{1},...,{\alpha }_{t})\) values is initiated. CenetSVD of the training matrix is carried out, and the matrix \({{\varvec{V}}}_{enet}\) is used to calculate the mean reconstruction error \({\mathrm{MSE}}_{{\alpha }_{1},1}\) of the test matrix:
$${{\mathrm{MSE}}_{{\alpha }_{1},1}=\Vert {{\varvec{X}}}_{TEST}-{\widehat{{\varvec{X}}}}_{TEST}\Vert }^{2}={\Vert {{\varvec{X}}}_{TEST}-{{\varvec{X}}}_{TEST}{{\varvec{V}}}_{enet}{{{\varvec{V}}}_{enet}}^{T}\Vert }^{2}$$
(8)
Reconstruction errors \({\mathrm{MSE}}_{{\alpha }_{1},1}, ..., {\mathrm{MSE}}_{{\alpha }_{1},n}\) are obtained for each of the folds; hence, the final \({\rm{MSE}}_{{\alpha }_{1}}\) is computed as the mean of the errors \({\mathrm{MSE}}_{{\alpha }_{1},1:n}\).
This step is repeated for the whole sequence of \({\alpha }_{t}\) values. The optimum \(\alpha \) is the one that provides the minimum \(\mathrm{MSE}=\mathrm{min}\left\{{\rm{MSE}}_{{\alpha }_{1}}, \dots , {\rm{MSE}}_{{\alpha }_{t}}\right\}\).

5.2 Selection of\(\tau \)

The shrinkage parameter \(\tau \) inversely controls the degree of sparsity; that is, the larger its value is, the fewer zero loadings. In this work, we propose to select \(\tau \) using the Bayesian information criterion (BIC) as in (Guo and James 2010; Croux et al. 2013):
$$\mathrm{BIC}(\tau )=\frac{{\Vert {\varvec{X}}-{\varvec{X}}{{\varvec{V}}}_{enet}{{\varvec{V}}}_{enet}^{T}\Vert }^{2}}{{\Vert {\varvec{X}}-{\varvec{X}}{\varvec{V}}{{\varvec{V}}}^{T}\Vert }^{2}}+\mathrm{df}(\tau )\frac{\mathrm{log}(I)}{I}$$
(9)
where \({{\varvec{X}}}_{IJ}\) is the original matrix, \({{\varvec{V}}}_{enet}\) is the right-pseudo-singular vector matrix obtained from CenetSVD, \({\varvec{V}}\) is the right-singular vector matrix obtained from unconstrained SVD and \(\mathrm{df}(\tau )\) is the number of non-zero elements in \({{\varvec{V}}}_{enet}\). The parameter \(\tau \) that minimizes \(\mathrm{BIC}(\tau )\) is selected from a sequence of possible \(\tau \in [1,(1-\alpha )\sqrt{J}+\alpha ]\). As stated by Guillemot et al. (2019) and Witten et al. (2009), only some values of the constraints lead to solutions (see Online Resource 2).

6 Analysing data with Cenetmethods

In this section, we illustrate the performance of CenetPCA and CenetHJ-Biplot to analyse two matrices in different contexts: 1) the Object-Spatial Imagery Questionnaire (OSIQ) data \((J<I)\) and 2) the MILE gene expression data \((J>>I)\). All data analyses were performed using the R software (R Core Team 2020). All figures were illustrated using the ggplot2 library (Wickham 2016).

6.1 Analysing the mental imagery questionnaire data: a psychometric example

The OSIQ dataset is publicly available on GitHub (https://​github.​com/​HerveAbdi/​data4PCCAR).
To show the usefulness of CenetHJ-Biplot in a toy example in which the number of variables is less than the number of observations, a random selection of 100 simulated participants who responded to the items of the OSIQ scale (Blajenkova et al. 2006) was carried out. This scale consists of 30 items rated on a 5-category Likert scale, which are structured around two latent dimensions: the spatial and object imagery scales. The correlation matrix between items is shown in Online Resource 1.
Two-dimensional solutions of the traditional HJ-Biplot and CenetHJ-Biplot are shown in Fig. 2. To obtain a low sparsity degree, CenetHJ-Biplot was employed with a shrinkage parameter \({\tau }_{2}=(0.5\sqrt{J}+0.5)\cdot (3/4)\) (low level of sparsity). There is no sparsity restriction imposed on individuals. In both plots, participants are represented by dots, and questionnaire items, by arrows. The cosine of the angles between the vectors reflects the relationships between items. Thus, items of the same theoretical dimension are strongly and directly correlated (acute angles). As can be seen, if the CenetHJ-Biplot is applied (Fig. 2b), the relationships of the items in the two constructed dimensions, even with low constraints, reflect the theoretical psychometric structure more clearly than in the case of the HJ-Biplot (Fig. 2a). Additionally, the item o15 correlates directly and weakly with the items of its own scale and correlates inversely and strongly with the items of the other scale in the results of HJ-Biplot (Fig. 1a). The CenetHJ-Biplot tends to maintain these relationships (Fig. 2b). The same can be observed with items s06, s11 and o10, which do not correlate with their dimension items but whose relationships to the items of another subscale are preserved.

6.2 Analysing leukaemia gene expression data: a genomic example

The use of statistical methods to analyse microarray data involving variable selection has become increasingly important in tumours classification context (Algamal and Lee 2015). In the case of gene expression data, it is common that the number of genes exceeds the number of samples. Frequently only a few genes have relevant information and therefore the use of sparse constraints for the automatic selection of genes is necessary.
To show the usefulness of CenetPCA and CenetHJ-Biplot, in this section we analyse 216 leukaemia patients randomly selected from the GSE13204 series available at the Gene Expression Omnibus (GEO) repository. Among our samples, 32.9% had been classified as acute lymphoblastic leukaemia (ALL, n = 71), and 34.3%, as chronic lymphocytic leukaemia (CLL, n = 74); 32.9% of the samples were control patients (n = 71). RNA gene expression data were extracted from the Affymetrix Microarray Platform Human Genome U133 Plus 2.0 Array (HGU133Plus2). Data preprocessing was carried out using RMA normalization. For illustrative purposes, 2,000 genes showing the greatest variability from the CUR decomposition leverage scores were selected (more detailed information can be found in (Mahoney and Drineas 2009)).
Traditional PCA was performed on the centred matrix of 216 samples and 2,000 probe sets. The 45.5% of the data variability is explained by the two components retained. Figure 3a shows the score plot of components 1–2. The scores revealed three subgroups of samples clearly differentiated corresponding to control (rectangle), ALL (circle) and CLL (triangle) samples. The first PC discriminates between control and CLL samples, with the CLL samples located on the positive side of axis 1. The ALL samples were differentiated from the CLL and the control samples by their 2-axis coordinates. Figure 3a (middle, bottom) shows that each PC is a linear combination of a large number of genes, making them hardly interpretable despite their good discriminatory capacity.
To assess the effects of sparsification, CenetPCA is performed using \(\alpha =0.5\) (i.e., providing equal weight to the Lasso and Ridge constraints) and \(\tau =5.86\). The last one was selected according to the BIC criterion and the previously separation of the groups considered. The 25.8% of the data variability is explained by the two sparse components retained. Figure 3b reflects the score plot of the samples in the first two sparse PCs. The same classification is achieved, as shown in Fig. 3a (top). The loading plots of CenetPCA show the genes automatically selected with this technique (Fig. 3b, middle-bottom). Those genes with higher loadings in the PCs are those with higher loadings in the sparse CenetPCs. Additionally, the variables with lower loadings in the PCs have a zero-value loading in the sparse CenetPCs (or close to 0). This fact simplifies the interpretation of the constrained components obtained. From the 2,000 gene probes considered, 1,752 present zero loadings in both CenetPCs, and from the remaining 248 non-zero loadings, 43 gene probes present loadings lower than 0.01 along both constrained components.
Finally, to characterise the influence of the selected gene probes in the separation of the above-mentioned groups, the gene expression centred matrix of the 216 observations and 205 resultant gene probes of CenetPCA were analysed via CenetHJ-Biplot. The sparsity constraint \({\tau }_{2}\) was fixed at a medium level of sparsity (\(\tau =((1-\alpha )\sqrt{J}+\alpha )\cdot (1/3)\)) (Guillemot et al. 2019). The 27.6% of the data variability is explained by the two sparse components retained. The CenetHJ-Biplot results are shown in Fig. 4. The first axis shows the separation of CLL samples from control and ALL samples. Axis 2 is a gradient direction differentiating the ALL samples from the rest. The genes that do not appear on the plane have null coordinates. In addition, the CenetHJ-Biplot representation makes possible the recognition of a genetic characterization of each of the subgroups. In this sense, control samples are characterized by a high expression of S100A12 and S100A9. These genes are responsible for discriminating between control and tumour samples. CLL samples are differentiated from control and ALL samples because it presents higher expression levels of FCMR and lower expression levels of DEFA1, LTF, HDB, HBB, HBA1, HBM and RFLNB genes.

7 Conclusions

In this work, the extension of CSVD (Guillemot et al. 2019) to the enet ball is proposed, integrating sparse and orthogonal vectors simultaneously. This method is based on the projection of a vector onto the convex intersection of enet and \({\ell}_{2}\) balls. Here, the enet constraint is shown as a suitable constraint approach, restricting coefficients to zero while ensuring that correlated variables have similar coefficients, a desirable property in disciplines such as genomics or psychometry. Our approach using CenetSVD is useful for analysing large-scale problems with \(J\gg I\) and datasets with \(I>J\). Additionally, CenetSVD is extended to sparse and orthogonal constrained CenetPCA and sparse and orthogonal constrained CenetBiplots. These techniques provide the possibility of recognizing groups with similar patterns and the causative variables associated with them. In addition, they are variable selection techniques that improve the interpretation of results due to the sparse components established. Furthermore, this work provides a sparsity parameter selection procedure based on the cross-validation and the BIC, as well as the possibility to manually establish distinct levels of sparsity.
Future lines of research may contemplate the possibility of applying other types of constraints within the CSVD framework or even the proposal of other algorithms for projecting a vector onto non-convex sets based on the correspondent mathematical theory. Additionally, statistical techniques of two-way and three-way data analysis could be developed through CenetSVD. We conclude that our proposed methods are promising tools for conducting multivariate analysis and are applicable to a wide range of research areas.
Our methods are available as R functions in the GitHub repository in SparseCenetMA (https://​github.​com/​ananieto/​SparseCenetMA).

Acknowledgements

The authors would like to thank Guillemot et al. for the possibility of using the R public code of CSVD that can be found at https://​github.​com/​vguillemot/​csvd, with copyright:
“Copyright (c) 2018 Vincent Guillemot. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE”.

Declarations

Conflict of interest

The authors declared that they have no conflicts of interest.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Anhänge

Supplementary Information

Below is the link to the electronic supplementary material.
Literatur
Zurück zum Zitat Bauschke HH, Combettes PL (2017) Convex analysis and monotone operator theory in hilbert spaces. Springer, ChamCrossRefMATH Bauschke HH, Combettes PL (2017) Convex analysis and monotone operator theory in hilbert spaces. Springer, ChamCrossRefMATH
Zurück zum Zitat Berg E, Schmidt M, Friedlander M, Murphy K (2008) Group sparsity via linear-time projection, Tech. Rep. (TR-2008–09.) Dept. of Computer Science, UBC Berg E, Schmidt M, Friedlander M, Murphy K (2008) Group sparsity via linear-time projection, Tech. Rep. (TR-2008–09.) Dept. of Computer Science, UBC
Zurück zum Zitat Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the ℓ1-ball for learning in high dimensions. Proc 25th Int Conf Mach Learn 272–279 Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the ℓ1-ball for learning in high dimensions. Proc 25th Int Conf Mach Learn 272–279
Zurück zum Zitat Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218CrossRefMATH Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218CrossRefMATH
Zurück zum Zitat Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58:453–467MathSciNetCrossRefMATH Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58:453–467MathSciNetCrossRefMATH
Zurück zum Zitat Galindo MP (1986) An alternative for simultaneous representation: HJ-Biplot. Qüestiió 10:12–23 Galindo MP (1986) An alternative for simultaneous representation: HJ-Biplot. Qüestiió 10:12–23
Zurück zum Zitat Genicot M, Huang W, Trendafilov NT (2015) Weakly correlated sparse components with nearly orthonormal loadings. In International Conference on Geometric Science of Information (484–490). Springer, Cham Genicot M, Huang W, Trendafilov NT (2015) Weakly correlated sparse components with nearly orthonormal loadings. In International Conference on Geometric Science of Information (484–490). Springer, Cham
Zurück zum Zitat Gloaguen A, Guillemot V, Tenenhaus A, et al (2017) An efficient algorithm to satisfy l1 and l2 constraints. 49èmes Journées de Statistique. hal-01630744 Gloaguen A, Guillemot V, Tenenhaus A, et al (2017) An efficient algorithm to satisfy l1 and l2 constraints. 49èmes Journées de Statistique. hal-01630744
Zurück zum Zitat Guo J, James G (2010) Principal component analysis with sparse fused loadings. J Comput Graph Stat 19:930–946MathSciNetCrossRef Guo J, James G (2010) Principal component analysis with sparse fused loadings. J Comput Graph Stat 19:930–946MathSciNetCrossRef
Zurück zum Zitat Hoerl A, Kennard R (1988) Ridge regression. Encyclopedia of Statistical Sciences, vol 8. Wiley, New York, pp 129–136 Hoerl A, Kennard R (1988) Ridge regression. Encyclopedia of Statistical Sciences, vol 8. Wiley, New York, pp 129–136
Zurück zum Zitat James G, Witten D, Hastie T, Tibshirani R (2014) An Introduction to Statistical Learning: with Applications in R. An Introd to Stat Learn with Appl R. Springer, New York James G, Witten D, Hastie T, Tibshirani R (2014) An Introduction to Statistical Learning: with Applications in R. An Introd to Stat Learn with Appl R. Springer, New York
Zurück zum Zitat Jolliffe IT, Cadima J (2016) Principal component analysis : a review and recent developments. Philos Trans Royal Soc A 374:20150202MathSciNetCrossRefMATH Jolliffe IT, Cadima J (2016) Principal component analysis : a review and recent developments. Philos Trans Royal Soc A 374:20150202MathSciNetCrossRefMATH
Zurück zum Zitat Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553MathSciNetMATH Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553MathSciNetMATH
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetMATH
Metadaten
Titel
CenetBiplot: a new proposal of sparse and orthogonal biplots methods by means of elastic net CSVD
verfasst von
Nerea González-García
Ana Belén Nieto-Librero
Purificación Galindo-Villardón
Publikationsdatum
20.11.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 1/2023
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-021-00468-1

Weitere Artikel der Ausgabe 1/2023

Advances in Data Analysis and Classification 1/2023 Zur Ausgabe

Premium Partner