ABSTRACT
Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all numbers of non zero coefficients, with complexity O(n3), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n3). We show on toy examples and biological data that our algorithm does provide globally optimal solutions in many cases.
- Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., & Rosenwald, A. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403, 503--511.Google ScholarCross Ref
- Alon, A., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biology, 96, 6745--6750.Google Scholar
- Burer, S., & Monteiro, R. (2003). A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95, 329--357.Google ScholarCross Ref
- Cadima, J., & Jolliffe, I. T. (1995). Loadings and correlations in the interpretation of principal components. Journal of Applied Statistics, 22, 203--214.Google ScholarCross Ref
- d'Aspremont, A., El Ghaoui, L., Jordan, M., & Lanckriet, G. R. G. (2004). A direct formulation for sparse PCA using semidefinite programming. To appear in SIAM Review.Google Scholar
- El Ghaoui, L. (2006). On the quality of a semidefinite programming bound for sparse principal component analysis. ArXiV math.OC/0601448.Google Scholar
- Jolliffe, I. T. (1995). Rotation of principal components: choice of normalization constraints. Journal of Applied Statistics, 22, 29--35.Google ScholarCross Ref
- Jolliffe, I. T., Trendafilov, N., & Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, 531--547.Google ScholarCross Ref
- Kolda, T. G., & O'Leary, D. P. (2000). Algorithm 805: computation and uses of the semidiscrete matrix decomposition. ACM Transactions on Mathematical Software, 26, 415--435. Google ScholarDigital Library
- Moghaddam, B., Weiss, Y., & Avidan, S. (2006a). Generalized spectral bounds for sparse LDA. Proc. ICML. Google ScholarDigital Library
- Moghaddam, B., Weiss, Y., & Avidan, S. (2006b). Spectral bounds for sparse PCA: Exact and greedy algorithms. Advances in Neural Information Processing Systems, 18.Google Scholar
- Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM J. Comput., 24, 227--234. Google ScholarDigital Library
- Su, Y., Murali, T. M., Pavlovic, V., Schaffer, M., & Kasif, S. (2003). Rankgene: identification of diagnostic genes based on expression data. Bioinformatics, 19, 1578--1579.Google ScholarCross Ref
- Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal statistical society, series B, 58, 267--288.Google ScholarCross Ref
- Toh, K. C., Todd, M. J., & Tutuncu, R. H. (1999). SDPT3 - a MATLAB software package for semidefinite programming. Optimization Methods and Software, 11, 545--581.Google ScholarCross Ref
- Zou, H., Hastie, T., & Tibshirani, R. (2004). Sparse principal component analysis. Journal of Computational and Graphical Statistics.Google Scholar
- Full regularization path for sparse principal component analysis
Recommendations
L1-norm-based principal component analysis with adaptive regularization
Recently, some L1-norm-based principal component analysis algorithms with sparsity have been proposed for robust dimensionality reduction and processing multivariate data. The L1-norm regularization used in these methods encounters stability problems ...
rs-Sparse principal component analysis
Principal component analysis is a popular data analysis dimensionality reduction technique, aiming to project with minimum error for a given dataset into a subspace of smaller number of dimensions.In order to improve interpretability, different variants ...
Sparse principal component analysis by choice of norm
Recent years have seen the developments of several methods for sparse principal component analysis due to its importance in the analysis of high dimensional data. Despite the demonstration of their usefulness in practical applications, they are limited ...
Comments