Article

Full regularization path for sparse principal component analysis

Authors:
Alexandre d'Aspremont

Princeton University, Princeton, NJ

Princeton University, Princeton, NJ
View Profile

,
Francis R. Bach

Ecole des Mines de Paris, France

Ecole des Mines de Paris, France
View Profile

,
Laurent El Ghaoui

U. C. Berkeley, Berkeley, CA

U. C. Berkeley, Berkeley, CA
View Profile

ICML '07: Proceedings of the 24th international conference on Machine learningJune 2007Pages 177–184https://doi.org/10.1145/1273496.1273519

Published:20 June 2007Publication History

ICML '07: Proceedings of the 24th international conference on Machine learning

Pages 177–184

ABSTRACT

Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all numbers of non zero coefficients, with complexity O(n³), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n³). We show on toy examples and biological data that our algorithm does provide globally optimal solutions in many cases.

References

Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., & Rosenwald, A. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403, 503--511.Google ScholarCross Ref
Alon, A., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biology, 96, 6745--6750.Google Scholar
Burer, S., & Monteiro, R. (2003). A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95, 329--357.Google ScholarCross Ref
Cadima, J., & Jolliffe, I. T. (1995). Loadings and correlations in the interpretation of principal components. Journal of Applied Statistics, 22, 203--214.Google ScholarCross Ref
d'Aspremont, A., El Ghaoui, L., Jordan, M., & Lanckriet, G. R. G. (2004). A direct formulation for sparse PCA using semidefinite programming. To appear in SIAM Review.Google Scholar
El Ghaoui, L. (2006). On the quality of a semidefinite programming bound for sparse principal component analysis. ArXiV math.OC/0601448.Google Scholar
Jolliffe, I. T. (1995). Rotation of principal components: choice of normalization constraints. Journal of Applied Statistics, 22, 29--35.Google ScholarCross Ref
Jolliffe, I. T., Trendafilov, N., & Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, 531--547.Google ScholarCross Ref
Kolda, T. G., & O'Leary, D. P. (2000). Algorithm 805: computation and uses of the semidiscrete matrix decomposition. ACM Transactions on Mathematical Software, 26, 415--435. Google ScholarDigital Library
Moghaddam, B., Weiss, Y., & Avidan, S. (2006a). Generalized spectral bounds for sparse LDA. Proc. ICML. Google ScholarDigital Library
Moghaddam, B., Weiss, Y., & Avidan, S. (2006b). Spectral bounds for sparse PCA: Exact and greedy algorithms. Advances in Neural Information Processing Systems, 18.Google Scholar
Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM J. Comput., 24, 227--234. Google ScholarDigital Library
Su, Y., Murali, T. M., Pavlovic, V., Schaffer, M., & Kasif, S. (2003). Rankgene: identification of diagnostic genes based on expression data. Bioinformatics, 19, 1578--1579.Google ScholarCross Ref
Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal statistical society, series B, 58, 267--288.Google ScholarCross Ref
Toh, K. C., Todd, M. J., & Tutuncu, R. H. (1999). SDPT3 - a MATLAB software package for semidefinite programming. Optimization Methods and Software, 11, 545--581.Google ScholarCross Ref
Zou, H., Hastie, T., & Tibshirani, R. (2004). Sparse principal component analysis. Journal of Computational and Graphical Statistics.Google Scholar

Full regularization path for sparse principal component analysis
1. Computing methodologies

Recommendations

L1-norm-based principal component analysis with adaptive regularization

Recently, some L1-norm-based principal component analysis algorithms with sparsity have been proposed for robust dimensionality reduction and processing multivariate data. The L1-norm regularization used in these methods encounters stability problems ...
Read More
rs-Sparse principal component analysis

Principal component analysis is a popular data analysis dimensionality reduction technique, aiming to project with minimum error for a given dataset into a subspace of smaller number of dimensions.In order to improve interpretability, different variants ...
Read More
Sparse principal component analysis by choice of norm

Recent years have seen the developments of several methods for sparse principal component analysis due to its importance in the analysis of high dimensional data. Despite the demonstration of their usefulness in practical applications, they are limited ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICML '07: Proceedings of the 24th international conference on Machine learning
June 2007
1233 pages
ISBN:9781595937933
DOI:10.1145/1273496
Editor:
Zoubin Ghahramani
University of Cambridge, United Kingdom
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate140of548submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 416
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full regularization path for sparse principal component analysis

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

L1-norm-based principal component analysis with adaptive regularization

rs-Sparse principal component analysis

Sparse principal component analysis by choice of norm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Full regularization path for sparse principal component analysis

ICML '07: Proceedings of the 24th international conference on Machine learning

ABSTRACT

References

Cited By

Recommendations

L1-norm-based principal component analysis with adaptive regularization

rs-Sparse principal component analysis

Sparse principal component analysis by choice of norm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media