skip to main content
10.1145/1273496.1273519acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Full regularization path for sparse principal component analysis

Published:20 June 2007Publication History

ABSTRACT

Given a sample covariance matrix, we examine the problem of maximizing the variance explained by a particular linear combination of the input variables while constraining the number of nonzero coefficients in this combination. This is known as sparse principal component analysis and has a wide array of applications in machine learning and engineering. We formulate a new semidefinite relaxation to this problem and derive a greedy algorithm that computes a full set of good solutions for all numbers of non zero coefficients, with complexity O(n3), where n is the number of variables. We then use the same relaxation to derive sufficient conditions for global optimality of a solution, which can be tested in O(n3). We show on toy examples and biological data that our algorithm does provide globally optimal solutions in many cases.

References

  1. Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., & Rosenwald, A. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403, 503--511.Google ScholarGoogle ScholarCross RefCross Ref
  2. Alon, A., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., & Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Cell Biology, 96, 6745--6750.Google ScholarGoogle Scholar
  3. Burer, S., & Monteiro, R. (2003). A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Mathematical Programming, 95, 329--357.Google ScholarGoogle ScholarCross RefCross Ref
  4. Cadima, J., & Jolliffe, I. T. (1995). Loadings and correlations in the interpretation of principal components. Journal of Applied Statistics, 22, 203--214.Google ScholarGoogle ScholarCross RefCross Ref
  5. d'Aspremont, A., El Ghaoui, L., Jordan, M., & Lanckriet, G. R. G. (2004). A direct formulation for sparse PCA using semidefinite programming. To appear in SIAM Review.Google ScholarGoogle Scholar
  6. El Ghaoui, L. (2006). On the quality of a semidefinite programming bound for sparse principal component analysis. ArXiV math.OC/0601448.Google ScholarGoogle Scholar
  7. Jolliffe, I. T. (1995). Rotation of principal components: choice of normalization constraints. Journal of Applied Statistics, 22, 29--35.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jolliffe, I. T., Trendafilov, N., & Uddin, M. (2003). A modified principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12, 531--547.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kolda, T. G., & O'Leary, D. P. (2000). Algorithm 805: computation and uses of the semidiscrete matrix decomposition. ACM Transactions on Mathematical Software, 26, 415--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Moghaddam, B., Weiss, Y., & Avidan, S. (2006a). Generalized spectral bounds for sparse LDA. Proc. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Moghaddam, B., Weiss, Y., & Avidan, S. (2006b). Spectral bounds for sparse PCA: Exact and greedy algorithms. Advances in Neural Information Processing Systems, 18.Google ScholarGoogle Scholar
  12. Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM J. Comput., 24, 227--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Su, Y., Murali, T. M., Pavlovic, V., Schaffer, M., & Kasif, S. (2003). Rankgene: identification of diagnostic genes based on expression data. Bioinformatics, 19, 1578--1579.Google ScholarGoogle ScholarCross RefCross Ref
  14. Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal statistical society, series B, 58, 267--288.Google ScholarGoogle ScholarCross RefCross Ref
  15. Toh, K. C., Todd, M. J., & Tutuncu, R. H. (1999). SDPT3 - a MATLAB software package for semidefinite programming. Optimization Methods and Software, 11, 545--581.Google ScholarGoogle ScholarCross RefCross Ref
  16. Zou, H., Hastie, T., & Tibshirani, R. (2004). Sparse principal component analysis. Journal of Computational and Graphical Statistics.Google ScholarGoogle Scholar
  1. Full regularization path for sparse principal component analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICML '07: Proceedings of the 24th international conference on Machine learning
      June 2007
      1233 pages
      ISBN:9781595937933
      DOI:10.1145/1273496

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate140of548submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader