Skip to main content

2018 | OriginalPaper | Buchkapitel

Principal Component Analysis for Exponential Family Data

verfasst von : Meng Lu, Kai He, Jianhua Z. Huang, Xiaoning Qian

Erschienen in: Advances in Principal Component Analysis

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter reviews exponential family principal component analysis (ePCA), a family of statistical methods for dimension reduction of large-scale data that are not real-valued, such as user ratings for items in e-commerce, categorical/count genetic data in bioinformatics, and digital images in computer vision. The ePCA framework extends the applications of traditional PCA to modern data containing various data types. A sparse version of ePCA further helps overcome the model inconsistency and improve interpretability when applied to high-dimensional data. Model formulations and solution strategies of ePCA and sparse ePCA are discussed with real-world applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)MathSciNetCrossRefMATH Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)MathSciNetCrossRefMATH
2.
Zurück zum Zitat Bair, E., Tibshirani, R.: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2(4), e108 (2004)CrossRef Bair, E., Tibshirani, R.: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2(4), e108 (2004)CrossRef
3.
Zurück zum Zitat Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization. Springer (2000) Borwein, J., Lewis, A.: Convex Analysis and Nonlinear Optimization. Springer (2000)
4.
Zurück zum Zitat Chen, X., Wang, L., Hu, B., Guo, M., Barnard, J., Zhu, X.: Pathway-based analysis for genome-wide association studies using supervised principal components. Genet. Epidemiol. 34, 716–724 (2010)CrossRef Chen, X., Wang, L., Hu, B., Guo, M., Barnard, J., Zhu, X.: Pathway-based analysis for genome-wide association studies using supervised principal components. Genet. Epidemiol. 34, 716–724 (2010)CrossRef
5.
Zurück zum Zitat Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family. Adv. Neural Inf. Process. Syst. 14, 617–642 (2002) Collins, M., Dasgupta, S., Schapire, R.E.: A generalization of principal component analysis to the exponential family. Adv. Neural Inf. Process. Syst. 14, 617–642 (2002)
6.
Zurück zum Zitat David, W., Srikantan, N.: Iterative reweighted l1 and l2 methods for finding sparse solutions. IEEE J. Sel. Top. Sig. Process. 4(2), 317–329 (2010)CrossRef David, W., Srikantan, N.: Iterative reweighted l1 and l2 methods for finding sparse solutions. IEEE J. Sel. Top. Sig. Process. 4(2), 317–329 (2010)CrossRef
7.
Zurück zum Zitat Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)MathSciNetCrossRefMATH Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)MathSciNetCrossRefMATH
8.
Zurück zum Zitat Fan, K.: On a theorem of weyl concerning eigenvalues of linear transformations: II. Proc. Natl. Acad. Sci. U. S. A. 35(11), 652–655 (1949)CrossRef Fan, K.: On a theorem of weyl concerning eigenvalues of linear transformations: II. Proc. Natl. Acad. Sci. U. S. A. 35(11), 652–655 (1949)CrossRef
9.
Zurück zum Zitat Georghiades, A.S., Belhumeur, P.N.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 643–660 (2001)CrossRef Georghiades, A.S., Belhumeur, P.N.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 643–660 (2001)CrossRef
10.
Zurück zum Zitat Guo, Y., Schuurmans, D.: Efficient global optimization for exponential family PCA and low-rank matrix factorization. In: Proceedings of the 46th Annual Allerton Conference on Communication, Control, and Computing, pp. 1100–1107 (2008) Guo, Y., Schuurmans, D.: Efficient global optimization for exponential family PCA and low-rank matrix factorization. In: Proceedings of the 46th Annual Allerton Conference on Communication, Control, and Computing, pp. 1100–1107 (2008)
12.
Zurück zum Zitat Jaakkola, T., Jordan, M.I.: Bayesian parameter estimation via variational methods. Stat. Comput. 10, 25–37 (2000)CrossRef Jaakkola, T., Jordan, M.I.: Bayesian parameter estimation via variational methods. Stat. Comput. 10, 25–37 (2000)CrossRef
13.
Zurück zum Zitat Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 700 (2009)MathSciNetCrossRefMATH Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 700 (2009)MathSciNetCrossRefMATH
14.
Zurück zum Zitat Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)MATH Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)MATH
15.
Zurück zum Zitat Landgraf, A.J., Lee, Y.: Dimensionality reduction for binary data through the projection of natural parameters. Technical Report No. 890, Department of Statistics, The Ohio State University (2015) Landgraf, A.J., Lee, Y.: Dimensionality reduction for binary data through the projection of natural parameters. Technical Report No. 890, Department of Statistics, The Ohio State University (2015)
16.
Zurück zum Zitat Landgraf, A.J., Lee, Y.: Generalized principal component analysis: projection of saturated model parameters. Technical Report No. 892, Department of Statistics, The Ohio State University (2015) Landgraf, A.J., Lee, Y.: Generalized principal component analysis: projection of saturated model parameters. Technical Report No. 892, Department of Statistics, The Ohio State University (2015)
17.
Zurück zum Zitat Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graphical Stat. 9, 1–20 (2000) Lange, K., Hunter, D.R., Yang, I.: Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graphical Stat. 9, 1–20 (2000)
18.
Zurück zum Zitat Lee, S., Huang, J.Z.: A coordinate descent MM algorithm for fast computation of sparse logistic PCA. J. Comput. Stat. Data Anal. 62, 26–38 (2013)MathSciNetCrossRefMATH Lee, S., Huang, J.Z.: A coordinate descent MM algorithm for fast computation of sparse logistic PCA. J. Comput. Stat. Data Anal. 62, 26–38 (2013)MathSciNetCrossRefMATH
19.
Zurück zum Zitat Lee, S., Huang, J.Z., Hu, J.: Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4(3), 1579–1601 (2010)MathSciNetCrossRefMATH Lee, S., Huang, J.Z., Hu, J.: Sparse logistic principal components analysis for binary data. Ann. Appl. Stat. 4(3), 1579–1601 (2010)MathSciNetCrossRefMATH
20.
Zurück zum Zitat Leeuw, J.D.: Principal component analysis of binary data by iterated singular value decomposition. J. Comput. Stat. Data Anal. 50(1), 21–39 (2006)MathSciNetCrossRefMATH Leeuw, J.D.: Principal component analysis of binary data by iterated singular value decomposition. J. Comput. Stat. Data Anal. 50(1), 21–39 (2006)MathSciNetCrossRefMATH
21.
Zurück zum Zitat Lu, M., Huang, J.Z., Qian, X.: Supervised logistic principal component analysis for pathway based genome-wide association studies. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB), pp. 52–59 (2012) Lu, M., Huang, J.Z., Qian, X.: Supervised logistic principal component analysis for pathway based genome-wide association studies. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM BCB), pp. 52–59 (2012)
22.
Zurück zum Zitat Lu, M., Huang, J.Z., Qian, X.: Sparse exponential family principal component analysis. Pattern Recogn. 60, 681–691 (2016)CrossRef Lu, M., Huang, J.Z., Qian, X.: Sparse exponential family principal component analysis. Pattern Recogn. 60, 681–691 (2016)CrossRef
23.
Zurück zum Zitat Lu, M., Lee, H.S., Hadley, D., Huang, J.Z., Qian, X.: Logistic principal component analysis for rare variants in gene-environment interaction analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(6), 1020–1028 (2014)CrossRef Lu, M., Lee, H.S., Hadley, D., Huang, J.Z., Qian, X.: Logistic principal component analysis for rare variants in gene-environment interaction analysis. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(6), 1020–1028 (2014)CrossRef
24.
Zurück zum Zitat Lu, M., Lee, H.S., Hadley, D., Huang, J.Z., Qian, X.: Supervised categorical principal component analysis for genome-wide association analyses. BMC Genomics 15, (S10) (2014) Lu, M., Lee, H.S., Hadley, D., Huang, J.Z., Qian, X.: Supervised categorical principal component analysis for genome-wide association analyses. BMC Genomics 15, (S10) (2014)
25.
Zurück zum Zitat Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press, New York (1979)MATH Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press, New York (1979)MATH
26.
Zurück zum Zitat McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd ed. CRC (1990) McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd ed. CRC (1990)
27.
Zurück zum Zitat Nadler, B.: Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann. Stat. 36(6), 2791–2817 (2008)MathSciNetCrossRefMATH Nadler, B.: Finite sample approximation results for principal component analysis: a matrix perturbation approach. Ann. Stat. 36(6), 2791–2817 (2008)MathSciNetCrossRefMATH
28.
Zurück zum Zitat Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sinica 17(4), 1617 (2007)MathSciNetMATH Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sinica 17(4), 1617 (2007)MathSciNetMATH
29.
Zurück zum Zitat Pearson, K.: On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Phylos. Mag. J. Sci. Sixth Ser. 2, 559–572 (1901)MATH Pearson, K.: On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Phylos. Mag. J. Sci. Sixth Ser. 2, 559–572 (1901)MATH
30.
Zurück zum Zitat Rockafellar, R.: Convex Analysis. Princeton University Press (1970) Rockafellar, R.: Convex Analysis. Princeton University Press (1970)
31.
Zurück zum Zitat She, Y.: Thresholding-based iterative selection procedures for model selection and shrinkage. Electron. J. Stat. 3, 384–415 (2009)MathSciNetCrossRefMATH She, Y.: Thresholding-based iterative selection procedures for model selection and shrinkage. Electron. J. Stat. 3, 384–415 (2009)MathSciNetCrossRefMATH
32.
Zurück zum Zitat She, Y., Li, S., Wu, D.: Robust orthogonal complement principal component analysis. J. Am. Stat. Assoc. 111(514), 763–771 (2016)MathSciNetCrossRef She, Y., Li, S., Wu, D.: Robust orthogonal complement principal component analysis. J. Am. Stat. Assoc. 111(514), 763–771 (2016)MathSciNetCrossRef
33.
34.
Zurück zum Zitat Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 101, 1015–1034 (2008)MathSciNetCrossRefMATH Shen, H., Huang, J.Z.: Sparse principal component analysis via regularized low rank matrix approximation. J. Multivar. Anal. 101, 1015–1034 (2008)MathSciNetCrossRefMATH
35.
36.
Zurück zum Zitat Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008)CrossRefMATH Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1–305 (2008)CrossRefMATH
37.
Zurück zum Zitat Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013) Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)
38.
Zurück zum Zitat Zhang, Q., She, Y.: Sparse generalized principal component analysis for large-scale applications beyond gaussianity. arXiv:1512.03883 (2016) Zhang, Q., She, Y.: Sparse generalized principal component analysis for large-scale applications beyond gaussianity. arXiv:​1512.​03883 (2016)
Metadaten
Titel
Principal Component Analysis for Exponential Family Data
verfasst von
Meng Lu
Kai He
Jianhua Z. Huang
Xiaoning Qian
Copyright-Jahr
2018
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-6704-4_8

Neuer Inhalt