Skip to main content

2019 | OriginalPaper | Buchkapitel

Determining Principal Component Cardinality Through the Principle of Minimum Description Length

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

PCA (Principal Component Analysis) and its variants are ubiquitous techniques for matrix dimension reduction and reduced-dimension latent-factor extraction. One significant challenge in using PCA, is the choice of the number of principal components. The information-theoretic MDL (Minimum Description Length) principle gives objective compression-based criteria for model selection, but it is difficult to analytically apply its modern definition - NML (Normalized Maximum Likelihood) - to the problem of PCA. This work shows a general reduction of NML problems to lower-dimension problems. Applying this reduction, it bounds the NML of PCA, by terms of the NML of linear regression, which are known.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Barron, A.R., Rissanen, J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Trans. Inf. Theor 44(6), 2743–2760 (1998)MathSciNetCrossRef Barron, A.R., Rissanen, J., Yu, B.: The minimum description length principle in coding and modeling. IEEE Trans. Inf. Theor 44(6), 2743–2760 (1998)MathSciNetCrossRef
2.
Zurück zum Zitat Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Occam’s razor. Inf. Process. Lett. 24(6), 377–380 (1987)MathSciNetCrossRef Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Occam’s razor. Inf. Process. Lett. 24(6), 377–380 (1987)MathSciNetCrossRef
3.
Zurück zum Zitat Bokde, D., Girase, S., Mukhopadhyay, D.: Matrix factorization model in collaborative filtering algorithms: a survey. Procedia Comput. Sci. 49, 136–146 (2015). Proceedings of 4th International Conference on Advances in Computing, Communication and Control (ICAC3 2015) Bokde, D., Girase, S., Mukhopadhyay, D.: Matrix factorization model in collaborative filtering algorithms: a survey. Procedia Comput. Sci. 49, 136–146 (2015). Proceedings of 4th International Conference on Advances in Computing, Communication and Control (ICAC3 2015)
4.
Zurück zum Zitat Choi, Y., Taylor, J., Tibshirani, R.: Selecting the number of principal components: estimation of the true rank of a noisy matrix. Ann. Statist. 45(6), 2590–2617 (2017)MathSciNetCrossRef Choi, Y., Taylor, J., Tibshirani, R.: Selecting the number of principal components: estimation of the true rank of a noisy matrix. Ann. Statist. 45(6), 2590–2617 (2017)MathSciNetCrossRef
5.
Zurück zum Zitat Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2006)MATH Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2006)MATH
6.
Zurück zum Zitat Laura, D.: Directed factor graph notation for generative models. Tech. rep. Saarbrücken, Germany (2010) Laura, D.: Directed factor graph notation for generative models. Tech. rep. Saarbrücken, Germany (2010)
7.
Zurück zum Zitat Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211–218 (1936)CrossRef Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika 1(3), 211–218 (1936)CrossRef
8.
Zurück zum Zitat Grünwald, P.: A tutorial introduction to the minimum description length principle. In: Advances in Minimum Description Length: Theory and Applications. MIT Press, Cambridge (2005) Grünwald, P.: A tutorial introduction to the minimum description length principle. In: Advances in Minimum Description Length: Theory and Applications. MIT Press, Cambridge (2005)
9.
Zurück zum Zitat Hansen, M.H., Yu, B.: Model selection and the principle of minimum description length. J. Am. Stat. Assoc. 96(454), 746–774 (2001)MathSciNetCrossRef Hansen, M.H., Yu, B.: Model selection and the principle of minimum description length. J. Am. Stat. Assoc. 96(454), 746–774 (2001)MathSciNetCrossRef
10.
Zurück zum Zitat Hansen, M.H., Yu, B.: Minimum description length model selection criteria for generalized linear models, vol. 40. Lecture Notes-Monograph Series, pp. 145–163. Institute of Mathematical Statistics, Beachwood (2003) Hansen, M.H., Yu, B.: Minimum description length model selection criteria for generalized linear models, vol. 40. Lecture Notes-Monograph Series, pp. 145–163. Institute of Mathematical Statistics, Beachwood (2003)
12.
Zurück zum Zitat Hoyle, D.C.: Automatic PCA dimension selection for high dimensional data and small sample sizes. JMLR 9, 733–2759 (2008)MATH Hoyle, D.C.: Automatic PCA dimension selection for high dimensional data and small sample sizes. JMLR 9, 733–2759 (2008)MATH
13.
Zurück zum Zitat Donald, A.J.: Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 6(74), 2204–2214 (1993)MathSciNet Donald, A.J.: Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology 6(74), 2204–2214 (1993)MathSciNet
15.
Zurück zum Zitat Josse, J., Husson, F.: Selecting the number of components in principal component analysis using cross-validation approximations. Comput. Stat. Data Anal. 56(6), 1869–1879 (2012)MathSciNetCrossRef Josse, J., Husson, F.: Selecting the number of components in principal component analysis using cross-validation approximations. Comput. Stat. Data Anal. 56(6), 1869–1879 (2012)MathSciNetCrossRef
16.
Zurück zum Zitat Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)CrossRef Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)CrossRef
17.
Zurück zum Zitat Myung, J.I., Navarro, D.J., Pitt, M.A.: Model selection by normalized maximum likelihood. J. Math. Psychol. 50(2), 175–191 (2006)MathSciNetCrossRef Myung, J.I., Navarro, D.J., Pitt, M.A.: Model selection by normalized maximum likelihood. J. Math. Psychol. 50(2), 175–191 (2006)MathSciNetCrossRef
18.
Zurück zum Zitat Petersen, K.B., Pedersen, M.S.: The Matrix Cookbook, November 2012. Version 20121115 (2012) Petersen, K.B., Pedersen, M.S.: The Matrix Cookbook, November 2012. Version 20121115 (2012)
19.
Zurück zum Zitat Dawid, A.P., Vovk, V.G.: Prequential probability: principles and properties. Bernoulli 5(1), 125–162 (1999)MathSciNetCrossRef Dawid, A.P., Vovk, V.G.: Prequential probability: principles and properties. Bernoulli 5(1), 125–162 (1999)MathSciNetCrossRef
20.
Zurück zum Zitat Rissanen, J.: Stochastic complexity in statistical inquiry. Adv. Ser. Appl. Phys. 178, 409–412 (1989)MATH Rissanen, J.: Stochastic complexity in statistical inquiry. Adv. Ser. Appl. Phys. 178, 409–412 (1989)MATH
23.
Zurück zum Zitat Rissanen, J.: Strong optimality of the normalized ml models as universal codes. IEEE Trans. Inf. Theory 47, 1712–1717 (2000)CrossRef Rissanen, J.: Strong optimality of the normalized ml models as universal codes. IEEE Trans. Inf. Theory 47, 1712–1717 (2000)CrossRef
24.
Zurück zum Zitat Tavory, A.: Determining principal component cardinality through the principle of minimum description length. arXiv (2019) Tavory, A.: Determining principal component cardinality through the principle of minimum description length. arXiv (2019)
25.
Zurück zum Zitat Udell, M., Horn, C., Zadeh, R., Boyd, S.: Generalized low rank models. Foundations and Trends® in Machine Learning, 9(1), 1–118 (2016) Udell, M., Horn, C., Zadeh, R., Boyd, S.: Generalized low rank models. Foundations and Trends® in Machine Learning, 9(1), 1–118 (2016)
26.
Zurück zum Zitat Zhu, M., Ghodsi, A.: Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput. Stat. Data Anal. 51, 918–930 (2006)MathSciNetCrossRef Zhu, M., Ghodsi, A.: Automatic dimensionality selection from the scree plot via the use of profile likelihood. Comput. Stat. Data Anal. 51, 918–930 (2006)MathSciNetCrossRef
Metadaten
Titel
Determining Principal Component Cardinality Through the Principle of Minimum Description Length
verfasst von
Ami Tavory
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-37599-7_54