Skip to main content

2015 | OriginalPaper | Buchkapitel

Robust Principal Component Analysis of Data with Missing Values

verfasst von : Tommi Kärkkäinen, Mirka Saarela

Erschienen in: Machine Learning and Data Mining in Pattern Recognition

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Principal component analysis is one of the most popular machine learning and data mining techniques. Having its origins in statistics, principal component analysis is used in numerous applications. However, there seems to be not much systematic testing and assessment of principal component analysis for cases with erroneous and incomplete data. The purpose of this article is to propose multiple robust approaches for carrying out principal component analysis and, especially, to estimate the relative importances of the principal components to explain the data variability. Computational experiments are first focused on carefully designed simulated tests where the ground truth is known and can be used to assess the accuracy of the results of the different methods. In addition, a practical application and evaluation of the methods for an educational data set is given.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alpaydin, E.: Introduction to Machine Learning, 2nd edn. The MIT Press, Cambridge, MA, USA (2010)MATH Alpaydin, E.: Introduction to Machine Learning, 2nd edn. The MIT Press, Cambridge, MA, USA (2010)MATH
2.
Zurück zum Zitat Äyrämö, S.: Knowledge Mining Using Robust Clustering: volume 63 of Jyväskylä Studies in Computing. University of Jyväskylä, Jyväskylä (2006) Äyrämö, S.: Knowledge Mining Using Robust Clustering: volume 63 of Jyväskylä Studies in Computing. University of Jyväskylä, Jyväskylä (2006)
3.
Zurück zum Zitat Bednar, J., Watt, T.: Alpha-trimmed means and their relationship to median filters. IEEE Trans. Acoust. Speech Sig. Process. 32(1), 145–153 (1984)CrossRef Bednar, J., Watt, T.: Alpha-trimmed means and their relationship to median filters. IEEE Trans. Acoust. Speech Sig. Process. 32(1), 145–153 (1984)CrossRef
4.
Zurück zum Zitat Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995) Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
5.
Zurück zum Zitat Croux, C., Ollila, E., Oja, H.: Sign and rank covariance matrices: statistical properties and application to principal components analysis. In: Dodge, Y. (ed.) Statistical data analysis based on the L1-norm and related methods, pp. 257–269. Springer, Basel (2002)CrossRef Croux, C., Ollila, E., Oja, H.: Sign and rank covariance matrices: statistical properties and application to principal components analysis. In: Dodge, Y. (ed.) Statistical data analysis based on the L1-norm and related methods, pp. 257–269. Springer, Basel (2002)CrossRef
6.
Zurück zum Zitat d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)MATHMathSciNet d’Aspremont, A., Bach, F., Ghaoui, L.E.: Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9, 1269–1294 (2008)MATHMathSciNet
7.
Zurück zum Zitat Gervini, D.: Robust functional estimation using the median and spherical principal components. Biometrika 95(3), 587–600 (2008)MATHMathSciNetCrossRef Gervini, D.: Robust functional estimation using the median and spherical principal components. Biometrika 95(3), 587–600 (2008)MATHMathSciNetCrossRef
8.
Zurück zum Zitat Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore, MD, USA (1996)MATH Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore, MD, USA (1996)MATH
9.
Zurück zum Zitat Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust statistics: the approach based on influence functions, vol. 114. Wiley, New York (2011) Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust statistics: the approach based on influence functions, vol. 114. Wiley, New York (2011)
10.
Zurück zum Zitat Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2011) Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2011)
11.
Zurück zum Zitat Hettmansperger, T.P., McKean, J.W.: Robust Nonparametric Statistical Methods. Edward Arnold, London (1998)MATH Hettmansperger, T.P., McKean, J.W.: Robust Nonparametric Statistical Methods. Edward Arnold, London (1998)MATH
12.
Zurück zum Zitat Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)CrossRef Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)CrossRef
14.
Zurück zum Zitat Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11, 1957–2000 (2010)MATHMathSciNet Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11, 1957–2000 (2010)MATHMathSciNet
15.
Zurück zum Zitat Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2005)CrossRef Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2005)CrossRef
16.
Zurück zum Zitat Kärkkäinen, T., Heikkola, E.: Robust formulations for training multilayer perceptrons. Neural Comput. 16, 837–862 (2004)MATHCrossRef Kärkkäinen, T., Heikkola, E.: Robust formulations for training multilayer perceptrons. Neural Comput. 16, 837–862 (2004)MATHCrossRef
17.
Zurück zum Zitat Kärkkäinen, T., Toivanen, J.: Building blocks for odd-even multigrid with applications to reduced systems. J. Comput. Appl. Math. 131, 15–33 (2001)MATHMathSciNetCrossRef Kärkkäinen, T., Toivanen, J.: Building blocks for odd-even multigrid with applications to reduced systems. J. Comput. Appl. Math. 131, 15–33 (2001)MATHMathSciNetCrossRef
18.
Zurück zum Zitat Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 4. Wiley, New York (1987)MATH Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 4. Wiley, New York (1987)MATH
19.
Zurück zum Zitat Locantore, N., Marron, J.S., Simpson, D.G., Tripoli, N., Zhang, J.T., Cohen, K.L., Boente, G., Fraiman, R., Brumback, B., Croux, C., et al.: Robust principal component analysis for functional data. Test 8(1), 1–73 (1999)MATHMathSciNetCrossRef Locantore, N., Marron, J.S., Simpson, D.G., Tripoli, N., Zhang, J.T., Cohen, K.L., Boente, G., Fraiman, R., Brumback, B., Croux, C., et al.: Robust principal component analysis for functional data. Test 8(1), 1–73 (1999)MATHMathSciNetCrossRef
21.
Zurück zum Zitat OECD: PISA Data Analysis Manual: SPSS and SAS, 2nd edn. OECD Publishing, Paris (2009) OECD: PISA Data Analysis Manual: SPSS and SAS, 2nd edn. OECD Publishing, Paris (2009)
22.
Zurück zum Zitat OECD: PISA: Results: Ready to Learn - Students’ Engagement, Drive and Self-Beliefs. OECD Publishing, Paris (2013) OECD: PISA: Results: Ready to Learn - Students’ Engagement, Drive and Self-Beliefs. OECD Publishing, Paris (2013)
23.
Zurück zum Zitat Ringberg, H., Soule, A., Rexford, J., Diot, C.: Sensitivity of PCA for traffic anomaly detection. In: ACM SIGMETRICS Performance Evaluation Review, vol. 35, pp. 109–120. ACM (2007) Ringberg, H., Soule, A., Rexford, J., Diot, C.: Sensitivity of PCA for traffic anomaly detection. In: ACM SIGMETRICS Performance Evaluation Review, vol. 35, pp. 109–120. ACM (2007)
24.
Zurück zum Zitat Saarela, M., Kärkkäinen,T.: Discovering gender-specific knowledge from Finnish basic education using PISA scale indices. In: Proceedings of the 7th International Conference on Educational Data Mining, pp. 60–68 (2014) Saarela, M., Kärkkäinen,T.: Discovering gender-specific knowledge from Finnish basic education using PISA scale indices. In: Proceedings of the 7th International Conference on Educational Data Mining, pp. 60–68 (2014)
25.
Zurück zum Zitat Saarela, M., Kärkkäinen, T.: Analysing student performance using sparse data of core bachelor courses. JEDM-J. Educ. Data Min. 7(1), 3–32 (2015) Saarela, M., Kärkkäinen, T.: Analysing student performance using sparse data of core bachelor courses. JEDM-J. Educ. Data Min. 7(1), 3–32 (2015)
27.
Zurück zum Zitat Van Ginkel, J.R.. Kroonenberg, P.M., Kiers, H.A.: Missing data in principal component analysis of questionnaire data: a comparison of methods. J. Stat. Comput. Simul. 1–18 (2013) (ahead-of-print) Van Ginkel, J.R.. Kroonenberg, P.M., Kiers, H.A.: Missing data in principal component analysis of questionnaire data: a comparison of methods. J. Stat. Comput. Simul. 1–18 (2013) (ahead-of-print)
28.
Metadaten
Titel
Robust Principal Component Analysis of Data with Missing Values
verfasst von
Tommi Kärkkäinen
Mirka Saarela
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-21024-7_10