Skip to main content

2021 | OriginalPaper | Buchkapitel

Single Imputation Via Chunk-Wise PCA

verfasst von : Alfonso Iodice D’Enza, Francesco Palumbo, Angelos Markos

Erschienen in: Data Analysis and Rationality in a Complex World

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The straightforward application of Principal Component Analysis (PCA) to incomplete data sets is not possible and practitioners often remove or ignore observations that contain at least one missing value. Three different strategies can be mainly distinguished to apply PCA on a data set with missing entries: (i) imputation of the missings prior to the application of PCA; (ii) obtain the PCA solution and ignore the missings; and (iii) obtain the PCA solution and deal explicitly with missings. Methods implementing the latter strategy have been reviewed and, among them, the iterative PCA (iPCA) approach has been shown to be preferable. This paper proposes a chunk-wise implementation of iPCA, suitable for tall data sets, that is, with many observations. In the proposed approach, each data chunk is imputed according to the insofar analyzed data. The proposed procedure is compared to the batch iPCA and to a naive implementation, which imputes each data chunk independently. In a series of experiments, we consider different data sets and missing data mechanisms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Dray, S., Josse, J.: Principal component analysis with missing values: a comparative survey of methods. Plant Ecol. 216(5), 657–667 (2015)CrossRef Dray, S., Josse, J.: Principal component analysis with missing values: a comparative survey of methods. Plant Ecol. 216(5), 657–667 (2015)CrossRef
Zurück zum Zitat Folch-Fortuny, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: new proposals and a comparative study. Chemometr. Intell. Lab. Syst. 146, 77–88 (2015)CrossRef Folch-Fortuny, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: new proposals and a comparative study. Chemometr. Intell. Lab. Syst. 146, 77–88 (2015)CrossRef
Zurück zum Zitat Geraci, M., Farcomeni, A.: Principal component analysis in the presence of missing data. In: Naik, G.R. (ed.) Advances in Principal Component Analysis, pp. 47–70. Springer (2018) Geraci, M., Farcomeni, A.: Principal component analysis in the presence of missing data. In: Naik, G.R. (ed.) Advances in Principal Component Analysis, pp. 47–70. Springer (2018)
Zurück zum Zitat Gower, J.C.: Statistical methods of comparing different multivariate analyses of the same data. In: Hodson F.R., Kendall, D. G., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 138–149. Edinburgh University Press, Edinburgh (1971) Gower, J.C.: Statistical methods of comparing different multivariate analyses of the same data. In: Hodson F.R., Kendall, D. G., Tautu, P. (eds.) Mathematics in the Archaeological and Historical Sciences, pp. 138–149. Edinburgh University Press, Edinburgh (1971)
Zurück zum Zitat Greenacre, M.J.: Biplots in practice, Fundacion BBVA (2010) Greenacre, M.J.: Biplots in practice, Fundacion BBVA (2010)
Zurück zum Zitat Hall, P., Marshall, D., Martin, R.: Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition. Image Vision Comput. 20(13–14), 1009–1016 (2002)CrossRef Hall, P., Marshall, D., Martin, R.: Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition. Image Vision Comput. 20(13–14), 1009–1016 (2002)CrossRef
Zurück zum Zitat Iodice D’Enza, A., Markos, A., Buttarazzi, D.: The idm package: incremental decomposition methods in R. J. Stat. Softw. 86(1), 1–24 (2018) Iodice D’Enza, A., Markos, A., Buttarazzi, D.: The idm package: incremental decomposition methods in R. J. Stat. Softw. 86(1), 1–24 (2018)
Zurück zum Zitat Jolliffe, I.T.: Principal Component Analysis. Springer, New York, NY (2002)MATH Jolliffe, I.T.: Principal Component Analysis. Springer, New York, NY (2002)MATH
Zurück zum Zitat Josse, J., Hussin, F.: Handling missing values in exploratory multivariate data analysis methods. J. Société Française Statistique 153(2), 79–99 (2012)MathSciNetMATH Josse, J., Hussin, F.: Handling missing values in exploratory multivariate data analysis methods. J. Société Française Statistique 153(2), 79–99 (2012)MathSciNetMATH
Zurück zum Zitat Kiers, H.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2), 251–266 (1997)MathSciNetCrossRef Kiers, H.: Weighted least squares fitting using ordinary least squares algorithms. Psychometrika 62(2), 251–266 (1997)MathSciNetCrossRef
Zurück zum Zitat Little, R., Rubin. D.: Statistical Analysis with Missing Data. Wiley (2019) Little, R., Rubin. D.: Statistical Analysis with Missing Data. Wiley (2019)
Zurück zum Zitat Loisel, S., Takane, Y.: Comparisons among several methods for handling missing data in principal component analysis (PCA). Adv. Data Anal. Classi. 13(2), 495–518 (2019)MathSciNetCrossRef Loisel, S., Takane, Y.: Comparisons among several methods for handling missing data in principal component analysis (PCA). Adv. Data Anal. Classi. 13(2), 495–518 (2019)MathSciNetCrossRef
Zurück zum Zitat Matloff, N.: Software alchemy: turning complex statistical computations into embarrassingly-parallel ones. arXiv preprint arXiv:1409.5827 (2014) Matloff, N.: Software alchemy: turning complex statistical computations into embarrassingly-parallel ones. arXiv preprint arXiv:​1409.​5827 (2014)
Zurück zum Zitat Rieth, C.A., Amsel, B.D., Tran, R., Cook, M.B.: Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. Harvard Dataverse (2017) Rieth, C.A., Amsel, B.D., Tran, R., Cook, M.B.: Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation. Harvard Dataverse (2017)
Zurück zum Zitat Schafer, J.L.: Analysis of Incomplete Multivariate Data. CRC Press (1997) Schafer, J.L.: Analysis of Incomplete Multivariate Data. CRC Press (1997)
Zurück zum Zitat Severson, K.A., Molaro, M.C., Braatz, R.D.: Principal component analysis of process datasets with missing values. Processes 5(3), 38 (2017)CrossRef Severson, K.A., Molaro, M.C., Braatz, R.D.: Principal component analysis of process datasets with missing values. Processes 5(3), 38 (2017)CrossRef
Zurück zum Zitat Van Ginkel, J.R., Kroonenberg, P.M., Kiers, H.: Missing data in principal component analysis of questionnaire data: a comparison of methods. J. Stat. Comput. Sim. 84(11), 2298–2315 (2014)MathSciNetCrossRef Van Ginkel, J.R., Kroonenberg, P.M., Kiers, H.: Missing data in principal component analysis of questionnaire data: a comparison of methods. J. Stat. Comput. Sim. 84(11), 2298–2315 (2014)MathSciNetCrossRef
Metadaten
Titel
Single Imputation Via Chunk-Wise PCA
verfasst von
Alfonso Iodice D’Enza
Francesco Palumbo
Angelos Markos
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-60104-1_9

Premium Partner