Skip to main content
Log in

Principal component analysis with interval imputed missing values

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

In this paper some statistical properties of Interval Imputation are derived in the context of Principal Component Analysis. Interval Imputation is a recent proposal for the treatment of missing values, consisting of replacing blanks with intervals and then analyzing the resulting data matrix using Symbolic Data Analysis techniques. The most noticeable virtue of this method is that it does not require a single-valued imputation, so it allows us to take into account that incomplete observations are affected by a degree of uncertainty. Illustrative examples and simulation studies are carried out in order to illustrate the functioning of the technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allison, P.D.: Missing Data. Sage, Thousand Oaks (2002)

    MATH  Google Scholar 

  • Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis. J. Am. Stat. Assoc. 98, 470–487 (2003)

    Article  MathSciNet  Google Scholar 

  • Bock, H.-H., Diday, E. (eds.): Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, Berlin (2000)

    Google Scholar 

  • Cazes, P., Chouakria, A., Diday, E., Schektman, Y.: Extension de l’analyse en composantes principales à des données de type intervalle. Rev. Stat. Appl. 45(3), 5–24 (1997)

    Google Scholar 

  • Chouakria, A.: Extension des methodes d’analyse factorielle a des donnes de type intervalle. PhD Thesis, Université Paris Dauphine (1998)

  • Chouakria, A., Diday, E., Cazes, P.: An improved factorial representation of symbolic objects. In: Knowledge Extraction from Statistical Data, pp. 301–305. European Commission Eurostat, Luxembourg (1999)

    Google Scholar 

  • Diday, E.: Introduction l’approche symbolique en Analyse des Donnés. Première Journées Symbolique-Numérique, Université de Paris IX Dauphine (1987)

  • Diday, E.: An introduction to Symbolic data analysis and its application to the SODAS project. Cahiers du CEREMADE, 9914 (1999)

  • Diday, E., Noirhomme-Fraiture, M.: Symbolic Data Analysis and the SODAS Software. Wiley, New York (2008)

    MATH  Google Scholar 

  • Dong, W., Shah, H.C.: Vertex method for computing functions of fuzzy variables. Fuzzy Sets Syst. 24, 65–78 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  • D’Urso, P., Giordani, P.: A least squares approach to principal component analysis for interval valued data. Chemom. Intell. Lab. Syst. 70, 179–192 (2004)

    Article  Google Scholar 

  • Geladi, P., Kowanski, B.: Partial least squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986)

    Article  Google Scholar 

  • Gioia, F., Lauro, C.N.: Principal component analysis of interval data. Comput. Stat. 21, 343–363 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Ghaharamani, Z., Jordan, M.I.: Supervised learning from incomplete data via an EM approach. In: Cowan, J.D., Tesuaro, G., Alspector, J. (eds.) Advanced in Neural Information Processing Systems, vol. 6 (1994)

    Google Scholar 

  • Godwa, K.C., Diday, E., Nagabhushan, P.: Dimensionality reduction of symbolic data. Pattern Recognit. Lett. 16, 219–223 (1995)

    Article  Google Scholar 

  • Grung, B., Manne, R.: Missing values in principal component analysis. Chemom. Intell. Lab. Syst. 42, 125–139 (1998)

    Article  Google Scholar 

  • Guttman, L.: Some necessary conditions for common factor analysis. Psychometrika 19, 149–161 (1954)

    Article  MATH  MathSciNet  Google Scholar 

  • Haitovsky, Y.: Missing data in regression analysis. J. R. Stat. Soc. B 30, 67–82 (1968)

    MATH  Google Scholar 

  • Lauro, N.C., Palumbo, F.: Principal component analysis with interval data: a symbolic data analysis approach. Comput. Stat. 15(1), 73–87 (2000)

    Article  MATH  Google Scholar 

  • Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)

    MATH  Google Scholar 

  • Jackson, J.E.: A User’s Guide to Principal Components. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  • Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)

    MATH  Google Scholar 

  • Kaiser, H.F.: The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20, 141–151 (1960)

    Article  Google Scholar 

  • Kalton, G., Kasprzyk, D.: Imputing for missing survey responses. In: Proceedings of the Section on Survey Research Methods, pp. 22–31. American Statistical Association, Alexandria (1982)

    Google Scholar 

  • Myunghee, C.P., Cuiling, W.: Handling missing data by deleting completely observed records. J. Stat. Plan. Inference (2008). doi:10.1016/j.jspi.2008.10.024

    Google Scholar 

  • Nelson, P.R.C., Taylor, P.A., MacGregor, J.F.: Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemom. Intell. Lab. Syst. 35, 45–65 (1996)

    Article  Google Scholar 

  • Palumbo, F., Lauro, C.N.: A PCA for interval valued data based on midpoints and radii. In: Yanai, H., Okada, A., Shigemasu, K., Kano, Y., Meulman, J. (eds.) New Development in Psychometric. Springer, Berlin (2003)

    Google Scholar 

  • Peres-Neto, P.R., Jackson, D.A., Somers, K.M.: How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Comput. Stat. Data Anal. 49(4), 974–997 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  • Roweis, S.: EM algorithms for PCA and SPCA. Adv. Neural Inf. Process. Syst. 10, 626–632 (1998)

    Google Scholar 

  • Rubin, D.B.: The design of a general and flexible system for handling non-response in sample surveys. Manuscript (1977)

  • Rubin, D.B.: Multiple imputation in sample surveys—a phenomenological Bayesian approach to nonresponse. In: Proceedings of the Survey Research Method section of the American Statistical Association, pp. 20–34 (1978)

    Google Scholar 

  • Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)

    Book  Google Scholar 

  • Schneider, T.: Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J. Climate 14, 853–871 (2001)

    Article  Google Scholar 

  • Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. B 61, 611–622 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, C.G.M., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Wiberg, T.: Computation of principal components when data are missing. In: Gordesch, J., Naeve, P. (eds.) Compstat 1976, pp. 229–236. Physica, Wien (1976)

    Google Scholar 

  • Wold, H.: Nonlinear estimation by iterative least squares procedures. In: David, F. (ed.) Research Paper in Statistics, pp. 411–444. Wiley, New York (1966)

    Google Scholar 

  • Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37–52 (1987)

    Article  Google Scholar 

  • Zuccolotto, P.: A symbolic data approach for missing values treatment in principal component analysis. Stat. Appl. 6, 153–180 (2008)

    Google Scholar 

  • Zuccolotto, P.: Missing values treatment with interval imputation in satisfaction measurement. Quad. Stat. 11, 145–163 (2009)

    Google Scholar 

  • Zuccolotto, P.: Symbolic missing data imputation in principal component analysis. Stat. Anal. Data Min. 4, 171–183 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paola Zuccolotto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zuccolotto, P. Principal component analysis with interval imputed missing values. AStA Adv Stat Anal 96, 1–23 (2012). https://doi.org/10.1007/s10182-011-0164-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-011-0164-3

Keywords

Navigation