Abstract
In this paper some statistical properties of Interval Imputation are derived in the context of Principal Component Analysis. Interval Imputation is a recent proposal for the treatment of missing values, consisting of replacing blanks with intervals and then analyzing the resulting data matrix using Symbolic Data Analysis techniques. The most noticeable virtue of this method is that it does not require a single-valued imputation, so it allows us to take into account that incomplete observations are affected by a degree of uncertainty. Illustrative examples and simulation studies are carried out in order to illustrate the functioning of the technique.
Similar content being viewed by others
References
Allison, P.D.: Missing Data. Sage, Thousand Oaks (2002)
Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis. J. Am. Stat. Assoc. 98, 470–487 (2003)
Bock, H.-H., Diday, E. (eds.): Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, Berlin (2000)
Cazes, P., Chouakria, A., Diday, E., Schektman, Y.: Extension de l’analyse en composantes principales à des données de type intervalle. Rev. Stat. Appl. 45(3), 5–24 (1997)
Chouakria, A.: Extension des methodes d’analyse factorielle a des donnes de type intervalle. PhD Thesis, Université Paris Dauphine (1998)
Chouakria, A., Diday, E., Cazes, P.: An improved factorial representation of symbolic objects. In: Knowledge Extraction from Statistical Data, pp. 301–305. European Commission Eurostat, Luxembourg (1999)
Diday, E.: Introduction l’approche symbolique en Analyse des Donnés. Première Journées Symbolique-Numérique, Université de Paris IX Dauphine (1987)
Diday, E.: An introduction to Symbolic data analysis and its application to the SODAS project. Cahiers du CEREMADE, 9914 (1999)
Diday, E., Noirhomme-Fraiture, M.: Symbolic Data Analysis and the SODAS Software. Wiley, New York (2008)
Dong, W., Shah, H.C.: Vertex method for computing functions of fuzzy variables. Fuzzy Sets Syst. 24, 65–78 (1987)
D’Urso, P., Giordani, P.: A least squares approach to principal component analysis for interval valued data. Chemom. Intell. Lab. Syst. 70, 179–192 (2004)
Geladi, P., Kowanski, B.: Partial least squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986)
Gioia, F., Lauro, C.N.: Principal component analysis of interval data. Comput. Stat. 21, 343–363 (2006)
Ghaharamani, Z., Jordan, M.I.: Supervised learning from incomplete data via an EM approach. In: Cowan, J.D., Tesuaro, G., Alspector, J. (eds.) Advanced in Neural Information Processing Systems, vol. 6 (1994)
Godwa, K.C., Diday, E., Nagabhushan, P.: Dimensionality reduction of symbolic data. Pattern Recognit. Lett. 16, 219–223 (1995)
Grung, B., Manne, R.: Missing values in principal component analysis. Chemom. Intell. Lab. Syst. 42, 125–139 (1998)
Guttman, L.: Some necessary conditions for common factor analysis. Psychometrika 19, 149–161 (1954)
Haitovsky, Y.: Missing data in regression analysis. J. R. Stat. Soc. B 30, 67–82 (1968)
Lauro, N.C., Palumbo, F.: Principal component analysis with interval data: a symbolic data analysis approach. Comput. Stat. 15(1), 73–87 (2000)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987)
Jackson, J.E.: A User’s Guide to Principal Components. Wiley, New York (1991)
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (2002)
Kaiser, H.F.: The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20, 141–151 (1960)
Kalton, G., Kasprzyk, D.: Imputing for missing survey responses. In: Proceedings of the Section on Survey Research Methods, pp. 22–31. American Statistical Association, Alexandria (1982)
Myunghee, C.P., Cuiling, W.: Handling missing data by deleting completely observed records. J. Stat. Plan. Inference (2008). doi:10.1016/j.jspi.2008.10.024
Nelson, P.R.C., Taylor, P.A., MacGregor, J.F.: Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemom. Intell. Lab. Syst. 35, 45–65 (1996)
Palumbo, F., Lauro, C.N.: A PCA for interval valued data based on midpoints and radii. In: Yanai, H., Okada, A., Shigemasu, K., Kano, Y., Meulman, J. (eds.) New Development in Psychometric. Springer, Berlin (2003)
Peres-Neto, P.R., Jackson, D.A., Somers, K.M.: How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Comput. Stat. Data Anal. 49(4), 974–997 (2005)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Roweis, S.: EM algorithms for PCA and SPCA. Adv. Neural Inf. Process. Syst. 10, 626–632 (1998)
Rubin, D.B.: The design of a general and flexible system for handling non-response in sample surveys. Manuscript (1977)
Rubin, D.B.: Multiple imputation in sample surveys—a phenomenological Bayesian approach to nonresponse. In: Proceedings of the Survey Research Method section of the American Statistical Association, pp. 20–34 (1978)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Schneider, T.: Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J. Climate 14, 853–871 (2001)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. B 61, 611–622 (1999)
Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, C.G.M., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. 76(12), 1049–1064 (2006)
Wiberg, T.: Computation of principal components when data are missing. In: Gordesch, J., Naeve, P. (eds.) Compstat 1976, pp. 229–236. Physica, Wien (1976)
Wold, H.: Nonlinear estimation by iterative least squares procedures. In: David, F. (ed.) Research Paper in Statistics, pp. 411–444. Wiley, New York (1966)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37–52 (1987)
Zuccolotto, P.: A symbolic data approach for missing values treatment in principal component analysis. Stat. Appl. 6, 153–180 (2008)
Zuccolotto, P.: Missing values treatment with interval imputation in satisfaction measurement. Quad. Stat. 11, 145–163 (2009)
Zuccolotto, P.: Symbolic missing data imputation in principal component analysis. Stat. Anal. Data Min. 4, 171–183 (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zuccolotto, P. Principal component analysis with interval imputed missing values. AStA Adv Stat Anal 96, 1–23 (2012). https://doi.org/10.1007/s10182-011-0164-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-011-0164-3