Skip to main content

2021 | OriginalPaper | Buchkapitel

A Simulation Study for the Identification of Missing Data Mechanisms Using Visualisation

verfasst von : Johané Nienkemper-Swanepoel, Niël Le Roux, Sugnet Gardner-Lubbe

Erschienen in: Data Analysis and Rationality in a Complex World

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Understanding the cause of the missingness in data is a science of its own and is of great importance for the application of valid and unbiased analysis techniques for missing data. The distribution of missingness is defined by certain dependencies on either observed or missing values in a data set, and therefore, requires a multivariate visualisation to attempt to identify the missing data mechanism (MDM). Multivariate categorical data sets containing missing data entries can be separated into observed and unobserved (or missing) subsets by creating an additional category level (CL) for each variable with missing responses in the indicator matrix. Subset multiple correspondence analysis (sMCA) can then be applied to the recoded indicator matrix to obtain separate biplots for the observed and missing subsets. The sMCA biplot of missing categories enables the exploration of the missing values which could expose non-response patterns. Partitioning around medoids (pam) clustering is used to determine whether sufficient clustering structures can be identified in the sMCA biplot of missing responses. A simulation study consisting of data sets with different sample sizes are generated from three distributions. Artificial missingness is created by deleting values according to MAR and MCAR MDMs with different percentages of missing values. The influence of the underlying distribution on the outcome of the clustering techniques will be presented. The insight obtained from the simulation results provides guidelines for the identification of the MDM in real data applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Buhi, E.R., Goodson, P., Neilands, T.B.: Out of sight, not out of mind: Strategies for handling missing data. Am. J. Health Behav. 32(1), 83–92 (2008)CrossRef Buhi, E.R., Goodson, P., Neilands, T.B.: Out of sight, not out of mind: Strategies for handling missing data. Am. J. Health Behav. 32(1), 83–92 (2008)CrossRef
Zurück zum Zitat Cox, T.F., Cox, M.A.A.: Multidimensional Scaling, 2nd edn. Chapman and Hall/CRC (2001) Cox, T.F., Cox, M.A.A.: Multidimensional Scaling, 2nd edn. Chapman and Hall/CRC (2001)
Zurück zum Zitat García-Laencina, P.J., Sancho-Gómez, J., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010) García-Laencina, P.J., Sancho-Gómez, J., Figueiras-Vidal, A.R.: Pattern classification with missing data: a review. Neural Comput. Appl. 19(2), 263–282 (2010)
Zurück zum Zitat Gower, J.C., Hand, D.J.: Biplots. Chapman and Hall (1996) Gower, J.C., Hand, D.J.: Biplots. Chapman and Hall (1996)
Zurück zum Zitat Gower, J., Lubbe, S., Le Roux, N.: Understanding Biplots. Wiley, New York (2011) Gower, J., Lubbe, S., Le Roux, N.: Understanding Biplots. Wiley, New York (2011)
Zurück zum Zitat Greenacre, M., Pardo, R.: Multiple correspondence analysis of subsets of response categories. In: Greenacre, M., Blasius, J. (eds.) Multiple Correspondence Analysis and Related Methods, pp. 197–217. Chapman and Hall/CRC, New York, NY (2006) Greenacre, M., Pardo, R.: Multiple correspondence analysis of subsets of response categories. In: Greenacre, M., Blasius, J. (eds.) Multiple Correspondence Analysis and Related Methods, pp. 197–217. Chapman and Hall/CRC, New York, NY (2006)
Zurück zum Zitat Greenacre, M.: Correspondence Analysis in Practice, 3rd edn. Chapman and Hall/CRC (2017) Greenacre, M.: Correspondence Analysis in Practice, 3rd edn. Chapman and Hall/CRC (2017)
Zurück zum Zitat Hendry, G., North, D., Zewotir, T., Naidoo, R.N.: The application of subset correspondence analysis to address the problem of missing data in a study on asthma severity in childhood. Stat. Med. 33(22), 3882–3893 (2014) Hendry, G., North, D., Zewotir, T., Naidoo, R.N.: The application of subset correspondence analysis to address the problem of missing data in a study on asthma severity in childhood. Stat. Med. 33(22), 3882–3893 (2014)
Zurück zum Zitat Kowarik, A., Templ, M.: Imputation with R package VIM. J. Stat. Soft. 74(7), 1–16 (2016) Kowarik, A., Templ, M.: Imputation with R package VIM. J. Stat. Soft. 74(7), 1–16 (2016)
Zurück zum Zitat Mitsuhiro, M., Yadohisa, H.: Reduced k-means clustering with MCA in a lowdimensional space. Comput. Stat. 30(2), 463–475 (2015) Mitsuhiro, M., Yadohisa, H.: Reduced k-means clustering with MCA in a lowdimensional space. Comput. Stat. 30(2), 463–475 (2015)
Zurück zum Zitat Nenadić, O., Greenacre, M.: Correspondence analysis in R, with two- and three-dimensional graphics: The ca package. J. Stat. Soft. 20(3), 1–13 (2007) Nenadić, O., Greenacre, M.: Correspondence analysis in R, with two- and three-dimensional graphics: The ca package. J. Stat. Soft. 20(3), 1–13 (2007)
Zurück zum Zitat Struyf, A., Hubert, M., Rousseeuw, P.J.: Integrating robust clustering techniques in S-plus. Comput. Stat. Data An. 26(1), 17–37 (1997) Struyf, A., Hubert, M., Rousseeuw, P.J.: Integrating robust clustering techniques in S-plus. Comput. Stat. Data An. 26(1), 17–37 (1997)
Zurück zum Zitat Templ, M., Alfons, A., Filzmoser, P.: Exploring incomplete data using visualization techniques. Adv. Data Anal. Classi. 6(1), 29–47 (2012) Templ, M., Alfons, A., Filzmoser, P.: Exploring incomplete data using visualization techniques. Adv. Data Anal. Classi. 6(1), 29–47 (2012)
Zurück zum Zitat Van Buuren, S.: Flexible Imputation of Missing Data. Chapman and Hall/CRC (2012) Van Buuren, S.: Flexible Imputation of Missing Data. Chapman and Hall/CRC (2012)
Zurück zum Zitat Van der Heijden, P.G.M., Escofier, B. Analyse de correspondances: Recherches au cœur de l’analyse des données. In: Escofier B. (ed.) Multiple Correspondence Analysis with Missing Data, pp. 152–170. Presses Universitaire de Rennes (1997) Van der Heijden, P.G.M., Escofier, B. Analyse de correspondances: Recherches au cœur de l’analyse des données. In: Escofier B. (ed.) Multiple Correspondence Analysis with Missing Data, pp. 152–170. Presses Universitaire de Rennes (1997)
Metadaten
Titel
A Simulation Study for the Identification of Missing Data Mechanisms Using Visualisation
verfasst von
Johané Nienkemper-Swanepoel
Niël Le Roux
Sugnet Gardner-Lubbe
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-60104-1_23

Premium Partner