Skip to main content
Erschienen in: International Journal of Data Science and Analytics 1/2018

19.01.2018 | Regular Paper

Fast causal inference with non-random missingness by test-wise deletion

verfasst von: Eric V. Strobl, Shyam Visweswaran, Peter L. Spirtes

Erschienen in: International Journal of Data Science and Analytics | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain only a few missing values. In this report, we show that we can more efficiently utilize the observed values with test-wise deletion while still maintaining algorithmic soundness. Here, test-wise deletion refers to the process of list-wise deleting samples only among the variables required for each conditional independence (CI) test used in constraint-based searches. Test-wise deletion therefore often saves more samples than list-wise deletion for each CI test, especially when we have a sparse underlying graph. Our theoretical results show that test-wise deletion is sound under the justifiable assumption that none of the missingness mechanisms causally affect each other in the underlying causal graph. We also find that FCI and RFCI with test-wise deletion outperform their list-wise deletion and imputation counterparts on average when MNAR holds in both synthetic and real data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
We will only consider distributions which admit densities in this report.
 
2
Recall that justifying MAR or MCAR in real datasets also requires inductive arguments.
 
3
The CPMAG is also known as a partial ancestral graph (PAG). However, we will use the term CPMAG in order to mimic the use of the term CPDAG.
 
4
FCI has 10 orientation rules in total, 2 of which include R1 and R4.
 
5
We also set \(\mathbb {E}(N)=3\) and obtained identical results as reported in Appendix 11.4.
 
6
This MCAR interpretation implies that \(\cup _{i=1}^q M_i \perp \!\!\!\perp _d \varvec{O}\), so the interpretation is similar to the MCAR interpretation introduced in [10], where we have \(\cup _{i=1}^q M_i \perp \!\!\!\perp _d (\{\varvec{O} \cup \varvec{L} \cup \varvec{S}\} {\setminus } \cup _{i=1}^q M_i)\).
 
Literatur
4.
Zurück zum Zitat Daniel, R.M., Kenward, M.G., Cousens, S.N., De Stavola, B.L.: Using causal diagrams to guide analysis in missing data problems. Stoch. Models 21(3), 243–256 (2012)MathSciNetMATH Daniel, R.M., Kenward, M.G., Cousens, S.N., De Stavola, B.L.: Using causal diagrams to guide analysis in missing data problems. Stoch. Models 21(3), 243–256 (2012)MathSciNetMATH
5.
Zurück zum Zitat Doove, L., Van Buuren, S., Dusseldorp, E.: Recursive partitioning for missing data imputation in the presence of interaction effects. Comput. Stat. Data Anal. 72(C), 92–104 (2014)MathSciNetCrossRef Doove, L., Van Buuren, S., Dusseldorp, E.: Recursive partitioning for missing data imputation in the presence of interaction effects. Comput. Stat. Data Anal. 72(C), 92–104 (2014)MathSciNetCrossRef
8.
Zurück zum Zitat Little, R.J.A.: Missing data adjustments in large surveys. J. Bus. Econ. Stat. 6, 287–296 (1988) Little, R.J.A.: Missing data adjustments in large surveys. J. Bus. Econ. Stat. 6, 287–296 (1988)
10.
Zurück zum Zitat Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 1277–1285. Curran Associates, Inc., New York (2013) Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 1277–1285. Curran Associates, Inc., New York (2013)
11.
12.
Zurück zum Zitat Schafer, J.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)CrossRefMATH Schafer, J.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)CrossRefMATH
13.
14.
Zurück zum Zitat Shpitser, I., Mohan, K., Pearl, J.: Missing data as a causal and probabilistic problem. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI 2015, 12–16 July 2015, Amsterdam, The Netherlands, pp. 802–811 (2015) Shpitser, I., Mohan, K., Pearl, J.: Missing data as a causal and probabilistic problem. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI 2015, 12–16 July 2015, Amsterdam, The Netherlands, pp. 802–811 (2015)
15.
Zurück zum Zitat Sokolova, E., Groot, P., Claassen, T., von Rhein, D., Buitelaar, J., Heskes, T.: Causal discovery from medical data: dealing with missing values and a mixture of discrete and continuous data. In: Artificial Intelligence in Medicine—Proceedings of 15th Conference on Artificial Intelligence in Medicine, AIME 2015, Pavia, Italy, 17–20 June 2015, pp. 177–181 (2015). https://doi.org/10.1007/978-3-319-19551-3_23 Sokolova, E., Groot, P., Claassen, T., von Rhein, D., Buitelaar, J., Heskes, T.: Causal discovery from medical data: dealing with missing values and a mixture of discrete and continuous data. In: Artificial Intelligence in Medicine—Proceedings of 15th Conference on Artificial Intelligence in Medicine, AIME 2015, Pavia, Italy, 17–20 June 2015, pp. 177–181 (2015). https://​doi.​org/​10.​1007/​978-3-319-19551-3_​23
17.
Zurück zum Zitat Spirtes, P.: An anytime algorithm for causal inference. In: In the Presence of Latent Variables and Selection Bias in Computation, Causation and Discovery, pp. 121–128. MIT Press, Cambridge (2001) Spirtes, P.: An anytime algorithm for causal inference. In: In the Presence of Latent Variables and Selection Bias in Computation, Causation and Discovery, pp. 121–128. MIT Press, Cambridge (2001)
18.
Zurück zum Zitat Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)MATH Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)MATH
19.
Zurück zum Zitat Spirtes, P., Meek, C., Richardson, T.: An algorithm for causal inference in the presence of latent variables and selection bias. Computation, Causation, and Discovery, pp. 211–252. AAAI Press, Menlo Park, CA (1999) Spirtes, P., Meek, C., Richardson, T.: An algorithm for causal inference in the presence of latent variables and selection bias. Computation, Causation, and Discovery, pp. 211–252. AAAI Press, Menlo Park, CA (1999)
20.
Zurück zum Zitat Spirtes, P., Richardson, T.: A polynomial time algorithm for determining DAG equivalence in the presence of latent variables and selection bias. In: Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, Fort Lauderdale, pp. 489–500 (1996) Spirtes, P., Richardson, T.: A polynomial time algorithm for determining DAG equivalence in the presence of latent variables and selection bias. In: Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, Fort Lauderdale, pp. 489–500 (1996)
22.
Zurück zum Zitat Tillman, R.E., Danks, D., Glymour, C.: Integrating locally learned causal structures with overlapping variables. In: Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 8–11 Dec 2008, pp. 1665–1672 (2008) Tillman, R.E., Danks, D., Glymour, C.: Integrating locally learned causal structures with overlapping variables. In: Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 8–11 Dec 2008, pp. 1665–1672 (2008)
23.
Zurück zum Zitat Tillman, R.E., Eberhardt, F.: Learning causal structure from multiple datasets with similar variable sets. Behaviormetrika 41(1), 41–64 (2014)CrossRef Tillman, R.E., Eberhardt, F.: Learning causal structure from multiple datasets with similar variable sets. Behaviormetrika 41(1), 41–64 (2014)CrossRef
26.
Zurück zum Zitat van Buuren, S.: Flexible Imputation of Missing Data (Chapman and Hall, CRC Interdisciplinary Statistics), 1st edn. Chapman and Hall, London (2012)CrossRef van Buuren, S.: Flexible Imputation of Missing Data (Chapman and Hall, CRC Interdisciplinary Statistics), 1st edn. Chapman and Hall, London (2012)CrossRef
27.
Zurück zum Zitat van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, K.C., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. (in press) (2005) van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, K.C., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. (in press) (2005)
Metadaten
Titel
Fast causal inference with non-random missingness by test-wise deletion
verfasst von
Eric V. Strobl
Shyam Visweswaran
Peter L. Spirtes
Publikationsdatum
19.01.2018
Verlag
Springer International Publishing
Erschienen in
International Journal of Data Science and Analytics / Ausgabe 1/2018
Print ISSN: 2364-415X
Elektronische ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-017-0094-6

Weitere Artikel der Ausgabe 1/2018

International Journal of Data Science and Analytics 1/2018 Zur Ausgabe

Premium Partner