Skip to main content
Top
Published in: International Journal of Data Science and Analytics 1/2018

19-01-2018 | Regular Paper

Fast causal inference with non-random missingness by test-wise deletion

Authors: Eric V. Strobl, Shyam Visweswaran, Peter L. Spirtes

Published in: International Journal of Data Science and Analytics | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain only a few missing values. In this report, we show that we can more efficiently utilize the observed values with test-wise deletion while still maintaining algorithmic soundness. Here, test-wise deletion refers to the process of list-wise deleting samples only among the variables required for each conditional independence (CI) test used in constraint-based searches. Test-wise deletion therefore often saves more samples than list-wise deletion for each CI test, especially when we have a sparse underlying graph. Our theoretical results show that test-wise deletion is sound under the justifiable assumption that none of the missingness mechanisms causally affect each other in the underlying causal graph. We also find that FCI and RFCI with test-wise deletion outperform their list-wise deletion and imputation counterparts on average when MNAR holds in both synthetic and real data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
We will only consider distributions which admit densities in this report.
 
2
Recall that justifying MAR or MCAR in real datasets also requires inductive arguments.
 
3
The CPMAG is also known as a partial ancestral graph (PAG). However, we will use the term CPMAG in order to mimic the use of the term CPDAG.
 
4
FCI has 10 orientation rules in total, 2 of which include R1 and R4.
 
5
We also set \(\mathbb {E}(N)=3\) and obtained identical results as reported in Appendix 11.4.
 
6
This MCAR interpretation implies that \(\cup _{i=1}^q M_i \perp \!\!\!\perp _d \varvec{O}\), so the interpretation is similar to the MCAR interpretation introduced in [10], where we have \(\cup _{i=1}^q M_i \perp \!\!\!\perp _d (\{\varvec{O} \cup \varvec{L} \cup \varvec{S}\} {\setminus } \cup _{i=1}^q M_i)\).
 
Literature
4.
go back to reference Daniel, R.M., Kenward, M.G., Cousens, S.N., De Stavola, B.L.: Using causal diagrams to guide analysis in missing data problems. Stoch. Models 21(3), 243–256 (2012)MathSciNetMATH Daniel, R.M., Kenward, M.G., Cousens, S.N., De Stavola, B.L.: Using causal diagrams to guide analysis in missing data problems. Stoch. Models 21(3), 243–256 (2012)MathSciNetMATH
5.
go back to reference Doove, L., Van Buuren, S., Dusseldorp, E.: Recursive partitioning for missing data imputation in the presence of interaction effects. Comput. Stat. Data Anal. 72(C), 92–104 (2014)MathSciNetCrossRef Doove, L., Van Buuren, S., Dusseldorp, E.: Recursive partitioning for missing data imputation in the presence of interaction effects. Comput. Stat. Data Anal. 72(C), 92–104 (2014)MathSciNetCrossRef
8.
go back to reference Little, R.J.A.: Missing data adjustments in large surveys. J. Bus. Econ. Stat. 6, 287–296 (1988) Little, R.J.A.: Missing data adjustments in large surveys. J. Bus. Econ. Stat. 6, 287–296 (1988)
10.
go back to reference Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 1277–1285. Curran Associates, Inc., New York (2013) Mohan, K., Pearl, J., Tian, J.: Graphical models for inference with missing data. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 1277–1285. Curran Associates, Inc., New York (2013)
11.
12.
14.
go back to reference Shpitser, I., Mohan, K., Pearl, J.: Missing data as a causal and probabilistic problem. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI 2015, 12–16 July 2015, Amsterdam, The Netherlands, pp. 802–811 (2015) Shpitser, I., Mohan, K., Pearl, J.: Missing data as a causal and probabilistic problem. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI 2015, 12–16 July 2015, Amsterdam, The Netherlands, pp. 802–811 (2015)
15.
go back to reference Sokolova, E., Groot, P., Claassen, T., von Rhein, D., Buitelaar, J., Heskes, T.: Causal discovery from medical data: dealing with missing values and a mixture of discrete and continuous data. In: Artificial Intelligence in Medicine—Proceedings of 15th Conference on Artificial Intelligence in Medicine, AIME 2015, Pavia, Italy, 17–20 June 2015, pp. 177–181 (2015). https://doi.org/10.1007/978-3-319-19551-3_23 Sokolova, E., Groot, P., Claassen, T., von Rhein, D., Buitelaar, J., Heskes, T.: Causal discovery from medical data: dealing with missing values and a mixture of discrete and continuous data. In: Artificial Intelligence in Medicine—Proceedings of 15th Conference on Artificial Intelligence in Medicine, AIME 2015, Pavia, Italy, 17–20 June 2015, pp. 177–181 (2015). https://​doi.​org/​10.​1007/​978-3-319-19551-3_​23
17.
go back to reference Spirtes, P.: An anytime algorithm for causal inference. In: In the Presence of Latent Variables and Selection Bias in Computation, Causation and Discovery, pp. 121–128. MIT Press, Cambridge (2001) Spirtes, P.: An anytime algorithm for causal inference. In: In the Presence of Latent Variables and Selection Bias in Computation, Causation and Discovery, pp. 121–128. MIT Press, Cambridge (2001)
18.
go back to reference Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)MATH Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)MATH
19.
go back to reference Spirtes, P., Meek, C., Richardson, T.: An algorithm for causal inference in the presence of latent variables and selection bias. Computation, Causation, and Discovery, pp. 211–252. AAAI Press, Menlo Park, CA (1999) Spirtes, P., Meek, C., Richardson, T.: An algorithm for causal inference in the presence of latent variables and selection bias. Computation, Causation, and Discovery, pp. 211–252. AAAI Press, Menlo Park, CA (1999)
20.
go back to reference Spirtes, P., Richardson, T.: A polynomial time algorithm for determining DAG equivalence in the presence of latent variables and selection bias. In: Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, Fort Lauderdale, pp. 489–500 (1996) Spirtes, P., Richardson, T.: A polynomial time algorithm for determining DAG equivalence in the presence of latent variables and selection bias. In: Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, Fort Lauderdale, pp. 489–500 (1996)
22.
go back to reference Tillman, R.E., Danks, D., Glymour, C.: Integrating locally learned causal structures with overlapping variables. In: Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 8–11 Dec 2008, pp. 1665–1672 (2008) Tillman, R.E., Danks, D., Glymour, C.: Integrating locally learned causal structures with overlapping variables. In: Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 8–11 Dec 2008, pp. 1665–1672 (2008)
23.
go back to reference Tillman, R.E., Eberhardt, F.: Learning causal structure from multiple datasets with similar variable sets. Behaviormetrika 41(1), 41–64 (2014)CrossRef Tillman, R.E., Eberhardt, F.: Learning causal structure from multiple datasets with similar variable sets. Behaviormetrika 41(1), 41–64 (2014)CrossRef
26.
go back to reference van Buuren, S.: Flexible Imputation of Missing Data (Chapman and Hall, CRC Interdisciplinary Statistics), 1st edn. Chapman and Hall, London (2012)CrossRef van Buuren, S.: Flexible Imputation of Missing Data (Chapman and Hall, CRC Interdisciplinary Statistics), 1st edn. Chapman and Hall, London (2012)CrossRef
27.
go back to reference van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, K.C., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. (in press) (2005) van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn, K.C., Rubin, D.B.: Fully conditional specification in multivariate imputation. J. Stat. Comput. Simul. (in press) (2005)
Metadata
Title
Fast causal inference with non-random missingness by test-wise deletion
Authors
Eric V. Strobl
Shyam Visweswaran
Peter L. Spirtes
Publication date
19-01-2018
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 1/2018
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-017-0094-6

Other articles of this Issue 1/2018

International Journal of Data Science and Analytics 1/2018 Go to the issue

Premium Partner