skip to main content
chapter

Causality for Machine Learning

Published:04 March 2022Publication History
First page image

References

  1. J. Aldrich. 1989. Autonomy. Oxf. Econ. Pap. 41, 15–34. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  2. I. Asimov. 1951. Foundation. Gnome Press, New York.Google ScholarGoogle Scholar
  3. E. Bareinboim and J. Pearl. 2014. Transportability from multiple environments with limited experiments: Completeness results. In Advances in Neural Information Processing Systems 27, 280–288.Google ScholarGoogle Scholar
  4. E. Bareinboim, A. Forney, and J. Pearl. 2015. Bandits with unobserved confounders: A causal approach. In Advances in Neural Information Processing Systems 28, 1342–1350.Google ScholarGoogle Scholar
  5. S. Bauer, B. Schölkopf, and J. Peters. 2016. The arrow of time in multivariate time series. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. JMLR Workshop and Conference Proceedings, 2043–2051.Google ScholarGoogle Scholar
  6. Y. Bengio, A. Courville, and P. Vincent. 2012. Representation learning: A review and new perspectives. IEEE Trans. Softw. Eng. 35, 8, 1798–1828. DOI: .Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Bengio, V. Thomas, J. Pineau, D. Precup, and Y. Bengio. 2017. Independently controllable features. arXiv:1703.07718.Google ScholarGoogle Scholar
  8. Y. Bengio, T. Deleu, N. Rahaman, R. Ke, S. Lachapelle, O. Bilaniuk, A. Goyal, and C. Pal. 2019. A meta-transfer objective for learning to disentangle causal mechanisms. arXiv:1901.10912.Google ScholarGoogle Scholar
  9. B. Benneke, I. Wong, C. Piaulet, H. A. Knutson, I. J. M. Crossfield, J. Lothringer, C. V. Morley, P. Gao, T. P. Greene, C. Dressing, D. Dragomir, A. W. Howard, P. R. McCullough, E. M. R. K. J. J. Fortney, and J. Fraine. 2019. Water vapor on the habitable-zone exoplanet K2-18b. arXiv:1909.04642.Google ScholarGoogle Scholar
  10. M. Besserve, N. Shajarisales, B. Schölkopf, and D. Janzing. 2018a. Group invariance principles for causal generative models. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS). 557–565.Google ScholarGoogle Scholar
  11. M. Besserve, R. Sun, and B. Schölkopf. 2018b. Counterfactuals uncover the modular structure of deep generative models. arXiv:1812.03253.Google ScholarGoogle Scholar
  12. P. Blöbaum, T. Washio, and S. Shimizu. 2016. Error asymmetry in causal and anticausal regression. Behaviormetrika 2017. arXiv:1610.03263. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  13. A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. ACM, New York, 92–100. DOI: .Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Bohm. 1957. Causality and Chance in Modern Physics. Routledge & Kegan Paul, London.Google ScholarGoogle Scholar
  15. B. Bonet and H. Geffner. 2019. Learning first-order symbolic representations for planning from the structure of the state space. arXiv:1909.05546.Google ScholarGoogle Scholar
  16. L. Bottou, J. Peters, J. Quiñonero-Candela, D. X. Charles, D. M. Chickering, E. Portugualy, D. Ray, P. Simard, and E. Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. J. Mach. Learn. Res. 14, 3207–3260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Brynjolfsson, A. Collis, W. E. Diewert, F. Eggers, and K. J. Fox. 2019. GDP-B: Accounting for the value of new and free goods in the digital economy. Working Paper 25695, National Bureau of Economic Research.Google ScholarGoogle Scholar
  18. K. Budhathoki and J. Vreeken. 2016. Causal inference by compression. In IEEE 16th International Conference on Data Mining. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  19. L. Buesing, T. Weber, Y. Zwols, S. Racaniere, A. Guez, J.-B. Lespiau, and N. Heess. 2018. Woulda, coulda, shoulda: Counterfactually-guided policy search. arXiv:1811.06272.Google ScholarGoogle Scholar
  20. K. Chalupka, P. Perona, and F. Eberhardt. 2015. Multi-level cause–effect systems. arXiv:1512.07942.Google ScholarGoogle Scholar
  21. K. Chalupka, P. Perona, and F. Eberhardt. 2018. Fast conditional independence test for vector variables with large sample sizes. arXiv:1804.02747.Google ScholarGoogle Scholar
  22. O. Chapelle and B. Schölkopf. 2002. Incorporating invariances in nonlinear SVMs. In T. G. Dietterich, S. Becker, and Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, MA, 609–616. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  23. O. Chapelle, B. Schölkopf, and A. Zien (Eds.). 2006. Semi-Supervised Learning. MIT Press, Cambridge, MA. http://www.kyb.tuebingen.mpg.de/ssl-book/. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  24. Y. Chen and A. Cheung. 2018. The transparent self under big data profiling: Privacy and Chinese legislation on the social credit system. J. Comp. Law 12, 2, 356–378. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  25. N. Chentanez, A. G. Barto, and S. P. Singh. 2005. Intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems 17. MIT Press, 1281–1288.Google ScholarGoogle Scholar
  26. X. Dai. 2018. Toward a reputation state: The social credit system project of China. https://ssrn.com/abstract=3193577. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  27. P. Daniušis, D. Janzing, J. M. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Schölkopf. 2010. Inferring deterministic causal relations. In Proceedings of the 26th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 143–150.Google ScholarGoogle Scholar
  28. I. Dasgupta, J. Wang, S. Chiappa, J. Mitrovic, P. Ortega, D. Raposo, E. Hughes, P. Battaglia, M. Botvinick, and Z. Kurth-Nelson. 2019. Causal reasoning from meta-reinforcement learning. arXiv:1901.08162.Google ScholarGoogle Scholar
  29. A. P. Dawid. 1979. Conditional independence in statistical theory. J. R. Stat. Soc. B 41, 1, 1–31.Google ScholarGoogle ScholarCross RefCross Ref
  30. L. Devroye, L. Györfi, and G. Lugosi. 1996. A Probabilistic Theory of Pattern Recognition, Vol. 31: Applications of Mathematics. Springer, New York. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  31. D. Foreman-Mackey, B. T. Montet, D. W. Hogg, T. D. Morton, D. Wang, and B. Schölkopf. 2015. A systematic search for transiting planets in the K2 data. Astrophys. J. 806, 2. http://stacks.iop.org/0004-637X/806/i=2/a=215. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  32. R. Frisch, T. Haavelmo, T. Koopmans, and J. Tinbergen. 1948. Autonomy of Economic Relations. Universitets Socialøkonomiske Institutt, Oslo, Norway.Google ScholarGoogle Scholar
  33. K. Fukumizu, A. Gretton, X. Sun, and B. Schölkopf. 2008. Kernel measures of conditional dependence. In Advances in Neural Information Processing Systems 20. 489–496.Google ScholarGoogle Scholar
  34. D. Geiger and J. Pearl. 1990. Logical and algorithmic properties of independence and their application to Bayesian networks. Ann. Math. Artif. Intell. 2, 165–178. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  35. M. Gong, K. Zhang, T. Liu, D. Tao, C. Glymour, and B. Schölkopf. 2016. Domain adaptation with conditional transferable components. In Proceedings of the 33rd International Conference on Machine Learning. 2839–2848.Google ScholarGoogle Scholar
  36. M. Gong, K. Zhang, B. Schölkopf, C. Glymour, and D. Tao. 2017. Causal discovery from temporally aggregated time series. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI). ID 269.Google ScholarGoogle Scholar
  37. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 2672–2680.Google ScholarGoogle Scholar
  38. O. Gottesman, F. Johansson, J. Meier, J. Dent, D. Lee, S. Srinivasan, L. Zhang, Y. Ding, D. Wihl, X. Peng, J. Yao, I. Lage, C. Mosch, L. wei H. Lehman, M. Komorowski, M. Komorowski, A. Faisal, L. A. Celi, D. Sontag, and F. Doshi-Velez. 2018. Evaluating reinforcement learning algorithms in observational health settings. arXiv:1805.12298.Google ScholarGoogle Scholar
  39. O. Goudet, D. Kalainathan, P. Caillou, I. Guyon, D. Lopez-Paz, and M. Sebag. 2017. Causal generative neural networks. arXiv:1711.08936.Google ScholarGoogle Scholar
  40. A. Goyal, A. Lamb, J. Hoffmann, S. Sodhani, S. Levine, Y. Bengio, and B. Schölkopf. 2019. Recurrent independent mechanisms. arXiv:1909.10893.Google ScholarGoogle Scholar
  41. A. Gretton, O. Bousquet, A. Smola, and B. Schölkopf. 2005a. Measuring statistical dependence with Hilbert–Schmidt norms. In Algorithmic Learning Theory. Springer-Verlag, 63–78. DOI: .Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Schölkopf. 2005b. Kernel methods for measuring independence. J. Mach. Learn. Res. 6, 2075–2129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Guo, L. Cheng, J. Li, P. R. Hahn, and H. Liu. 2018. A survey of learning causality with data: Problems and methods. arXiv:1809.09337. DOI: .Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. I. Guyon, C. Aliferis, and A. Elisseeff. 2007. Causal feature selection. In Computational Methods of Feature Selection. Chapman and Hall/CRC, Boca Raton, FL, 75–97.Google ScholarGoogle Scholar
  45. I. Guyon, D. Janzing, and B. Schölkopf. 2010. Causality: Objectives and assessment. In I. Guyon, D. Janzing, and B. Schölkopf (Eds.), JMLR Workshop and Conference Proceedings. Vol. 6. MIT Press, Cambridge, MA, 1–42.Google ScholarGoogle Scholar
  46. T. Haavelmo. 1944. The probability approach in econometrics. Econometrica 12, (supplement), S1–S115.Google ScholarGoogle Scholar
  47. C. Heinze-Deml and N. Meinshausen. 2017. Conditional variance penalties and domain shift robustness. arXiv:1710.11469.Google ScholarGoogle Scholar
  48. C. Heinze-Deml, J. Peters, and N. Meinshausen. 2017. Invariant causal prediction for nonlinear models. arXiv:1706.08576.Google ScholarGoogle Scholar
  49. J. Henrich. 2016. The Secret of our Success. Princeton University Press, Princeton, NJ.Google ScholarGoogle Scholar
  50. K. D. Hoover. 2008. Causality in economics and econometrics. In S. N. Durlauf and L. E. Blume (Eds.), The New Palgrave Dictionary of Economics (2nd. ed.). Palgrave Macmillan, Basingstoke, UK.Google ScholarGoogle Scholar
  51. P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Schölkopf. 2009. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS).689–696.Google ScholarGoogle Scholar
  52. B. Huang, K. Zhang, J. Zhang, R. Sanchez-Romero, C. Glymour, and B. Schölkopf. 2017. Behind distribution shift: Mining driving forces of changes and causal arrows. In IEEE 17th International Conference on Data Mining (ICDM 2017). 913–918. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  53. D. Janzing. 2019. Causal regularization. In Advances in Neural Information Processing Systems 33.Google ScholarGoogle Scholar
  54. D. Janzing and B. Schölkopf. 2010. Causal inference using the algorithmic Markov condition. IEEE Trans. Inf. Theory 56, 10, 5168–5194. DOI: .Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. Janzing and B. Schölkopf. 2015. Semi-supervised interpolation in an anticausal learning scenario. J. Mach. Learn. Res. 16, 1923–1948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. D. Janzing and B. Schölkopf. 2018. Detecting non-causal artifacts in multivariate linear regression models. In Proceedings of the 35th International Conference on Machine Learning (ICML). 2250–2258.Google ScholarGoogle Scholar
  57. D. Janzing, J. Peters, J. M. Mooij, and B. Schölkopf. 2009. Identifying confounders using additive noise models. In Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 249–257.Google ScholarGoogle Scholar
  58. D. Janzing, P. Hoyer, and B. Schölkopf. 2010. Telling cause from effect based on high-dimensional observations. In J. Fürnkranz and T. Joachims (Eds.), In Proceedings of the 27th International Conference on Machine Learning. 479–486.Google ScholarGoogle Scholar
  59. D. Janzing, J. M. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniušis, B. Steudel, and B. Schölkopf. 2012. Information-geometric approach to inferring causal directions. Artif. Intell. 182–183, 1–31. DOI: .Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. D. Janzing, R. Chaves, and B. Schölkopf. 2016. Algorithmic independence of initial condition and dynamical law in thermodynamics and causal inference. New J. Phys. 18, 093052, 1–13. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  61. M. Khajehnejad, B. Tabibian, B. Schölkopf, A. Singla, and M. Gomez-Rodriguez. 2019. Optimal decision making under strategic behavior. arXiv:1905.09239.Google ScholarGoogle Scholar
  62. N. Kilbertus, M. Rojas Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. 2017. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems 30. 656–666.Google ScholarGoogle Scholar
  63. N. Kilbertus, G. Parascandolo, and B. Schölkopf. 2018. Generalization in anti-causal learning. arXiv:1812.00524.Google ScholarGoogle Scholar
  64. D. P. Kingma and M. Welling. 2013. Auto-encoding variational Bayes. arXiv:1312.6114.Google ScholarGoogle Scholar
  65. F. Klein. 1872. Vergleichende Betrachtungen über neuere geometrische Forschungen. Verlag von Andreas Deichert, Erlangen.Google ScholarGoogle Scholar
  66. S. Kpotufe, E. Sgouritsa, D. Janzing, and B. Schölkopf. 2014. Consistency of causal inference under the additive noise model. In Proceedings of the 31st International Conference on Machine Learning. 478–486.Google ScholarGoogle Scholar
  67. M. J. Kusner, J. Loftus, C. Russell, and R. Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 4066–4076.Google ScholarGoogle Scholar
  68. S. Lange, T. Gabel, and M. Riedmiller. 2012. Batch reinforcement learning. In M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State-of-the-Art. Springer, Berlin, 45–73.Google ScholarGoogle ScholarCross RefCross Ref
  69. S. L. Lauritzen. 1996. Graphical Models. Oxford University Press, New York.Google ScholarGoogle Scholar
  70. Y. LeCun, Y. Bengio, and G. Hinton. 2015. Deep learning. Nature 521, 7553, 436–444. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  71. Y. Li, M. Gong, X. Tian, T. Liu, and D. Tao. 2018a. Domain generalization via conditional invariant representation. arXiv:1807.08479.Google ScholarGoogle Scholar
  72. Y. Li, X. Tian, M. Gong, Y. Liu, T. Liu, K. Zhang, and D. Tao. 2018b. Deep domain generalization via conditional invariant adversarial networks. In The European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  73. Z. C. Lipton, Y.-X. Wang, and A. Smola. 2018. Detecting and correcting for label shift with black box predictors. arXiv:1802.03916.Google ScholarGoogle Scholar
  74. F. Locatello, S. Bauer, M. Lucic, G. Rätsch, S. Gelly, B. Schölkopf, and O. Bachem. 2018a. Challenging common assumptions in the unsupervised learning of disentangled representations. In Proceedings of the 36th International Conference on Machine Learning.Google ScholarGoogle Scholar
  75. F. Locatello, D. Vincent, I. Tolstikhin, G. Rätsch, S. Gelly, and B. Schölkopf. 2018b. Competitive training of mixtures of independent deep generative models. arXiv:1804.11130.Google ScholarGoogle Scholar
  76. D. Lopez-Paz, K. Muandet, B. Schölkopf, and I. Tolstikhin. 2015. Towards a learning theory of cause–effect inference. In Proceedings of the 32nd International Conference on Machine Learning. 1452–1461.Google ScholarGoogle Scholar
  77. D. Lopez-Paz, R. Nishihara, S. Chintala, B. Schölkopf, and L. Bottou. 2017. Discovering causal signals in images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 58–66.Google ScholarGoogle Scholar
  78. K. Lorenz. 1973. Die Rückseite des Spiegels. R. Piper & Co. Verlag, Munich.Google ScholarGoogle Scholar
  79. C. Lu, B. Schölkopf, and J. M. Hernández-Lobato. 2018. Deconfounding reinforcement learning in observational settings. arXiv:1812.10576.Google ScholarGoogle Scholar
  80. S. MacLane. 1971. Categories for the Working Mathematician. Vol. 5. Graduate Texts in Mathematics. Springer-Verlag, New York.Google ScholarGoogle Scholar
  81. S. Magliacane, T. van Ommen, T. Claassen, S. Bongers, P. Versteeg, and J. M. Mooij. 2018. Domain adaptation by using causal inference to predict invariant conditional distributions. In Proceedings of the NeurIPS. arXiv:1707.06422.Google ScholarGoogle Scholar
  82. E. Medina. 2011. Cybernetic Revolutionaries: Technology and Politics in Allende’s Chile. The MIT Press, Cambridge, MA.Google ScholarGoogle ScholarCross RefCross Ref
  83. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540, 529–533. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  84. B. T. Montet, T. D. Morton, D. Foreman-Mackey, J. A. Johnson, D. W. Hogg, B. P. Bowler, D. W. Latham, A. Bieryla, and A. W. Mann. 2015. Stellar and planetary properties of K2 Campaign 1 candidates and validation of 17 planets, including a planet receiving Earth-like insolation. Astrophys. J. 809, 1, 25.Google ScholarGoogle ScholarCross RefCross Ref
  85. J. M. Mooij, D. Janzing, J. Peters, and B. Schölkopf. 2009. Regression by dependence minimization and its application to causal inference. In Proceedings of the 26th International Conference on Machine Learning (ICML). 745–752. DOI: .Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. J. M. Mooij, D. Janzing, T. Heskes, and B. Schölkopf. 2011. On causal discovery with cyclic additive noise models. In Advances in Neural Information Processing Systems 24 (NIPS).Google ScholarGoogle Scholar
  87. J. M. Mooij, D. Janzing, and B. Schölkopf. 2013. From ordinary differential equations to structural causal models: The deterministic case. In Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 440–448.Google ScholarGoogle Scholar
  88. J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf. 2016. Distinguishing cause from effect using observational data: Methods and benchmarks. J. Mach. Learn. Res. 17, 32, 1–102.Google ScholarGoogle Scholar
  89. G. Parascandolo, N. Kilbertus, M. Rojas-Carulla, and B. Schölkopf. 2018. Learning independent causal mechanisms. In Proceedings of the 35th International Conference on Machine Learning (ICML). 4033–4041.Google ScholarGoogle Scholar
  90. J. Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. J. Pearl. 2009a. Causality: Models, Reasoning, and Inference. (2nd. ed.). Cambridge University Press, New York.Google ScholarGoogle Scholar
  92. J. Pearl. 2009b. Giving computers free will. Forbes.Google ScholarGoogle Scholar
  93. J. Pearl and E. Bareinboim. 2015. External validity: From do-calculus to transportability across populations. Stat. Sci. 2014, 29, 4, 579–595. arXiv:1503.01603. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  94. J. Peters, J. M. Mooij, D. Janzing, and B. Schölkopf. 2011. Identifiability of causal graphs using functional models. In Proceedings of the 27th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 589–598.Google ScholarGoogle Scholar
  95. J. Peters, J. M. Mooij, D. Janzing, and B. Schölkopf. 2014. Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15, 2009–2053.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. J. Peters, P. Bühlmann, and N. Meinshausen. 2016. Causal inference using invariant prediction: Identification and confidence intervals. J. R. Stat. Soc. Series B Stat. Methodol. 78, 5, 947–1012. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  97. J. Peters, D. Janzing, and B. Schölkopf. 2017. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  98. N. Pfister, S. Bauer, and J. Peters. 2018a. Identifying causal structure in large-scale kinetic systems. arXiv:1810.11776.Google ScholarGoogle Scholar
  99. N. Pfister, P. Bühlmann, B. Schölkopf, and J. Peters. 2018b. Kernel-based tests for joint independence. J. R. Stat. Soc. Series B Stat. Methodol. 80, 1, 5–31. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  100. S. Rabanser, S. Günnemann, and Z. C. Lipton. 2018. Failing loudly: An empirical study of methods for detecting dataset shift. arXiv:1810.11953.Google ScholarGoogle Scholar
  101. H. Reichenbach. 1956. The Direction of Time. University of California Press, Berkeley, CA. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  102. M. Rojas-Carulla, B. Schölkopf, R. Turner, and J. Peters. 2018. Invariant models for causal transfer learning. J. Mach. Learn. Res. 19, 36, 1–34. DOI: .Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. P. K. Rubenstein, S. Weichwald, S. Bongers, J. M. Mooij, D. Janzing, M. Grosse-Wentrup, and B. Schölkopf. 2017. Causal consistency of structural equation models. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence.Google ScholarGoogle Scholar
  104. P. K. Rubenstein, S. Bongers, B. Schölkopf, and J. M. Mooij. 2018. From deterministic ODEs to dynamic structural causal models. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI).Google ScholarGoogle Scholar
  105. B. Schölkopf. 2015. Artificial intelligence: Learning to see and act. Nature 518, 7540, 486–487. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  106. B. Schölkopf. 2017. Causal learning. In Invited Talk, 34th International Conference on Machine Learning (ICML). https://vimeo.com/238274659.Google ScholarGoogle Scholar
  107. B. Schölkopf and A. J. Smola. 2002. Learning with Kernels. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  108. B. Schölkopf, D. Janzing, J. Peters, and K. Zhang. 2011. Robust learning via cause–effect models. https://arxiv.org/abs/1112.2738.Google ScholarGoogle Scholar
  109. B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. 2012. On causal and anticausal learning. In J. Langford and J. Pineau (Eds.), Proceedings of the 29th International Conference on Machine Learning (ICML). Omnipress, New York, 1255–1262. http://icml.cc/2012/papers/625.pdf.Google ScholarGoogle Scholar
  110. B. Schölkopf, D. Hogg, D. Wang, D. Foreman-Mackey, D. Janzing, C.-J. Simon-Gabriel, and J. Peters. 2016a. Modeling confounding by half-sibling regression. Proc. Natl. Acad. Sci. U. S. A. 113, 27, 7391–7398. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  111. B. Schölkopf, D. Janzing, and D. Lopez-Paz. 2016b. Causal and statistical learning. Oberwolfach Rep. 13, 3, 1896–1899. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  112. L. Schott, J. Rauber, M. Bethge, and W. Brendel. 2019. Towards the first adversarially robust neural network model on MNIST. In International Conference on Learning Representations. https://openreview.net/forum?id=S1EHOsC9tX.Google ScholarGoogle Scholar
  113. R. D. Shah and J. Peters. 2018. The hardness of conditional independence testing and the generalised covariance measure. Ann. Statist. 48, 3, 1514–1538. arXiv:1804.07203. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  114. N. Shajarisales, D. Janzing, B. Schölkopf, and M. Besserve. 2015. Telling cause from effect in deterministic linear dynamical systems. In Proceedings of the 32nd International Conference on Machine Learning (ICML). 285–294.Google ScholarGoogle Scholar
  115. C. E. Shannon. 1959. Coding theorems for a discrete source with a fidelity criterion. In IRE International Convention Records. Vol. 7, 142–163.Google ScholarGoogle Scholar
  116. S. Shimizu, P. O. Hoyer, A. Hyvärinen, and A. J. Kerminen. 2006. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, 2003–2030.Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. V. Smil. 2017. Energy and Civilization: A History. MIT Press, Cambridge, MA. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  118. P. Spirtes, C. Glymour, and R. Scheines. 2000. Causation, Prediction, and Search (2nd. ed.). MIT Press, Cambridge, MA. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  119. W. Spohn. 1978. Grundlagen der Entscheidungstheorie. Scriptor-Verlag.Google ScholarGoogle Scholar
  120. I. Steinwart and A. Christmann. 2008. Support Vector Machines. Springer, New York.Google ScholarGoogle Scholar
  121. B. Steudel, D. Janzing, and B. Schölkopf. 2010. Causal Markov condition for submodular information measures. In Proceedings of the 23rd Annual Conference on Learning Theory (COLT). 464–476.Google ScholarGoogle Scholar
  122. A. Subbaswamy, P. Schulam, and S. Saria, 2018. Preventing failures due to dataset shift: Learning predictive models that transport. arXiv:1812.04597.Google ScholarGoogle Scholar
  123. X. Sun, D. Janzing, and B. Schölkopf. 2006. Causal inference by choosing graphs with most plausible Markov kernels. In Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics.Google ScholarGoogle Scholar
  124. R. Suter, Đ. Miladinović, B. Schölkopf, and S. Bauer. 2018. Robustly disentangled causal mechanisms: Validating deep representations for interventional robustness. arXiv:1811.00007. Proceedings ICML.Google ScholarGoogle Scholar
  125. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. 2013. Intriguing properties of neural networks. arXiv:1312.6199.Google ScholarGoogle Scholar
  126. A. Tsiaras, I. Waldmann, G. Tinetti, J. Tennyson, and S. N. Yurchenko. 2019. Water vapour in the atmosphere of the habitable-zone eight-earth-mass planet K2-18b. Nat. Astron. 3, 1–6. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  127. V. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer, New York. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  128. V. N. Vapnik. 1998. Statistical Learning Theory. Wiley, New York.Google ScholarGoogle ScholarCross RefCross Ref
  129. J. von Kügelgen, A. Mey, M. Loog, and B. Schölkopf. 2019. Semi-supervised learning, causality and the conditional cluster assumption. https://arxiv.org/abs/1905.12081.Google ScholarGoogle Scholar
  130. H. Wang, Z. He, Z. C. Lipton, and E. P. Xing. 2019. Learning robust representations by projecting superficial statistics out. arXiv:1903.06256.Google ScholarGoogle Scholar
  131. S. Weichwald, B. Schölkopf, T. Ball, and M. Grosse-Wentrup. 2014. Causal and anti-causal learning in pattern recognition for neuroimaging. In 4th International Workshop on Pattern Recognition in Neuroimaging (PRNI). IEEE. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  132. W. K. Wootters and W. H. Zurek. 1982. A single quantum cannot be cloned. Nature 299, 5886, 802–803. DOI: .Google ScholarGoogle ScholarCross RefCross Ref
  133. J. Zhang and E. Bareinboim. 2018. Fairness in decision-making—The causal explanation formula. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, LA, 2037–2045.Google ScholarGoogle Scholar
  134. J. Zhang and E. Bareinboim. 2019. Near-optimal reinforcement learning in dynamic treatment regimes. In Advances in Neural Information Processing Systems 33.Google ScholarGoogle Scholar
  135. K. Zhang and A. Hyvärinen. 2009. On the identifiability of the post-nonlinear causal model. In Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 647–655.Google ScholarGoogle Scholar
  136. K. Zhang, J. Peters, D. Janzing, and B. Schölkopf. 2011. Kernel-based conditional independence test and application in causal discovery. In Proceedings of the 27th Annual Conference on Uncertainty in Artificial Intelligence (UAI). 804–813.Google ScholarGoogle Scholar
  137. K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. 2013. Domain adaptation under target and conditional shift. In Proceedings of the 30th International Conference on Machine Learning (ICML). 819–827.Google ScholarGoogle Scholar
  138. K. Zhang, M. Gong, and B. Schölkopf. 2015. Multi-source domain adaptation: A causal view. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 3150–3157.Google ScholarGoogle Scholar
  139. K. Zhang, B. Huang, J. Zhang, C. Glymour, and B. Schölkopf. 2017. Causal discovery from nonstationary/heterogeneous data: Skeleton estimation and orientation determination. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI 2017). 1347–1353. DOI: .Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Causality for Machine Learning
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Books
          Probabilistic and Causal Inference: The Works of Judea Pearl
          February 2022
          946 pages
          ISBN:9781450395861
          DOI:10.1145/3501714

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 March 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • chapter

          Appears In

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader