Abstract
Machine learning models have had discernible achievements in a myriad of applications. However, most of these models are black-boxes, and it is obscure how the decisions are made by them. This makes the models unreliable and untrustworthy. To provide insights into the decision making processes of these models, a variety of traditional interpretable models have been proposed. Moreover, to generate more humanfriendly explanations, recent work on interpretability tries to answer questions related to causality such as "Why does this model makes such decisions?" or "Was it a specific feature that caused the decision made by the model?". In this work, models that aim to answer causal questions are referred to as causal interpretable models. The existing surveys have covered concepts and methodologies of traditional interpretability. In this work, we present a comprehensive survey on causal interpretable models from the aspects of the problems and methods. In addition, this survey provides in-depth insights into the existing evaluation metrics for measuring interpretability, which can help practitioners understand for what scenarios each evaluation metric is suitable.
- A. Aamodt and E. Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1):39--59, 1994.Google ScholarDigital Library
- J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, pages 9505--9515, 2018.Google Scholar
- D. Alvarez-Melis and T. Jaakkola. A causal framework for explaining the predictions of black-box sequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 412--421, Copenhagen, Denmark, Sept. 2017. Association for Computational Linguistics.Google ScholarCross Ref
- J. Angwin, J. Larson, L. Kirchner, and S. Mattu. Machine bias risk assessments in criminal sentencing. https://www.propublica.org/article/machinebias- risk-assessments-in-criminal-sentencing, Mar 2019.Google Scholar
- AWS. Amazon customer reviews dataset. https://s3.amazonaws.com/amazon-reviews-pds/ readme.html, 2020.Google Scholar
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.Google Scholar
- D. Bau, J. Zhu, H. Strobelt, B. Zhou, J. B. Tenenbaum, W. T. Freeman, and A. Torralba. GAN dissection: Visualizing and understanding generative adversarial networks. CoRR, abs/1811.10597, 2018.Google Scholar
- M. Besserve, R. Sun, and B. Sch¨olkopf. Counterfactuals uncover the modular structure of deep generative models. CoRR, abs/1812.03253, 2018.Google Scholar
- D. Boyd and K. Crawford. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5):662--679, 2012.Google Scholar
- O. Boz. Extracting decision trees from trained neural networks. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 456--461. ACM, 2002.Google ScholarDigital Library
- R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, pages 1721--1730, New York, NY, USA, 2015. ACM.Google ScholarDigital Library
- A. Chattopadhyay, P. Manupriya, A. Sarkar, and V. N. Balasubramanian. Neural network attributions: A causal perspective. CoRR, abs/1902.02302, 2019.Google Scholar
- X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems, pages 2172--2180, 2016.Google Scholar
- W. Cheng, Y. Shen, L. Huang, and Y. Zhu. Incorporating interpretability into latent factor models via fast influence analysis. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 885--893. ACM, 2019.Google ScholarDigital Library
- A. Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. CoRR, abs/1703.00056, 2017.Google Scholar
- M. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In Advances in neural information processing systems, pages 24--30, 1996.Google Scholar
- F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.Google Scholar
- M. Du, N. Liu, and X. Hu. Techniques for interpretable machine learning. arXiv preprint arXiv:1808.00033, 2018.Google Scholar
- D. Dua and C. Graff. UCI machine learning repository, 2017.Google Scholar
- D. Erhan, Y. Bengio, A. Courville, and P. Vincent. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.Google Scholar
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303--338, June 2010.Google ScholarDigital Library
- A. Flores, K. Bechtel, and C. Lowenkamp. False positives, false negatives, and false analyses: A rejoinder to "machine bias: There's software used across the country to predict future criminals. and it's biased against blacks.". Federal probation, 80, 09 2016.Google Scholar
- J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189--1232, 2001.Google Scholar
- N. Frosst and G. Hinton. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017.Google Scholar
- L. Gerson Neuberg. Causality: models, reasoning, and inference, by judea pearl, cambridge university press, 2000. Econometric Theory, 19:675--685, 08 2003.Google ScholarCross Ref
- A. Ghorbani, A. Abid, and J. Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3681--3688, 2019.Google ScholarDigital Library
- L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pages 80--89. IEEE, 2018.Google ScholarCross Ref
- A. Goldstein, A. Kapelner, J. Bleich, and E. Pitkin. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1):44--65, 2015.Google ScholarCross Ref
- I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.Google ScholarDigital Library
- I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.Google Scholar
- B. Goodman and S. Flaxman. Eu regulations on algorithmic decision-making and a "right to explanation", 2016. cite arxiv:1606.08813Comment: presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY.Google Scholar
- B. Goodman and S. Flaxman. European union regulations on algorithmic decision-making and a "right to explanation". AI Magazine, 38(3):50--57, 2017.Google ScholarDigital Library
- Y. Goyal, U. Shalit, and B. Kim. Explaining classifiers with causal concept effect (cace). CoRR, abs/1907.07165, 2019.Google Scholar
- Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee. Counterfactual visual explanations. CoRR, abs/1904.07451, 2019.Google Scholar
- R. M. Grath, L. Costabello, C. L. Van, P. Sweeney, F. Kamiab, Z. Shen, and F. L´ecu´e. Interpretable credit application predictions with counterfactual explanations. CoRR, abs/1811.05245, 2018.Google Scholar
- R. Guo, L. Cheng, J. Li, P. R. Hahn, and H. Liu. A survey of learning causality with data: Problems and methods. arXiv preprint arXiv:1809.09337, 2018.Google Scholar
- K. S. Gurumoorthy, A. Dhurandhar, G. Cecchi, and C. Aggarwal. Efficient data representation by selecting prototypes with importance weights, 2017.Google Scholar
- M. Harradon, J. Druce, and B. E. Ruttenberg. Causal learning and explanation of deep neural networks via autoencoded activations. CoRR, abs/1802.00541, 2018.Google Scholar
- L. A. Hendricks, R. Hu, T. Darrell, and Z. Akata. Generating counterfactual explanations with natural language. CoRR, abs/1806.09809, 2018.Google Scholar
- I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, volume 3, 2017.Google Scholar
- G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google Scholar
- A. Hyv¨arinen and E. Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4--5):411--430, 2000.Google Scholar
- IMDb. Imdb datasets. https://www.imdb.com/interfaces/, 2020.Google Scholar
- I. Jolliffe. Principal component analysis. Springer, 2011.Google ScholarCross Ref
- A. Kanehira, K. Takemoto, S. Inayoshi, and T. Harada. Multimodal explanations by predictingGoogle Scholar
- N. Kilbertus, M. Rojas Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Sch¨olkopf. Avoiding discrimination through causal reasoning. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 656--666. Curran Associates, Inc., 2017.Google Scholar
- B. Kim, R. Khanna, and O. O. Koyejo. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems, pages 2280--2288, 2016.Google ScholarDigital Library
- B. Kim, O. Koyejo, and R. Khanna. Examples are not enough, learn to criticize! criticism for interpretability. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain, pages 2280--2288, 2016.Google Scholar
- C. Kim and O. Bastani. Learning interpretable models with causal guarantees. CoRR, abs/1901.08576, 2019.Google Scholar
- D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.Google Scholar
- P. W. Koh, K.-S. Ang, H. H. Teo, and P. Liang. On the accuracy of influence functions for measuring group effects. arXiv preprint arXiv:1905.13289, 2019.Google Scholar
- P. W. Koh and P. Liang. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1885--1894. JMLR. org, 2017.Google ScholarDigital Library
- M. J. Kusner, J. R. Loftus, C. Russell, and R. Silva. Counterfactual fairness. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA, pages 4069--4079, 2017.Google Scholar
- K. Lang. 20 newsgroups. http://qwone.com/~jason/20Newsgroups/, 2008.Google Scholar
- Q. V. Le. Building high-level features using large scale unsupervised learning. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 8595--8598. IEEE, 2013.Google ScholarCross Ref
- Y. LeCun, C. Cortes, and C. Burges. The mnist database. http://yann.lecun.com/exdb/mnist/, Jan 2020.Google Scholar
- J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino, J. Tang, and H. Liu. Feature selection: A data perspective. ACM Computing Surveys (CSUR), 50(6):94, 2018.Google Scholar
- Y. Li, R. Guo, W. Wang, and H. Liu. Causal learning in question quality improvement. In 2019 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench'19), 2019.Google Scholar
- Z. C. Lipton. The mythos of model interpretability. CoRR, abs/1606.03490, 2016.Google Scholar
- S. Liu, B. Kailkhura, D. Loveland, and Y. Han. Generative counterfactual introspection for explainable deep learning. CoRR, abs/1907.03077, 2019.Google Scholar
- A. V. Looveren and J. Klaise. Interpretable counterfactual explanations guided by prototypes. CoRR, abs/1907.02584, 2019.Google Scholar
- Y. Lou, R. Caruana, and J. Gehrke. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 150--158. ACM, 2012.Google ScholarDigital Library
- Y. Lou, R. Caruana, J. Gehrke, and G. Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '13, pages 623--631, New York, NY, USA, 2013. ACM.Google ScholarDigital Library
- J. Lu, C. Xiong, D. Parikh, and R. Socher. Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 375--383, 2017.Google ScholarCross Ref
- J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical question-image co-attention for visual question answering. In Advances In Neural Information Processing Systems, pages 289--297, 2016.Google Scholar
- S. M. Lundberg and S.-I. Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765--4774, 2017.Google ScholarDigital Library
- D. Madras, E. Creager, T. Pitassi, and R. S. Zemel. Fairness through causal awareness: Learning latent-variable models for biased data. CoRR, abs/1809.02519, 2018.Google Scholar
- P. Madumal, T. Miller, L. Sonenberg, and F. Vetere. Explainable reinforcement learning through a causal lens. CoRR, abs/1905.10958, 2019.Google Scholar
- N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Galstyan. A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635, 2019.Google Scholar
- T. Miller. Explanation in artificial intelligence: Insights from the social sciences. CoRR, abs/1706.07269, 2017.Google Scholar
- C. Molnar. Interpretable Machine Learning. Lulu.com, 2019. https: //christophm.github.io/interpretable-ml-book/.Google Scholar
- C. Molnar. Interpretable machine learning. Lulu. com, 2019.Google Scholar
- J. Moore, N. Hammerla, and C. Watkins. Explaining deep learning models with constrained adversarial examples. CoRR, abs/1906.10671, 2019.Google Scholar
- S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. CoRR, abs/1511.04599, 2015.Google Scholar
- A. Mordvintsev, C. Olah, and M. Tyka. Inceptionism: Going deeper into neural networks, 2015.Google Scholar
- R. K. Mothilal, A. Sharma, and C. Tan. Explaining machine learning classifiers through diverse counterfactual explanations. CoRR, abs/1905.07697, 2019.Google Scholar
- T. Narendra, A. Sankaran, D. Vijaykeerthy, and S. Mani. Explaining deep learning models using causal inference. CoRR, abs/1811.04376, 2018.Google Scholar
- C. Olah, A. Mordvintsev, and L. Schubert. Feature visualization. Distill, 2017. https://distill.pub/2017/feature-visualization.Google ScholarCross Ref
- C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye, and A. Mordvintsev. The building blocks of interpretability. Distill, 2018. https://distill.pub/2018/building-blocks.Google ScholarCross Ref
- N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. CoRR, abs/1511.07528, 2015.Google Scholar
- ´A. Parafita and J. Vitri'a. Explaining visual models by causal attribution. arXiv preprint arXiv:1909.08891, 2019.Google Scholar
- J. Pearl. Causality. Cambridge university press, 2009.Google Scholar
- J. Pearl. Theoretical impediments to machine learning with seven sparks from the causal revolution. CoRR, abs/1801.04016, 2018.Google Scholar
- J. Pearl. The seven tools of causal inference, with reflections on machine learning. Commun. ACM, 62(3):54--60, Feb. 2019.Google ScholarDigital Library
- G. Plumb, D. Molitor, and A. S. Talwalkar. Model agnostic supervised local explanations. In Advances in Neural Information Processing Systems, pages 2515--2524, 2018.Google Scholar
- S. Rathi. Generating counterfactual and contrastive explanations using SHAP. CoRR, abs/1906.09293, 2019.Google Scholar
- A. Renkl. Toward an instructionally oriented theory of example-based learning. Cognitive science, 38(1):1--37, 2014.Google ScholarCross Ref
- M. T. Ribeiro, S. Singh, and C. Guestrin. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386, 2016.Google Scholar
- M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135--1144. ACM, 2016.Google ScholarDigital Library
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211--252, 2015.Google Scholar
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 618--626, 2017.Google ScholarCross Ref
- U. Shalit, F. D. Johansson, and D. Sontag. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 3076--3085. JMLR. org, 2017.Google Scholar
- K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.Google Scholar
- J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.Google Scholar
- P.-N. Tan. Introduction to data mining. Pearson Education India, 2018.Google Scholar
- G. G. Towell and J. W. Shavlik. Extracting refined rules from knowledge-based neural networks. Machine learning, 13(1):71--101, 1993.Google ScholarDigital Library
- UCI. Uci machine learning repository. https://archive.ics.uci.edu/ml/index.php, 2020.Google Scholar
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ªL. Kaiser, and I. Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998--6008, 2017.Google Scholar
- P. Velickovi´c, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.Google Scholar
- U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395--416, 2007.Google ScholarDigital Library
- S. Wachter, B. D. Mittelstadt, and C. Russell. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. CoRR, abs/1711.00399, 2017.Google Scholar
- Y. Wu, L. Zhang, X. Wu, and H. Tong. Pc-fairness: A unified framework for measuring causality-based fairness. In Advances in Neural Information Processing Systems, pages 3399--3409, 2019.Google Scholar
- H. Xu and K. Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In European Conference on Computer Vision, pages 451--466. Springer, 2016.Google ScholarCross Ref
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning, pages 2048--2057, 2015.Google ScholarDigital Library
- F. Yang, M. Du, and X. Hu. Evaluating explanation without ground truth in interpretable machine learning, 2019.Google Scholar
- Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 21--29, 2016.Google ScholarCross Ref
- Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pages 1480--1489, 2016.Google ScholarCross Ref
- YELP. Yelp dataset. https://www.yelp.com/dataset, 2020.Google Scholar
- M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, pages 818--833. Springer, 2014.Google ScholarCross Ref
- J. Zhang and E. Bareinboim. Fairness in decision-making-the causal explanation formula. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google Scholar
- Q. Zhang, Y. Yang, H. Ma, and Y. N. Wu. Interpreting cnns via decision trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6261--6270, 2019.Google ScholarCross Ref
- Q.-s. Zhang and S.-C. Zhu. Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering, 19(1):27--39, 2018.Google ScholarCross Ref
- Q. Zhao and T. Hastie. Causal interpretations of black-box models. Journal of Business & Economic Statistics, pages 1--10, 2019. SIGKDDGoogle ScholarCross Ref
Index Terms
- Causal Interpretability for Machine Learning - Problems, Methods and Evaluation
Recommendations
Causal Inference and Causal Machine Learning with Practical Applications: The paper highlights the concepts of Causal Inference and Causal ML along with different implementation techniques
CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)One of the most important research areas in Machine Learning is to build prescriptive models. This requires understanding and measurement of the causal impact of any proposed treatment, followed by designing optimal strategy based on such causal ...
State of the art of Fairness, Interpretability and Explainability in Machine Learning: Case of PRIM
SITA'20: Proceedings of the 13th International Conference on Intelligent Systems: Theories and ApplicationsThe adoption of complex machine learning (ML) models in recent years has brought along a new challenge related to how to interpret, understand, and explain the reasoning behind these complex models' predictions. Treating complex ML systems as ...
Causality in Neural Networks - An Extended Abstract
AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and SocietyCausal reasoning is the main learning and explanation tool used by humans. AI systems should possess causal reasoning capabilities to be deployed in the real world with trust and reliability. Introducing the ideas of causality to machine learning helps ...
Comments