Top

Data Mining and Knowledge Discovery

22-03-2024

MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data

Authors: Annabelle Redelmeier, Martin Jullum, Kjersti Aas, Anders Løland

Published in: Data Mining and Knowledge Discovery

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We introduce MCCE: \({{{\underline{\varvec{M}}}}}\)onte \({{{\underline{\varvec{C}}}}}\)arlo sampling of valid and realistic \({{{\underline{\varvec{C}}}}}\)ounterfactual \({{{\underline{\varvec{E}}}}}\)xplanations for tabular data, a novel counterfactual explanation method that generates on-manifold, actionable and valid counterfactuals by modeling the joint distribution of the mutable features given the immutable features and the decision. Unlike other on-manifold methods that tend to rely on variational autoencoders and have strict prediction model and data requirements, MCCE handles any type of prediction model and categorical features with more than two levels. MCCE first models the joint distribution of the features and the decision with an autoregressive generative model where the conditionals are estimated using decision trees. Then, it samples a large set of observations from this model, and finally, it removes the samples that do not obey certain criteria. We compare MCCE with a range of state-of-the-art on-manifold counterfactual methods using four well-known data sets and show that MCCE outperforms these methods on all common performance metrics and speed. In particular, including the decision in the modeling process improves the efficiency of the method substantially.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

We use ‘counterfactual explanation’ or ‘CE’ to refer to the literature or explanation type and ‘counterfactual’ or ‘example’ to refer to the instance produced.

A decision is derived from a prediction, \(f(\varvec{x})\), using a pre-defined cutoff value or interval c, characterizing the desired decision. For example, if \(f(\varvec{x}) = 0.39\) and \(c = (0.5, 1]\), then since \(f(\varvec{x}) \notin c\), we give instance \(\varvec{x}\) a decision of 0 and say \(\varvec{x}\) has received an undesirable decision.

In Borisov et al. (2023) the FICO dataset is referred to as the HELOC dataset.

Antorán J, Bhatt U, Adel T et al (2021) Getting a clue: a method for explaining uncertainty estimates. In: International Conference on Learning Representations

Borisov V, Seffler K, Leemann T et al (2023) Language models are realistic tabular data generators. In: Proceedings of ICLR 2023

Breiman L, Friedman J, Olshen R et al (1984) Classification and regression trees. Chapman and Hall, Boca Raton

Brughmans D, Leyman P, Martens D (2023) NICE: an algorithm for nearest instance counterfactual explanations. Data Mining Knowl Discov pp 1–39

Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp 785–794

Chi CM, Vossler P, Fan Y et al (2022) Asymptotic properties of high-dimensional random forests. Ann Stat 50(6):3415–3438MathSciNetCrossRef

Dandl S, Molnar C, Binder M et al (2020) Multi-objective counterfactual explanations. In: International conference on parallel problem solving from nature, Springer, pp 448–469

Dhurandhar A, Chen PY, Luss R et al (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Proceedings of the 32nd International conference on neural information processing systems, pp 590–601

Downs M, Chu JL, Yacoby Y et al (2020) CRUDS: Counterfactual recourse using disentangled subspaces. In: ICML Workshop on human interpretability in machine learning

Drechsler J, Reiter JP (2011) An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput Stat Data Anal 55(12):3232–3243MathSciNetCrossRef

Dwork C (2006) Differential privacy. In: 33rd international colloquium automata, languages and programming, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33, Springer, pp 1–12

Germain M, Gregor K, Murray I et al (2015) MADE: Masked autoencoder for distribution estimation. In: International conference on machine learning, PMLR, pp 881–889

Goethals S, Sörensen K, Martens D (2022) The privacy issue of counterfactual explanations: explanation linkage attacks. arXiv preprint arXiv:2210.12051

Gomez O, Holter S, Yuan J et al (2020) Vice: Visual counterfactual explanations for machine learning models. In: Proceedings of the 25th International conference on intelligent user interfaces. association for computing machinery, New York, NY, USA, IUI ’20, pp 531–535

Guidotti R (2022) Counterfactual explanations and how to find them: literature review and benchmarking. Data Min Knowl Discov pp 1–55

Géron A (2019) Hands-on machine learning with Scikit-learn, Keras, and tensor flow, 2nd edn. O’Reilly Media Inc, Sebastopol

Hastie T, Tibshirani R, Friedman JH et al (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, ChamCrossRef

Joshi S, Koyejo O, Vijitbenjaronk W et al (2019) Towards realistic individual recourse and actionable explanations in black-box decision making systems. Safe Machine Learning workshop at ICLR

Karimi AH, Barthe G, Balle B et al (2020) Model-agnostic counterfactual explanations for consequential decisions. In: International conference on artificial intelligence and statistics, PMLR, pp 895–905

Karimi AH, Barthe G, Schölkopf B et al (2022) A survey of algorithmic recourse: contrastive explanations and consequential recommendations. ACM Comput Surv 55(5):1–29CrossRef

Keane MT, Smyth B (2020) Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable AI (XAI). In: 28th International conference case-based reasoning research and development, ICCBR 2020, Salamanca, Spain, June 8–12, 2020, Proceedings 28, Springer, pp 163–178

Laugel T, Lesot MJ, Marsala C et al (2018) Comparison-based inverse classification for interpretability in machine learning. In: International conference on information processing and management of uncertainty in knowledge-based systems, Springer, pp 100–111

Mahiou S, Xu K, Ganev G (2022) DPART: Differentially private autoregressive tabular, a general framework for synthetic data generation. arXiv preprint arXiv:2207.05810

Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784

Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 conference on fairness, accountability, and transparency, pp 607–617

Nowok B, Raab GM, Dibben C et al (2016) SYNTHPOP: Bespoke creation of synthetic data in R. J Stat Softw 74(11):1–26CrossRef

Pawelczyk M, Broelemann K, Kasneci G (2020) Learning model-agnostic counterfactual explanations for tabular data. Proc Web Conf 2020:3126–3132

Pawelczyk M, Bielawski S, Van den Heuvel J et al (2021) Carla: A python library to benchmark algorithmic recourse and counterfactual explanation algorithms. arXiv preprint arXiv:2108.00783

Pawelczyk M, Lakkaraju H, Neel S (2023) On the privacy risks of algorithmic recourse. In: International conference on artificial intelligence and statistics, PMLR, pp 9680–9696

Poyiadzi R, Sokol K, Santos-Rodriguez R et al (2020) Face: Feasible and actionable counterfactual explanations. In: Proceedings of the AAAI/ACM conference on AI, ethics, and society, pp 344–350

Rasouli P, Chieh Yu I (2022) CARE: Coherent actionable recourse based on sound counterfactual explanations. Int J Data Sci Anal pp 1–26

Reiter JP (2005) Using CART to generate partially synthetic public use microdata. J Offl Stat 21(3):441

Scornet E, Biau G, Vert JP (2015) Consistency of random forests. Ann Stat 43(4):1716–1741MathSciNetCrossRef

Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231

Stepin I, Alonso JM, Catala A et al (2021) A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9:11,974-12,001CrossRef

Tolomei G, Silvestri F, Haines A et al (2017) Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 465–474

Ustun B, Spangher A, Liu Y (2019) Actionable recourse in linear classification. In: Proceedings of the conference on fairness, accountability, and transparency, pp 10–19

Verma S, Dickerson JP, Hines K (2021) Counterfactual explanations for machine learning: challenges revisited. CoRR arXiv:abs/2106.07756

Wachter S, Mittelstadt B, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv JL Tech 31:841

Wexler J, Pushkarna M, Bolukbasi T et al (2020) The what-if tool: interactive probing of machine learning models. IEEE Trans Vis Comput Graph 26(1):56–65. https://doi.org/10.1109/TVCG.2019.2934619CrossRef

Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34MathSciNetCrossRef

Xu L, Skoularidou M, Cuesta-Infante A et al (2019) Modeling tabular data using conditional GAN. Adv Neural Inf Process Syst 32

Title: MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data
Authors: Annabelle Redelmeier
Martin Jullum
Kjersti Aas
Anders Løland
Publication date: 22-03-2024
Publisher: Springer US
Published in: Data Mining and Knowledge Discovery
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-024-01017-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner