Skip to main content
Top

22-03-2024

MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data

Authors: Annabelle Redelmeier, Martin Jullum, Kjersti Aas, Anders Løland

Published in: Data Mining and Knowledge Discovery

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We introduce MCCE: \({{{\underline{\varvec{M}}}}}\)onte \({{{\underline{\varvec{C}}}}}\)arlo sampling of valid and realistic \({{{\underline{\varvec{C}}}}}\)ounterfactual \({{{\underline{\varvec{E}}}}}\)xplanations for tabular data, a novel counterfactual explanation method that generates on-manifold, actionable and valid counterfactuals by modeling the joint distribution of the mutable features given the immutable features and the decision. Unlike other on-manifold methods that tend to rely on variational autoencoders and have strict prediction model and data requirements, MCCE handles any type of prediction model and categorical features with more than two levels. MCCE first models the joint distribution of the features and the decision with an autoregressive generative model where the conditionals are estimated using decision trees. Then, it samples a large set of observations from this model, and finally, it removes the samples that do not obey certain criteria. We compare MCCE with a range of state-of-the-art on-manifold counterfactual methods using four well-known data sets and show that MCCE outperforms these methods on all common performance metrics and speed. In particular, including the decision in the modeling process improves the efficiency of the method substantially.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
We use ‘counterfactual explanation’ or ‘CE’ to refer to the literature or explanation type and ‘counterfactual’ or ‘example’ to refer to the instance produced.
 
2
A decision is derived from a prediction, \(f(\varvec{x})\), using a pre-defined cutoff value or interval c, characterizing the desired decision. For example, if \(f(\varvec{x}) = 0.39\) and \(c = (0.5, 1]\), then since \(f(\varvec{x}) \notin c\), we give instance \(\varvec{x}\) a decision of 0 and say \(\varvec{x}\) has received an undesirable decision.
 
3
In Borisov et al. (2023) the FICO dataset is referred to as the HELOC dataset.
 
Literature
go back to reference Antorán J, Bhatt U, Adel T et al (2021) Getting a clue: a method for explaining uncertainty estimates. In: International Conference on Learning Representations Antorán J, Bhatt U, Adel T et al (2021) Getting a clue: a method for explaining uncertainty estimates. In: International Conference on Learning Representations
go back to reference Borisov V, Seffler K, Leemann T et al (2023) Language models are realistic tabular data generators. In: Proceedings of ICLR 2023 Borisov V, Seffler K, Leemann T et al (2023) Language models are realistic tabular data generators. In: Proceedings of ICLR 2023
go back to reference Breiman L, Friedman J, Olshen R et al (1984) Classification and regression trees. Chapman and Hall, Boca Raton Breiman L, Friedman J, Olshen R et al (1984) Classification and regression trees. Chapman and Hall, Boca Raton
go back to reference Brughmans D, Leyman P, Martens D (2023) NICE: an algorithm for nearest instance counterfactual explanations. Data Mining Knowl Discov pp 1–39 Brughmans D, Leyman P, Martens D (2023) NICE: an algorithm for nearest instance counterfactual explanations. Data Mining Knowl Discov pp 1–39
go back to reference Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp 785–794 Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp 785–794
go back to reference Chi CM, Vossler P, Fan Y et al (2022) Asymptotic properties of high-dimensional random forests. Ann Stat 50(6):3415–3438MathSciNetCrossRef Chi CM, Vossler P, Fan Y et al (2022) Asymptotic properties of high-dimensional random forests. Ann Stat 50(6):3415–3438MathSciNetCrossRef
go back to reference Dandl S, Molnar C, Binder M et al (2020) Multi-objective counterfactual explanations. In: International conference on parallel problem solving from nature, Springer, pp 448–469 Dandl S, Molnar C, Binder M et al (2020) Multi-objective counterfactual explanations. In: International conference on parallel problem solving from nature, Springer, pp 448–469
go back to reference Dhurandhar A, Chen PY, Luss R et al (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Proceedings of the 32nd International conference on neural information processing systems, pp 590–601 Dhurandhar A, Chen PY, Luss R et al (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Proceedings of the 32nd International conference on neural information processing systems, pp 590–601
go back to reference Downs M, Chu JL, Yacoby Y et al (2020) CRUDS: Counterfactual recourse using disentangled subspaces. In: ICML Workshop on human interpretability in machine learning Downs M, Chu JL, Yacoby Y et al (2020) CRUDS: Counterfactual recourse using disentangled subspaces. In: ICML Workshop on human interpretability in machine learning
go back to reference Drechsler J, Reiter JP (2011) An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput Stat Data Anal 55(12):3232–3243MathSciNetCrossRef Drechsler J, Reiter JP (2011) An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput Stat Data Anal 55(12):3232–3243MathSciNetCrossRef
go back to reference Dwork C (2006) Differential privacy. In: 33rd international colloquium automata, languages and programming, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33, Springer, pp 1–12 Dwork C (2006) Differential privacy. In: 33rd international colloquium automata, languages and programming, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33, Springer, pp 1–12
go back to reference Germain M, Gregor K, Murray I et al (2015) MADE: Masked autoencoder for distribution estimation. In: International conference on machine learning, PMLR, pp 881–889 Germain M, Gregor K, Murray I et al (2015) MADE: Masked autoencoder for distribution estimation. In: International conference on machine learning, PMLR, pp 881–889
go back to reference Goethals S, Sörensen K, Martens D (2022) The privacy issue of counterfactual explanations: explanation linkage attacks. arXiv preprint arXiv:2210.12051 Goethals S, Sörensen K, Martens D (2022) The privacy issue of counterfactual explanations: explanation linkage attacks. arXiv preprint arXiv:​2210.​12051
go back to reference Gomez O, Holter S, Yuan J et al (2020) Vice: Visual counterfactual explanations for machine learning models. In: Proceedings of the 25th International conference on intelligent user interfaces. association for computing machinery, New York, NY, USA, IUI ’20, pp 531–535 Gomez O, Holter S, Yuan J et al (2020) Vice: Visual counterfactual explanations for machine learning models. In: Proceedings of the 25th International conference on intelligent user interfaces. association for computing machinery, New York, NY, USA, IUI ’20, pp 531–535
go back to reference Guidotti R (2022) Counterfactual explanations and how to find them: literature review and benchmarking. Data Min Knowl Discov pp 1–55 Guidotti R (2022) Counterfactual explanations and how to find them: literature review and benchmarking. Data Min Knowl Discov pp 1–55
go back to reference Géron A (2019) Hands-on machine learning with Scikit-learn, Keras, and tensor flow, 2nd edn. O’Reilly Media Inc, Sebastopol Géron A (2019) Hands-on machine learning with Scikit-learn, Keras, and tensor flow, 2nd edn. O’Reilly Media Inc, Sebastopol
go back to reference Hastie T, Tibshirani R, Friedman JH et al (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, ChamCrossRef Hastie T, Tibshirani R, Friedman JH et al (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, ChamCrossRef
go back to reference Joshi S, Koyejo O, Vijitbenjaronk W et al (2019) Towards realistic individual recourse and actionable explanations in black-box decision making systems. Safe Machine Learning workshop at ICLR Joshi S, Koyejo O, Vijitbenjaronk W et al (2019) Towards realistic individual recourse and actionable explanations in black-box decision making systems. Safe Machine Learning workshop at ICLR
go back to reference Karimi AH, Barthe G, Balle B et al (2020) Model-agnostic counterfactual explanations for consequential decisions. In: International conference on artificial intelligence and statistics, PMLR, pp 895–905 Karimi AH, Barthe G, Balle B et al (2020) Model-agnostic counterfactual explanations for consequential decisions. In: International conference on artificial intelligence and statistics, PMLR, pp 895–905
go back to reference Karimi AH, Barthe G, Schölkopf B et al (2022) A survey of algorithmic recourse: contrastive explanations and consequential recommendations. ACM Comput Surv 55(5):1–29CrossRef Karimi AH, Barthe G, Schölkopf B et al (2022) A survey of algorithmic recourse: contrastive explanations and consequential recommendations. ACM Comput Surv 55(5):1–29CrossRef
go back to reference Keane MT, Smyth B (2020) Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable AI (XAI). In: 28th International conference case-based reasoning research and development, ICCBR 2020, Salamanca, Spain, June 8–12, 2020, Proceedings 28, Springer, pp 163–178 Keane MT, Smyth B (2020) Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable AI (XAI). In: 28th International conference case-based reasoning research and development, ICCBR 2020, Salamanca, Spain, June 8–12, 2020, Proceedings 28, Springer, pp 163–178
go back to reference Laugel T, Lesot MJ, Marsala C et al (2018) Comparison-based inverse classification for interpretability in machine learning. In: International conference on information processing and management of uncertainty in knowledge-based systems, Springer, pp 100–111 Laugel T, Lesot MJ, Marsala C et al (2018) Comparison-based inverse classification for interpretability in machine learning. In: International conference on information processing and management of uncertainty in knowledge-based systems, Springer, pp 100–111
go back to reference Mahiou S, Xu K, Ganev G (2022) DPART: Differentially private autoregressive tabular, a general framework for synthetic data generation. arXiv preprint arXiv:2207.05810 Mahiou S, Xu K, Ganev G (2022) DPART: Differentially private autoregressive tabular, a general framework for synthetic data generation. arXiv preprint arXiv:​2207.​05810
go back to reference Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 conference on fairness, accountability, and transparency, pp 607–617 Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 conference on fairness, accountability, and transparency, pp 607–617
go back to reference Nowok B, Raab GM, Dibben C et al (2016) SYNTHPOP: Bespoke creation of synthetic data in R. J Stat Softw 74(11):1–26CrossRef Nowok B, Raab GM, Dibben C et al (2016) SYNTHPOP: Bespoke creation of synthetic data in R. J Stat Softw 74(11):1–26CrossRef
go back to reference Pawelczyk M, Broelemann K, Kasneci G (2020) Learning model-agnostic counterfactual explanations for tabular data. Proc Web Conf 2020:3126–3132 Pawelczyk M, Broelemann K, Kasneci G (2020) Learning model-agnostic counterfactual explanations for tabular data. Proc Web Conf 2020:3126–3132
go back to reference Pawelczyk M, Bielawski S, Van den Heuvel J et al (2021) Carla: A python library to benchmark algorithmic recourse and counterfactual explanation algorithms. arXiv preprint arXiv:2108.00783 Pawelczyk M, Bielawski S, Van den Heuvel J et al (2021) Carla: A python library to benchmark algorithmic recourse and counterfactual explanation algorithms. arXiv preprint arXiv:​2108.​00783
go back to reference Pawelczyk M, Lakkaraju H, Neel S (2023) On the privacy risks of algorithmic recourse. In: International conference on artificial intelligence and statistics, PMLR, pp 9680–9696 Pawelczyk M, Lakkaraju H, Neel S (2023) On the privacy risks of algorithmic recourse. In: International conference on artificial intelligence and statistics, PMLR, pp 9680–9696
go back to reference Poyiadzi R, Sokol K, Santos-Rodriguez R et al (2020) Face: Feasible and actionable counterfactual explanations. In: Proceedings of the AAAI/ACM conference on AI, ethics, and society, pp 344–350 Poyiadzi R, Sokol K, Santos-Rodriguez R et al (2020) Face: Feasible and actionable counterfactual explanations. In: Proceedings of the AAAI/ACM conference on AI, ethics, and society, pp 344–350
go back to reference Rasouli P, Chieh Yu I (2022) CARE: Coherent actionable recourse based on sound counterfactual explanations. Int J Data Sci Anal pp 1–26 Rasouli P, Chieh Yu I (2022) CARE: Coherent actionable recourse based on sound counterfactual explanations. Int J Data Sci Anal pp 1–26
go back to reference Reiter JP (2005) Using CART to generate partially synthetic public use microdata. J Offl Stat 21(3):441 Reiter JP (2005) Using CART to generate partially synthetic public use microdata. J Offl Stat 21(3):441
go back to reference Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231 Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231
go back to reference Stepin I, Alonso JM, Catala A et al (2021) A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9:11,974-12,001CrossRef Stepin I, Alonso JM, Catala A et al (2021) A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9:11,974-12,001CrossRef
go back to reference Tolomei G, Silvestri F, Haines A et al (2017) Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 465–474 Tolomei G, Silvestri F, Haines A et al (2017) Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 465–474
go back to reference Ustun B, Spangher A, Liu Y (2019) Actionable recourse in linear classification. In: Proceedings of the conference on fairness, accountability, and transparency, pp 10–19 Ustun B, Spangher A, Liu Y (2019) Actionable recourse in linear classification. In: Proceedings of the conference on fairness, accountability, and transparency, pp 10–19
go back to reference Wachter S, Mittelstadt B, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv JL Tech 31:841 Wachter S, Mittelstadt B, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv JL Tech 31:841
go back to reference Xu L, Skoularidou M, Cuesta-Infante A et al (2019) Modeling tabular data using conditional GAN. Adv Neural Inf Process Syst 32 Xu L, Skoularidou M, Cuesta-Infante A et al (2019) Modeling tabular data using conditional GAN. Adv Neural Inf Process Syst 32
Metadata
Title
MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data
Authors
Annabelle Redelmeier
Martin Jullum
Kjersti Aas
Anders Løland
Publication date
22-03-2024
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-024-01017-y

Premium Partner