skip to main content
10.1145/3375627.3375833acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article
Public Access

"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations

Published:07 February 2020Publication History

ABSTRACT

As machine learning black boxes are increasingly being deployed in critical domains such as healthcare and criminal justice, there has been a growing emphasis on developing techniques for explaining these black boxes in a human interpretable manner. There has been recent concern that a high-fidelity explanation of a black box ML model may not accurately reflect the biases in the black box. As a consequence, explanations have the potential to mislead human users into trusting a problematic black box. In this work, we rigorously explore the notion of misleading explanations and how they influence user trust in black box models. Specifically, we propose a novel theoretical framework for understanding and generating misleading explanations, and carry out a user study with domain experts to demonstrate how these explanations can be used to mislead users. Our work is the first to empirically establish how user trust in black box models can be manipulated via misleading explanations.

References

  1. Rakesh Agrawal and Ramakrishnan Srikant. 2004. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 487--499.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Osbert Bastani, Carolyn Kim, and Hamsa Bastani. 2017. Interpretability via model extraction. arXiv preprint arXiv:1706.09773 (2017).Google ScholarGoogle Scholar
  3. Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. In Knowledge Discovery and Data Mining (KDD) .Google ScholarGoogle Scholar
  4. Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel. 2019. Explanations can be manipulated and geometry is to blame. arXiv preprint arXiv:1906.07983 (2019).Google ScholarGoogle Scholar
  5. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).Google ScholarGoogle Scholar
  6. Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681--3688.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Samir Khuller, Anna Moss, and Joseph Seffi Naor. 1999. The budgeted maximum coverage problem. Inform. Process. Lett. , Vol. 70, 1 (1999), 39--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Carolyn Kim and Osbert Bastani. 2019. Learning Interpretable Models with Causal Guarantees. arXiv preprint arXiv:1901.08576 (2019).Google ScholarGoogle Scholar
  9. Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. Human decisions and machine predictions. The quarterly journal of economics , Vol. 133, 1 (2017), 237--293.Google ScholarGoogle ScholarCross RefCross Ref
  10. Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2019. An evaluation of the human-interpretability of explanation. arXiv preprint arXiv:1902.00006 (2019).Google ScholarGoogle Scholar
  11. Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) . 1675--1684.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2019. Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 131--138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. 2009. Non-monotone submodular maximization under matroid and knapsack constraints. In Proceedings of the ACM Symposium on Theory of Computing (STOC). 323--332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and David Madigan. 2015. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics (2015).Google ScholarGoogle Scholar
  15. Zachary C Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016).Google ScholarGoogle Scholar
  16. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.Google ScholarGoogle Scholar
  17. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Knowledge Discovery and Data Mining (KDD) .Google ScholarGoogle Scholar
  18. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Thirty-Second AAAI Conference on Artificial Intelligence .Google ScholarGoogle Scholar
  19. Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence , Vol. 1, 5 (2019), 206.Google ScholarGoogle ScholarCross RefCross Ref
  20. Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GPDR. Harv. JL & Tech. , Vol. 31 (2017), 841.Google ScholarGoogle Scholar
  21. Qingyuan Zhao and Trevor Hastie. 2019. Causal interpretations of black-box models. Journal of Business & Economic Statistics just-accepted (2019), 1--19.Google ScholarGoogle Scholar

Index Terms

  1. "How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        AIES '20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
        February 2020
        439 pages
        ISBN:9781450371100
        DOI:10.1145/3375627

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 February 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate61of162submissions,38%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader