research-article

Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning

Authors:
Harmanpreet Kaur

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

,
Harsha Nori

Microsoft Research, Seattle, WA, USA

Microsoft Research, Seattle, WA, USA
View Profile

,
Samuel Jenkins

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Rich Caruana

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

,
Hanna Wallach

Microsoft Research, New York City, NY, USA

Microsoft Research, New York City, NY, USA
View Profile

,
Jennifer Wortman Vaughan

Microsoft Research, New York, NY, USA

Microsoft Research, New York, NY, USA
View Profile

CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing SystemsApril 2020Pages 1–14https://doi.org/10.1145/3313831.3376219

Published:23 April 2020Publication History

CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

Pages 1–14

ABSTRACT

Machine learning (ML) models are now routinely deployed in domains ranging from criminal justice to healthcare. With this newfound ubiquity, ML has moved beyond academia and grown into an engineering discipline. To that end, interpretability tools have been designed to help data scientists and machine learning practitioners better understand how ML models work. However, there has been little evaluation of the extent to which these tools achieve this goal. We study data scientists' use of two existing interpretability tools, the InterpretML implementation of GAMs and the SHAP Python package. We conduct a contextual inquiry (N=11) and a survey (N=197) of data scientists to observe how they use interpretability tools to uncover common issues that arise when building and evaluating ML models. Our results indicate that data scientists over-trust and misuse interpretability tools. Furthermore, few of our participants were able to accurately describe the visualizations output by these tools. We highlight qualitative themes for data scientists' mental models of interpretability tools. We conclude with implications for researchers and tool designers, and contextualize our findings in the social science literature.

Supplemental Material

a92-kaur-presentation.mp4

mp4

36.2 MB

Download

Available for Download

zip

pn2060aux.zip (1.8 MB)

The auxiliary material consists of a zip file containing eight pdfs. The pdfs included are: (1) interview protocol for pilot interviews with data scientists; (2) tutorial for Generalized Additive Models (GAMs) used for our contextual inquiry; (3) tutorial for SHAP used for our contextual inquiry; (4) questions about the dataset and model that were asked during our contextual inquiry; (5) survey protocol; (6) introduction to the dataset and model, and tutorial for GAMs used for our survey; (7) introduction to the dataset and model, and tutorial for SHAP used for our survey; and (8) a figure representing the percentage of participants with low, neutral, and high deployment scores for the model used in their condition.

References

Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI'18). ACM, NY, NY, USA, Article 582, 18 pages. DOI: http://dx.doi.org/10.1145/3173574.3174156Google ScholarDigital Library
David Alvarez-Melis, Hal Daumé, III, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Weight of Evidence as a Basis for Human-Oriented Explanations. arXiv preprint arXiv:1910.13503 (2019).Google Scholar
Saleema Amershi, James Fogarty, and Daniel Weld. 2012. Regroup: Interactive Machine Learning for On-demand Group Creation in Social Networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). ACM, NY, NY, USA, 21--30. DOI: http://dx.doi.org/10.1145/2207676.2207680Google ScholarDigital Library
Julia Angwin, Jeff Larson, Surya Mattu, and Kirchner Lauren. 2016. Machine Bias: There's software used across the country to predict future criminals. And it's biased against blacks. ProPublica, May 23 (2016), 2016. http://www.propublica.org/article/machine-bias-risk -assessments-in-criminal-sentencingGoogle Scholar
Dean C Barnlund. 2017. A transactional model of communication. In Communication theory, Second edition, C. David Mortensen (Ed.). Routledge, 47--57.Google Scholar
Victoria Bellotti and Keith Edwards. 2001. Intelligibility and Accountability: Human Considerations in Context-Aware Systems. Human--Computer Interaction 16, 2--4 (2001), 193--212. DOI: http://dx.doi.org/10.1207/S15327051HCI16234_05Google ScholarDigital Library
Virginia Braun and Victoria Clarke. 2012. Thematic analysis. In APA handbook of research methods in psychology, Vol 2: Research designs: Quantitative, qualitative, neuropsychological, and biological. American Psychological Association, Washington, DC, US, 57--71. DOI: http://dx.doi.org/10.1037/13620-004Google ScholarCross Ref
Carrie J. Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. "Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 104 (Nov. 2019), 24 pages. DOI: http://dx.doi.org/10.1145/3359206Google ScholarDigital Library
Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183--186. DOI: http://dx.doi.org/10.1126/science.aal4230Google ScholarCross Ref
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). ACM, NY, NY, USA, 1721--1730. DOI: http://dx.doi.org/10.1145/2783258.2788613Google ScholarDigital Library
Herbert H Clark, Robert Schreuder, and Samuel Buttrick. 1983. Common ground at the understanding of demonstrative reference. Journal of verbal learning and verbal behavior 22, 2 (1983), 245--258.Google ScholarCross Ref
Juliet M. Corbin and Anselm Strauss. 1990. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative sociology 13, 1 (1990), 3--21. DOI: http://dx.doi.org/10.1007/BF00988593Google ScholarCross Ref
Janez Demsar, Blaz Zupan, Gregor Leban, and Tomaz Curk. 2004. Orange: From Experimental Machine Learning to Interactive Data Mining. In Knowledge Discovery in Databases: PKDD 2004, Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 537--539. DOI: http://dx.doi.org/10.1007/978--3--540--30116--5_58Google ScholarCross Ref
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).Google Scholar
Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O'Brien, Stuart Schieber, James Waldo, David Weinberger, and Alexandra Wood. 2017. Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134 (2017).Google Scholar
Paul Dourish. 2016. Algorithms and their others: Algorithmic culture in context. Big Data & Society 3, 2 (2016), 2053951716665128. DOI: http://dx.doi.org/10.1177/2053951716665128Google ScholarCross Ref
Mary T Dzindolet, Hall P Beck, Linda G Pierce, and Lloyd A Dawe. 2001. A framework of automation use. Technical Report. Army Research Lab Aberdeen Proving Ground MD.Google Scholar
Jerry Alan Fails and Dan R. Olsen, Jr. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces (IUI '03). ACM, NY, NY, USA, 39--45. DOI: http://dx.doi.org/10.1145/604045.604056Google ScholarDigital Library
Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 80--89. DOI: http://dx.doi.org/10.1109/dsaa.2018.00018Google ScholarCross Ref
Herbert P. Grice. 1975. Logic and Conversation. (1975), 41--58. DOI: http://dx.doi.org/10.1163/9789004368811_003Google ScholarCross Ref
Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139--183. DOI: http://dx.doi.org/10.1016/S0166--4115(08)62386--9Google ScholarCross Ref
Trevor Hastie and Robert Tibshirani. 1987. Generalized Additive Models: Some Applications. J. Amer. Statist. Assoc. 82, 398 (1987), 371--386. DOI: http://dx.doi.org/10.1080/01621459.1987.10478440Google ScholarCross Ref
Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M. Drucker. 2019. Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, NY, NY, USA, Article 579, 13 pages. DOI: http://dx.doi.org/10.1145/3290605.3300809Google ScholarDigital Library
Jiun-Yin Jian, Ann M. Bisantz, and Colin G. Drury. 2000. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics 4, 1 (2000), 53--71. DOI: http://dx.doi.org/10.1207/S15327566IJCE0401_04Google ScholarCross Ref
Jongbin Jung, Connor Concannon, Ravi Shroff, Sharad Goel, and Daniel G Goldstein. 2017. Simple rules for complex decisions. Available at SSRN 2919024 (2017). http://dx.doi.org/10.2139/ssrn.2919024Google ScholarCross Ref
Mayank Kabra, Alice A Robie, Marta Rivera-Alba, Steven Branson, and Kristin Branson. 2013. JAABA: interactive machine learning for automatic annotation of animal behavior. Nature methods 10, 1 (2013), 64--67. DOI: http://dx.doi.org/10.1038/nmeth.2281Google ScholarCross Ref
Daniel Kahneman. 2011. Thinking, fast and slow. Macmillan.Google Scholar
Daniel Kahneman, Stewart Paul Slovic, Paul Slovic, and Amos Tversky. 1982. Judgment under uncertainty: Heuristics and biases. Cambridge university press.Google Scholar
Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, NY, NY, USA, 3819--3828. DOI: http://dx.doi.org/10.1145/2702123.2702520Google ScholarDigital Library
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3146--3154. http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdfGoogle ScholarDigital Library
Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! Criticism for Interpretability. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 2280--2288. http://papers.nips.cc/paper/6300-examples-are-not-enough-learn-to-criticize-criticism-for-interpretability.pdfGoogle Scholar
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. 2018. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, Stockholmsmässan, Stockholm Sweden, 2668--2677. http://proceedings.mlr.press/v80/kim18d.htmlGoogle Scholar
Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI?: Exploring Designs for Adjusting End-user Expectations of AI Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, NY, NY, USA, Article 411, 14 pages. DOI: http://dx.doi.org/10.1145/3290605.3300641Google ScholarDigital Library
Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of Explanatory Debugging to Personalize Interactive Machine Learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI '15). ACM, NY, NY, USA, 126--137. DOI: http://dx.doi.org/10.1145/2678025.2701399Google ScholarDigital Library
Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell Me More?: The Effects of Mental Model Soundness on Personalizing an Intelligent Agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '12). ACM, NY, NY, USA, 1--10. DOI: http://dx.doi.org/10.1145/2207676.2207678Google ScholarDigital Library
Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Sam Gershman, Been Kim, and Finale Doshi-Velez. 2019. Human Evaluation of Models Built for Interpretability. In AAAI Conference on Human Computation and Crowdsourcing (HCOMP).Google Scholar
Himabindu Lakkaraju and Osbert Bastani. 2020. " How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the 2020 AAAI/ACM Conference on AI, Ethics, and Society (AIES '20). ACM, NY, NY, USA.Google ScholarDigital Library
David B. Leake. 1991. Goal-based explanation evaluation. Cognitive Science 15, 4 (1991), 509--545. DOI: http://dx.doi.org/10.1016/0364-0213(91)80017-YGoogle ScholarCross Ref
Zachary C Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016).Google Scholar
Stine Lomborg and Patrick Heiberg Kapsch. 2019. Decoding algorithms. Media, Culture & Society (2019). DOI: http://dx.doi.org/10.1177/0163443719855301Google ScholarCross Ref
Tania Lombrozo. 2006. The structure and function of explanations. Trends in Cognitive Sciences 10, 10 (2006), 464--470. DOI: http://dx.doi.org/10.1016/j.tics.2006.08.004Google ScholarCross Ref
Alexandra L' heureux, Katarina Grolinger, Hany F Elyamany, and Miriam AM Capretz. 2017. Machine learning with big data: Challenges and approaches. IEEE Access 5 (2017), 7776--7797. DOI: http://dx.doi.org/10.1109/ACCESS.2017.2696365Google ScholarCross Ref
Scott M Lundberg, Gabriel G Erion, and Su-In Lee. 2018. Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888 (2018).Google Scholar
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach -to-interpreting-model-predictions.pdfGoogle Scholar
Prashan Madumal, Tim Miller, Frank Vetere, and Liz Sonenberg. 2018. Towards a Grounded Dialog Model for Explainable Artificial Intelligence. In First international workshop on socio-cognitive systems at IJCAI 2018. https://arxiv.org/abs/1806.08055Google Scholar
Bertram F Malle. 2006. How the mind explains behavior: Folk explanations, meaning, and social interaction. Mit Press.Google Scholar
Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence (2018).Google Scholar
Tim Miller, Piers Howe, and Liz Sonenberg. 2017. Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. In IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI).Google Scholar
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, NY, NY, USA, Article 126, 15 pages. DOI: http://dx.doi.org/10.1145/3290605.3300356Google ScholarDigital Library
Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. 2019. InterpretML: A Unified Framework for Machine Learning Interpretability. arXiv preprint arXiv:1909.09223 (2019).Google Scholar
Donald A Norman. 2014. Some observations on mental models. In Mental models. Psychology Press, 15--22.Google Scholar
Raja Parasuraman, Robert Molloy, and Indramani L. Singh. 1993. Performance Consequences of Automation-Induced 'Complacency'. The International Journal of Aviation Psychology 3, 1 (1993), 1--23. DOI: http://dx.doi.org/10.1207/s15327108ijap0301_1Google ScholarCross Ref
Kayur Patel, James Fogarty, James A. Landay, and Beverly Harrison. 2008. Investigating Statistical Machine Learning As a Tool for Software Development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '08). ACM, NY, NY, USA, 667--676. DOI: http://dx.doi.org/10.1145/1357054.1357160Google ScholarDigital Library
Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810 (2018).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, NY, NY, USA, 1135--1144. DOI: http://dx.doi.org/10.1145/2939672.2939778Google ScholarDigital Library
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206--215. DOI: http://dx.doi.org/10.1038/s42256-019-0048-xGoogle ScholarCross Ref
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 618--626.Google ScholarCross Ref
Lloyd S Shapley. 1997. A value for n-person games. Classics in game theory (1997), 69.Google Scholar
Ben R Slugoski, Mansur Lalljee, Roger Lamb, and Gerald P Ginsburg. 1993. Attribution in conversational context: Effect of mutual knowledge on explanation-giving. European Journal of Social Psychology 23, 3 (1993), 219--238. DOI: http://dx.doi.org/10.1002/ejsp.2420230302Google ScholarCross Ref
Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. 2009. Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies 67, 8 (2009), 639--662. DOI: http://dx.doi.org/10.1016/j.ijhcs.2009.03.004Google ScholarDigital Library
Sarah Tan, Rich Caruana, Giles Hooker, Paul Koch, and Albert Gordo. 2018. Learning global additive explanations for neural nets using model distillation. arXiv preprint arXiv:1801.08640 (2018).Google Scholar
Richard Tomsett, Dave Braines, Dan Harborne, Alun Preece, and Supriyo Chakraborty. 2018. Interpretable to whom? A role-based model for analyzing interpretable machine learning systems. arXiv preprint arXiv:1806.07552 (2018).Google Scholar
Byron C. Wallace, Kevin Small, Carla E. Brodley, Joseph Lau, and Thomas A. Trikalinos. 2012. Deploying an Interactive Machine Learning System in an Evidence-based Practice Center: Abstrackr. In Proceedings of the 2Nd ACM SIGHIT International Health Informatics Symposium (IHI '12). ACM, NY, NY, USA, 819--824. DOI: http://dx.doi.org/10.1145/2110363.2110464Google ScholarDigital Library
Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y. Lim. 2019. Designing Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, NY, NY, USA, Article 601, 15 pages. DOI: http://dx.doi.org/10.1145/3290605.3300831Google ScholarDigital Library
Daniel S. Weld and Gagan Bansal. 2019. The Challenge of Crafting Intelligible Intelligence. Commun. ACM 62, 6 (May 2019), 70--79. DOI: http://dx.doi.org/10.1145/3282486Google ScholarDigital Library
Qian Yang, Nikola Banovic, and John Zimmerman. 2018a. Mapping Machine Learning Advances from HCI Research to Reveal Starting Places for Design Innovation. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI'18). ACM, NY, NY, USA, Article 130, 11 pages. DOI: http://dx.doi.org/10.1145/3173574.3173704Google ScholarDigital Library
Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. 2018b. Investigating How Experienced UX Designers Effectively Work with Machine Learning. In Proceedings of the 2018 Designing Interactive Systems Conference (DIS '18). ACM, NY, NY, USA, 585--596. DOI: http://dx.doi.org/10.1145/3196709.3196730Google ScholarDigital Library
Jiaming Zeng, Berk Ustun, and Cynthia Rudin. 2017. Interpretable classification models for recidivism prediction. Journal of the Royal Statistical Society: Series A (Statistics in Society) 180, 3 (2017), 689--722. DOI: http://dx.doi.org/10.1111/rssa.12227Google ScholarCross Ref
Jichen Zhu, Antonios Liapis, Sebastian Risi, Rafael Bidarra, and G. Michael Youngblood. 2018. Explainable AI for designers: A human-centered perspective on mixed-initiative co-creation. In 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1--8. DOI: http://dx.doi.org/10.1109/CIG.2018.8490433Google ScholarDigital Library

Index Terms

Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning
1. Computing methodologies
  1. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User studies

Recommendations

Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs
CSCW

As the use of machine learning (ML) models in product development and data-driven decision-making processes became pervasive in many domains, people's focus on building a well-performing model has increasingly shifted to understanding how their model ...
Read More
AutoML and Interpretability: Powering the Machine Learning Revolution in Healthcare
FODS '20: Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference

An AutoML and interpretability are both fundamental to the successful uptake of machine learning by non-expert end users. The former will lower barriers to entry and unlock potent new capabilities that are out of reach when working with ad-hoc models, ...
Read More
From Discovery to Adoption: Understanding the ML Practitioners’ Interpretability Journey
DIS '23: Proceedings of the 2023 ACM Designing Interactive Systems Conference

Models are interpretable when machine learning (ML) practitioners can readily understand the reasoning behind their predictions. Ironically, little is known about the ML practitioners’ experience of discovering and adopting novel interpretability ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
April 2020
10688 pages
ISBN:9781450367080
DOI:10.1145/3313831
General Chairs:
Regina Bernhaupt
Eindhoven University of Technology, Netherlands
,
Florian 'Floyd' Mueller
Monash University, Australia
,
David Verweij
Newcastle University, UK
,
Josh Andres
RMIT, Australia
,
Program Chairs:
Joanna McGrenere
University of British Columbia, Canada
,
Andy Cockburn
University of Canterbury, New Zealand
,
Ignacio Avellino
University of Maryland Baltimore County, USA
,
Alix Goguey
Grenoble Alpes University, France
,
Pernille Bjørn
University of Copenhagen, Denmark
,
Shengdong (Shen) Zhao
National University of Singapore, Singapore
,
Briane Paul Samson
Future University Hakodate, Japan & De La Salle University, Philippines
,
Rafal Kocielnik
University of Washington, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 April 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Honorable Mention
Author Tags
interpretability
machine learning
user-centric evaluation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 214
  Total Citations
  View Citations
- 5,519
  Total Downloads
- Downloads (Last 12 months)1,160
- Downloads (Last 6 weeks)167
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning

CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs

AutoML and Interpretability: Powering the Machine Learning Revolution in Healthcare

From Discovery to Adoption: Understanding the ML Practitioners’ Interpretability Journey