research-article

Open Access

Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

Authors:
Carrie J. Cai

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Emily Reif

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Narayan Hegde

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Jason Hipp

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Been Kim

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Daniel Smilkov

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Martin Wattenberg

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Fernanda Viegas

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Greg S. Corrado

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Martin C. Stumpe

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

,
Michael Terry

Google Brain, Mountain View, CA, USA

Google Brain, Mountain View, CA, USA
View Profile

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsMay 2019Paper No.: 4Pages 1–14https://doi.org/10.1145/3290605.3300234

Published:02 May 2019Publication History

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

Pages 1–14

ABSTRACT

Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs. In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm, and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these tools increased the diagnostic utility of images found and increased user trust in the algorithm. The tools were preferred over a traditional interface, without a loss in diagnostic accuracy. We also observed that users adopted new strategies when using refinement tools, re-purposing them to test and understand the underlying algorithm and to disambiguate ML errors from their own errors. Taken together, these findings inform future human-ML collaborative systems for expert decision-making.

References

Ceyhun Burak Akgül, Daniel L Rubin, Sandy Napel, Christopher F Beaulieu, Hayit Greenspan, and Burak Acar. 2011. Content-based image retrieval in radiology: current status and future directions. Journal of Digital Imaging 24, 2 (2011), 208--222.Google ScholarCross Ref
Guillaume Alain and Yoshua Bengio. 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 (2016).Google Scholar
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105--120.Google ScholarDigital Library
Saleema Amershi, James Fogarty, and Daniel Weld. 2012. Regroup: Interactive machine learning for on-demand group creation in social networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 21--30. Google ScholarDigital Library
Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. 2015. From generic to specific deep representations for visual recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 36--45.Google ScholarCross Ref
David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network Dissection: Quantifying Interpretability of Deep Visual Representations. In Computer Vision and Pattern Recognition.Google Scholar
Eta S Berner. 2007. Clinical decision support systems. Vol. 233. Springer.Google Scholar
Marshal A Blatt, Michael C Higgins, Keith I Marton, and HC Sox Jr. 1988. Medical decision making.Google Scholar
Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The Effects of Example-Based Explanations in a Machine Learning Interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM. Google ScholarDigital Library
Matthew Chalmers and Ian MacColl. 2003. Seamful and seamless design in ubiquitous computing. In Workshop at the crossroads: The interaction of HCI and systems issues in UbiComp, Vol. 8.Google Scholar
Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2008. Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Comput. Surv. 40, 2, Article 5 (May 2008), 60 pages. Google ScholarDigital Library
Scott Doyle, Mark Hwang, Kinsuk Shah, Anant Madabhushi, Michael Feldman, and John Tomaszeweski. 2007. Automated grading of prostate cancer using architectural and textural image features. In Biomedical imaging: from nano to macro, 2007. ISBI 2007. 4th IEEE international symposium on. IEEE, 1284--1287.Google Scholar
Jesse Engel, Matthew Hoffman, and Adam Roberts. 2017. Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models. Computing Research Repository abs/1711.05772 (2017).Google Scholar
Jonathan I Epstein, Michael J Zelefsky, Daniel D Sjoberg, Joel B Nelson, Lars Egevad, Cristina Magi-Galluzzi, Andrew J Vickers, Anil V Parwani, Victor E Reuter, Samson W Fine, et al. 2016. A contemporary prostate cancer grading system: a validated alternative to the Gleason score. European urology 69, 3 (2016), 428--435.Google Scholar
Motahhare Eslami, Karrie Karahalios, Christian Sandvig, Kristen Vaccaro, Aimee Rickman, Kevin Hamilton, and Alex Kirlik. 2016. First i like it, then i hide it: Folk theories of social feeds. In Proceedings of the 2016 cHI conference on human factors in computing systems. ACM, 2371--2382. Google ScholarDigital Library
Jerry Alan Fails and Dan R Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 39--45.Google ScholarDigital Library
Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al. 1995. Query by image and video content: The QBIC system. computer 28, 9 (1995), 23--32. Google ScholarDigital Library
James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the sigchi conference on human factors in computing systems. ACM, 29--38. Google ScholarDigital Library
Amit X Garg, Neill KJ Adhikari, Heather McDonald, M Patricia RosasArellano, PJ Devereaux, Joseph Beyene, Justina Sam, and R Brian Haynes. 2005. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. Jama 293, 10 (2005), 1223--1238.Google ScholarCross Ref
Sandra G Hart and Lowell E Staveland. 1988. Development of NASATLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139--183.Google Scholar
Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors 57, 3 (2015), 407--434.Google ScholarCross Ref
Avi Kak and Christina Pavlopoulou. 2002. Content-based image retrieval from large medical databases. In 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on. IEEE, 138--147.Google Scholar
Saif Khairat, David Marc, William Crosby, and Ali Al Sanousi. 2018. Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis. JMIR medical informatics 6, 2 (2018).Google Scholar
Been Kim, Elena Glassman, Brittney Johnson, and Julie Shah. 2015. iBCM: Interactive Bayesian Case Model Empowering Humans via Intuitive Interaction. (2015).Google Scholar
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International Conference on Machine Learning. 2673--2682.Google Scholar
René F Kizilcec. 2016. How much information?: Effects of transparency on trust in an algorithmic interface. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2390--2395. Google ScholarDigital Library
Ajay Kohli and Saurabh Jha. 2018. Why CAD failed in mammography. Journal of the American College of Radiology 15, 3 (2018), 535--537.Google ScholarCross Ref
Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3075--3084. Google ScholarDigital Library
Roger C Mayer, James H Davis, and F David Schoorman. 1995. An integrative model of organizational trust. Academy of management review 20, 3 (1995), 709--734.Google ScholarCross Ref
Neville Mehta, Raja Alomari, and Vipin Chaudhary. 2009. Content based sub-image retrieval system for high resolution pathology images using salient interest points. 2009 (09 2009), 3719--22.Google Scholar
B Middleton, DF Sittig, and A Wright. 2016. Clinical decision support: a 25 year retrospective and a 25 year vision. Yearbook of medical informatics 25, S 01 (2016), S103--S116.Google Scholar
Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In HLT-NAACL.Google Scholar
Clara Mosquera-Lopez, Sos Agaian, Alejandro Velez-Hoyos, and Ian Thompson. 2015. Computer-aided prostate cancer diagnosis from digitized histopathology: a review on texture-based systems. IEEE reviews in biomedical engineering 8 (2015), 98--113.Google Scholar
Henning Müller, Nicolas Michoux, David Bandon, and Antoine Geissbuhler. 2004. A review of content-based image retrieval systems in medical applications clinical benefits and future directions. International journal of medical informatics 73, 1 (2004), 1--23.Google ScholarCross Ref
Carlton Wayne Niblack, Ron Barber, Will Equitz, Myron D Flickner, Eduardo H Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. 1993. QBIC project: querying images by content, using color, texture, and shape. In Storage and retrieval for image and video databases, Vol. 1908. International Society for Optics and Photonics, 173--188.Google Scholar
Raymond S Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of general psychology 2, 2 (1998), 175.Google Scholar
Jerome A Osheroff, Jonathan M Teich, Blackford Middleton, Elaine B Steen, Adam Wright, and Don E Detmer. 2007. A roadmap for national action on clinical decision support. Journal of the American medical informatics association 14, 2 (2007), 141--145.Google ScholarCross Ref
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1135--1144. Google ScholarDigital Library
Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813. Google ScholarDigital Library
Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, et al. 2017. Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 (2017).Google Scholar
Judah ES Sklan, Andrew J Plassard, Daniel Fabbri, and Bennett A Landman. 2015. Toward content-based image retrieval with deep convolutional neural networks. In Medical Imaging 2015: Biomedical Applications in Molecular, Structural, and Functional Imaging, Vol. 9417. International Society for Optics and Photonics, 94172C.Google Scholar
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (Dec 2000), 1349--1380. Google ScholarDigital Library
Kristen Vaccaro, Dylan Huang, Motahhare Eslami, Christian Sandvig, Kevin Hamilton, and Karrie Karahalios. 2018. The Illusion of Control: Placebo Effects of Control Settings. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 16. Google ScholarDigital Library
Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep Learning for ContentBased Image Retrieval: A Comprehensive Study. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM '14). ACM, New York, NY, USA, 157--166. Google ScholarDigital Library
Qian Yang, John Zimmerman, Aaron Steinfeld, Lisa Carey, and James F Antaki. 2016. Investigating the heart pump implant decision process: opportunities for decision support tools to help. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4477--4488. Google ScholarDigital Library
Mussarat Yasmin, Sajjad Mohsin, and Muhammad Sharif. 2014. Intelligent Image Retrieval Techniques: A Survey. Journal of Applied Research and Technology 12, 1 (2014), 87 -- 103.Google ScholarCross Ref
HongJiang Zhang and Zhong Su. 2002. Relevance feedback in CBIR. In Visual and Multimedia Information Management. Springer, 21--35. Google ScholarDigital Library
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv preprint arXiv:1703.10593 (2017).Google Scholar

Index Terms

Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Medical decision making using vector space model
IHI '10: Proceedings of the 1st ACM International Health Informatics Symposium

This paper addresses the task of analyzing healthcare data for medical decision making. We describe a method for ranking medications based on historical data of the outcomes recorded as part of a system of Electronic Medical Records (EMR). Medication ...
Read More
Human-Centred Machine Learning
CHI EA '16: Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems

Machine learning is one of the most important and successful techniques in contemporary computer science. It involves the statistical inference of models (such as classifiers) from data. It is often conceived in a very impersonal way, with algorithms ...
Read More
Medical informatics: clinical decision making and beyond

Does Medical Informatics encompass all aspects of computing in health care, or is it limited to information processing in clinical medicine? A panel discussion will present several points of view. This paper advocates a unified view of Medical ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
May 2019
9077 pages
ISBN:9781450359702
DOI:10.1145/3290605
General Chairs:
Stephen Brewster
University of Glasgow, Scotland, UK
,
Geraldine Fitzpatrick
TU Wien, Austria
,
Program Chairs:
Anna Cox
University College London, UK
,
Vassilis Kostakos
University of Melbourne, Australia
Copyright © 2019 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 May 2019
Check for updates
Badges
- Honorable Mention
Author Tags
clinical health
human-ai interaction
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '19 Paper Acceptance Rate703of2,958submissions,24%Overall Acceptance Rate6,199of26,314submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 209
  Total Citations
  View Citations
- 9,993
  Total Downloads
- Downloads (Last 12 months)1,956
- Downloads (Last 6 weeks)223
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Medical decision making using vector space model

Human-Centred Machine Learning

Medical informatics: clinical decision making and beyond