skip to main content
10.1145/3290605.3300234acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open Access
Honorable Mention

Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

Published:02 May 2019Publication History

ABSTRACT

Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs. In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm, and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these tools increased the diagnostic utility of images found and increased user trust in the algorithm. The tools were preferred over a traditional interface, without a loss in diagnostic accuracy. We also observed that users adopted new strategies when using refinement tools, re-purposing them to test and understand the underlying algorithm and to disambiguate ML errors from their own errors. Taken together, these findings inform future human-ML collaborative systems for expert decision-making.

References

  1. Ceyhun Burak Akgül, Daniel L Rubin, Sandy Napel, Christopher F Beaulieu, Hayit Greenspan, and Burak Acar. 2011. Content-based image retrieval in radiology: current status and future directions. Journal of Digital Imaging 24, 2 (2011), 208--222.Google ScholarGoogle ScholarCross RefCross Ref
  2. Guillaume Alain and Yoshua Bengio. 2016. Understanding intermediate layers using linear classifier probes. arXiv preprint arXiv:1610.01644 (2016).Google ScholarGoogle Scholar
  3. Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105--120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Saleema Amershi, James Fogarty, and Daniel Weld. 2012. Regroup: Interactive machine learning for on-demand group creation in social networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hossein Azizpour, Ali Sharif Razavian, Josephine Sullivan, Atsuto Maki, and Stefan Carlsson. 2015. From generic to specific deep representations for visual recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 36--45.Google ScholarGoogle ScholarCross RefCross Ref
  6. David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Network Dissection: Quantifying Interpretability of Deep Visual Representations. In Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  7. Eta S Berner. 2007. Clinical decision support systems. Vol. 233. Springer.Google ScholarGoogle Scholar
  8. Marshal A Blatt, Michael C Higgins, Keith I Marton, and HC Sox Jr. 1988. Medical decision making.Google ScholarGoogle Scholar
  9. Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The Effects of Example-Based Explanations in a Machine Learning Interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Matthew Chalmers and Ian MacColl. 2003. Seamful and seamless design in ubiquitous computing. In Workshop at the crossroads: The interaction of HCI and systems issues in UbiComp, Vol. 8.Google ScholarGoogle Scholar
  11. Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2008. Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Comput. Surv. 40, 2, Article 5 (May 2008), 60 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Scott Doyle, Mark Hwang, Kinsuk Shah, Anant Madabhushi, Michael Feldman, and John Tomaszeweski. 2007. Automated grading of prostate cancer using architectural and textural image features. In Biomedical imaging: from nano to macro, 2007. ISBI 2007. 4th IEEE international symposium on. IEEE, 1284--1287.Google ScholarGoogle Scholar
  13. Jesse Engel, Matthew Hoffman, and Adam Roberts. 2017. Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models. Computing Research Repository abs/1711.05772 (2017).Google ScholarGoogle Scholar
  14. Jonathan I Epstein, Michael J Zelefsky, Daniel D Sjoberg, Joel B Nelson, Lars Egevad, Cristina Magi-Galluzzi, Andrew J Vickers, Anil V Parwani, Victor E Reuter, Samson W Fine, et al. 2016. A contemporary prostate cancer grading system: a validated alternative to the Gleason score. European urology 69, 3 (2016), 428--435.Google ScholarGoogle Scholar
  15. Motahhare Eslami, Karrie Karahalios, Christian Sandvig, Kristen Vaccaro, Aimee Rickman, Kevin Hamilton, and Alex Kirlik. 2016. First i like it, then i hide it: Folk theories of social feeds. In Proceedings of the 2016 cHI conference on human factors in computing systems. ACM, 2371--2382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jerry Alan Fails and Dan R Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 39--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, et al. 1995. Query by image and video content: The QBIC system. computer 28, 9 (1995), 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the sigchi conference on human factors in computing systems. ACM, 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Amit X Garg, Neill KJ Adhikari, Heather McDonald, M Patricia RosasArellano, PJ Devereaux, Joseph Beyene, Justina Sam, and R Brian Haynes. 2005. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. Jama 293, 10 (2005), 1223--1238.Google ScholarGoogle ScholarCross RefCross Ref
  20. Sandra G Hart and Lowell E Staveland. 1988. Development of NASATLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139--183.Google ScholarGoogle Scholar
  21. Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors 57, 3 (2015), 407--434.Google ScholarGoogle ScholarCross RefCross Ref
  22. Avi Kak and Christina Pavlopoulou. 2002. Content-based image retrieval from large medical databases. In 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on. IEEE, 138--147.Google ScholarGoogle Scholar
  23. Saif Khairat, David Marc, William Crosby, and Ali Al Sanousi. 2018. Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis. JMIR medical informatics 6, 2 (2018).Google ScholarGoogle Scholar
  24. Been Kim, Elena Glassman, Brittney Johnson, and Julie Shah. 2015. iBCM: Interactive Bayesian Case Model Empowering Humans via Intuitive Interaction. (2015).Google ScholarGoogle Scholar
  25. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International Conference on Machine Learning. 2673--2682.Google ScholarGoogle Scholar
  26. René F Kizilcec. 2016. How much information?: Effects of transparency on trust in an algorithmic interface. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2390--2395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ajay Kohli and Saurabh Jha. 2018. Why CAD failed in mammography. Journal of the American College of Radiology 15, 3 (2018), 535--537.Google ScholarGoogle ScholarCross RefCross Ref
  28. Todd Kulesza, Saleema Amershi, Rich Caruana, Danyel Fisher, and Denis Charles. 2014. Structured labeling for facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3075--3084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Roger C Mayer, James H Davis, and F David Schoorman. 1995. An integrative model of organizational trust. Academy of management review 20, 3 (1995), 709--734.Google ScholarGoogle ScholarCross RefCross Ref
  30. Neville Mehta, Raja Alomari, and Vipin Chaudhary. 2009. Content based sub-image retrieval system for high resolution pathology images using salient interest points. 2009 (09 2009), 3719--22.Google ScholarGoogle Scholar
  31. B Middleton, DF Sittig, and A Wright. 2016. Clinical decision support: a 25 year retrospective and a 25 year vision. Yearbook of medical informatics 25, S 01 (2016), S103--S116.Google ScholarGoogle Scholar
  32. Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In HLT-NAACL.Google ScholarGoogle Scholar
  33. Clara Mosquera-Lopez, Sos Agaian, Alejandro Velez-Hoyos, and Ian Thompson. 2015. Computer-aided prostate cancer diagnosis from digitized histopathology: a review on texture-based systems. IEEE reviews in biomedical engineering 8 (2015), 98--113.Google ScholarGoogle Scholar
  34. Henning Müller, Nicolas Michoux, David Bandon, and Antoine Geissbuhler. 2004. A review of content-based image retrieval systems in medical applications clinical benefits and future directions. International journal of medical informatics 73, 1 (2004), 1--23.Google ScholarGoogle ScholarCross RefCross Ref
  35. Carlton Wayne Niblack, Ron Barber, Will Equitz, Myron D Flickner, Eduardo H Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. 1993. QBIC project: querying images by content, using color, texture, and shape. In Storage and retrieval for image and video databases, Vol. 1908. International Society for Optics and Photonics, 173--188.Google ScholarGoogle Scholar
  36. Raymond S Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of general psychology 2, 2 (1998), 175.Google ScholarGoogle Scholar
  37. Jerome A Osheroff, Jonathan M Teich, Blackford Middleton, Elaine B Steen, Adam Wright, and Don E Detmer. 2007. A roadmap for national action on clinical decision support. Journal of the American medical informatics association 14, 2 (2007), 141--145.Google ScholarGoogle ScholarCross RefCross Ref
  38. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1135--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, et al. 2017. Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 (2017).Google ScholarGoogle Scholar
  41. Judah ES Sklan, Andrew J Plassard, Daniel Fabbri, and Bennett A Landman. 2015. Toward content-based image retrieval with deep convolutional neural networks. In Medical Imaging 2015: Biomedical Applications in Molecular, Structural, and Functional Imaging, Vol. 9417. International Society for Optics and Photonics, 94172C.Google ScholarGoogle Scholar
  42. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (Dec 2000), 1349--1380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kristen Vaccaro, Dylan Huang, Motahhare Eslami, Christian Sandvig, Kevin Hamilton, and Karrie Karahalios. 2018. The Illusion of Control: Placebo Effects of Control Settings. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep Learning for ContentBased Image Retrieval: A Comprehensive Study. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM '14). ACM, New York, NY, USA, 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Qian Yang, John Zimmerman, Aaron Steinfeld, Lisa Carey, and James F Antaki. 2016. Investigating the heart pump implant decision process: opportunities for decision support tools to help. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4477--4488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Mussarat Yasmin, Sajjad Mohsin, and Muhammad Sharif. 2014. Intelligent Image Retrieval Techniques: A Survey. Journal of Applied Research and Technology 12, 1 (2014), 87 -- 103.Google ScholarGoogle ScholarCross RefCross Ref
  47. HongJiang Zhang and Zhong Su. 2002. Relevance feedback in CBIR. In Visual and Multimedia Information Management. Springer, 21--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv preprint arXiv:1703.10593 (2017).Google ScholarGoogle Scholar

Index Terms

  1. Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
      May 2019
      9077 pages
      ISBN:9781450359702
      DOI:10.1145/3290605

      Copyright © 2019 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 May 2019

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '19 Paper Acceptance Rate703of2,958submissions,24%Overall Acceptance Rate6,199of26,314submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format