skip to main content
10.1145/3490099.3511119acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open Access

Investigating Explainability of Generative AI for Code through Scenario-based Design

Published:22 March 2022Publication History

ABSTRACT

What does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in helping people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering. Using scenario-based design and question-driven XAI design approaches, we explore users’ explainability needs for GenAI in three software engineering use cases: natural language to code, code translation, and code auto-completion. We conducted 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit users’ explainability needs. Drawing from prior work, we also propose 4 types of XAI features for GenAI for code and gathered additional design ideas from participants. Our work explores explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.

References

  1. Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6(2018), 52138–52160.Google ScholarGoogle ScholarCross RefCross Ref
  2. Mayank Agarwal, Kartik Talamadupula, Stephanie Houde, Fernando Martinez, Michael J. Muller, John T. Richards, Steven Ross, and Justin D. Weisz. 2020. Quality Estimation & Interpretability for Code Translation. ArXiv abs/2012.07581(2020).Google ScholarGoogle Scholar
  3. Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2655–2668. https://doi.org/10.18653/v1/2021.naacl-main.211Google ScholarGoogle ScholarCross RefCross Ref
  4. Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. Ai Magazine 35, 4 (2014), 105–120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cecilia Aragon, Shion Guha, Marina Kogan, Michael Muller, and Gina Neff. 2022. Human-Centered Data Science: An Introduction. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  7. Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Neff, Wanli Xing, and Joseph Bayer. 2016. Developing a research agenda for human-centered data science. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. 529–535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Matthew Arnold, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilović, Ravi Nair, K Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, 2019. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development 63, 4/5 (2019), 6–1.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, 2021. Program Synthesis with Large Language Models. arXiv preprint arXiv:2108.07732(2021).Google ScholarGoogle Scholar
  10. Umang Bhatt, Javier Antorán, Yunfeng Zhang, Q Vera Liao, Prasanna Sattigeri, Riccardo Fogliato, Gabrielle Melançon, Ranganath Krishnan, Jason Stanley, Omesh Tickoo, 2021. Uncertainty as a form of transparency: Measuring, communicating, and using uncertainty. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 401–413.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 648–657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI interprets the probes. In Proceedings of the SIGCHI conference on Human factors in computing systems. 1077–1086.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. ArXiv abs/2005.14165(2020).Google ScholarGoogle Scholar
  14. Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of KDD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374 [cs.LG]Google ScholarGoogle Scholar
  16. Tian Qi Chen, Xuechen Li, Roger B. Grosse, and David Kristjanson Duvenaud. 2018. Isolating Sources of Disentanglement in Variational Autoencoders. In NeurIPS.Google ScholarGoogle Scholar
  17. Premkumar Devanbu. 2015. New initiative: The naturalness of software. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 543–546.Google ScholarGoogle ScholarCross RefCross Ref
  18. Shipi Dhanorkar, Christine T Wolf, Kun Qian, Anbang Xu, Lucian Popa, and Yunyao Li. 2021. Who needs to know what, when?: Broadening the Explainable AI (XAI) Design Space by Looking at Explanations Across the AI Lifecycle. In Designing Interactive Systems Conference 2021. 1591–1602.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Upol Ehsan, Q Vera Liao, Michael Muller, Mark O Riedl, and Justin D Weisz. 2021. Expanding explainability: Towards social transparency in ai systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Upol Ehsan and Mark Riedl. 2021. Explainability Pitfalls: Beyond Dark Patterns in Explainable AI - paper at HCAI@NeurIPS2021 workshop on human centered AI. https://sites.google.com/view/hcai-human-centered-ai-neurips/home Accessed January 19, 2022.Google ScholarGoogle Scholar
  21. Upol Ehsan and Mark O Riedl. 2020. Human-centered explainable ai: Towards a reflective sociotechnical approach. In International Conference on Human-Computer Interaction. Springer, 449–466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Upol Ehsan, Philipp Wintersberger, Q Vera Liao, Martina Mara, Marc Streit, Sandra Wachter, Andreas Riener, and Mark O Riedl. 2021. Operationalizing Human-Centered Perspectives in Explainable AI. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–6.Google ScholarGoogle Scholar
  23. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155(2020).Google ScholarGoogle Scholar
  24. Bill Gaver, Tony Dunne, and Elena Pacenti. 1999. Design: Cultural Probes. Interactions 6, 1 (jan 1999), 21–29. https://doi.org/10.1145/291224.291235Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Werner Geyer, Lydia B Chilton, Justin D Weisz, and Mary Lou Maher. 2021. HAI-GEN 2021: 2nd Workshop on Human-AI Co-Creation with Generative Models. In 26th International Conference on Intelligent User Interfaces. 15–17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Soumya Ghosh, Q Vera Liao, Karthikeyan Natesan Ramamurthy, Jiri Navratil, Prasanna Sattigeri, Kush R Varshney, and Yunfeng Zhang. 2021. Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI. arXiv preprint arXiv:2106.01410(2021).Google ScholarGoogle Scholar
  27. Github. 2021. Copilot. Retrieved 03-August-2021 from https://copilot.github.comGoogle ScholarGoogle Scholar
  28. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51, 5 (2018), 1–42.Google ScholarGoogle Scholar
  30. Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366(2020).Google ScholarGoogle Scholar
  31. Shunan Guo, Fan Du, Sana Malik, Eunyee Koh, Sungchul Kim, Zhicheng Liu, Donghyun Kim, Hongyuan Zha, and Nan Cao. 2019. Visualizing uncertainty and alternatives in event sequence predictions. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Aaron Halfaker and R Stuart Geiger. 2020. Ores: Lowering barriers with participatory machine learning in wikipedia. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2(2020), 1–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, 2021. Measuring Coding Challenge Competence With APPS. arXiv preprint arXiv:2105.09938(2021).Google ScholarGoogle Scholar
  34. Denis J Hilton. 1990. Conversational processes and causal explanation.Psychological Bulletin 107, 1 (1990), 65.Google ScholarGoogle Scholar
  35. M. Hind, Stephanie Houde, Jacquelyn Martino, A. Mojsilovic, David Piorkowski, John T. Richards, and K. Varshney. 2020. Experiences with Improving the Transparency of AI Models and Services. Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (2020).Google ScholarGoogle Scholar
  36. M. Hind, S. Mehta, A. Mojsilovic, R. Nair, K. Ramamurthy, Alexandra Olteanu, and K. Varshney. 2019. Increasing Trust in AI Services through Supplier’s Declarations of Conformity. IBM J. Res. Dev. 63(2019), 6:1–6:13.Google ScholarGoogle Scholar
  37. Abram Hindle, Earl T Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Commun. ACM 59, 5 (2016), 122–131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 159–166. https://doi.org/10.1145/302979.303030Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2021. Code prediction by feeding trees to transformers. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 150–162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Bran Knowles and John T. Richards. 2021. The Sanction of Authority: Promoting Public Trust in AI. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Marina Kogan, Aaron Halfaker, Shion Guha, Cecilia Aragon, Michael Muller, and Stuart Geiger. 2020. Mapping Out Human-Centered Data Science: Methods, Approaches, and Best Practices. In Companion of the 2020 ACM International Conference on Supporting Group Work. 151–156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sandeep Kaur Kuttal, Jarow Myers, Sam Gurka, David Magar, David Piorkowski, and Rachel Bellamy. 2020. Towards designing conversational agents for pair programming: Accounting for creativity strategies and conversational styles. In 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 1–11.Google ScholarGoogle ScholarCross RefCross Ref
  43. Sandeep Kaur Kuttal, Bali Ong, Kate Kwasny, and Peter Robe. 2021. Trade-offs for Substituting a Human with an Agent in a Pair Programming Context: The Good, the Bad, and the Ugly. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1675–1684. https://doi.org/10.1145/2939672.2939874Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Min Kyung Lee, Nina Grgić-Hlača, Michael Carl Tschantz, Reuben Binns, Adrian Weller, Michelle Carney, and Kori Inkpen. 2020. Human-centered approaches to fair and responsible AI. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Min Kyung Lee, Daniel Kusbit, Anson Kahng, Ji Tae Kim, Xinran Yuan, Allissa Chan, Daniel See, Ritesh Noothigattu, Siheon Lee, Alexandros Psomas, 2019. WeBuildAI: Participatory framework for algorithmic governance. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Tao Lei, Regina Barzilay, and T. Jaakkola. 2016. Rationalizing Neural Predictions. In EMNLP.Google ScholarGoogle Scholar
  48. Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: informing design practices for explainable AI user experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Q Vera Liao and Michael Muller. 2019. Enabling Value Sensitive AI Systems through Participatory Design Fictions. arXiv preprint arXiv:1912.07381(2019).Google ScholarGoogle Scholar
  50. Q Vera Liao, Milena Pribić, Jaesik Han, Sarah Miller, and Daby Sow. 2021. Question-Driven Design Process for Explainable AI User Experiences. arXiv preprint arXiv:2104.03483(2021).Google ScholarGoogle Scholar
  51. Q Vera Liao, Moninder Singh, Yunfeng Zhang, and Rachel Bellamy. 2021. Introduction to explainable ai. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Q Vera Liao and Kush R Varshney. 2021. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. arXiv preprint arXiv:2110.10790(2021).Google ScholarGoogle Scholar
  53. Brian Y Lim and Anind K Dey. 2010. Toolkit to support intelligibility in context-aware applications. In Proceedings of the 12th ACM international conference on Ubiquitous computing. 13–22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Brian Y Lim, Anind K Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI conference on human factors in computing systems. 2119–2128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris B. Kotsiantis. 2021. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 23(2021).Google ScholarGoogle Scholar
  56. Zachary C Lipton. 2018. The mythos of model interpretability. Queue 16, 3 (2018), 31–57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2021. What Makes Good In-Context Examples for GPT-3?arXiv preprint arXiv:2101.06804(2021).Google ScholarGoogle Scholar
  58. Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586(2021).Google ScholarGoogle Scholar
  59. Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J Cai. 2020. Novice-AI music co-creation via AI-steering tools for deep generative models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ryan Louie, Any Cohen, Cheng-Zhi Anna Huang, Michael Terry, and Carrie J Cai. 2020. Cococo: AI-Steering Tools for Music Novices Co-Creating with Generative Models.. In HAI-GEN+ user2agent@ IUI.Google ScholarGoogle Scholar
  61. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. ArXiv abs/2102.04664(2021).Google ScholarGoogle Scholar
  62. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems. 4768–4777.Google ScholarGoogle Scholar
  63. Cade Metz. 2021. A.I. Can Now Write Its Own Computer Code. That’s Good News for Humans.The New York Times (9 September 2021). https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.htmlGoogle ScholarGoogle Scholar
  64. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Michael Muller, Plamen Angelov, Shion Guha, Marina Kogan, Gina Neff, Nuria Oliver, Manuel Gomez Rodriquez, and Adrian Weller. 2021. HCAI@NeurIPS2021: Human Centered AI workshop at NeurIPS 2021. https://sites.google.com/view/hcai-human-centered-ai-neurips/home Accessed January 17, 2022.Google ScholarGoogle Scholar
  66. Michael Muller, Cecilia Aragon, Shion Guha, Marina Kogan, Gina Neff, Cathrine Seidelin, Katie Shilton, and Anissa Tanweer. 2020. Interrogating Data Science. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. 467–473.Google ScholarGoogle Scholar
  67. Michael Muller, Melanie Feinberg, Timothy George, Steven J Jackson, Bonnie E John, Mary Beth Kery, and Samir Passi. 2019. Human-centered study of data science work practices. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1–8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Michael Muller and Q Vera Liao. [n.d.]. Exploring AI Ethics and Values through Participatory Design Fictions. ([n. d.]).Google ScholarGoogle Scholar
  69. Michael Muller, April Y. Wang, Steven I. Ross, Justin D. Weisz, Mayank Agarwal, Kartik Talamadupula, Stephanie Houde, Fernando Martinez, John Richards, Jaimie Drozdal, Xie Lui, David Piorkowski, and Dakuo Wang. 2021. How data scientists improve generated code documentation in Jupyter notebooks. Retrieved October 5, 2021 from https://hai-gen2021.github.io/program/Google ScholarGoogle Scholar
  70. Michael Muller, Christine T Wolf, Josh Andres, Michael Desmond, Narendra Nath Joshi, Zahra Ashktorab, Aabhas Sharma, Kristina Brimijoin, Qian Pan, Evelyn Duesterwald, 2021. Designing Ground Truth and the Social Life of Labels. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N Nguyen. 2014. Migrating code with statistical machine translation. In Companion Proceedings of the 36th International Conference on Software Engineering. 544–547.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 574–584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Andrés Páez. 2019. The pragmatic turn in explainable artificial intelligence (XAI). Minds and Machines 29, 3 (2019), 441–459.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Raja Parasuraman, Thomas B Sheridan, and Christopher D Wickens. 2000. A model for types and levels of human interaction with automation. IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans 30, 3 (2000), 286–297.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. David Piorkowski, D. Gonz’alez, John T. Richards, and Stephanie Houde. 2020. Towards evaluating and eliciting high-quality documentation for intelligent systems. ArXiv abs/2011.08774(2020).Google ScholarGoogle Scholar
  76. David Piorkowski, Soya Park, A. Wang, Dakuo Wang, Michael J. Muller, and Felix Portnoy. 2021. How AI Developers Overcome Communication Challenges in a Multidisciplinary Team. Proceedings of the ACM on Human-Computer Interaction 5 (2021), 1 – 25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Ruchi Puri, D. Kung, G. Janssen, Wei Zhang, Giacomo Domeniconi, Vladmir Zolotov, Julian Dolby, Jie Chen, M. Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, and Ulrich Finkler. 2021. Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. ArXiv abs/2105.12655(2021).Google ScholarGoogle Scholar
  78. Inioluwa Deborah Raji and Jingying Yang. 2019. About ml: Annotation and benchmarking on understanding and transparency of machine learning lifecycles. arXiv preprint arXiv:1912.06166(2019).Google ScholarGoogle Scholar
  79. Sahil Barjtya Ankur Sharma Usha Rani. 2017. A detailed study of Software Development Life Cycle (SDLC) Models. International Journal of Engineering and Computer Science 6 (2017).Google ScholarGoogle Scholar
  80. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of KDD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. John Richards, David Piorkowski, Michael Hind, Stephanie Houde, and Aleksandra Mojsilović. 2020. A Methodology for Creating AI FactSheets. arXiv preprint arXiv:2006.13796(2020).Google ScholarGoogle Scholar
  82. John T. Richards, David Piorkowski, M. Hind, Stephanie Houde, and Aleksandra Mojsilovi’c. 2020. A Methodology for Creating AI FactSheets. ArXiv abs/2006.13796(2020).Google ScholarGoogle Scholar
  83. Karl Ridgeway. 2016. A Survey of Inductive Biases for Factorial Representation-Learning. ArXiv abs/1612.05299(2016).Google ScholarGoogle Scholar
  84. Karl Ridgeway and Michael C. Mozer. 2018. Learning Deep Disentangled Embeddings with the F-Statistic Loss. In NeurIPS.Google ScholarGoogle Scholar
  85. Mark O Riedl. 2019. Human-centered artificial intelligence and machine learning. Human Behavior and Emerging Technologies 1, 1 (2019), 33–36.Google ScholarGoogle ScholarCross RefCross Ref
  86. Andrew Ross, Nina Chen, Elisa Zhao Hang, Elena L Glassman, and Finale Doshi-Velez. 2021. Evaluating the Interpretability of Generative Models by Interactive Reconstruction. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Mary Beth Rosson and John M Carroll. 2009. Scenario-based design. In Human-computer interaction. CRC Press, 161–180.Google ScholarGoogle Scholar
  88. Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised Translation of Programming Languages.. In NeurIPS.Google ScholarGoogle Scholar
  89. Ben Shneiderman. 2020. Bridging the gap between ethics and practice: Guidelines for reliable, safe, and trustworthy Human-Centered AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 10, 4(2020), 1–31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. H Colleen Stuart, Laura Dabbish, Sara Kiesler, Peter Kinnaird, and Ruogu Kang. 2012. Social transparency in networked information exchange: a theoretical framework. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. 451–460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Hariharan Subramonyam, Colleen Seifert, and Eytan Adar. 2021. Towards A Process Model for Co-Creating AI Experiences. arXiv preprint arXiv:2104.07595(2021).Google ScholarGoogle Scholar
  92. Kartik Talamadupula. 2021. Applied AI Matters - AI4Code: Applying Artificial Intelligence to Source Code. Association for Computing Machinery (ACM) Special Interest Group on AI (SIGAI) AI Matters 7(2021). Issue 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers. arXiv preprint arXiv:2009.05617(2020).Google ScholarGoogle Scholar
  94. Jennifer Wortman Vaughan and Hanna Wallach. 2020. A human-centered agenda for intelligible machine learning. Machines We Trust: Getting Along with Artificial Intelligence (2020).Google ScholarGoogle Scholar
  95. Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In ACL.Google ScholarGoogle Scholar
  96. Jesse Vig and Yonatan Belinkov. 2019. Analyzing the Structure of Attention in a Transformer Language Model. In BlackboxNLP@ACL.Google ScholarGoogle Scholar
  97. Donald Martin Vinodkumar Prabhakaran Jr. 2020. Participatory Machine Learning Using Community-Based System Dynamics. Health and Human Rights 22, 2 (2020), 71.Google ScholarGoogle Scholar
  98. Abhishek Wadhwani and Priyank Jain. 2020. Machine Learning Model Cards Transparency Review: Using model card toolkit. In 2020 IEEE Pune Section International Conference (PuneCon). IEEE, 133–137.Google ScholarGoogle ScholarCross RefCross Ref
  99. Yue Wang, Weishi Wang, Shafiq R. Joty, and S. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation.Google ScholarGoogle Scholar
  100. Justin D Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces. 402–412.Google ScholarGoogle Scholar
  101. Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not explanation. arXiv preprint arXiv:1908.04626(2019).Google ScholarGoogle Scholar
  102. Christine T Wolf. 2019. Explainability scenarios: towards scenario-based XAI design. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 252–257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Enhao Zhang and Nikola Banovic. 2021. Method for Exploring Generative Adversarial Networks (GANs) via Automatically Generated Image Galleries. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 2921–2929.Google ScholarGoogle Scholar
  105. Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-sensitive algorithm design: Method, case study, and lessons. Proceedings of the ACM on Human-Computer Interaction 2, CSCW(2018), 1–23.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Investigating Explainability of Generative AI for Code through Scenario-based Design
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces
          March 2022
          888 pages
          ISBN:9781450391443
          DOI:10.1145/3490099

          Copyright © 2022 Owner/Author

          This work is licensed under a Creative Commons Attribution International 4.0 License.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 March 2022

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate746of2,811submissions,27%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format