research-article

Open Access

Investigating Explainability of Generative AI for Code through Scenario-based Design

Authors:
Jiao Sun

University of Southern California, United States

University of Southern California, United States
View Profile

,
Q. Vera Liao

IBM Research, United States

IBM Research, United States
View Profile

,
Michael Muller

AI Interactions, IBM Research, United States

AI Interactions, IBM Research, United States
View Profile

,
Mayank Agarwal

IBM Research, United States

IBM Research, United States
View Profile

,
Stephanie Houde

IBM Research, United States

IBM Research, United States
View Profile

,
Kartik Talamadupula

IBM Research, United States

IBM Research, United States
View Profile

,
Justin D. Weisz

IBM Research AI, United States

IBM Research AI, United States
View Profile

IUI '22: Proceedings of the 27th International Conference on Intelligent User InterfacesMarch 2022Pages 212–228https://doi.org/10.1145/3490099.3511119

Published:22 March 2022Publication History

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

Pages 212–228

ABSTRACT

What does it mean for a generative AI model to be explainable? The emergent discipline of explainable AI (XAI) has made great strides in helping people understand discriminative models. Less attention has been paid to generative models that produce artifacts, rather than decisions, as output. Meanwhile, generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering. Using scenario-based design and question-driven XAI design approaches, we explore users’ explainability needs for GenAI in three software engineering use cases: natural language to code, code translation, and code auto-completion. We conducted 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit users’ explainability needs. Drawing from prior work, we also propose 4 types of XAI features for GenAI for code and gathered additional design ideas from participants. Our work explores explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.

References

Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6(2018), 52138–52160.Google ScholarCross Ref
Mayank Agarwal, Kartik Talamadupula, Stephanie Houde, Fernando Martinez, Michael J. Muller, John T. Richards, Steven Ross, and Justin D. Weisz. 2020. Quality Estimation & Interpretability for Code Translation. ArXiv abs/2012.07581(2020).Google Scholar
Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2021. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2655–2668. https://doi.org/10.18653/v1/2021.naacl-main.211Google ScholarCross Ref
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–37.Google ScholarDigital Library
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. Ai Magazine 35, 4 (2014), 105–120.Google ScholarDigital Library
Cecilia Aragon, Shion Guha, Marina Kogan, Michael Muller, and Gina Neff. 2022. Human-Centered Data Science: An Introduction. MIT Press, Cambridge, MA.Google Scholar
Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Neff, Wanli Xing, and Joseph Bayer. 2016. Developing a research agenda for human-centered data science. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. 529–535.Google ScholarDigital Library
Matthew Arnold, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilović, Ravi Nair, K Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, 2019. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development 63, 4/5 (2019), 6–1.Google ScholarCross Ref
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, 2021. Program Synthesis with Large Language Models. arXiv preprint arXiv:2108.07732(2021).Google Scholar
Umang Bhatt, Javier Antorán, Yunfeng Zhang, Q Vera Liao, Prasanna Sattigeri, Riccardo Fogliato, Gabrielle Melançon, Ranganath Krishnan, Jason Stanley, Omesh Tickoo, 2021. Uncertainty as a form of transparency: Measuring, communicating, and using uncertainty. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 401–413.Google ScholarDigital Library
Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 648–657.Google ScholarDigital Library
Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI interprets the probes. In Proceedings of the SIGCHI conference on Human factors in computing systems. 1077–1086.Google ScholarDigital Library
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. ArXiv abs/2005.14165(2020).Google Scholar
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of KDD.Google ScholarDigital Library
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374 [cs.LG]Google Scholar
Tian Qi Chen, Xuechen Li, Roger B. Grosse, and David Kristjanson Duvenaud. 2018. Isolating Sources of Disentanglement in Variational Autoencoders. In NeurIPS.Google Scholar
Premkumar Devanbu. 2015. New initiative: The naturalness of software. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 543–546.Google ScholarCross Ref
Shipi Dhanorkar, Christine T Wolf, Kun Qian, Anbang Xu, Lucian Popa, and Yunyao Li. 2021. Who needs to know what, when?: Broadening the Explainable AI (XAI) Design Space by Looking at Explanations Across the AI Lifecycle. In Designing Interactive Systems Conference 2021. 1591–1602.Google ScholarDigital Library
Upol Ehsan, Q Vera Liao, Michael Muller, Mark O Riedl, and Justin D Weisz. 2021. Expanding explainability: Towards social transparency in ai systems. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–19.Google ScholarDigital Library
Upol Ehsan and Mark Riedl. 2021. Explainability Pitfalls: Beyond Dark Patterns in Explainable AI - paper at HCAI@NeurIPS2021 workshop on human centered AI. https://sites.google.com/view/hcai-human-centered-ai-neurips/home Accessed January 19, 2022.Google Scholar
Upol Ehsan and Mark O Riedl. 2020. Human-centered explainable ai: Towards a reflective sociotechnical approach. In International Conference on Human-Computer Interaction. Springer, 449–466.Google ScholarDigital Library
Upol Ehsan, Philipp Wintersberger, Q Vera Liao, Martina Mara, Marc Streit, Sandra Wachter, Andreas Riener, and Mark O Riedl. 2021. Operationalizing Human-Centered Perspectives in Explainable AI. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–6.Google Scholar
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155(2020).Google Scholar
Bill Gaver, Tony Dunne, and Elena Pacenti. 1999. Design: Cultural Probes. Interactions 6, 1 (jan 1999), 21–29. https://doi.org/10.1145/291224.291235Google ScholarDigital Library
Werner Geyer, Lydia B Chilton, Justin D Weisz, and Mary Lou Maher. 2021. HAI-GEN 2021: 2nd Workshop on Human-AI Co-Creation with Generative Models. In 26th International Conference on Intelligent User Interfaces. 15–17.Google ScholarDigital Library
Soumya Ghosh, Q Vera Liao, Karthikeyan Natesan Ramamurthy, Jiri Navratil, Prasanna Sattigeri, Kush R Varshney, and Yunfeng Zhang. 2021. Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI. arXiv preprint arXiv:2106.01410(2021).Google Scholar
Github. 2021. Copilot. Retrieved 03-August-2021 from https://copilot.github.comGoogle Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.Google ScholarDigital Library
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51, 5 (2018), 1–42.Google Scholar
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, 2020. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366(2020).Google Scholar
Shunan Guo, Fan Du, Sana Malik, Eunyee Koh, Sungchul Kim, Zhicheng Liu, Donghyun Kim, Hongyuan Zha, and Nan Cao. 2019. Visualizing uncertainty and alternatives in event sequence predictions. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarDigital Library
Aaron Halfaker and R Stuart Geiger. 2020. Ores: Lowering barriers with participatory machine learning in wikipedia. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2(2020), 1–37.Google ScholarDigital Library
Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, 2021. Measuring Coding Challenge Competence With APPS. arXiv preprint arXiv:2105.09938(2021).Google Scholar
Denis J Hilton. 1990. Conversational processes and causal explanation.Psychological Bulletin 107, 1 (1990), 65.Google Scholar
M. Hind, Stephanie Houde, Jacquelyn Martino, A. Mojsilovic, David Piorkowski, John T. Richards, and K. Varshney. 2020. Experiences with Improving the Transparency of AI Models and Services. Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (2020).Google Scholar
M. Hind, S. Mehta, A. Mojsilovic, R. Nair, K. Ramamurthy, Alexandra Olteanu, and K. Varshney. 2019. Increasing Trust in AI Services through Supplier’s Declarations of Conformity. IBM J. Res. Dev. 63(2019), 6:1–6:13.Google Scholar
Abram Hindle, Earl T Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Commun. ACM 59, 5 (2016), 122–131.Google ScholarDigital Library
Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 159–166. https://doi.org/10.1145/302979.303030Google ScholarDigital Library
Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2021. Code prediction by feeding trees to transformers. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 150–162.Google ScholarDigital Library
Bran Knowles and John T. Richards. 2021. The Sanction of Authority: Promoting Public Trust in AI. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021).Google ScholarDigital Library
Marina Kogan, Aaron Halfaker, Shion Guha, Cecilia Aragon, Michael Muller, and Stuart Geiger. 2020. Mapping Out Human-Centered Data Science: Methods, Approaches, and Best Practices. In Companion of the 2020 ACM International Conference on Supporting Group Work. 151–156.Google ScholarDigital Library
Sandeep Kaur Kuttal, Jarow Myers, Sam Gurka, David Magar, David Piorkowski, and Rachel Bellamy. 2020. Towards designing conversational agents for pair programming: Accounting for creativity strategies and conversational styles. In 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 1–11.Google ScholarCross Ref
Sandeep Kaur Kuttal, Bali Ong, Kate Kwasny, and Peter Robe. 2021. Trade-offs for Substituting a Human with an Agent in a Pair Programming Context: The Good, the Bad, and the Ugly. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–20.Google ScholarDigital Library
Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1675–1684. https://doi.org/10.1145/2939672.2939874Google ScholarDigital Library
Min Kyung Lee, Nina Grgić-Hlača, Michael Carl Tschantz, Reuben Binns, Adrian Weller, Michelle Carney, and Kori Inkpen. 2020. Human-centered approaches to fair and responsible AI. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–8.Google ScholarDigital Library
Min Kyung Lee, Daniel Kusbit, Anson Kahng, Ji Tae Kim, Xinran Yuan, Allissa Chan, Daniel See, Ritesh Noothigattu, Siheon Lee, Alexandros Psomas, 2019. WeBuildAI: Participatory framework for algorithmic governance. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–35.Google ScholarDigital Library
Tao Lei, Regina Barzilay, and T. Jaakkola. 2016. Rationalizing Neural Predictions. In EMNLP.Google Scholar
Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: informing design practices for explainable AI user experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarDigital Library
Q Vera Liao and Michael Muller. 2019. Enabling Value Sensitive AI Systems through Participatory Design Fictions. arXiv preprint arXiv:1912.07381(2019).Google Scholar
Q Vera Liao, Milena Pribić, Jaesik Han, Sarah Miller, and Daby Sow. 2021. Question-Driven Design Process for Explainable AI User Experiences. arXiv preprint arXiv:2104.03483(2021).Google Scholar
Q Vera Liao, Moninder Singh, Yunfeng Zhang, and Rachel Bellamy. 2021. Introduction to explainable ai. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–3.Google ScholarDigital Library
Q Vera Liao and Kush R Varshney. 2021. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. arXiv preprint arXiv:2110.10790(2021).Google Scholar
Brian Y Lim and Anind K Dey. 2010. Toolkit to support intelligibility in context-aware applications. In Proceedings of the 12th ACM international conference on Ubiquitous computing. 13–22.Google ScholarDigital Library
Brian Y Lim, Anind K Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI conference on human factors in computing systems. 2119–2128.Google ScholarDigital Library
Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris B. Kotsiantis. 2021. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 23(2021).Google Scholar
Zachary C Lipton. 2018. The mythos of model interpretability. Queue 16, 3 (2018), 31–57.Google ScholarDigital Library
Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2021. What Makes Good In-Context Examples for GPT-3?arXiv preprint arXiv:2101.06804(2021).Google Scholar
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586(2021).Google Scholar
Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J Cai. 2020. Novice-AI music co-creation via AI-steering tools for deep generative models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Ryan Louie, Any Cohen, Cheng-Zhi Anna Huang, Michael Terry, and Carrie J Cai. 2020. Cococo: AI-Steering Tools for Music Novices Co-Creating with Generative Models.. In HAI-GEN+ user2agent@ IUI.Google Scholar
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. ArXiv abs/2102.04664(2021).Google Scholar
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems. 4768–4777.Google Scholar
Cade Metz. 2021. A.I. Can Now Write Its Own Computer Code. That’s Good News for Humans.The New York Times (9 September 2021). https://www.nytimes.com/2021/09/09/technology/codex-artificial-intelligence-coding.htmlGoogle Scholar
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency (2019).Google ScholarDigital Library
Michael Muller, Plamen Angelov, Shion Guha, Marina Kogan, Gina Neff, Nuria Oliver, Manuel Gomez Rodriquez, and Adrian Weller. 2021. HCAI@NeurIPS2021: Human Centered AI workshop at NeurIPS 2021. https://sites.google.com/view/hcai-human-centered-ai-neurips/home Accessed January 17, 2022.Google Scholar
Michael Muller, Cecilia Aragon, Shion Guha, Marina Kogan, Gina Neff, Cathrine Seidelin, Katie Shilton, and Anissa Tanweer. 2020. Interrogating Data Science. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. 467–473.Google Scholar
Michael Muller, Melanie Feinberg, Timothy George, Steven J Jackson, Bonnie E John, Mary Beth Kery, and Samir Passi. 2019. Human-centered study of data science work practices. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1–8.Google ScholarDigital Library
Michael Muller and Q Vera Liao. [n.d.]. Exploring AI Ethics and Values through Participatory Design Fictions. ([n. d.]).Google Scholar
Michael Muller, April Y. Wang, Steven I. Ross, Justin D. Weisz, Mayank Agarwal, Kartik Talamadupula, Stephanie Houde, Fernando Martinez, John Richards, Jaimie Drozdal, Xie Lui, David Piorkowski, and Dakuo Wang. 2021. How data scientists improve generated code documentation in Jupyter notebooks. Retrieved October 5, 2021 from https://hai-gen2021.github.io/program/Google Scholar
Michael Muller, Christine T Wolf, Josh Andres, Michael Desmond, Narendra Nath Joshi, Zahra Ashktorab, Aabhas Sharma, Kristina Brimijoin, Qian Pan, Evelyn Duesterwald, 2021. Designing Ground Truth and the Social Life of Labels. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarDigital Library
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N Nguyen. 2014. Migrating code with statistical machine translation. In Companion Proceedings of the 36th International Conference on Software Engineering. 544–547.Google ScholarDigital Library
Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 574–584.Google ScholarDigital Library
Andrés Páez. 2019. The pragmatic turn in explainable artificial intelligence (XAI). Minds and Machines 29, 3 (2019), 441–459.Google ScholarDigital Library
Raja Parasuraman, Thomas B Sheridan, and Christopher D Wickens. 2000. A model for types and levels of human interaction with automation. IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans 30, 3 (2000), 286–297.Google ScholarDigital Library
David Piorkowski, D. Gonz’alez, John T. Richards, and Stephanie Houde. 2020. Towards evaluating and eliciting high-quality documentation for intelligent systems. ArXiv abs/2011.08774(2020).Google Scholar
David Piorkowski, Soya Park, A. Wang, Dakuo Wang, Michael J. Muller, and Felix Portnoy. 2021. How AI Developers Overcome Communication Challenges in a Multidisciplinary Team. Proceedings of the ACM on Human-Computer Interaction 5 (2021), 1 – 25.Google ScholarDigital Library
Ruchi Puri, D. Kung, G. Janssen, Wei Zhang, Giacomo Domeniconi, Vladmir Zolotov, Julian Dolby, Jie Chen, M. Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, and Ulrich Finkler. 2021. Project CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks. ArXiv abs/2105.12655(2021).Google Scholar
Inioluwa Deborah Raji and Jingying Yang. 2019. About ml: Annotation and benchmarking on understanding and transparency of machine learning lifecycles. arXiv preprint arXiv:1912.06166(2019).Google Scholar
Sahil Barjtya Ankur Sharma Usha Rani. 2017. A detailed study of Software Development Life Cycle (SDLC) Models. International Journal of Engineering and Computer Science 6 (2017).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of KDD.Google ScholarDigital Library
John Richards, David Piorkowski, Michael Hind, Stephanie Houde, and Aleksandra Mojsilović. 2020. A Methodology for Creating AI FactSheets. arXiv preprint arXiv:2006.13796(2020).Google Scholar
John T. Richards, David Piorkowski, M. Hind, Stephanie Houde, and Aleksandra Mojsilovi’c. 2020. A Methodology for Creating AI FactSheets. ArXiv abs/2006.13796(2020).Google Scholar
Karl Ridgeway. 2016. A Survey of Inductive Biases for Factorial Representation-Learning. ArXiv abs/1612.05299(2016).Google Scholar
Karl Ridgeway and Michael C. Mozer. 2018. Learning Deep Disentangled Embeddings with the F-Statistic Loss. In NeurIPS.Google Scholar
Mark O Riedl. 2019. Human-centered artificial intelligence and machine learning. Human Behavior and Emerging Technologies 1, 1 (2019), 33–36.Google ScholarCross Ref
Andrew Ross, Nina Chen, Elisa Zhao Hang, Elena L Glassman, and Finale Doshi-Velez. 2021. Evaluating the Interpretability of Generative Models by Interactive Reconstruction. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarDigital Library
Mary Beth Rosson and John M Carroll. 2009. Scenario-based design. In Human-computer interaction. CRC Press, 161–180.Google Scholar
Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised Translation of Programming Languages.. In NeurIPS.Google Scholar
Ben Shneiderman. 2020. Bridging the gap between ethics and practice: Guidelines for reliable, safe, and trustworthy Human-Centered AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 10, 4(2020), 1–31.Google ScholarDigital Library
H Colleen Stuart, Laura Dabbish, Sara Kiesler, Peter Kinnaird, and Ruogu Kang. 2012. Social transparency in networked information exchange: a theoretical framework. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. 451–460.Google ScholarDigital Library
Hariharan Subramonyam, Colleen Seifert, and Eytan Adar. 2021. Towards A Process Model for Co-Creating AI Experiences. arXiv preprint arXiv:2104.07595(2021).Google Scholar
Kartik Talamadupula. 2021. Applied AI Matters - AI4Code: Applying Artificial Intelligence to Source Code. Association for Computing Machinery (ACM) Special Interest Group on AI (SIGAI) AI Matters 7(2021). Issue 1.Google ScholarDigital Library
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit Test Case Generation with Transformers. arXiv preprint arXiv:2009.05617(2020).Google Scholar
Jennifer Wortman Vaughan and Hanna Wallach. 2020. A human-centered agenda for intelligible machine learning. Machines We Trust: Getting Along with Artificial Intelligence (2020).Google Scholar
Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In ACL.Google Scholar
Jesse Vig and Yonatan Belinkov. 2019. Analyzing the Structure of Attention in a Transformer Language Model. In BlackboxNLP@ACL.Google Scholar
Donald Martin Vinodkumar Prabhakaran Jr. 2020. Participatory Machine Learning Using Community-Based System Dynamics. Health and Human Rights 22, 2 (2020), 71.Google Scholar
Abhishek Wadhwani and Priyank Jain. 2020. Machine Learning Model Cards Transparency Review: Using model card toolkit. In 2020 IEEE Pune Section International Conference (PuneCon). IEEE, 133–137.Google ScholarCross Ref
Yue Wang, Weishi Wang, Shafiq R. Joty, and S. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation.Google Scholar
Justin D Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces. 402–412.Google Scholar
Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not explanation. arXiv preprint arXiv:1908.04626(2019).Google Scholar
Christine T Wolf. 2019. Explainability scenarios: towards scenario-based XAI design. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 252–257.Google ScholarDigital Library
Enhao Zhang and Nikola Banovic. 2021. Method for Exploring Generative Adversarial Networks (GANs) via Automatically Generated Image Galleries. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–15.Google ScholarDigital Library
Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning Deep Features for Discriminative Localization. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 2921–2929.Google Scholar
Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-sensitive algorithm design: Method, case study, and lessons. Proceedings of the ACM on Human-Computer Interaction 2, CSCW(2018), 1–23.Google ScholarDigital Library

Index Terms

Investigating Explainability of Generative AI for Code through Scenario-based Design
1. Human-centered computing
2. Software and its engineering
  1. Software creation and management
    1. Software development process management
  2. Software notations and tools

Index terms have been assigned to the content through auto-classification.

Recommendations

Design Principles for Generative AI Applications
CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems
Generative AI applications present unique design challenges. As generative AI technologies are increasingly being incorporated into mainstream applications, there is an urgent need for guidance on how to design user experiences that foster effective and ...
Read More
Investigating and Designing for Trust in AI-powered Code Generation Tools
FAccT '24: The 2024 ACM Conference on Fairness, Accountability, and Transparency

Trust is a crucial factor for the adoption and responsible usage of generative AI tools in complex tasks such as software engineering. However, we have a limited understanding of how software developers evaluate the trustworthiness of AI-powered code ...
Read More
Towards Design Principles for User-Centric Explainable AI in Fraud Detection
Artificial Intelligence in HCI
Abstract
Experts rely on fraud detection and decision support systems to analyze fraud cases, a growing problem in digital retailing and banking. With the advent of Artificial Intelligence (AI) for decision support, those experts face the black-box problem ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces
March 2022
888 pages
ISBN:9781450391443
DOI:10.1145/3490099

Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 March 2022
Check for updates
Author Tags
explainable AI
generative AI
human-centered AI
scenario based design
software engineering tooling
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate746of2,811submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 57
  Total Citations
  View Citations
- 12,325
  Total Downloads
- Downloads (Last 12 months)8,445
- Downloads (Last 6 weeks)776
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Investigating Explainability of Generative AI for Code through Scenario-based Design

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design Principles for Generative AI Applications

Investigating and Designing for Trust in AI-powered Code Generation Tools

Towards Design Principles for User-Centric Explainable AI in Fraud Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Investigating Explainability of Generative AI for Code through Scenario-based Design

IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design Principles for Generative AI Applications

Investigating and Designing for Trust in AI-powered Code Generation Tools

Towards Design Principles for User-Centric Explainable AI in Fraud Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media