survey

Software Engineering for AI-Based Systems: A Survey

Authors:
Silverio Martínez-Fernández

Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain

Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain

0000-0001-9928-133X
View Profile

,
Justus Bogner

University of Stuttgart, Institute of Software Engineering, Stuttgart, Germany

University of Stuttgart, Institute of Software Engineering, Stuttgart, Germany

0000-0001-5788-0991
View Profile

,
Xavier Franch

Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain

Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain

0000-0001-9733-8830
View Profile

,
Marc Oriol

Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain

Universitat Politècnica de Catalunya - BarcelonaTech, Barcelona, Spain

0000-0003-1928-7024
View Profile

,
Julien Siebert

FraunhoferInstitute for Experimental Software Engineering IESE, Kaiserslautern, Germany

FraunhoferInstitute for Experimental Software Engineering IESE, Kaiserslautern, Germany

0000-0002-7696-0046
View Profile

,
Adam Trendowicz

FraunhoferInstitute for Experimental Software Engineering IESE, Kaiserslautern, Germany

FraunhoferInstitute for Experimental Software Engineering IESE, Kaiserslautern, Germany
View Profile

,
Anna Maria Vollmer

FraunhoferInstitute for Experimental Software Engineering IESE, Kaiserslautern, Germany

FraunhoferInstitute for Experimental Software Engineering IESE, Kaiserslautern, Germany

0000-0002-3563-8253
View Profile

,
Stefan Wagner

University of Stuttgart, Institute of Software Engineering, Stuttgart, Germany

University of Stuttgart, Institute of Software Engineering, Stuttgart, Germany

0000-0002-5256-8429
View Profile

ACM Transactions on Software Engineering and Methodology Volume 31 Issue 2Article No.: 37epp 1–59https://doi.org/10.1145/3487043

Published:01 April 2022Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image-, speech-recognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems. To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study. We considered 248 studies published between January 2010 and March 2020. SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges. Our results are valuable for: researchers, to quickly understand the state-of-the-art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula.

REFERENCES

[1] Abdessalem Raja Ben, Nejati Shiva, Briand Lionel C., and Stifter Thomas. 2018. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering. ACM, New York, NY, USA, 1016–1026. DOI:Google ScholarDigital Library
[2] Adedjouma Morayo, Pedroza Gabriel, and Bannour Boutheina. 2018. Representative safety assessment of autonomous vehicle for public transportation. In 2018 IEEE 21st International Symposium on Real-Time Distributed Computing (ISORC). IEEE, 124–129. DOI:Google ScholarCross Ref
[3] Aggarwal Aniya, Lohia Pranay, Nagar Seema, Dey Kuntal, and Saha Diptikalyan. 2019. Black box fairness testing of machine learning models. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 625–635. DOI:Google ScholarDigital Library
[4] Akkiraju Rama, Sinha Vibha, Xu Anbang, Mahmud Jalal, Gundecha Pritam, Liu Zhe, Liu Xiaotong, and Schumacher John. 2018. Characterizing machine learning process: A maturity framework. arXiv (2018).Google Scholar
[5] Alahdab Mohannad and Çalıklı Gül. 2019. Empirical analysis of hidden technical debt patterns in machine learning software. In Product-Focused Software Process Improvement. Springer International Publishing, 195–202. DOI:Google ScholarCross Ref
[6] Amershi Saleema, Begel Andrew, Bird Christian, DeLine Robert, Gall Harald, Kamar Ece, Nagappan Nachiappan, Nushi Besmira, and Zimmermann Thomas. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300. DOI:Google ScholarDigital Library
[7] Amodei Dario, Olah Chris, Steinhardt Jacob, Christiano Paul, Schulman John, and Mané Dan. 2016. Concrete problems in AI safety. arXiv 277, 2003 (2016), 1–29. arxiv:1606.06565 http://arxiv.org/abs/1606.06565Google Scholar
[8] Ampatzoglou Apostolos, Bibi Stamatia, Avgeriou Paris, Verbeek Marijn, and Chatzigeorgiou Alexander. 2019. Identifying, categorizing and mitigating threats to validity in software engineering secondary studies. Information and Software Technology 106 (2019), 201–230.Google ScholarCross Ref
[9] Aniculaesei Adina, Grieser Jörg, Rausch Andreas, Rehfeldt Karina, and Warnecke Tim. 2018. Towards a holistic software systems engineering approach for dependable autonomous systems. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems. ACM, 23–30. DOI:Google ScholarDigital Library
[10] Aniculaesei Adina, Grieser Jorg, Rausch Andreas, Rehfeldt Karina, and Warnecke Tim. 2019. Graceful degradation of decision and control responsibility for autonomous systems based on dependability cages. 5th International Symposium on Future Active Safety Technology toward Zero Accidents (FAST-zero’19)September (2019), 1–6.Google Scholar
[11] Anthes Gary. 2017. Artificial intelligence poised to ride a new wave. Commun. ACM 60, 7 (2017), 19–21.Google ScholarDigital Library
[12] Arnold M., Bellamy R. K. E., Hind M., Houde S., Mehta S., Mojsilović A., Nair R., Ramamurthy K. Natesan, Reimer D., Olteanu A., Piorkowski D., Tsay J., and Varshney K. R.. 2018. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. arXiv (2018). arxiv:1808.07261.Google Scholar
[13] Arpteg Anders, Brinne Bjorn, Crnkovic-Friis Luka, and Bosch Jan. 2018. Software engineering challenges of deep learning. In 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 50–59. DOI:arxiv:1810.12034.Google ScholarCross Ref
[14] Bailis Peter, Olukotun Kunle, Ré Christopher, and Zaharia Matei. 2017. Infrastructure for usable machine learning: The Stanford DAWN project. arXiv (2017). arxiv:1705.07538.Google Scholar
[15] Banks Alec and Ashmore Rob. 2019. Requirements assurance in machine learning. CEUR Workshop Proceedings 2301 (2019).Google Scholar
[16] Bansal Somil and Tomlin Claire J.. 2018. Control and safety of autonomous vehicles with learning-enabled components. In Safe, Autonomous and Intelligent Vehicles. Springer International Publishing, 57–75. DOI:Google ScholarCross Ref
[17] Basili V., Caldiera G., and Rombach H. D.. 1994. The goal question metric approach. In Encyclopedia of Software Engineering, Vol. 2. John Wiley & Sons, 528–532.Google Scholar
[18] Baylor Denis, Breck Eric, Cheng Heng-Tze, Fiedel Noah, Foo Chuan Yu, Haque Zakaria, Haykal Salem, Ispir Mustafa, Jain Vihan, Koc Levent, Koo Chiu Yuen, Lew Lukasz, Mewald Clemens, Modi Akshay Naresh, Polyzotis Neoklis, Ramesh Sukriti, Roy Sudip, Whang Steven Euijong, Wicke Martin, Wilkiewicz Jarek, Zhang Xin, and Zinkevich Martin. 2017. TFX: A TensorFlow-based production-scale machine learning platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1387–1395. DOI:Google ScholarDigital Library
[19] Behutiye Woubshet, Karhapää Pertti, López Lidia, Burgués Xavier, Martínez-Fernández Silverio, Vollmer Anna Maria, Rodríguez Pilar, Franch Xavier, and Oivo Markku. 2020. Management of quality requirements in agile and rapid software development: A systematic mapping study. Information and Software Technology 123 (2020), 106225.Google ScholarCross Ref
[20] Belani Hrvoje, Vukovic Marin, and Car Zeljka. 2019. Requirements engineering challenges in building AI-based complex systems. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 252–255. DOI:Google ScholarCross Ref
[21] Bernardi Lucas, Mavridis Themistoklis, and Estevez Pablo. 2019. 150 successful machine learning models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1743–1751. DOI:Google ScholarDigital Library
[22] Bolte Jan Aike, Bär Andreas, Lipinski Daniel, and Fingscheidt Tim. 2019. Towards corner case detection for autonomous driving. arXivIv (2019).Google Scholar
[23] Borg Markus, Englund Cristofer, Wnuk Krzysztof, Duran Boris, Levandowski Christoffer, Gao Shenjian, Tan Yanwen, Kaijser Henrik, Lönn Henrik, and Törnqvist Jonas. 2018. Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry. arXiv preprint arXiv:1812.05389 (2018).Google Scholar
[24] Bosch Jan, Crnkovic Ivica, and Olsson Helena Holmström. 2020. Engineering AI systems: A research agenda. arXiv (2020). arxiv:2001.07522.Google Scholar
[25] Bourque Pierre and Richard E.. 2014. SWEBOK Version 3.0. IEEE, ISBN-10: 0-7695-5166-1 (2014).Google Scholar
[26] Bozic Josip and Wotawa Franz. 2018. Security testing for chatbots. In Testing Software and Systems. Springer International Publishing, 33–38. DOI:Google ScholarCross Ref
[27] Braiek Houssem Ben and Khomh Foutse. 2020. On testing machine learning programs. Journal of Systems and Software 164 (2020), 110542.Google ScholarCross Ref
[28] Breck Eric, Cai Shanqing, Nielsen Eric, Salib Michael, and Sculley D.. 2017. The ML test score: A rubric for ML production readiness and technical debt reduction. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 1123–1132. DOI:Google ScholarCross Ref
[29] Breck Eric, Polyzotis Neoklis, Roy Sudip, Whang Steven Euijong, and Zinkevich Martin. 2019. Data validation for machine learning. SysML (2019), 1–14.Google Scholar
[30] Brereton Pearl, Kitchenham Barbara A., Budgen David, Turner Mark, and Khalil Mohamed. 2007. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software 80, 4 (Apr 2007), 571–583. DOI:Google ScholarDigital Library
[31] Bryson Joanna and Winfield Alan. 2017. Standardizing ethical design for artificial intelligence and autonomous systems. Computer 50, 5 (May 2017), 116–119. DOI:Google ScholarDigital Library
[32] Burton Simon, Gauerhof Lydia, and Heinzemann Christian. 2017. Making the case for safety of machine learning in highly automated driving. In Lecture Notes in Computer Science. Springer International Publishing, 5–16. DOI:Google ScholarCross Ref
[33] Byun Taejoon, Sharma Vaibhav, Vijayakumar Abhishek, Rayadurgam Sanjai, and Cofer Darren. 2019. Input prioritization for testing neural networks. In 2019 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 63–70. DOI:arxiv:1901.03768.Google ScholarCross Ref
[34] Cai Shanqing, Breck Eric, Nielsen Eric, Salib Michael, and Sculley D.. 2016. TensorFlow debugger: Debugging dataflow graphs for machine learning. In Proceedings of the Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016). https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45789.pdf.Google Scholar
[35] Chakarov Aleksandar, Nori Aditya, Rajamani Sriram, Sen Shayak, and Vijaykeerthy Deepak. 2016. Debugging machine learning tasks. arXiv (2016), 1–29. arxiv:1603.07292 http://arxiv.org/abs/1603.07292.Google Scholar
[36] Chakravarty Anand. 2010. Stress testing an AI based web service: A case study. In 2010 Seventh International Conference on Information Technology: New Generations. IEEE, 1004–1008. DOI:Google ScholarDigital Library
[37] Chen Meng, Knapp Andreas, Pohl Martin, and Dietmayer Klaus. 2018. Taming functional deficiencies of automated driving systems: A methodology framework toward safety validation. In 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1918–1924. DOI:Google ScholarDigital Library
[38] Cheng Chih-Hong, Nührenberg Georg, Huang Chung-Hao, and Ruess Harald. 2018. Verification of binarized neural networks via inter-neuron factoring. In Lecture Notes in Computer Science. Springer International Publishing, 279–290. DOI:arxiv:arXiv:1710.03107v2.Google ScholarCross Ref
[39] Coates D. L. and Martin A.. 2019. An instrument to evaluate the maturity of bias governance capability in artificial intelligence projects. IBM Journal of Research and Development 63, 4/5 (Jul 2019), 7:1–7:15. DOI:Google ScholarCross Ref
[40] Colomo-Palacios Ricardo. 2019. Towards a Software Engineering Framework for the Design, Construction and Deployment of Machine Learning-Based Solutions in Digitalization Processes. 343–349.Google ScholarCross Ref
[41] Costal Dolors, Farré Carles, Franch Xavier, and Quer Carme. 2021. How tertiary studies perform quality assessment of secondary studies in software engineering. In 2021 Proceedings of 24th IberoAmerican Conference on Software Engineering (CIbSE 2021), ESELAW track.Google Scholar
[42] Crankshaw Daniel, Wang Xin, Zhou Giulio, Franklin Michael J., Gonzalez Joseph E., and Stoica Ion. 2017. Clipper: A low-latency online prediction serving system. Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017 (2017), 613–627. arxiv:1612.03079.Google Scholar
[43] Cruzes Daniela S. and Dyba Tore. 2011. Recommended steps for thematic synthesis in software engineering. In 2011 International Symposium on Empirical Software Engineering and Measurement. IEEE, 275–284.Google ScholarDigital Library
[44] Nascimento Elizamary de Souza, Ahmed Iftekhar, Oliveira Edson, Palheta Marcio Piedade, Steinmacher Igor, and Conte Tayana. 2019. Understanding development process of machine learning systems: Challenges and solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1–6. DOI:Google ScholarCross Ref
[45] Deak Ryan M. and Morra Jonathan H.. 2018. Aloha: A machine learning framework for engineers. Conference on Systems and Machine Learning (MLSys) (2018), 17–19. https://www.sysml.cc/doc/13.pdf.Google Scholar
[46] Deng Li. 2018. Artificial intelligence in the rising wave of deep learning: The historical path and future outlook [perspectives]. IEEE Signal Processing Magazine 35, 1 (2018), 180–177.Google ScholarCross Ref
[47] Desai Ankush, Ghosh Shromona, Seshia Sanjit A., Shankar Natarajan, and Tiwari Ashish. 2019. SOTER: A runtime assurance framework for programming safe robotics systems. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 138–150. DOI:arxiv:1808.07921.Google ScholarCross Ref
[48] Dreossi Tommaso, Fremont Daniel J., Ghosh Shromona, Kim Edward, Ravanbakhsh Hadi, Vazquez-Chanlatte Marcell, and Seshia Sanjit A.. 2019. VerifAI: A toolkit for the formal design and analysis of artificial intelligence-based systems. In Computer Aided Verification, Dillig Isil and Tasiran Serdar (Eds.). Springer International Publishing, Cham, 432–442.Google ScholarCross Ref
[49] Dreossi Tommaso, Jha Somesh, and Seshia Sanjit A.. 2018. Semantic adversarial deep learning. arXiv 2 (2018), 3–26.Google Scholar
[50] Du Xiaoning, Xie Xiaofei, Li Yi, Ma Lei, Liu Yang, and Zhao Jianjun. 2019. DeepStellar: Model-based quantitative analysis of stateful deep learning systems. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 477–487. DOI:Google ScholarDigital Library
[51] Du Xiaoning, Xie Xiaofei, Li Yi, Ma Lei, Liu Yang, and Zhao Jianjun. 2019. A quantitative analysis framework for recurrent neural network. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1062–1065. DOI:Google ScholarDigital Library
[52] Dwarakanath Anurag, Ahuja Manish, Sikand Samarth, Rao Raghotham M., Bose R. P. Jagadeesh Chandra, Dubash Neville, and Podder Sanjay. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 118–128. DOI:arxiv:1808.05353.Google ScholarDigital Library
[53] Emam Khaled El. 1999. Benchmarking Kappa: Interrater agreement in software process assessments. Empir. Softw. Eng. 4, 2 (1999), 113–133.Google ScholarDigital Library
[54] Eniser Hasan Ferit, Gerasimou Simos, and Sen Alper. 2019. DeepFault: Fault localization for deep neural networks. In Fundamental Approaches to Software Engineering. Springer International Publishing, 171–191. DOI:arxiv:1902.05974.Google ScholarCross Ref
[55] Eykholt Kevin, Evtimov Ivan, Fernandes Earlence, Li Bo, Rahmati Amir, Xiao Chaowei, Prakash Atul, Kohno Tadayoshi, and Song Dawn. 2017. Robust physical-world attacks on deep learning models. arXiv (2017). arxiv:1707.08945 http://arxiv.org/abs/1707.08945.Google Scholar
[56] Feng Yang, Shi Qingkai, Gao Xinyu, Wan Jun, Fang Chunrong, and Chen Zhenyu. 2020. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 177–188. DOI:arxiv:1903.00661.Google ScholarDigital Library
[57] Feth Patrik, Schneider Daniel, and Adler Rasmus. 2017. A conceptual safety supervisor definition and evaluation framework for autonomous systems. In Lecture Notes in Computer Science. Springer International Publishing, 135–148. DOI:Google ScholarCross Ref
[58] Fiebrink Rebecca, Cook Perry R., and Trueman Dan. 2011. Human model evaluation in interactive supervised learning. In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems - CHI'11. ACM Press, 147–156. DOI:Google ScholarDigital Library
[59] Flaounas Ilias. 2017. Beyond the technical challenges for deploying machine learning solutions in a software company. arXiv (2017). arxiv:1708.02363.Google Scholar
[60] Foidl Harald, Felderer Michael, and Biffl Stefan. 2019. Technical debt in data-intensive software systems. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 338–341. DOI:arxiv:1905.13455.Google ScholarCross Ref
[61] Franco-Bedoya Oscar, Ameller David, Costal Dolors, and Franch Xavier. 2017. Open source software ecosystems: A systematic mapping. Information and Software Technology 91 (2017), 160–185.Google ScholarDigital Library
[62] Fremont Daniel J., Kim Edward, Pant Yash Vardhan, Seshia Sanjit A., Acharya Atul, Bruso Xantha, Wells Paul, Lemke Steve, Lu Qiang, and Mehta Shalin. 2020. Formal scenario-based testing of autonomous vehicles: From simulation to the real world. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE. DOI:arxiv:2003.07739.Google ScholarDigital Library
[63] Gambi Alessio, Mueller Marc, and Fraser Gordon. 2019. Automatically testing self-driving cars with search-based procedural content generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 273–283. DOI:Google ScholarDigital Library
[64] Gao Jerry, Tao Chuanqi, Jie Dou, and Lu Shengqiang. 2019. Invited paper: What is AI software testing? and why. In 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE). IEEE, 27–36. DOI:Google ScholarCross Ref
[65] Garcia Alvaro Lopez, Lucas Jesus Marco De, Antonacci Marica, Castell Wolfgang Zu, David Mario, Hardt Marcus, Iglesias Lara Lloret, Molto Germen, Plociennik Marcin, Tran Viet, Alic Andy S., Caballer Miguel, Plasencia Isabel Campos, Costantini Alessandro, Dlugolinsky Stefan, Duma Doina Cristina, Donvito Giacinto, Gomes Jorge, Cacha Ignacio Heredia, Ito Keiichi, Kozlov Valentin Y., Nguyen Giang, Fernandez Pablo Orviz, Sustr Zdenek, and Wolniewicz Pawel. 2020. A cloud-based framework for machine learning workloads and applications. IEEE Access 8 (2020), 18681–18692. DOI:Google ScholarCross Ref
[66] Gauerhof Lydia, Munk Peter, and Burton Simon. 2018. Structuring validation targets of a machine learning function applied to automated driving. In Developments in Language Theory. Springer International Publishing, 45–58. DOI:Google ScholarCross Ref
[67] Gerasimou Simos, Eniser Hasan Ferit, Sen Alper, and Cakan Alper. 2020. Importance-driven deep learning system testing. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings. ACM, 322–323. DOI:arxiv:2002.03433.Google ScholarDigital Library
[68] Gharib Mohamad, Lollini Paolo, Botta Marco, Amparore Elvio, Donatelli Susanna, and Bondavalli Andrea. 2018. On the safety of automotive systems incorporating machine learning based components: A position paper. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 271–274. DOI:Google ScholarCross Ref
[69] Ghofrani Javad, Kozegar Ehsan, Bozorgmehr Arezoo, and Soorati Mohammad Divband. 2019. Reusability in artificial neural networks. In Proceedings of the 23rd International Systems and Software Product Line Conference Volume B - SPLC'19. ACM Press. DOI:Google ScholarDigital Library
[70] Ghosh Shromona, Ravanbakhsh Hadi, and Seshia Sanjit A.. 2019. Counterexample-guided synthesis of perception models and control. arXiv (2019). arxiv:1911.01523.Google Scholar
[71] Giray Görkem. 2021. A software engineering perspective on engineering machine learning systems: State of the art and challenges. Journal of Systems and Software 180 (2021), 111031. DOI:Google ScholarDigital Library
[72] Gopinath Divya, Katz Guy, Pasareanu Corina S., and Barrett Clark. 2017. DeepSafe: A data-driven approach for checking adversarial robustness in neural networks. arXiv (2017). arxiv:1710.00486.Google Scholar
[73] Guo Qianyu, Chen Sen, Xie Xiaofei, Ma Lei, Hu Qiang, Liu Hongtao, Liu Yang, Zhao Jianjun, and Li Xiaohong. 2019. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 810–822. DOI:arxiv:1909.06727.Google ScholarDigital Library
[74] Hains Gaetan, Jakobsson Arvid, and Khmelevsky Youry. 2018. Towards formal methods and software engineering for deep learning: Security, safety and productivity for DL systems development. In 2018 Annual IEEE International Systems Conference (SysCon). IEEE, 1–5. DOI:Google ScholarCross Ref
[75] Haldar Malay, Abdool Mustafa, Ramanathan Prashant, Xu Tao, Yang Shulin, Duan Huizhong, Zhang Qing, Barrow-Williams Nick, Turnbull Bradley C., Collins Brendan M., and Legrand Thomas. 2018. Applying deep learning to Airbnb search. arXiv (2018), 1927–1935.Google Scholar
[76] Hartsell Charles, Mahadevan Nagabhushan, Ramakrishna Shreyas, Dubey Abhishek, Bapty Theodore, Johnson Taylor, Koutsoukos Xenofon, Sztipanovits Janos, and Karsai Gabor. 2019. Model-based design for CPS with learning-enabled components. In Proceedings of the Workshop on Design Automation for CPS and IoT - DESTION'19. ACM Press, 1–9. DOI:Google ScholarDigital Library
[77] Hauer Florian, Schmidt Tabea, Holzmuller Bernd, and Pretschner Alexander. 2019. Did we test all scenarios for automated and autonomous driving systems? In 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2950–2955. DOI:Google ScholarDigital Library
[78] Henderson Peter, Sinha Koustuv, Angelard-Gontier Nicolas, Ke Nan Rosemary, Fried Genevieve, Lowe Ryan, and Pineau Joelle. 2017. Ethical challenges in data-driven dialogue systems. arXiv (2017), 123–129.Google Scholar
[79] Henriksson Jens, Borg Markus, and Englund Cristofer. 2018. Automotive safety and machine learning. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems. ACM, 47–49. DOI:Google ScholarDigital Library
[80] Hill Charles, Bellamy Rachel, Erickson Thomas, and Burnett Margaret. 2016. Trials and tribulations of developers of intelligent systems: A field study. In 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 162–170. DOI:Google ScholarCross Ref
[81] Holstein Kenneth, Vaughan Jennifer Wortman, Daumé Hal, Dudík Miroslav, and Wallach Hanna. 2018. Improving fairness in machine learning systems: What do industry practitioners need? arXiv (2018), 1–16.Google Scholar
[82] Horkoff Jennifer. 2019. Non-functional requirements for machine learning: Challenges and new directions. In 2019 IEEE 27th International Requirements Engineering Conference (RE). IEEE, 386–391. DOI:Google ScholarCross Ref
[83] Huang Song. 2018. Challenges of testing machine learning applications. International Journal of Performability Engineering (2018), 1275–1282. DOI:Google ScholarCross Ref
[84] Huang Xiaowei, Kwiatkowska Marta, Wang Sen, and Wu Min. 2017. Safety verification of deep neural networks. In Computer Aided Verification. Springer International Publishing, 3–29. DOI:arxiv:1610.06940.Google ScholarCross Ref
[85] Hummer Waldemar, Muthusamy Vinod, Rausch Thomas, Dube Parijat, Maghraoui Kaoutar El, Murthi Anupama, and Oum Punleuk. 2019. ModelOps: Cloud-based lifecycle management for reliable and trusted AI. In 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 113–120. DOI:Google ScholarCross Ref
[86] Ingrand Felix. 2019. Recent trends in formal validation and verification of autonomous robots software. In 2019 Third IEEE International Conference on Robotic Computing (IRC). IEEE, 321–328. DOI:Google ScholarCross Ref
[87] Standardization International Organization For. 2011. ISO/IEC 25010 - Systems and Software Engineering - Systems and Software Quality Requirements and Evaluation (SQuaRE) - System and Software Quality Models. 25 pages. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35733.Google Scholar
[88] Ishikawa Fuyuki. 2018. Concepts in quality assessment for machine learning - from test data to arguments. In Conceptual Modeling. Springer International Publishing, 536–544. DOI:Google ScholarCross Ref
[89] Ishikawa Fuyuki and Matsuno Yutaka. 2018. Continuous argument engineering: Tackling uncertainty in machine learning based systems. In Developments in Language Theory. Springer International Publishing, 14–21. DOI:Google ScholarCross Ref
[90] Ishikawa Fuyuki and Yoshioka Nobukazu. 2019. How do engineers perceive difficulties in engineering of machine-learning systems? - Questionnaire survey. In 2019 IEEE/ACM Joint 7th International Workshop on Conducting Empirical Studies in Industry (CESI) and 6th International Workshop on Software Engineering Research and Industrial Practice (SER&IP). IEEE, 2–9. DOI:Google ScholarDigital Library
[91] Islam Md Johirul, Nguyen Giang, Pan Rangeet, and Rajan Hridesh. 2019. A comprehensive study on deep learning bug characteristics. arXiv (2019), 510–520.Google Scholar
[92] Islam Md Johirul, Nguyen Hoan Anh, Pan Rangeet, and Rajan Hridesh. 2019. What do developers ask about ML libraries? A large-scale study using stack overflow. arXivMl (2019). arxiv:1906.11940.Google Scholar
[93] Ivarsson Martin and Gorschek Tony. 2010. A method for evaluating rigor and industrial relevance of technology evaluations. Empirical Software Engineering 16, 3 (Oct 2010), 365–395. DOI:Google ScholarDigital Library
[94] Jenn Eric, Albore Alexandre, Mamalet Franck, Flandin Grégory, Gabreau Christophe, Delseny Hervé, Gauffriau Adrien, Bonnin Hugues, Alecu Lucian, Pirard Jérémy, Lefevre Baptiste, Gabriel Jean-Marc, Cappi Cyril, Gardès Laurent, Picard Sylvaine, Dulon Gilles, Beltran Brice, Bianic Jean-Christophe, Damour Mathieu, Delmas Kevin, and Pagetti Claire. 2020. Identifying challenges to the certification of machine learning for safety critical systems. In Proceedings of the 10th European Congress on Embedded Real Time Systems (ERTS). 10.Google Scholar
[95] Jentzsch Sophie F. and Hochgeschwender Nico. 2019. Don't forget your roots! Using provenance data for transparent and explainable development of machine learning models. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). IEEE, 37–40. DOI:Google ScholarCross Ref
[96] Ji Yujie, Zhang Xinyang, Ji Shouling, Luo Xiapu, and Wang Ting. 2018. Model-reuse attacks on deep learning systems. arXiv (2018), 349–363.Google Scholar
[97] Jia Minghua, Wang Xiaodong, Xu Yue, Cui Zhanqi, and Xie Ruilin. 2020. Testing machine learning classifiers based on compositional metamorphic relations. International Journal of Performability Engineering 16, 1 (2020), 67. DOI:Google ScholarCross Ref
[98] Juez Garazi, Amparan Estibaliz, Lattarulo Ray, Rastelli Joshue Perez, Ruiz Alejandra, and Espinoza Huascar. 2017. Safety assessment of automated vehicle functions by simulation-based fault injection. In 2017 IEEE International Conference on Vehicular Electronics and Safety (ICVES). IEEE, 214–219. DOI:Google ScholarDigital Library
[99] Kery Mary Beth, Radensky Marissa, Arya Mahima, John Bonnie E., and Myers Brad A.. 2018. The story in the notebook. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 1–11. DOI:Google ScholarDigital Library
[100] Khalajzadeh Hourieh, Abdelrazek Mohamed, Grundy John, Hosking John, and He Qiang. 2018. A survey of current end-user data analytics tool support. In 2018 IEEE International Congress on Big Data (BigData Congress). IEEE, 41–48. DOI:Google ScholarCross Ref
[101] Khomh Foutse, Adams Bram, Cheng Jinghui, Fokaefs Marios, and Antoniol Giuliano. 2018. Software engineering for machine-learning applications: The road ahead. IEEE Software 35, 5 (2018), 81–84.Google ScholarCross Ref
[102] Kim Miryung, Zimmermann Thomas, DeLine Robert, and Begel Andrew. 2018. Data scientists in software teams: State of the art and challenges. IEEE Transactions on Software Engineering 44, 11 (Nov 2018), 1024–1038. DOI:Google ScholarCross Ref
[103] Kitchenham Barbara. 2004. Procedures for performing systematic reviews. Keele, UK, Keele University 33, 2004 (2004), 1–26.Google Scholar
[104] Kitchenham Barbara and Charters Stuart. 2007. Guidelines for performing systematic literature reviews in software engineering. Keele University and University of Durham.Google Scholar
[105] Klueck Florian, Li Yihao, Nica Mihai, Tao Jianbo, and Wotawa Franz. 2018. Using ontologies for test suites generation for automated and autonomous driving functions. In 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 118–123. DOI:Google ScholarCross Ref
[106] Koopman Philip and Wagner Michael. 2016. Challenges in autonomous vehicle testing and validation. SAE International Journal of Transportation Safety 4, 1 (Apr 2016), 15–24. DOI:Google ScholarCross Ref
[107] Koopman Philip and Wagner Michael. 2018. Toward a framework for highly automated vehicle safety validation. In SAE Technical Paper Series. SAE International, 1–13. DOI:Google ScholarCross Ref
[108] Koren Mark and Kochenderfer Mykel J.. 2019. Efficient autonomy validation in simulation with adaptive stress testing. arXiv (2019), 4178–4183.Google Scholar
[109] Koseler Kaan, McGraw Kelsea, and Stephan Matthew. 2019. Realization of a machine learning domain specific modeling language: A baseball analytics case study. In Proceedings of the 7th International Conference on Model-Driven Engineering and Software Development. SciTePress - Science and Technology Publications, 13–24. DOI:Google ScholarDigital Library
[110] Kostova Blagovesta, Gürses Seda, and Wegmann Alain. 2020. On the interplay between requirements, engineering, and artificial intelligence. CEUR Workshop Proceedings 2584 (2020).Google Scholar
[111] Kühl Niklas, Goutier Marc, Hirt Robin, and Satzger Gerhard. 2019. Machine learning in artificial intelligence: Towards a common understanding. In 52nd Hawaii International Conference on System Sciences, HICSS 2019, Grand Wailea, Maui, Hawaii, USA, January 8-11, 2019, Bui Tung (Ed.). ScholarSpace, 1–10. http://hdl.handle.net/10125/59960.Google ScholarCross Ref
[112] Kuhrmann Marco, Fernández Daniel Méndez, and Daneva Maya. 2017. On the pragmatic design of literature studies in software engineering: An experience-based guideline. Empirical Software Engineering 22, 6 (Jan 2017), 2852–2891. DOI:Google ScholarDigital Library
[113] Kulesza Todd, Burnett Margaret, Wong Weng-Keen, and Stumpf Simone. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, 126–137. DOI:Google ScholarDigital Library
[114] Kumar Abhishek, Braud Tristan, Tarkoma Sasu, and Hui Pan. 2020. Trustworthy AI in the age of pervasive computing and big data. arXiv (2020). arxiv:2002.05657.Google Scholar
[115] Kumeno Fumihiro. 2019. Software engineering challenges for machine learning applications: A literature review. Intelligent Decision Technologies 13, 4 (2019), 463–476.Google ScholarCross Ref
[116] Kuwajima Hiroshi and Ishikawa Fuyuki. 2019. Adapting SQuaRE for quality assessment of artificial intelligence systems. In 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 13–18. DOI:arxiv:1908.02134.Google ScholarCross Ref
[117] Kuwajima Hiroshi, Yasuoka Hirotoshi, and Nakae Toshihiro. 2019. Open Problems in Engineering Machine Learning Systems and the Quality Model. arXiv (2019). arxiv:1904.00001v1.Google Scholar
[118] Kuwajima Hiroshi, Yasuoka Hirotoshi, and Nakae Toshihiro. 2020. Engineering problems in machine learning systems. Machine Learning 109, 5 (Apr 2020), 1103–1126. DOI:arxiv:1904.00001.Google ScholarDigital Library
[119] Kästner Christian and Kang Eunsuk. 2020. Teaching software engineering for AI-enabled systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering Education and Training. ACM, 45–48. DOI:arxiv:2001.06691.Google ScholarDigital Library
[120] Lan Shuyue, Huang Chao, Wang Zhilu, Liang Hengyi, Su Wenhao, and Zhu Qi. 2018. Design automation for intelligent automotive systems. In 2018 IEEE International Test Conference (ITC). IEEE, 1–10. DOI:Google ScholarCross Ref
[121] Leofante Francesco, Pulina Luca, and Tacchella Armando. 2016. Learning with safety requirements: State of the art and open questions. CEUR Workshop Proceedings 1745 (2016), 11–25.Google Scholar
[122] Leotta Maurizio, Olianas Dario, Ricca Filippo, and Noceti Nicoletta. 2019. How do implementation bugs affect the results of machine learning algorithms? In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 1304–1313. DOI:Google ScholarDigital Library
[123] Liu David C., Rogers Stephanie, Shiau Raymond, Kislyuk Dmitry, Ma Kevin C., Zhong Zhigang, Liu Jenny, and Jing Yushi. 2017. Related pins at Pinterest. In Proceedings of the 26th International Conference on World Wide Web Companion - WWW'17 Companion. ACM Press, 583–592. DOI:Google ScholarDigital Library
[124] Lorenzoni Giuliano, Alencar Paulo, Nascimento Nathalia, and Cowan Donald. 2021. Machine learning model development from a software engineering perspective: A systematic literature review. arXiv preprint arXiv:2102.07574 (2021).Google Scholar
[125] Lwakatare Lucy Ellen, Raj Aiswarya, Bosch Jan, Olsson Helena Holmström, and Crnkovic Ivica. 2019. A taxonomy of software engineering challenges for machine learning systems: An empirical investigation. In Lecture Notes in Business Information Processing. Springer International Publishing, 227–243. DOI:Google ScholarCross Ref
[126] Lwakatare Lucy Ellen, Raj Aiswarya, Crnkovic Ivica, Bosch Jan, and Olsson Helena Holmström. 2020. Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions. Information and Software Technology 127 (2020), 106368.Google ScholarCross Ref
[127] Ma Lei, Zhang Fuyuan, Xue Minhui, Li Bo, Liu Yang, Zhao Jianjun, and Wang Yadong. 2018. Combinatorial testing for deep learning systems. arXiv (2018), 614–618. arxiv:1806.07723.Google Scholar
[128] Ma Shiqing, Liu Yingqi, Lee Wen-Chuan, Zhang Xiangyu, and Grama Ananth. 2018. MODE: Automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 175–186. DOI:Google ScholarDigital Library
[129] Machida Fumio. 2019. N-version machine learning models for safety critical systems. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 48–51. DOI:Google ScholarCross Ref
[130] Machida Fumio. 2019. On the diversity of machine learning models for system reliability. In 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 276–285. DOI:Google ScholarCross Ref
[131] Majumdar Rupak, Mathur Aman, Pirron Marcus, Stegner Laura, and Zufferey Damien. 2019. Paracosm: A language and tool for testing autonomous driving systems. arXiv (2019). arxiv:1902.01084.Google Scholar
[132] Mallozzi Piergiuseppe, Pelliccione Patrizio, and Menghi Claudio. 2018. Keeping intelligence under control. In Proceedings of the 1st International Workshop on Software Engineering for Cognitive Services. ACM, 37–40. DOI:Google ScholarDigital Library
[133] Martínez-Fernández Silverio, Franch Xavier, Jedlitschka Andreas, Oriol Marc, and Trendowicz Adam. 2020. Research directions for developing and operating artificial intelligence models in trustworthy autonomous systems. arXiv (2020). arxiv:2003.05434.Google Scholar
[134] Masuda Satoshi, Ono Kohichi, Yasue Toshiaki, and Hosokawa Nobuhiro. 2018. A survey of software quality for machine learning applications. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 279–284.Google ScholarCross Ref
[135] Mattos David Issa, Bosch Jan, and Olsson Helena Holmström. 2019. Leveraging business transformation with machine learning experiments. In Lecture Notes in Business Information Processing. Springer International Publishing, 183–191. DOI:Google ScholarCross Ref
[136] McDermid John, Jia Yan, and Habli Ibrahim. 2019. Towards a framework for safety assurance of autonomous systems. CEUR Workshop Proceedings 2419 (2019).Google Scholar
[137] John Meenu Mary, Olsson Helena Holmström, and Bosch Jan. [n.d.]. Architecting AI deployment: A systematic review of state-of-the-art and state-of-practice literature.Google Scholar
[138] Menzies Tim. 2020. The five laws of SE for AI. IEEE Software 37, 1 (Jan 2020), 81–85. DOI:Google ScholarDigital Library
[139] Molina Caroline Bianca Santos Tancredi, Almeida Jorge Rady de, Vismari Lucio F., Gonzalez Rodrigo Ignacio R., Naufal Jamil K., and Camargo Joao Batista. 2017. Assuring fully autonomous vehicles safety by design: The autonomous vehicle control (AVC) module strategy. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 16–21. DOI:Google ScholarCross Ref
[140] Moreb Mohammed, Mohammed Tareq Abed, Bayat Oguz, and Ata Oguz. 2020. Corrections to “A novel software engineering approach toward using machine learning for improving the efficiency of health systems”. IEEE Access 8 (2020), 136459–136459. DOI:Google ScholarCross Ref
[141] Mourão Erica, Pimentel João Felipe, Murta Leonardo, Kalinowski Marcos, Mendes Emilia, and Wohlin Claes. 2020. On the performance of hybrid search strategies for systematic literature reviews in software engineering. Information and Software Technology 123 (Jul 2020), 106294. DOI:Google ScholarCross Ref
[142] Munappy Aiswarya, Bosch Jan, Olsson Helena Holmstrom, Arpteg Anders, and Brinne Bjorn. 2019. Data management challenges for deep learning. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 140–147. DOI:Google ScholarCross Ref
[143] Nakajima Shin. 2018. [Invited] Quality assurance of machine learning software. In 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE). IEEE, 143–144. DOI:Google ScholarCross Ref
[144] Nakajima Shin. 2019. Dataset diversity for metamorphic testing of machine learning software. In Structured Object-Oriented Formal Language and Method, Duan Zhenhua, Liu Shaoying, Tian Cong, and Nagoya Fumiko (Eds.). Springer International Publishing, Cham, 21–38. DOI:Google ScholarCross Ref
[145] Nakajima Shin. 2019. Quality evaluation assurance levels for deep neural networks software. In 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE. DOI:Google ScholarCross Ref
[146] Nalchigar Soroosh, Yu Eric, Obeidi Yazan, Carbajales Sebastian, Green John, and Chan Allen. 2019. Solution patterns for machine learning. In Advanced Information Systems Engineering. Springer International Publishing, 627–642. DOI:Google ScholarDigital Library
[147] Nascimento Elizamary, Nguyen-Duc Anh, Sundbø Ingrid, and Conte Tayana. 2020. Software engineering for artificial intelligence and machine learning software: A systematic literature review. arXiv preprint arXiv:2011.03751 (2020).Google Scholar
[148] Naur Peter, Randell Brian, Bauer Friedrich Ludwig, and Committee. NATO Science (Eds.). 1969. Software Engineering: Report on a Conference Sponsored by the NATO Science Committee, Garmisch, Germany, 7th to 11th October 1968. Scientific Affairs Division, NATO.Google Scholar
[149] Nishi Yasuharu, Masuda Satoshi, Ogawa Hideto, and Uetsuki Keiji. 2018. A test architecture for machine learning product. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 273–278. DOI:Google ScholarCross Ref
[150] Nushi Besmira, Kamar Ece, and Horvitz Eric. 2018. Towards accountable AI: Hybrid human-machine analyses for characterizing system failure. arXivHcomp (2018), 126–135. arxiv:1809.07424.Google Scholar
[151] Odena Augustus and Goodfellow Ian. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. arXiv (2018).Google Scholar
[152] Otero Carlos E. and Peter Adrian. 2015. Research directions for engineering big data analytics software. IEEE Intelligent Systems 30, 1 (Jan 2015), 13–19. DOI:Google ScholarDigital Library
[153] Ozkaya Ipek. 2020. What is really different in engineering AI-enabled systems? IEEE Software 37, 4 (Jul 2020), 3–6. DOI:Google ScholarDigital Library
[154] Partridge D. and Wilks Y.. 1987. Does AI have a methodology which is different from software engineering? Artificial Intelligence Review 1, 2 (1987), 111–120. DOI:Google ScholarCross Ref
[155] Patel Kayur. 2010. Lowering the barrier to applying machine learning. In Adjunct Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology - UIST'10. ACM Press, 355–358. DOI:Google ScholarDigital Library
[156] Pedroza Gabriel and Morayo Adedjouma. 2019. Safe-by-design development method for artificial intelligent based systems. In Proceedings of the 31st International Conference on Software Engineering and Knowledge Engineering. KSI Research Inc. and Knowledge Systems Institute Graduate School, 391–397. DOI:Google ScholarCross Ref
[157] Perkusich Mirko, Silva Lenardo Chaves e, Costa Alexandre, Ramos Felipe, Saraiva Renata, Freire Arthur, Dilorenzo Ednaldo, Dantas Emanuel, Santos Danilo, Gorgônio Kyller, Almeida Hyggo, and Perkusich Angelo. 2020. Intelligent software engineering in the context of agile software development: A systematic literature review. Information and Software Technology 119 (2020), 106241. DOI:Google ScholarDigital Library
[158] Petersen Kai, Feldt Robert, Mujtaba Shahid, and Mattsson Michael. 2008. Systematic mapping studies in software engineering. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering. BCS Learning & Development, 68–77. DOI:Google ScholarCross Ref
[159] Petersen Kai, Vakkalanka Sairam, and Kuzniarz Ludwik. 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (Aug 2015), 1–18. DOI:Google ScholarDigital Library
[160] Pulina Luca and Tacchella Armando. 2010. An abstraction-refinement approach to verification of artificial neural networks. CEUR Workshop Proceedings 616 (2010), 243–257.Google Scholar
[161] Rahimi Mona, Guo Jin L. C., Kokaly Sahar, and Chechik Marsha. 2019. Toward requirements specification for machine-learned components. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 241–244. DOI:Google ScholarCross Ref
[162] Rahman Saidur, River Emilio, Khomh Foutse, Guhneuc Yann Gal, and Lehnert Bernd. 2019. Machine learning software engineering in practice: An industrial case study. arXiv (2019), 1–21. arxiv:1906.07154.Google Scholar
[163] Raji Inioluwa Deborah, Smart Andrew, White Rebecca N., Mitchell Margaret, Gebru Timnit, Hutchinson Ben, Smith-Loud Jamila, Theron Daniel, and Barnes Parker. 2020. Closing the AI accountability gap. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, 33–44. DOI:arxiv:2001.00973.Google ScholarDigital Library
[164] Ralph Paul, Ali Nauman bin, Baltes Sebastian, Bianculli Domenico, Diaz Jessica, Dittrich Yvonne, Ernst Neil, Felderer Michael, Feldt Robert, Filieri Antonio, França Breno Bernard Nicolau de, Furia Carlo Alberto, Gay Greg, Gold Nicolas, Graziotin Daniel, He Pinjia, Hoda Rashina, Juristo Natalia, Kitchenham Barbara, Lenarduzzi Valentina, Martínez Jorge, Melegati Jorge, Mendez Daniel, Menzies Tim, Molleri Jefferson, Pfahl Dietmar, Robbes Romain, Russo Daniel, Saarimäki Nyyti, Sarro Federica, Taibi Davide, Siegmund Janet, Spinellis Diomidis, Staron Miroslaw, Stol Klaas, Storey Margaret-Anne, Taibi Davide, Tamburri Damian, Torchiano Marco, Treude Christoph, Turhan Burak, Wang Xiaofeng, and Vegas Sira. 2021. Empirical Standards for Software Engineering Research. arxiv:2010.03525 [cs.SE].Google Scholar
[165] Ribeiro Marco Tulio, Singh Sameer, and Guestrin Carlos. 2016. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135–1144. DOI:arxiv:1602.04938.Google ScholarDigital Library
[166] Riccio Vincenzo, Jahangirova Gunel, Stocco Andrea, Humbatova Nargiz, Weiss Michael, and Tonella Paolo. 2020. Testing machine learning based systems: A systematic mapping. Empirical Software Engineering 25, 6 (2020), 5193–5254. DOI:Google ScholarDigital Library
[167] Rill R. A. and Lőrincz A.. 2019. Cognitive modeling approach for dealing with challenges in cyber-physical systems. Studia Universitatis Babe s-Bolyai Informatica 64, 1 (Jun 2019), 51–66. DOI:Google ScholarCross Ref
[168] Rubaiyat Abu Hasnat Mohammad, Qin Yongming, and Alemzadeh Homa. 2018. Experimental resilience assessment of an open-source driving agent. In 2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 54–63. DOI:arxiv:1807.06172.Google ScholarCross Ref
[169] Russell Stuart J. and Norvig Peter. 2021. Artificial Intelligence: A Modern Approach (Fourth edition). Pearson, Hoboken.Google Scholar
[170] Salay Rick and Czarnecki Krzysztof. 2018. Using machine learning safely in automotive software: An assessment and adaption of software process requirements in ISO 26262. arXiv (2018). arxiv:1808.01614.Google Scholar
[171] Salay Rick and Czarnecki Krzysztof. 2019. Improving ML safety with partial specifications. In Lecture Notes in Computer Science. Springer International Publishing, 288–300. DOI:Google ScholarDigital Library
[172] Salay Rick, Queiroz Rodrigo, and Czarnecki Krzysztof. 2017. An analysis of ISO 26262: Using machine learning safely in automotive software. arXiv (2017). arxiv:1709.02435.Google Scholar
[173] Santhanam P., Farchi Eitan, and Pankratius Victor. 2019. Engineering reliable deep learning systems. arXiv 3 (2019), 1–8. arxiv:1910.12582.Google Scholar
[174] Sarathy Prakash, Baruah Sanjoy, Cook Stephen, and Wolf Marilyn. 2019. Realizing the promise of artificial intelligence for unmanned aircraft systems through behavior bounded assurance. In 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC). IEEE. DOI:Google ScholarCross Ref
[175] Sato Naoto, Kuruma Hironobu, Kaneko Masanori, Nakagawa Yuichiroh, Ogawa Hideto, Hoang Thai Son, and Butler Michael. 2018. DeepSaucer: Unified environment for verifying deep neural networks. arXiv (2018). arxiv:1811.03752.Google Scholar
[176] Saunders William, Stuhlmüller Andreas, Sastry Girish, and Evans Owain. 2018. Trial without error: Towards safe reinforcement learning via human intervention. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 3 (2018), 2067–2069. arxiv:1707.05173.Google Scholar
[177] Schelter Sebastian, Biessmann Felix, Januschowski Tim, Salinas David, Seufert Stephan, and Szarvas Gyuri. 2018. On challenges in machine learning model management. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2018), 5–13. http://sites.computer.org/debull/A18dec/p5.pdf.Google Scholar
[178] Schleier-Smith Johann. 2015. An architecture for agile machine learning in real-time applications. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2059–2068. DOI:Google ScholarDigital Library
[179] Sculley D., Holt Gary, Golovin Daniel, Davydov Eugene, Phillips Todd, Ebner Dietmar, Chaudhary Vinay, Young Michael, Crespo Jean François, and Dennison Dan. 2015. Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems 2015-Jan (2015), 2503–2511.Google Scholar
[180] Serban Alex, Blom Koen van der, Hoos Holger, and Visser Joost. 2020. Adoption and effects of software engineering best practices in machine learning. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12.Google ScholarDigital Library
[181] Serban Alex and Visser Joost. 2021. An empirical study of software architecture for machine learning. arXiv preprint arXiv:2105.12422 (2021).Google Scholar
[182] Serban Alexandru Constantin. 2019. Designing safety critical software systems to manage inherent uncertainty. In 2019 IEEE International Conference on Software Architecture Companion (ICSA-C). IEEE, 246–249. DOI:Google ScholarCross Ref
[183] Seshia Sanjit A., Sadigh Dorsa, and Sastry S. Shankar. 2016. Towards verified artificial intelligence. arXiv (2016), 1–18. arxiv:1606.08514 http://arxiv.org/abs/1606.08514.Google Scholar
[184] Shafaei Sina, Kugele Stefan, Osman Mohd Hafeez, and Knoll Alois. 2018. Uncertainty in machine learning: A safety perspective on autonomous driving. In Developments in Language Theory. Springer International Publishing, 458–464. DOI:Google ScholarCross Ref
[185] Shalev-Shwartz Shai, Shammah Shaked, and Shashua Amnon. 2017. On a formal model of safe and scalable self-driving cars. arXiv (2017), 1–37. arxiv:1708.06374.Google Scholar
[186] Sheh Raymond and Monteath Isaac. 2018. Defining explainable AI for requirements analysis. KI - Künstliche Intelligenz 32, 4 (Oct 2018), 261–266. DOI:Google ScholarCross Ref
[187] Simard Patrice Y., Amershi Saleema, Chickering David M., Pelton Alicia Edelman, Ghorashi Soroush, Meek Christopher, Ramos Gonzalo, Suh Jina, Verwey Johan, Wang Mo, and Wernsing John. 2017. Machine teaching a new paradigm for building machine learning systems. arXiv (2017). arxiv:1707.06742.Google Scholar
[188] Spieker Helge and Gotlieb Arnaud. 2019. Towards testing of deep learning systems with training set reduction. arXiv2 (2019). arxiv:1901.04169.Google Scholar
[189] Srisakaokul Siwakorn, Zhang Yuhao, Zhong Zexuan, Yang Wei, Xie Tao, and Li Bo. 2018. MulDef: Multi-model-based defense against adversarial examples for neural networks. arXiv (2018). arxiv:1809.00065.Google Scholar
[190] Sun Xiaobing, Zhou Tianchi, Li Gengjie, Hu Jiajun, Yang Hui, and Li Bin. 2017. An empirical study on real bugs for machine learning programs. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 348–357. DOI:Google ScholarCross Ref
[191] Sun Youcheng, Huang Xiaowei, Kroening Daniel, Sharp James, Hill Matthew, and Ashmore Rob. 2019. DeepConcolic: Testing and debugging deep neural networks. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 111–114. DOI:Google ScholarDigital Library
[192] Sun Youcheng, Huang Xiaowei, Kroening Daniel, Sharp James, Hill Matthew, and Ashmore Rob. 2019. Structural test coverage criteria for deep neural networks. ACM Transactions on Embedded Computing Systems 18, 5s (Oct 2019), 1–23. DOI:Google ScholarDigital Library
[193] Sun Youcheng, Wu Min, Ruan Wenjie, Huang Xiaowei, Kwiatkowska Marta, and Kroening Daniel. 2018. Concolic testing for deep neural networks. arXiv (2018), 109–119.Google Scholar
[194] Sun Youcheng, Zhou Yifan, Maskell Simon, Sharp James, and Huang Xiaowei. 2020. Reliability validation of learning enabled vehicle tracking. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9390–9396. DOI:. arxiv:2002.02424Google ScholarCross Ref
[195] Thung Ferdian, Wang Shaowei, Lo David, and Jiang Lingxiao. 2012. An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. IEEE, 271–280. DOI:Google ScholarDigital Library
[196] Tian Yuchi, Pei Kexin, Jana Suman, and Ray Baishakhi. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303–314. DOI:Google ScholarDigital Library
[197] Tramèr Florian, Zhang Fan, Juels Ari, Reiter Michael K., and Ristenpart Thomas. 2016. Stealing machine learning models via prediction APIs. In 25th USENIX Security Symposium (USENIX Security 16). https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer.Google Scholar
[198] Tuncali Cumhur Erkan, Fainekos Georgios, Ito Hisahiro, and Kapinski James. 2018. Sim-ATAV. In Proceedings of the 21st International Conference on Hybrid Systems: Computation and Control (Part of CPS Week). ACM, 283–284. DOI:Google ScholarDigital Library
[199] Tuncali Cumhur Erkan, Fainekos Georgios, Prokhorov Danil, Ito Hisahiro, and Kapinski James. 2020. Requirements-driven test generation for autonomous vehicles with machine learning components. IEEE Transactions on Intelligent Vehicles 5, 2 (Jun 2020), 265–280. DOI:Google Scholar
[200] Udeshi Sakshi, Arora Pryanshu, and Chattopadhyay Sudipta. 2018. Automated directed fairness testing. arXiv (2018), 98–108.Google Scholar
[201] Weide Tom van der, Papadopoulos Dimitris, Smirnov Oleg, Zielinski Michal, and Kasteren Tim van. 2017. Versioning for end-to-end machine learning pipelines. In Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning. ACM. DOI:Google ScholarDigital Library
[202] Varshney Kush R.. 2016. Engineering safety in machine learning. In 2016 Information Theory and Applications Workshop (ITA). IEEE. DOI:arxiv:1601.04126.Google ScholarCross Ref
[203] Varshney Kush R. and Alemzadeh Homa. 2017. On the safety of machine learning: Cyber-physical systems, decision sciences, and data products. Big Data 5, 3 (Sep 2017), 246–255. DOI:arxiv:1610.01256.Google ScholarCross Ref
[204] Vasconcelos Marisa, Candello Heloisa, Pinhanez Claudio, and Santos Thiago dos. 2017. Bottester. In Proceedings of the XVI Brazilian Symposium on Human Factors in Computing Systems. ACM, 1–4. DOI:Google ScholarDigital Library
[205] Vogelsang Andreas and Borg Markus. 2019. Requirements engineering for machine learning: Perspectives from data scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 245–251. DOI:arxiv:1908.04674.Google ScholarCross Ref
[206] Wan Zhiyuan, Xia Xin, Lo David, and Murphy Gail C.. 2020. How does machine learning change software development practices? IEEE Transactions on Software Engineering (2020), 1–15. DOI:Google ScholarCross Ref
[207] Wang Jingyi, Sun Jun, Zhang Peixin, and Wang Xinyu. 2018. Detecting adversarial samples for deep neural networks through mutation testing. arXiv (2018), 1–10. arxiv:1805.05010.Google Scholar
[208] Wang Simin, Huang Liguo, Ge Jidong, Zhang Tengfei, Feng Haitao, Li Ming, Zhang He, and Ng Vincent. 2020. Synergy between machine/deep learning and software engineering: How far are we? arXiv preprint arXiv:2008.05515 (2020).Google Scholar
[209] Wang Shiqi, Pei Kexin, Whitehouse Justin, Yang Junfeng, and Jana Suman. 2018. Efficient formal safety analysis of neural networks. arXivNeurIPS (2018).Google Scholar
[210] Wang Shiqi, Pei Kexin, Whitehouse Justin, Yang Junfeng, and Jana Suman. 2018. Formal security analysis of neural networks using symbolic intervals. arXiv (2018).Google Scholar
[211] Washizaki Hironori, Uchida Hiromu, Khomh Foutse, and Guéhéneuc Yann-Gaël. 2019. Studying software engineering patterns for designing machine learning systems. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE, 49–54.Google ScholarCross Ref
[212] Wohlin Claes. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering - EASE'14. ACM Press, New York, NY, USA, Article 38. DOI:Google ScholarDigital Library
[213] Wohlin Claes, Runeson Per, Neto Paulo Anselmo da Mota Silveira, Engström Emelie, Machado Ivan do Carmo, and Almeida Eduardo Santana de. 2013. On the reliability of mapping studies in software engineering. Journal of Systems and Software 86, 10 (Oct 2013), 2594–2610. DOI:Google ScholarCross Ref
[214] Wolf Christine T. and Paine Drew. 2020. Sensemaking practices in the everyday work of AI/ML software engineering. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. ACM, 86–92. DOI:Google ScholarDigital Library
[215] Wolschke Christian, Kuhn Thomas, Rombach Dieter, and Liggesmeyer Peter. 2017. Observation based creation of minimal test suites for autonomous vehicles. In 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 294–301. DOI:Google ScholarCross Ref
[216] Wong W. Eric, Mittas Nikolaos, Arvanitou Elvira Maria, and Li Yihao. 2021. A bibliometric assessment of software engineering themes, scholars and institutions (2013–2020). Journal of Systems and Software 180 (2021), 111029. DOI:Google ScholarDigital Library
[217] Wu Weibin, Xu Hui, Zhong Sanqiang, Lyu Michael R., and King Irwin. 2019. Deep validation: Toward detecting real-world corner cases for deep neural networks. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 125–137. DOI:Google ScholarCross Ref
[218] Xie Tao. 2018. Intelligent software engineering: Synergy between AI and software engineering. In Dependable Software Engineering. Theories, Tools, and Applications, Feng Xinyu, Müller-Olm Markus, and Yang Zijiang (Eds.). Springer International Publishing, Cham, 3–7.Google Scholar
[219] Xie Xiaofei, Ma Lei, Juefei-Xu Felix, Xue Minhui, Chen Hongxu, Liu Yang, Zhao Jianjun, Li Bo, Yin Jianxiong, and See Simon. 2019. DeepHunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 158–168. DOI:Google ScholarDigital Library
[220] Xie Xiaofei, Ma Lei, Wang Haijun, Li Yuekang, Liu Yang, and Li Xiaohong. 2019. DiffChaser: Detecting disagreements for deep neural networks. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 5772–5778. DOI:Google ScholarCross Ref
[221] Yaghoubi Shakiba and Fainekos Georgios. 2018. Gray-box adversarial testing for control systems with machine learning component. arXiv (2018), 179–184.Google Scholar
[222] Yang Qian. 2017. The Role of Design in Creating Machine-Learning-enhanced User Experience. AAAI Spring Symposium - Technical Report SS-17-01 - (2017), 406–411.Google Scholar
[223] Yang Wei and Xie Tao. 2018. Telemade: A testing framework for learning-based malware detection systems. Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018), 400–403.Google Scholar
[224] Yang Zhuolin, Zhao Zhikuan, Pei Hengzhi, Wang Boxin, Karlas Bojan, Liu Ji, Guo Heng, Li Bo, and Zhang Ce. 2020. End-to-end robustness for sensing-reasoning machine learning pipelines. arXiv (2020), 1–43. arxiv:2003.00120.Google Scholar
[225] Yokoyama Haruki. 2019. Machine learning system architectural pattern for improving operational stability. In 2019 IEEE International Conference on Software Architecture Companion (ICSA-C). IEEE, 267–274. DOI:Google ScholarCross Ref
[226] Zhang Jie M., Harman Mark, Ma Lei, and Liu Yang. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).Google ScholarDigital Library
[227] Zhang Tianyi, Gao Cuiyun, Ma Lei, Lyu Michael, and Kim Miryung. 2019. An empirical study of common challenges in developing deep learning applications. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 104–115. DOI:Google ScholarCross Ref
[228] Zhang Xufan, Yang Yilin, Feng Yang, and Chen Zhenyu. 2019. Software engineering practice in the development of deep learning applications. arXiv (2019). arxiv:1910.03156.Google Scholar
[229] Zhang Yuhao, Chen Yifan, Cheung Shing-Chi, Xiong Yingfei, and Zhang Lu. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 129–140. DOI:Google ScholarDigital Library
[230] Zhao Shuai, Talasila Manoop, Jacobson Guy, Borcea Cristian, Aftab Syed Anwar, and Murray John F.. 2018. Packaging and sharing machine learning models via the Acumos AI open platform. arXiv (2018).Google Scholar
[231] Zhao Xinghan and Gao Xiangfei. 2018. An AI software test method based on scene deductive approach. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 14–20. DOI:Google ScholarCross Ref
[232] Zheng Wujie, Wang Wenyu, Liu Dian, Zhang Changrong, Zeng Qinsong, Deng Yuetang, Yang Wei, He Pinjia, and Xie Tao. 2019. Testing untestable neural machine translation: An industrial case. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 314–315. DOI:arxiv:1807.02340.Google ScholarDigital Library
[233] Zhou Husheng, Li Wei, Kong Zelun, Guo Junfeng, Zhang Yuqun, Yu Bei, Zhang Lingming, and Liu Cong. 2020. DeepBillboard: Systematic physical-world testing of autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. ACM, 347–358. DOI:arxiv:1812.10812.Google ScholarDigital Library

Index Terms

Software Engineering for AI-Based Systems: A Survey
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management

Recommendations

Intelligent Software Engineering: Synergy between AI and Software Engineering
ISEC '18: Proceedings of the 11th Innovations in Software Engineering Conference

There has been a long history of applying AI technologies to address software engineering problems especially on tool automation. On the other hand, given the increasing importance and popularity of AI software, recent research efforts have been on ...
Read More
Ways of applying artificial intelligence in software engineering
RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

As Artificial Intelligence (AI) techniques become more powerful and easier to use they are increasingly deployed as key components of modern software systems. While this enables new functionality and often allows better adaptation to user needs it also ...
Read More
Modern software cybernetics

Classify software cybernetics as Software Cybernetics I and II.Identify the transition from Software Cybernetics I to Software Cybernetics II.Indicate that some new research areas are related to Software Cybernetics II.Highlight new research trends of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 31, Issue 2
April 2022
789 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3492439
Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 April 2022
- Accepted: 1 August 2021
- Revised: 1 July 2021
- Received: 1 May 2021
Published in tosem Volume 31, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Software engineering
artificial intelligence
AI-based systems
systematic mapping study
Qualifiers
- survey
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 7,582
  Total Downloads
- Downloads (Last 12 months)4,031
- Downloads (Last 6 weeks)546
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Software Engineering for AI-Based Systems: A Survey

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Intelligent Software Engineering: Synergy between AI and Software Engineering

Ways of applying artificial intelligence in software engineering

Modern software cybernetics