skip to main content
survey

Software Engineering for AI-Based Systems: A Survey

Authors Info & Claims
Published:01 April 2022Publication History
Skip Abstract Section

Abstract

AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image-, speech-recognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems. To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study. We considered 248 studies published between January 2010 and March 2020. SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges. Our results are valuable for: researchers, to quickly understand the state-of-the-art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula.

REFERENCES

  1. [1] Abdessalem Raja Ben, Nejati Shiva, Briand Lionel C., and Stifter Thomas. 2018. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering. ACM, New York, NY, USA, 10161026. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Adedjouma Morayo, Pedroza Gabriel, and Bannour Boutheina. 2018. Representative safety assessment of autonomous vehicle for public transportation. In 2018 IEEE 21st International Symposium on Real-Time Distributed Computing (ISORC). IEEE, 124129. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Aggarwal Aniya, Lohia Pranay, Nagar Seema, Dey Kuntal, and Saha Diptikalyan. 2019. Black box fairness testing of machine learning models. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 625635. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Akkiraju Rama, Sinha Vibha, Xu Anbang, Mahmud Jalal, Gundecha Pritam, Liu Zhe, Liu Xiaotong, and Schumacher John. 2018. Characterizing machine learning process: A maturity framework. arXiv (2018).Google ScholarGoogle Scholar
  5. [5] Alahdab Mohannad and Çalıklı Gül. 2019. Empirical analysis of hidden technical debt patterns in machine learning software. In Product-Focused Software Process Improvement. Springer International Publishing, 195202. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Amershi Saleema, Begel Andrew, Bird Christian, DeLine Robert, Gall Harald, Kamar Ece, Nagappan Nachiappan, Nushi Besmira, and Zimmermann Thomas. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291300. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Amodei Dario, Olah Chris, Steinhardt Jacob, Christiano Paul, Schulman John, and Mané Dan. 2016. Concrete problems in AI safety. arXiv 277, 2003 (2016), 129. arxiv:1606.06565 http://arxiv.org/abs/1606.06565Google ScholarGoogle Scholar
  8. [8] Ampatzoglou Apostolos, Bibi Stamatia, Avgeriou Paris, Verbeek Marijn, and Chatzigeorgiou Alexander. 2019. Identifying, categorizing and mitigating threats to validity in software engineering secondary studies. Information and Software Technology 106 (2019), 201230.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Aniculaesei Adina, Grieser Jörg, Rausch Andreas, Rehfeldt Karina, and Warnecke Tim. 2018. Towards a holistic software systems engineering approach for dependable autonomous systems. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems. ACM, 2330. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Aniculaesei Adina, Grieser Jorg, Rausch Andreas, Rehfeldt Karina, and Warnecke Tim. 2019. Graceful degradation of decision and control responsibility for autonomous systems based on dependability cages. 5th International Symposium on Future Active Safety Technology toward Zero Accidents (FAST-zero’19)September (2019), 16.Google ScholarGoogle Scholar
  11. [11] Anthes Gary. 2017. Artificial intelligence poised to ride a new wave. Commun. ACM 60, 7 (2017), 1921.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Arnold M., Bellamy R. K. E., Hind M., Houde S., Mehta S., Mojsilović A., Nair R., Ramamurthy K. Natesan, Reimer D., Olteanu A., Piorkowski D., Tsay J., and Varshney K. R.. 2018. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. arXiv (2018). arxiv:1808.07261.Google ScholarGoogle Scholar
  13. [13] Arpteg Anders, Brinne Bjorn, Crnkovic-Friis Luka, and Bosch Jan. 2018. Software engineering challenges of deep learning. In 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 5059. DOI:arxiv:1810.12034.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Bailis Peter, Olukotun Kunle, Ré Christopher, and Zaharia Matei. 2017. Infrastructure for usable machine learning: The Stanford DAWN project. arXiv (2017). arxiv:1705.07538.Google ScholarGoogle Scholar
  15. [15] Banks Alec and Ashmore Rob. 2019. Requirements assurance in machine learning. CEUR Workshop Proceedings 2301 (2019).Google ScholarGoogle Scholar
  16. [16] Bansal Somil and Tomlin Claire J.. 2018. Control and safety of autonomous vehicles with learning-enabled components. In Safe, Autonomous and Intelligent Vehicles. Springer International Publishing, 5775. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Basili V., Caldiera G., and Rombach H. D.. 1994. The goal question metric approach. In Encyclopedia of Software Engineering, Vol. 2. John Wiley & Sons, 528532.Google ScholarGoogle Scholar
  18. [18] Baylor Denis, Breck Eric, Cheng Heng-Tze, Fiedel Noah, Foo Chuan Yu, Haque Zakaria, Haykal Salem, Ispir Mustafa, Jain Vihan, Koc Levent, Koo Chiu Yuen, Lew Lukasz, Mewald Clemens, Modi Akshay Naresh, Polyzotis Neoklis, Ramesh Sukriti, Roy Sudip, Whang Steven Euijong, Wicke Martin, Wilkiewicz Jarek, Zhang Xin, and Zinkevich Martin. 2017. TFX: A TensorFlow-based production-scale machine learning platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 13871395. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Behutiye Woubshet, Karhapää Pertti, López Lidia, Burgués Xavier, Martínez-Fernández Silverio, Vollmer Anna Maria, Rodríguez Pilar, Franch Xavier, and Oivo Markku. 2020. Management of quality requirements in agile and rapid software development: A systematic mapping study. Information and Software Technology 123 (2020), 106225.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Belani Hrvoje, Vukovic Marin, and Car Zeljka. 2019. Requirements engineering challenges in building AI-based complex systems. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 252255. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Bernardi Lucas, Mavridis Themistoklis, and Estevez Pablo. 2019. 150 successful machine learning models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 17431751. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Bolte Jan Aike, Bär Andreas, Lipinski Daniel, and Fingscheidt Tim. 2019. Towards corner case detection for autonomous driving. arXivIv (2019).Google ScholarGoogle Scholar
  23. [23] Borg Markus, Englund Cristofer, Wnuk Krzysztof, Duran Boris, Levandowski Christoffer, Gao Shenjian, Tan Yanwen, Kaijser Henrik, Lönn Henrik, and Törnqvist Jonas. 2018. Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry. arXiv preprint arXiv:1812.05389 (2018).Google ScholarGoogle Scholar
  24. [24] Bosch Jan, Crnkovic Ivica, and Olsson Helena Holmström. 2020. Engineering AI systems: A research agenda. arXiv (2020). arxiv:2001.07522.Google ScholarGoogle Scholar
  25. [25] Bourque Pierre and Richard E.. 2014. SWEBOK Version 3.0. IEEE, ISBN-10: 0-7695-5166-1 (2014).Google ScholarGoogle Scholar
  26. [26] Bozic Josip and Wotawa Franz. 2018. Security testing for chatbots. In Testing Software and Systems. Springer International Publishing, 3338. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Braiek Houssem Ben and Khomh Foutse. 2020. On testing machine learning programs. Journal of Systems and Software 164 (2020), 110542.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Breck Eric, Cai Shanqing, Nielsen Eric, Salib Michael, and Sculley D.. 2017. The ML test score: A rubric for ML production readiness and technical debt reduction. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 11231132. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Breck Eric, Polyzotis Neoklis, Roy Sudip, Whang Steven Euijong, and Zinkevich Martin. 2019. Data validation for machine learning. SysML (2019), 114.Google ScholarGoogle Scholar
  30. [30] Brereton Pearl, Kitchenham Barbara A., Budgen David, Turner Mark, and Khalil Mohamed. 2007. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software 80, 4 (Apr 2007), 571583. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Bryson Joanna and Winfield Alan. 2017. Standardizing ethical design for artificial intelligence and autonomous systems. Computer 50, 5 (May 2017), 116119. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Burton Simon, Gauerhof Lydia, and Heinzemann Christian. 2017. Making the case for safety of machine learning in highly automated driving. In Lecture Notes in Computer Science. Springer International Publishing, 516. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Byun Taejoon, Sharma Vaibhav, Vijayakumar Abhishek, Rayadurgam Sanjai, and Cofer Darren. 2019. Input prioritization for testing neural networks. In 2019 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 6370. DOI:arxiv:1901.03768.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Cai Shanqing, Breck Eric, Nielsen Eric, Salib Michael, and Sculley D.. 2016. TensorFlow debugger: Debugging dataflow graphs for machine learning. In Proceedings of the Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016). https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45789.pdf.Google ScholarGoogle Scholar
  35. [35] Chakarov Aleksandar, Nori Aditya, Rajamani Sriram, Sen Shayak, and Vijaykeerthy Deepak. 2016. Debugging machine learning tasks. arXiv (2016), 129. arxiv:1603.07292 http://arxiv.org/abs/1603.07292.Google ScholarGoogle Scholar
  36. [36] Chakravarty Anand. 2010. Stress testing an AI based web service: A case study. In 2010 Seventh International Conference on Information Technology: New Generations. IEEE, 10041008. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Chen Meng, Knapp Andreas, Pohl Martin, and Dietmayer Klaus. 2018. Taming functional deficiencies of automated driving systems: A methodology framework toward safety validation. In 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 19181924. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Cheng Chih-Hong, Nührenberg Georg, Huang Chung-Hao, and Ruess Harald. 2018. Verification of binarized neural networks via inter-neuron factoring. In Lecture Notes in Computer Science. Springer International Publishing, 279290. DOI:arxiv:arXiv:1710.03107v2.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Coates D. L. and Martin A.. 2019. An instrument to evaluate the maturity of bias governance capability in artificial intelligence projects. IBM Journal of Research and Development 63, 4/5 (Jul 2019), 7:1–7:15. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Colomo-Palacios Ricardo. 2019. Towards a Software Engineering Framework for the Design, Construction and Deployment of Machine Learning-Based Solutions in Digitalization Processes. 343–349.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Costal Dolors, Farré Carles, Franch Xavier, and Quer Carme. 2021. How tertiary studies perform quality assessment of secondary studies in software engineering. In 2021 Proceedings of 24th IberoAmerican Conference on Software Engineering (CIbSE 2021), ESELAW track.Google ScholarGoogle Scholar
  42. [42] Crankshaw Daniel, Wang Xin, Zhou Giulio, Franklin Michael J., Gonzalez Joseph E., and Stoica Ion. 2017. Clipper: A low-latency online prediction serving system. Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017 (2017), 613627. arxiv:1612.03079.Google ScholarGoogle Scholar
  43. [43] Cruzes Daniela S. and Dyba Tore. 2011. Recommended steps for thematic synthesis in software engineering. In 2011 International Symposium on Empirical Software Engineering and Measurement. IEEE, 275284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Nascimento Elizamary de Souza, Ahmed Iftekhar, Oliveira Edson, Palheta Marcio Piedade, Steinmacher Igor, and Conte Tayana. 2019. Understanding development process of machine learning systems: Challenges and solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 16. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Deak Ryan M. and Morra Jonathan H.. 2018. Aloha: A machine learning framework for engineers. Conference on Systems and Machine Learning (MLSys) (2018), 1719. https://www.sysml.cc/doc/13.pdf.Google ScholarGoogle Scholar
  46. [46] Deng Li. 2018. Artificial intelligence in the rising wave of deep learning: The historical path and future outlook [perspectives]. IEEE Signal Processing Magazine 35, 1 (2018), 180–177.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Desai Ankush, Ghosh Shromona, Seshia Sanjit A., Shankar Natarajan, and Tiwari Ashish. 2019. SOTER: A runtime assurance framework for programming safe robotics systems. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 138150. DOI:arxiv:1808.07921.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Dreossi Tommaso, Fremont Daniel J., Ghosh Shromona, Kim Edward, Ravanbakhsh Hadi, Vazquez-Chanlatte Marcell, and Seshia Sanjit A.. 2019. VerifAI: A toolkit for the formal design and analysis of artificial intelligence-based systems. In Computer Aided Verification, Dillig Isil and Tasiran Serdar (Eds.). Springer International Publishing, Cham, 432442.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Dreossi Tommaso, Jha Somesh, and Seshia Sanjit A.. 2018. Semantic adversarial deep learning. arXiv 2 (2018), 326.Google ScholarGoogle Scholar
  50. [50] Du Xiaoning, Xie Xiaofei, Li Yi, Ma Lei, Liu Yang, and Zhao Jianjun. 2019. DeepStellar: Model-based quantitative analysis of stateful deep learning systems. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 477487. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Du Xiaoning, Xie Xiaofei, Li Yi, Ma Lei, Liu Yang, and Zhao Jianjun. 2019. A quantitative analysis framework for recurrent neural network. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 10621065. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Dwarakanath Anurag, Ahuja Manish, Sikand Samarth, Rao Raghotham M., Bose R. P. Jagadeesh Chandra, Dubash Neville, and Podder Sanjay. 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 118128. DOI:arxiv:1808.05353.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Emam Khaled El. 1999. Benchmarking Kappa: Interrater agreement in software process assessments. Empir. Softw. Eng. 4, 2 (1999), 113133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Eniser Hasan Ferit, Gerasimou Simos, and Sen Alper. 2019. DeepFault: Fault localization for deep neural networks. In Fundamental Approaches to Software Engineering. Springer International Publishing, 171191. DOI:arxiv:1902.05974.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Eykholt Kevin, Evtimov Ivan, Fernandes Earlence, Li Bo, Rahmati Amir, Xiao Chaowei, Prakash Atul, Kohno Tadayoshi, and Song Dawn. 2017. Robust physical-world attacks on deep learning models. arXiv (2017). arxiv:1707.08945 http://arxiv.org/abs/1707.08945.Google ScholarGoogle Scholar
  56. [56] Feng Yang, Shi Qingkai, Gao Xinyu, Wan Jun, Fang Chunrong, and Chen Zhenyu. 2020. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 177188. DOI:arxiv:1903.00661.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Feth Patrik, Schneider Daniel, and Adler Rasmus. 2017. A conceptual safety supervisor definition and evaluation framework for autonomous systems. In Lecture Notes in Computer Science. Springer International Publishing, 135148. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Fiebrink Rebecca, Cook Perry R., and Trueman Dan. 2011. Human model evaluation in interactive supervised learning. In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems - CHI'11. ACM Press, 147156. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Flaounas Ilias. 2017. Beyond the technical challenges for deploying machine learning solutions in a software company. arXiv (2017). arxiv:1708.02363.Google ScholarGoogle Scholar
  60. [60] Foidl Harald, Felderer Michael, and Biffl Stefan. 2019. Technical debt in data-intensive software systems. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 338341. DOI:arxiv:1905.13455.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Franco-Bedoya Oscar, Ameller David, Costal Dolors, and Franch Xavier. 2017. Open source software ecosystems: A systematic mapping. Information and Software Technology 91 (2017), 160185.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Fremont Daniel J., Kim Edward, Pant Yash Vardhan, Seshia Sanjit A., Acharya Atul, Bruso Xantha, Wells Paul, Lemke Steve, Lu Qiang, and Mehta Shalin. 2020. Formal scenario-based testing of autonomous vehicles: From simulation to the real world. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE. DOI:arxiv:2003.07739.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Gambi Alessio, Mueller Marc, and Fraser Gordon. 2019. Automatically testing self-driving cars with search-based procedural content generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 273283. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Gao Jerry, Tao Chuanqi, Jie Dou, and Lu Shengqiang. 2019. Invited paper: What is AI software testing? and why. In 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE). IEEE, 2736. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Garcia Alvaro Lopez, Lucas Jesus Marco De, Antonacci Marica, Castell Wolfgang Zu, David Mario, Hardt Marcus, Iglesias Lara Lloret, Molto Germen, Plociennik Marcin, Tran Viet, Alic Andy S., Caballer Miguel, Plasencia Isabel Campos, Costantini Alessandro, Dlugolinsky Stefan, Duma Doina Cristina, Donvito Giacinto, Gomes Jorge, Cacha Ignacio Heredia, Ito Keiichi, Kozlov Valentin Y., Nguyen Giang, Fernandez Pablo Orviz, Sustr Zdenek, and Wolniewicz Pawel. 2020. A cloud-based framework for machine learning workloads and applications. IEEE Access 8 (2020), 1868118692. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Gauerhof Lydia, Munk Peter, and Burton Simon. 2018. Structuring validation targets of a machine learning function applied to automated driving. In Developments in Language Theory. Springer International Publishing, 4558. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Gerasimou Simos, Eniser Hasan Ferit, Sen Alper, and Cakan Alper. 2020. Importance-driven deep learning system testing. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings. ACM, 322323. DOI:arxiv:2002.03433.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Gharib Mohamad, Lollini Paolo, Botta Marco, Amparore Elvio, Donatelli Susanna, and Bondavalli Andrea. 2018. On the safety of automotive systems incorporating machine learning based components: A position paper. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 271274. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Ghofrani Javad, Kozegar Ehsan, Bozorgmehr Arezoo, and Soorati Mohammad Divband. 2019. Reusability in artificial neural networks. In Proceedings of the 23rd International Systems and Software Product Line Conference Volume B - SPLC'19. ACM Press. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Ghosh Shromona, Ravanbakhsh Hadi, and Seshia Sanjit A.. 2019. Counterexample-guided synthesis of perception models and control. arXiv (2019). arxiv:1911.01523.Google ScholarGoogle Scholar
  71. [71] Giray Görkem. 2021. A software engineering perspective on engineering machine learning systems: State of the art and challenges. Journal of Systems and Software 180 (2021), 111031. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Gopinath Divya, Katz Guy, Pasareanu Corina S., and Barrett Clark. 2017. DeepSafe: A data-driven approach for checking adversarial robustness in neural networks. arXiv (2017). arxiv:1710.00486.Google ScholarGoogle Scholar
  73. [73] Guo Qianyu, Chen Sen, Xie Xiaofei, Ma Lei, Hu Qiang, Liu Hongtao, Liu Yang, Zhao Jianjun, and Li Xiaohong. 2019. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 810822. DOI:arxiv:1909.06727.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Hains Gaetan, Jakobsson Arvid, and Khmelevsky Youry. 2018. Towards formal methods and software engineering for deep learning: Security, safety and productivity for DL systems development. In 2018 Annual IEEE International Systems Conference (SysCon). IEEE, 15. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Haldar Malay, Abdool Mustafa, Ramanathan Prashant, Xu Tao, Yang Shulin, Duan Huizhong, Zhang Qing, Barrow-Williams Nick, Turnbull Bradley C., Collins Brendan M., and Legrand Thomas. 2018. Applying deep learning to Airbnb search. arXiv (2018), 19271935.Google ScholarGoogle Scholar
  76. [76] Hartsell Charles, Mahadevan Nagabhushan, Ramakrishna Shreyas, Dubey Abhishek, Bapty Theodore, Johnson Taylor, Koutsoukos Xenofon, Sztipanovits Janos, and Karsai Gabor. 2019. Model-based design for CPS with learning-enabled components. In Proceedings of the Workshop on Design Automation for CPS and IoT - DESTION'19. ACM Press, 19. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Hauer Florian, Schmidt Tabea, Holzmuller Bernd, and Pretschner Alexander. 2019. Did we test all scenarios for automated and autonomous driving systems? In 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 29502955. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Henderson Peter, Sinha Koustuv, Angelard-Gontier Nicolas, Ke Nan Rosemary, Fried Genevieve, Lowe Ryan, and Pineau Joelle. 2017. Ethical challenges in data-driven dialogue systems. arXiv (2017), 123129.Google ScholarGoogle Scholar
  79. [79] Henriksson Jens, Borg Markus, and Englund Cristofer. 2018. Automotive safety and machine learning. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems. ACM, 4749. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Hill Charles, Bellamy Rachel, Erickson Thomas, and Burnett Margaret. 2016. Trials and tribulations of developers of intelligent systems: A field study. In 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 162170. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Holstein Kenneth, Vaughan Jennifer Wortman, Daumé Hal, Dudík Miroslav, and Wallach Hanna. 2018. Improving fairness in machine learning systems: What do industry practitioners need? arXiv (2018), 116.Google ScholarGoogle Scholar
  82. [82] Horkoff Jennifer. 2019. Non-functional requirements for machine learning: Challenges and new directions. In 2019 IEEE 27th International Requirements Engineering Conference (RE). IEEE, 386391. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  83. [83] Huang Song. 2018. Challenges of testing machine learning applications. International Journal of Performability Engineering (2018), 12751282. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  84. [84] Huang Xiaowei, Kwiatkowska Marta, Wang Sen, and Wu Min. 2017. Safety verification of deep neural networks. In Computer Aided Verification. Springer International Publishing, 329. DOI:arxiv:1610.06940.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Hummer Waldemar, Muthusamy Vinod, Rausch Thomas, Dube Parijat, Maghraoui Kaoutar El, Murthi Anupama, and Oum Punleuk. 2019. ModelOps: Cloud-based lifecycle management for reliable and trusted AI. In 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 113120. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  86. [86] Ingrand Felix. 2019. Recent trends in formal validation and verification of autonomous robots software. In 2019 Third IEEE International Conference on Robotic Computing (IRC). IEEE, 321328. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Standardization International Organization For. 2011. ISO/IEC 25010 - Systems and Software Engineering - Systems and Software Quality Requirements and Evaluation (SQuaRE) - System and Software Quality Models. 25 pages. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35733.Google ScholarGoogle Scholar
  88. [88] Ishikawa Fuyuki. 2018. Concepts in quality assessment for machine learning - from test data to arguments. In Conceptual Modeling. Springer International Publishing, 536544. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Ishikawa Fuyuki and Matsuno Yutaka. 2018. Continuous argument engineering: Tackling uncertainty in machine learning based systems. In Developments in Language Theory. Springer International Publishing, 1421. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  90. [90] Ishikawa Fuyuki and Yoshioka Nobukazu. 2019. How do engineers perceive difficulties in engineering of machine-learning systems? - Questionnaire survey. In 2019 IEEE/ACM Joint 7th International Workshop on Conducting Empirical Studies in Industry (CESI) and 6th International Workshop on Software Engineering Research and Industrial Practice (SER&IP). IEEE, 29. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. [91] Islam Md Johirul, Nguyen Giang, Pan Rangeet, and Rajan Hridesh. 2019. A comprehensive study on deep learning bug characteristics. arXiv (2019), 510520.Google ScholarGoogle Scholar
  92. [92] Islam Md Johirul, Nguyen Hoan Anh, Pan Rangeet, and Rajan Hridesh. 2019. What do developers ask about ML libraries? A large-scale study using stack overflow. arXivMl (2019). arxiv:1906.11940.Google ScholarGoogle Scholar
  93. [93] Ivarsson Martin and Gorschek Tony. 2010. A method for evaluating rigor and industrial relevance of technology evaluations. Empirical Software Engineering 16, 3 (Oct 2010), 365395. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. [94] Jenn Eric, Albore Alexandre, Mamalet Franck, Flandin Grégory, Gabreau Christophe, Delseny Hervé, Gauffriau Adrien, Bonnin Hugues, Alecu Lucian, Pirard Jérémy, Lefevre Baptiste, Gabriel Jean-Marc, Cappi Cyril, Gardès Laurent, Picard Sylvaine, Dulon Gilles, Beltran Brice, Bianic Jean-Christophe, Damour Mathieu, Delmas Kevin, and Pagetti Claire. 2020. Identifying challenges to the certification of machine learning for safety critical systems. In Proceedings of the 10th European Congress on Embedded Real Time Systems (ERTS). 10.Google ScholarGoogle Scholar
  95. [95] Jentzsch Sophie F. and Hochgeschwender Nico. 2019. Don't forget your roots! Using provenance data for transparent and explainable development of machine learning models. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). IEEE, 3740. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  96. [96] Ji Yujie, Zhang Xinyang, Ji Shouling, Luo Xiapu, and Wang Ting. 2018. Model-reuse attacks on deep learning systems. arXiv (2018), 349363.Google ScholarGoogle Scholar
  97. [97] Jia Minghua, Wang Xiaodong, Xu Yue, Cui Zhanqi, and Xie Ruilin. 2020. Testing machine learning classifiers based on compositional metamorphic relations. International Journal of Performability Engineering 16, 1 (2020), 67. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  98. [98] Juez Garazi, Amparan Estibaliz, Lattarulo Ray, Rastelli Joshue Perez, Ruiz Alejandra, and Espinoza Huascar. 2017. Safety assessment of automated vehicle functions by simulation-based fault injection. In 2017 IEEE International Conference on Vehicular Electronics and Safety (ICVES). IEEE, 214219. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. [99] Kery Mary Beth, Radensky Marissa, Arya Mahima, John Bonnie E., and Myers Brad A.. 2018. The story in the notebook. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 111. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. [100] Khalajzadeh Hourieh, Abdelrazek Mohamed, Grundy John, Hosking John, and He Qiang. 2018. A survey of current end-user data analytics tool support. In 2018 IEEE International Congress on Big Data (BigData Congress). IEEE, 4148. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  101. [101] Khomh Foutse, Adams Bram, Cheng Jinghui, Fokaefs Marios, and Antoniol Giuliano. 2018. Software engineering for machine-learning applications: The road ahead. IEEE Software 35, 5 (2018), 8184.Google ScholarGoogle ScholarCross RefCross Ref
  102. [102] Kim Miryung, Zimmermann Thomas, DeLine Robert, and Begel Andrew. 2018. Data scientists in software teams: State of the art and challenges. IEEE Transactions on Software Engineering 44, 11 (Nov 2018), 10241038. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  103. [103] Kitchenham Barbara. 2004. Procedures for performing systematic reviews. Keele, UK, Keele University 33, 2004 (2004), 126.Google ScholarGoogle Scholar
  104. [104] Kitchenham Barbara and Charters Stuart. 2007. Guidelines for performing systematic literature reviews in software engineering. Keele University and University of Durham.Google ScholarGoogle Scholar
  105. [105] Klueck Florian, Li Yihao, Nica Mihai, Tao Jianbo, and Wotawa Franz. 2018. Using ontologies for test suites generation for automated and autonomous driving functions. In 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 118123. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  106. [106] Koopman Philip and Wagner Michael. 2016. Challenges in autonomous vehicle testing and validation. SAE International Journal of Transportation Safety 4, 1 (Apr 2016), 1524. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  107. [107] Koopman Philip and Wagner Michael. 2018. Toward a framework for highly automated vehicle safety validation. In SAE Technical Paper Series. SAE International, 113. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  108. [108] Koren Mark and Kochenderfer Mykel J.. 2019. Efficient autonomy validation in simulation with adaptive stress testing. arXiv (2019), 41784183.Google ScholarGoogle Scholar
  109. [109] Koseler Kaan, McGraw Kelsea, and Stephan Matthew. 2019. Realization of a machine learning domain specific modeling language: A baseball analytics case study. In Proceedings of the 7th International Conference on Model-Driven Engineering and Software Development. SciTePress - Science and Technology Publications, 1324. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. [110] Kostova Blagovesta, Gürses Seda, and Wegmann Alain. 2020. On the interplay between requirements, engineering, and artificial intelligence. CEUR Workshop Proceedings 2584 (2020).Google ScholarGoogle Scholar
  111. [111] Kühl Niklas, Goutier Marc, Hirt Robin, and Satzger Gerhard. 2019. Machine learning in artificial intelligence: Towards a common understanding. In 52nd Hawaii International Conference on System Sciences, HICSS 2019, Grand Wailea, Maui, Hawaii, USA, January 8-11, 2019, Bui Tung (Ed.). ScholarSpace, 110. http://hdl.handle.net/10125/59960.Google ScholarGoogle ScholarCross RefCross Ref
  112. [112] Kuhrmann Marco, Fernández Daniel Méndez, and Daneva Maya. 2017. On the pragmatic design of literature studies in software engineering: An experience-based guideline. Empirical Software Engineering 22, 6 (Jan 2017), 28522891. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. [113] Kulesza Todd, Burnett Margaret, Wong Weng-Keen, and Stumpf Simone. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, 126137. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. [114] Kumar Abhishek, Braud Tristan, Tarkoma Sasu, and Hui Pan. 2020. Trustworthy AI in the age of pervasive computing and big data. arXiv (2020). arxiv:2002.05657.Google ScholarGoogle Scholar
  115. [115] Kumeno Fumihiro. 2019. Software engineering challenges for machine learning applications: A literature review. Intelligent Decision Technologies 13, 4 (2019), 463476.Google ScholarGoogle ScholarCross RefCross Ref
  116. [116] Kuwajima Hiroshi and Ishikawa Fuyuki. 2019. Adapting SQuaRE for quality assessment of artificial intelligence systems. In 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 1318. DOI:arxiv:1908.02134.Google ScholarGoogle ScholarCross RefCross Ref
  117. [117] Kuwajima Hiroshi, Yasuoka Hirotoshi, and Nakae Toshihiro. 2019. Open Problems in Engineering Machine Learning Systems and the Quality Model. arXiv (2019). arxiv:1904.00001v1.Google ScholarGoogle Scholar
  118. [118] Kuwajima Hiroshi, Yasuoka Hirotoshi, and Nakae Toshihiro. 2020. Engineering problems in machine learning systems. Machine Learning 109, 5 (Apr 2020), 11031126. DOI:arxiv:1904.00001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. [119] Kästner Christian and Kang Eunsuk. 2020. Teaching software engineering for AI-enabled systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering Education and Training. ACM, 4548. DOI:arxiv:2001.06691.Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. [120] Lan Shuyue, Huang Chao, Wang Zhilu, Liang Hengyi, Su Wenhao, and Zhu Qi. 2018. Design automation for intelligent automotive systems. In 2018 IEEE International Test Conference (ITC). IEEE, 110. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  121. [121] Leofante Francesco, Pulina Luca, and Tacchella Armando. 2016. Learning with safety requirements: State of the art and open questions. CEUR Workshop Proceedings 1745 (2016), 1125.Google ScholarGoogle Scholar
  122. [122] Leotta Maurizio, Olianas Dario, Ricca Filippo, and Noceti Nicoletta. 2019. How do implementation bugs affect the results of machine learning algorithms? In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 13041313. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. [123] Liu David C., Rogers Stephanie, Shiau Raymond, Kislyuk Dmitry, Ma Kevin C., Zhong Zhigang, Liu Jenny, and Jing Yushi. 2017. Related pins at Pinterest. In Proceedings of the 26th International Conference on World Wide Web Companion - WWW'17 Companion. ACM Press, 583592. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. [124] Lorenzoni Giuliano, Alencar Paulo, Nascimento Nathalia, and Cowan Donald. 2021. Machine learning model development from a software engineering perspective: A systematic literature review. arXiv preprint arXiv:2102.07574 (2021).Google ScholarGoogle Scholar
  125. [125] Lwakatare Lucy Ellen, Raj Aiswarya, Bosch Jan, Olsson Helena Holmström, and Crnkovic Ivica. 2019. A taxonomy of software engineering challenges for machine learning systems: An empirical investigation. In Lecture Notes in Business Information Processing. Springer International Publishing, 227243. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  126. [126] Lwakatare Lucy Ellen, Raj Aiswarya, Crnkovic Ivica, Bosch Jan, and Olsson Helena Holmström. 2020. Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions. Information and Software Technology 127 (2020), 106368.Google ScholarGoogle ScholarCross RefCross Ref
  127. [127] Ma Lei, Zhang Fuyuan, Xue Minhui, Li Bo, Liu Yang, Zhao Jianjun, and Wang Yadong. 2018. Combinatorial testing for deep learning systems. arXiv (2018), 614618. arxiv:1806.07723.Google ScholarGoogle Scholar
  128. [128] Ma Shiqing, Liu Yingqi, Lee Wen-Chuan, Zhang Xiangyu, and Grama Ananth. 2018. MODE: Automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 175186. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. [129] Machida Fumio. 2019. N-version machine learning models for safety critical systems. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 4851. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  130. [130] Machida Fumio. 2019. On the diversity of machine learning models for system reliability. In 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 276285. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  131. [131] Majumdar Rupak, Mathur Aman, Pirron Marcus, Stegner Laura, and Zufferey Damien. 2019. Paracosm: A language and tool for testing autonomous driving systems. arXiv (2019). arxiv:1902.01084.Google ScholarGoogle Scholar
  132. [132] Mallozzi Piergiuseppe, Pelliccione Patrizio, and Menghi Claudio. 2018. Keeping intelligence under control. In Proceedings of the 1st International Workshop on Software Engineering for Cognitive Services. ACM, 3740. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. [133] Martínez-Fernández Silverio, Franch Xavier, Jedlitschka Andreas, Oriol Marc, and Trendowicz Adam. 2020. Research directions for developing and operating artificial intelligence models in trustworthy autonomous systems. arXiv (2020). arxiv:2003.05434.Google ScholarGoogle Scholar
  134. [134] Masuda Satoshi, Ono Kohichi, Yasue Toshiaki, and Hosokawa Nobuhiro. 2018. A survey of software quality for machine learning applications. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 279284.Google ScholarGoogle ScholarCross RefCross Ref
  135. [135] Mattos David Issa, Bosch Jan, and Olsson Helena Holmström. 2019. Leveraging business transformation with machine learning experiments. In Lecture Notes in Business Information Processing. Springer International Publishing, 183191. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  136. [136] McDermid John, Jia Yan, and Habli Ibrahim. 2019. Towards a framework for safety assurance of autonomous systems. CEUR Workshop Proceedings 2419 (2019).Google ScholarGoogle Scholar
  137. [137] John Meenu Mary, Olsson Helena Holmström, and Bosch Jan. [n.d.]. Architecting AI deployment: A systematic review of state-of-the-art and state-of-practice literature.Google ScholarGoogle Scholar
  138. [138] Menzies Tim. 2020. The five laws of SE for AI. IEEE Software 37, 1 (Jan 2020), 8185. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. [139] Molina Caroline Bianca Santos Tancredi, Almeida Jorge Rady de, Vismari Lucio F., Gonzalez Rodrigo Ignacio R., Naufal Jamil K., and Camargo Joao Batista. 2017. Assuring fully autonomous vehicles safety by design: The autonomous vehicle control (AVC) module strategy. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 1621. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  140. [140] Moreb Mohammed, Mohammed Tareq Abed, Bayat Oguz, and Ata Oguz. 2020. Corrections to “A novel software engineering approach toward using machine learning for improving the efficiency of health systems”. IEEE Access 8 (2020), 136459136459. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  141. [141] Mourão Erica, Pimentel João Felipe, Murta Leonardo, Kalinowski Marcos, Mendes Emilia, and Wohlin Claes. 2020. On the performance of hybrid search strategies for systematic literature reviews in software engineering. Information and Software Technology 123 (Jul 2020), 106294. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  142. [142] Munappy Aiswarya, Bosch Jan, Olsson Helena Holmstrom, Arpteg Anders, and Brinne Bjorn. 2019. Data management challenges for deep learning. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 140147. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  143. [143] Nakajima Shin. 2018. [Invited] Quality assurance of machine learning software. In 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE). IEEE, 143144. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  144. [144] Nakajima Shin. 2019. Dataset diversity for metamorphic testing of machine learning software. In Structured Object-Oriented Formal Language and Method, Duan Zhenhua, Liu Shaoying, Tian Cong, and Nagoya Fumiko (Eds.). Springer International Publishing, Cham, 2138. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  145. [145] Nakajima Shin. 2019. Quality evaluation assurance levels for deep neural networks software. In 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  146. [146] Nalchigar Soroosh, Yu Eric, Obeidi Yazan, Carbajales Sebastian, Green John, and Chan Allen. 2019. Solution patterns for machine learning. In Advanced Information Systems Engineering. Springer International Publishing, 627642. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. [147] Nascimento Elizamary, Nguyen-Duc Anh, Sundbø Ingrid, and Conte Tayana. 2020. Software engineering for artificial intelligence and machine learning software: A systematic literature review. arXiv preprint arXiv:2011.03751 (2020).Google ScholarGoogle Scholar
  148. [148] Naur Peter, Randell Brian, Bauer Friedrich Ludwig, and Committee. NATO Science (Eds.). 1969. Software Engineering: Report on a Conference Sponsored by the NATO Science Committee, Garmisch, Germany, 7th to 11th October 1968. Scientific Affairs Division, NATO.Google ScholarGoogle Scholar
  149. [149] Nishi Yasuharu, Masuda Satoshi, Ogawa Hideto, and Uetsuki Keiji. 2018. A test architecture for machine learning product. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 273278. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  150. [150] Nushi Besmira, Kamar Ece, and Horvitz Eric. 2018. Towards accountable AI: Hybrid human-machine analyses for characterizing system failure. arXivHcomp (2018), 126135. arxiv:1809.07424.Google ScholarGoogle Scholar
  151. [151] Odena Augustus and Goodfellow Ian. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. arXiv (2018).Google ScholarGoogle Scholar
  152. [152] Otero Carlos E. and Peter Adrian. 2015. Research directions for engineering big data analytics software. IEEE Intelligent Systems 30, 1 (Jan 2015), 1319. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. [153] Ozkaya Ipek. 2020. What is really different in engineering AI-enabled systems? IEEE Software 37, 4 (Jul 2020), 36. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. [154] Partridge D. and Wilks Y.. 1987. Does AI have a methodology which is different from software engineering? Artificial Intelligence Review 1, 2 (1987), 111120. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  155. [155] Patel Kayur. 2010. Lowering the barrier to applying machine learning. In Adjunct Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology - UIST'10. ACM Press, 355358. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. [156] Pedroza Gabriel and Morayo Adedjouma. 2019. Safe-by-design development method for artificial intelligent based systems. In Proceedings of the 31st International Conference on Software Engineering and Knowledge Engineering. KSI Research Inc. and Knowledge Systems Institute Graduate School, 391397. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  157. [157] Perkusich Mirko, Silva Lenardo Chaves e, Costa Alexandre, Ramos Felipe, Saraiva Renata, Freire Arthur, Dilorenzo Ednaldo, Dantas Emanuel, Santos Danilo, Gorgônio Kyller, Almeida Hyggo, and Perkusich Angelo. 2020. Intelligent software engineering in the context of agile software development: A systematic literature review. Information and Software Technology 119 (2020), 106241. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. [158] Petersen Kai, Feldt Robert, Mujtaba Shahid, and Mattsson Michael. 2008. Systematic mapping studies in software engineering. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering. BCS Learning & Development, 6877. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  159. [159] Petersen Kai, Vakkalanka Sairam, and Kuzniarz Ludwik. 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (Aug 2015), 118. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. [160] Pulina Luca and Tacchella Armando. 2010. An abstraction-refinement approach to verification of artificial neural networks. CEUR Workshop Proceedings 616 (2010), 243257.Google ScholarGoogle Scholar
  161. [161] Rahimi Mona, Guo Jin L. C., Kokaly Sahar, and Chechik Marsha. 2019. Toward requirements specification for machine-learned components. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 241244. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  162. [162] Rahman Saidur, River Emilio, Khomh Foutse, Guhneuc Yann Gal, and Lehnert Bernd. 2019. Machine learning software engineering in practice: An industrial case study. arXiv (2019), 121. arxiv:1906.07154.Google ScholarGoogle Scholar
  163. [163] Raji Inioluwa Deborah, Smart Andrew, White Rebecca N., Mitchell Margaret, Gebru Timnit, Hutchinson Ben, Smith-Loud Jamila, Theron Daniel, and Barnes Parker. 2020. Closing the AI accountability gap. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, 3344. DOI:arxiv:2001.00973.Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. [164] Ralph Paul, Ali Nauman bin, Baltes Sebastian, Bianculli Domenico, Diaz Jessica, Dittrich Yvonne, Ernst Neil, Felderer Michael, Feldt Robert, Filieri Antonio, França Breno Bernard Nicolau de, Furia Carlo Alberto, Gay Greg, Gold Nicolas, Graziotin Daniel, He Pinjia, Hoda Rashina, Juristo Natalia, Kitchenham Barbara, Lenarduzzi Valentina, Martínez Jorge, Melegati Jorge, Mendez Daniel, Menzies Tim, Molleri Jefferson, Pfahl Dietmar, Robbes Romain, Russo Daniel, Saarimäki Nyyti, Sarro Federica, Taibi Davide, Siegmund Janet, Spinellis Diomidis, Staron Miroslaw, Stol Klaas, Storey Margaret-Anne, Taibi Davide, Tamburri Damian, Torchiano Marco, Treude Christoph, Turhan Burak, Wang Xiaofeng, and Vegas Sira. 2021. Empirical Standards for Software Engineering Research. arxiv:2010.03525 [cs.SE].Google ScholarGoogle Scholar
  165. [165] Ribeiro Marco Tulio, Singh Sameer, and Guestrin Carlos. 2016. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 11351144. DOI:arxiv:1602.04938.Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. [166] Riccio Vincenzo, Jahangirova Gunel, Stocco Andrea, Humbatova Nargiz, Weiss Michael, and Tonella Paolo. 2020. Testing machine learning based systems: A systematic mapping. Empirical Software Engineering 25, 6 (2020), 51935254. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. [167] Rill R. A. and Lőrincz A.. 2019. Cognitive modeling approach for dealing with challenges in cyber-physical systems. Studia Universitatis Babe s-Bolyai Informatica 64, 1 (Jun 2019), 5166. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  168. [168] Rubaiyat Abu Hasnat Mohammad, Qin Yongming, and Alemzadeh Homa. 2018. Experimental resilience assessment of an open-source driving agent. In 2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 5463. DOI:arxiv:1807.06172.Google ScholarGoogle ScholarCross RefCross Ref
  169. [169] Russell Stuart J. and Norvig Peter. 2021. Artificial Intelligence: A Modern Approach (Fourth edition). Pearson, Hoboken.Google ScholarGoogle Scholar
  170. [170] Salay Rick and Czarnecki Krzysztof. 2018. Using machine learning safely in automotive software: An assessment and adaption of software process requirements in ISO 26262. arXiv (2018). arxiv:1808.01614.Google ScholarGoogle Scholar
  171. [171] Salay Rick and Czarnecki Krzysztof. 2019. Improving ML safety with partial specifications. In Lecture Notes in Computer Science. Springer International Publishing, 288300. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. [172] Salay Rick, Queiroz Rodrigo, and Czarnecki Krzysztof. 2017. An analysis of ISO 26262: Using machine learning safely in automotive software. arXiv (2017). arxiv:1709.02435.Google ScholarGoogle Scholar
  173. [173] Santhanam P., Farchi Eitan, and Pankratius Victor. 2019. Engineering reliable deep learning systems. arXiv 3 (2019), 18. arxiv:1910.12582.Google ScholarGoogle Scholar
  174. [174] Sarathy Prakash, Baruah Sanjoy, Cook Stephen, and Wolf Marilyn. 2019. Realizing the promise of artificial intelligence for unmanned aircraft systems through behavior bounded assurance. In 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC). IEEE. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  175. [175] Sato Naoto, Kuruma Hironobu, Kaneko Masanori, Nakagawa Yuichiroh, Ogawa Hideto, Hoang Thai Son, and Butler Michael. 2018. DeepSaucer: Unified environment for verifying deep neural networks. arXiv (2018). arxiv:1811.03752.Google ScholarGoogle Scholar
  176. [176] Saunders William, Stuhlmüller Andreas, Sastry Girish, and Evans Owain. 2018. Trial without error: Towards safe reinforcement learning via human intervention. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 3 (2018), 20672069. arxiv:1707.05173.Google ScholarGoogle Scholar
  177. [177] Schelter Sebastian, Biessmann Felix, Januschowski Tim, Salinas David, Seufert Stephan, and Szarvas Gyuri. 2018. On challenges in machine learning model management. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2018), 513. http://sites.computer.org/debull/A18dec/p5.pdf.Google ScholarGoogle Scholar
  178. [178] Schleier-Smith Johann. 2015. An architecture for agile machine learning in real-time applications. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 20592068. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. [179] Sculley D., Holt Gary, Golovin Daniel, Davydov Eugene, Phillips Todd, Ebner Dietmar, Chaudhary Vinay, Young Michael, Crespo Jean François, and Dennison Dan. 2015. Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems 2015-Jan (2015), 25032511.Google ScholarGoogle Scholar
  180. [180] Serban Alex, Blom Koen van der, Hoos Holger, and Visser Joost. 2020. Adoption and effects of software engineering best practices in machine learning. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. [181] Serban Alex and Visser Joost. 2021. An empirical study of software architecture for machine learning. arXiv preprint arXiv:2105.12422 (2021).Google ScholarGoogle Scholar
  182. [182] Serban Alexandru Constantin. 2019. Designing safety critical software systems to manage inherent uncertainty. In 2019 IEEE International Conference on Software Architecture Companion (ICSA-C). IEEE, 246249. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  183. [183] Seshia Sanjit A., Sadigh Dorsa, and Sastry S. Shankar. 2016. Towards verified artificial intelligence. arXiv (2016), 118. arxiv:1606.08514 http://arxiv.org/abs/1606.08514.Google ScholarGoogle Scholar
  184. [184] Shafaei Sina, Kugele Stefan, Osman Mohd Hafeez, and Knoll Alois. 2018. Uncertainty in machine learning: A safety perspective on autonomous driving. In Developments in Language Theory. Springer International Publishing, 458464. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  185. [185] Shalev-Shwartz Shai, Shammah Shaked, and Shashua Amnon. 2017. On a formal model of safe and scalable self-driving cars. arXiv (2017), 137. arxiv:1708.06374.Google ScholarGoogle Scholar
  186. [186] Sheh Raymond and Monteath Isaac. 2018. Defining explainable AI for requirements analysis. KI - Künstliche Intelligenz 32, 4 (Oct 2018), 261266. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  187. [187] Simard Patrice Y., Amershi Saleema, Chickering David M., Pelton Alicia Edelman, Ghorashi Soroush, Meek Christopher, Ramos Gonzalo, Suh Jina, Verwey Johan, Wang Mo, and Wernsing John. 2017. Machine teaching a new paradigm for building machine learning systems. arXiv (2017). arxiv:1707.06742.Google ScholarGoogle Scholar
  188. [188] Spieker Helge and Gotlieb Arnaud. 2019. Towards testing of deep learning systems with training set reduction. arXiv2 (2019). arxiv:1901.04169.Google ScholarGoogle Scholar
  189. [189] Srisakaokul Siwakorn, Zhang Yuhao, Zhong Zexuan, Yang Wei, Xie Tao, and Li Bo. 2018. MulDef: Multi-model-based defense against adversarial examples for neural networks. arXiv (2018). arxiv:1809.00065.Google ScholarGoogle Scholar
  190. [190] Sun Xiaobing, Zhou Tianchi, Li Gengjie, Hu Jiajun, Yang Hui, and Li Bin. 2017. An empirical study on real bugs for machine learning programs. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 348357. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  191. [191] Sun Youcheng, Huang Xiaowei, Kroening Daniel, Sharp James, Hill Matthew, and Ashmore Rob. 2019. DeepConcolic: Testing and debugging deep neural networks. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 111114. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  192. [192] Sun Youcheng, Huang Xiaowei, Kroening Daniel, Sharp James, Hill Matthew, and Ashmore Rob. 2019. Structural test coverage criteria for deep neural networks. ACM Transactions on Embedded Computing Systems 18, 5s (Oct 2019), 123. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  193. [193] Sun Youcheng, Wu Min, Ruan Wenjie, Huang Xiaowei, Kwiatkowska Marta, and Kroening Daniel. 2018. Concolic testing for deep neural networks. arXiv (2018), 109119.Google ScholarGoogle Scholar
  194. [194] Sun Youcheng, Zhou Yifan, Maskell Simon, Sharp James, and Huang Xiaowei. 2020. Reliability validation of learning enabled vehicle tracking. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 93909396. DOI:. arxiv:2002.02424Google ScholarGoogle ScholarCross RefCross Ref
  195. [195] Thung Ferdian, Wang Shaowei, Lo David, and Jiang Lingxiao. 2012. An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. IEEE, 271280. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  196. [196] Tian Yuchi, Pei Kexin, Jana Suman, and Ray Baishakhi. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303314. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  197. [197] Tramèr Florian, Zhang Fan, Juels Ari, Reiter Michael K., and Ristenpart Thomas. 2016. Stealing machine learning models via prediction APIs. In 25th USENIX Security Symposium (USENIX Security 16). https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer.Google ScholarGoogle Scholar
  198. [198] Tuncali Cumhur Erkan, Fainekos Georgios, Ito Hisahiro, and Kapinski James. 2018. Sim-ATAV. In Proceedings of the 21st International Conference on Hybrid Systems: Computation and Control (Part of CPS Week). ACM, 283284. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  199. [199] Tuncali Cumhur Erkan, Fainekos Georgios, Prokhorov Danil, Ito Hisahiro, and Kapinski James. 2020. Requirements-driven test generation for autonomous vehicles with machine learning components. IEEE Transactions on Intelligent Vehicles 5, 2 (Jun 2020), 265280. DOI:Google ScholarGoogle Scholar
  200. [200] Udeshi Sakshi, Arora Pryanshu, and Chattopadhyay Sudipta. 2018. Automated directed fairness testing. arXiv (2018), 98108.Google ScholarGoogle Scholar
  201. [201] Weide Tom van der, Papadopoulos Dimitris, Smirnov Oleg, Zielinski Michal, and Kasteren Tim van. 2017. Versioning for end-to-end machine learning pipelines. In Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning. ACM. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  202. [202] Varshney Kush R.. 2016. Engineering safety in machine learning. In 2016 Information Theory and Applications Workshop (ITA). IEEE. DOI:arxiv:1601.04126.Google ScholarGoogle ScholarCross RefCross Ref
  203. [203] Varshney Kush R. and Alemzadeh Homa. 2017. On the safety of machine learning: Cyber-physical systems, decision sciences, and data products. Big Data 5, 3 (Sep 2017), 246255. DOI:arxiv:1610.01256.Google ScholarGoogle ScholarCross RefCross Ref
  204. [204] Vasconcelos Marisa, Candello Heloisa, Pinhanez Claudio, and Santos Thiago dos. 2017. Bottester. In Proceedings of the XVI Brazilian Symposium on Human Factors in Computing Systems. ACM, 14. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  205. [205] Vogelsang Andreas and Borg Markus. 2019. Requirements engineering for machine learning: Perspectives from data scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 245251. DOI:arxiv:1908.04674.Google ScholarGoogle ScholarCross RefCross Ref
  206. [206] Wan Zhiyuan, Xia Xin, Lo David, and Murphy Gail C.. 2020. How does machine learning change software development practices? IEEE Transactions on Software Engineering (2020), 115. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  207. [207] Wang Jingyi, Sun Jun, Zhang Peixin, and Wang Xinyu. 2018. Detecting adversarial samples for deep neural networks through mutation testing. arXiv (2018), 110. arxiv:1805.05010.Google ScholarGoogle Scholar
  208. [208] Wang Simin, Huang Liguo, Ge Jidong, Zhang Tengfei, Feng Haitao, Li Ming, Zhang He, and Ng Vincent. 2020. Synergy between machine/deep learning and software engineering: How far are we? arXiv preprint arXiv:2008.05515 (2020).Google ScholarGoogle Scholar
  209. [209] Wang Shiqi, Pei Kexin, Whitehouse Justin, Yang Junfeng, and Jana Suman. 2018. Efficient formal safety analysis of neural networks. arXivNeurIPS (2018).Google ScholarGoogle Scholar
  210. [210] Wang Shiqi, Pei Kexin, Whitehouse Justin, Yang Junfeng, and Jana Suman. 2018. Formal security analysis of neural networks using symbolic intervals. arXiv (2018).Google ScholarGoogle Scholar
  211. [211] Washizaki Hironori, Uchida Hiromu, Khomh Foutse, and Guéhéneuc Yann-Gaël. 2019. Studying software engineering patterns for designing machine learning systems. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE, 4954.Google ScholarGoogle ScholarCross RefCross Ref
  212. [212] Wohlin Claes. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering - EASE'14. ACM Press, New York, NY, USA, Article 38. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  213. [213] Wohlin Claes, Runeson Per, Neto Paulo Anselmo da Mota Silveira, Engström Emelie, Machado Ivan do Carmo, and Almeida Eduardo Santana de. 2013. On the reliability of mapping studies in software engineering. Journal of Systems and Software 86, 10 (Oct 2013), 25942610. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  214. [214] Wolf Christine T. and Paine Drew. 2020. Sensemaking practices in the everyday work of AI/ML software engineering. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. ACM, 8692. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  215. [215] Wolschke Christian, Kuhn Thomas, Rombach Dieter, and Liggesmeyer Peter. 2017. Observation based creation of minimal test suites for autonomous vehicles. In 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 294301. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  216. [216] Wong W. Eric, Mittas Nikolaos, Arvanitou Elvira Maria, and Li Yihao. 2021. A bibliometric assessment of software engineering themes, scholars and institutions (2013–2020). Journal of Systems and Software 180 (2021), 111029. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  217. [217] Wu Weibin, Xu Hui, Zhong Sanqiang, Lyu Michael R., and King Irwin. 2019. Deep validation: Toward detecting real-world corner cases for deep neural networks. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 125137. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  218. [218] Xie Tao. 2018. Intelligent software engineering: Synergy between AI and software engineering. In Dependable Software Engineering. Theories, Tools, and Applications, Feng Xinyu, Müller-Olm Markus, and Yang Zijiang (Eds.). Springer International Publishing, Cham, 37.Google ScholarGoogle Scholar
  219. [219] Xie Xiaofei, Ma Lei, Juefei-Xu Felix, Xue Minhui, Chen Hongxu, Liu Yang, Zhao Jianjun, Li Bo, Yin Jianxiong, and See Simon. 2019. DeepHunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 158168. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  220. [220] Xie Xiaofei, Ma Lei, Wang Haijun, Li Yuekang, Liu Yang, and Li Xiaohong. 2019. DiffChaser: Detecting disagreements for deep neural networks. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 57725778. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  221. [221] Yaghoubi Shakiba and Fainekos Georgios. 2018. Gray-box adversarial testing for control systems with machine learning component. arXiv (2018), 179184.Google ScholarGoogle Scholar
  222. [222] Yang Qian. 2017. The Role of Design in Creating Machine-Learning-enhanced User Experience. AAAI Spring Symposium - Technical Report SS-17-01 - (2017), 406411.Google ScholarGoogle Scholar
  223. [223] Yang Wei and Xie Tao. 2018. Telemade: A testing framework for learning-based malware detection systems. Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018), 400403.Google ScholarGoogle Scholar
  224. [224] Yang Zhuolin, Zhao Zhikuan, Pei Hengzhi, Wang Boxin, Karlas Bojan, Liu Ji, Guo Heng, Li Bo, and Zhang Ce. 2020. End-to-end robustness for sensing-reasoning machine learning pipelines. arXiv (2020), 143. arxiv:2003.00120.Google ScholarGoogle Scholar
  225. [225] Yokoyama Haruki. 2019. Machine learning system architectural pattern for improving operational stability. In 2019 IEEE International Conference on Software Architecture Companion (ICSA-C). IEEE, 267274. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  226. [226] Zhang Jie M., Harman Mark, Ma Lei, and Liu Yang. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  227. [227] Zhang Tianyi, Gao Cuiyun, Ma Lei, Lyu Michael, and Kim Miryung. 2019. An empirical study of common challenges in developing deep learning applications. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 104115. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  228. [228] Zhang Xufan, Yang Yilin, Feng Yang, and Chen Zhenyu. 2019. Software engineering practice in the development of deep learning applications. arXiv (2019). arxiv:1910.03156.Google ScholarGoogle Scholar
  229. [229] Zhang Yuhao, Chen Yifan, Cheung Shing-Chi, Xiong Yingfei, and Zhang Lu. 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 129140. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  230. [230] Zhao Shuai, Talasila Manoop, Jacobson Guy, Borcea Cristian, Aftab Syed Anwar, and Murray John F.. 2018. Packaging and sharing machine learning models via the Acumos AI open platform. arXiv (2018).Google ScholarGoogle Scholar
  231. [231] Zhao Xinghan and Gao Xiangfei. 2018. An AI software test method based on scene deductive approach. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 1420. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  232. [232] Zheng Wujie, Wang Wenyu, Liu Dian, Zhang Changrong, Zeng Qinsong, Deng Yuetang, Yang Wei, He Pinjia, and Xie Tao. 2019. Testing untestable neural machine translation: An industrial case. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 314315. DOI:arxiv:1807.02340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  233. [233] Zhou Husheng, Li Wei, Kong Zelun, Guo Junfeng, Zhang Yuqun, Yu Bei, Zhang Lingming, and Liu Cong. 2020. DeepBillboard: Systematic physical-world testing of autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. ACM, 347358. DOI:arxiv:1812.10812.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Software Engineering for AI-Based Systems: A Survey

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Software Engineering and Methodology
        ACM Transactions on Software Engineering and Methodology  Volume 31, Issue 2
        April 2022
        789 pages
        ISSN:1049-331X
        EISSN:1557-7392
        DOI:10.1145/3492439
        • Editor:
        • Mauro Pezzè
        Issue’s Table of Contents

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 April 2022
        • Accepted: 1 August 2021
        • Revised: 1 July 2021
        • Received: 1 May 2021
        Published in tosem Volume 31, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • survey
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format