Abstract
AI-based systems are software systems with functionalities enabled by at least one AI component (e.g., for image-, speech-recognition, and autonomous driving). AI-based systems are becoming pervasive in society due to advances in AI. However, there is limited synthesized knowledge on Software Engineering (SE) approaches for building, operating, and maintaining AI-based systems. To collect and analyze state-of-the-art knowledge about SE for AI-based systems, we conducted a systematic mapping study. We considered 248 studies published between January 2010 and March 2020. SE for AI-based systems is an emerging research area, where more than 2/3 of the studies have been published since 2018. The most studied properties of AI-based systems are dependability and safety. We identified multiple SE approaches for AI-based systems, which we classified according to the SWEBOK areas. Studies related to software testing and software quality are very prevalent, while areas like software maintenance seem neglected. Data-related issues are the most recurrent challenges. Our results are valuable for: researchers, to quickly understand the state-of-the-art and learn which topics need more research; practitioners, to learn about the approaches and challenges that SE entails for AI-based systems; and, educators, to bridge the gap among SE and AI in their curricula.
- [1] . 2018. Testing vision-based control systems using learnable evolutionary algorithms. In Proceedings of the 40th International Conference on Software Engineering. ACM, New York, NY, USA, 1016–1026.
DOI: Google ScholarDigital Library - [2] . 2018. Representative safety assessment of autonomous vehicle for public transportation. In 2018 IEEE 21st International Symposium on Real-Time Distributed Computing (ISORC). IEEE, 124–129.
DOI: Google ScholarCross Ref - [3] . 2019. Black box fairness testing of machine learning models. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 625–635.
DOI: Google ScholarDigital Library - [4] . 2018. Characterizing machine learning process: A maturity framework. arXiv (2018).Google Scholar
- [5] . 2019. Empirical analysis of hidden technical debt patterns in machine learning software. In Product-Focused Software Process Improvement. Springer International Publishing, 195–202.
DOI: Google ScholarCross Ref - [6] . 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291–300.
DOI: Google ScholarDigital Library - [7] . 2016. Concrete problems in AI safety. arXiv 277, 2003 (2016), 1–29.
arxiv:1606.06565 http://arxiv.org/abs/1606.06565Google Scholar - [8] . 2019. Identifying, categorizing and mitigating threats to validity in software engineering secondary studies. Information and Software Technology 106 (2019), 201–230.Google ScholarCross Ref
- [9] . 2018. Towards a holistic software systems engineering approach for dependable autonomous systems. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems. ACM, 23–30.
DOI: Google ScholarDigital Library - [10] . 2019. Graceful degradation of decision and control responsibility for autonomous systems based on dependability cages. 5th International Symposium on Future Active Safety Technology toward Zero Accidents (FAST-zero’19)September (2019), 1–6.Google Scholar
- [11] . 2017. Artificial intelligence poised to ride a new wave. Commun. ACM 60, 7 (2017), 19–21.Google ScholarDigital Library
- [12] . 2018. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. arXiv (2018).
arxiv:1808.07261 .Google Scholar - [13] . 2018. Software engineering challenges of deep learning. In 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 50–59.
DOI: arxiv:1810.12034 .Google ScholarCross Ref - [14] . 2017. Infrastructure for usable machine learning: The Stanford DAWN project. arXiv (2017).
arxiv:1705.07538 .Google Scholar - [15] . 2019. Requirements assurance in machine learning. CEUR Workshop Proceedings 2301 (2019).Google Scholar
- [16] . 2018. Control and safety of autonomous vehicles with learning-enabled components. In Safe, Autonomous and Intelligent Vehicles. Springer International Publishing, 57–75.
DOI: Google ScholarCross Ref - [17] . 1994. The goal question metric approach. In Encyclopedia of Software Engineering, Vol. 2. John Wiley & Sons, 528–532.Google Scholar
- [18] . 2017. TFX: A TensorFlow-based production-scale machine learning platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1387–1395.
DOI: Google ScholarDigital Library - [19] . 2020. Management of quality requirements in agile and rapid software development: A systematic mapping study. Information and Software Technology 123 (2020), 106225.Google ScholarCross Ref
- [20] . 2019. Requirements engineering challenges in building AI-based complex systems. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 252–255.
DOI: Google ScholarCross Ref - [21] . 2019. 150 successful machine learning models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1743–1751.
DOI: Google ScholarDigital Library - [22] . 2019. Towards corner case detection for autonomous driving. arXivIv (2019).Google Scholar
- [23] . 2018. Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry. arXiv preprint arXiv:1812.05389 (2018).Google Scholar
- [24] . 2020. Engineering AI systems: A research agenda. arXiv (2020).
arxiv:2001.07522 .Google Scholar - [25] . 2014. SWEBOK Version 3.0. IEEE, ISBN-10: 0-7695-5166-1 (2014).Google Scholar
- [26] . 2018. Security testing for chatbots. In Testing Software and Systems. Springer International Publishing, 33–38.
DOI: Google ScholarCross Ref - [27] . 2020. On testing machine learning programs. Journal of Systems and Software 164 (2020), 110542.Google ScholarCross Ref
- [28] . 2017. The ML test score: A rubric for ML production readiness and technical debt reduction. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 1123–1132.
DOI: Google ScholarCross Ref - [29] . 2019. Data validation for machine learning. SysML (2019), 1–14.Google Scholar
- [30] . 2007. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software 80, 4 (
Apr 2007), 571–583.DOI: Google ScholarDigital Library - [31] . 2017. Standardizing ethical design for artificial intelligence and autonomous systems. Computer 50, 5 (
May 2017), 116–119.DOI: Google ScholarDigital Library - [32] . 2017. Making the case for safety of machine learning in highly automated driving. In Lecture Notes in Computer Science. Springer International Publishing, 5–16.
DOI: Google ScholarCross Ref - [33] . 2019. Input prioritization for testing neural networks. In 2019 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 63–70.
DOI: arxiv:1901.03768 .Google ScholarCross Ref - [34] . 2016. TensorFlow debugger: Debugging dataflow graphs for machine learning. In Proceedings of the Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016). https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45789.pdf.Google Scholar
- [35] . 2016. Debugging machine learning tasks. arXiv (2016), 1–29.
arxiv:1603.07292 http://arxiv.org/abs/1603.07292.Google Scholar - [36] . 2010. Stress testing an AI based web service: A case study. In 2010 Seventh International Conference on Information Technology: New Generations. IEEE, 1004–1008.
DOI: Google ScholarDigital Library - [37] . 2018. Taming functional deficiencies of automated driving systems: A methodology framework toward safety validation. In 2018 IEEE Intelligent Vehicles Symposium (IV). IEEE, 1918–1924.
DOI: Google ScholarDigital Library - [38] . 2018. Verification of binarized neural networks via inter-neuron factoring. In Lecture Notes in Computer Science. Springer International Publishing, 279–290.
DOI: arxiv:arXiv:1710.03107v2 .Google ScholarCross Ref - [39] . 2019. An instrument to evaluate the maturity of bias governance capability in artificial intelligence projects. IBM Journal of Research and Development 63, 4/5 (
Jul 2019), 7:1–7:15.DOI: Google ScholarCross Ref - [40] . 2019. Towards a Software Engineering Framework for the Design, Construction and Deployment of Machine Learning-Based Solutions in Digitalization Processes. 343–349.Google ScholarCross Ref
- [41] . 2021. How tertiary studies perform quality assessment of secondary studies in software engineering. In 2021 Proceedings of 24th IberoAmerican Conference on Software Engineering (CIbSE 2021), ESELAW track.Google Scholar
- [42] . 2017. Clipper: A low-latency online prediction serving system. Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017 (2017), 613–627.
arxiv:1612.03079 .Google Scholar - [43] . 2011. Recommended steps for thematic synthesis in software engineering. In 2011 International Symposium on Empirical Software Engineering and Measurement. IEEE, 275–284.Google ScholarDigital Library
- [44] . 2019. Understanding development process of machine learning systems: Challenges and solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1–6.
DOI: Google ScholarCross Ref - [45] . 2018. Aloha: A machine learning framework for engineers. Conference on Systems and Machine Learning (MLSys) (2018), 17–19. https://www.sysml.cc/doc/13.pdf.Google Scholar
- [46] . 2018. Artificial intelligence in the rising wave of deep learning: The historical path and future outlook [perspectives]. IEEE Signal Processing Magazine 35, 1 (2018), 180–177.Google ScholarCross Ref
- [47] . 2019. SOTER: A runtime assurance framework for programming safe robotics systems. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 138–150.
DOI: arxiv:1808.07921 .Google ScholarCross Ref - [48] . 2019. VerifAI: A toolkit for the formal design and analysis of artificial intelligence-based systems. In Computer Aided Verification, and (Eds.). Springer International Publishing, Cham, 432–442.Google ScholarCross Ref
- [49] . 2018. Semantic adversarial deep learning. arXiv 2 (2018), 3–26.Google Scholar
- [50] . 2019. DeepStellar: Model-based quantitative analysis of stateful deep learning systems. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 477–487.
DOI: Google ScholarDigital Library - [51] . 2019. A quantitative analysis framework for recurrent neural network. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1062–1065.
DOI: Google ScholarDigital Library - [52] . 2018. Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 118–128.
DOI: arxiv:1808.05353 .Google ScholarDigital Library - [53] . 1999. Benchmarking Kappa: Interrater agreement in software process assessments. Empir. Softw. Eng. 4, 2 (1999), 113–133.Google ScholarDigital Library
- [54] . 2019. DeepFault: Fault localization for deep neural networks. In Fundamental Approaches to Software Engineering. Springer International Publishing, 171–191.
DOI: arxiv:1902.05974 .Google ScholarCross Ref - [55] . 2017. Robust physical-world attacks on deep learning models. arXiv (2017).
arxiv:1707.08945 http://arxiv.org/abs/1707.08945.Google Scholar - [56] . 2020. DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 177–188.
DOI: arxiv:1903.00661 .Google ScholarDigital Library - [57] . 2017. A conceptual safety supervisor definition and evaluation framework for autonomous systems. In Lecture Notes in Computer Science. Springer International Publishing, 135–148.
DOI: Google ScholarCross Ref - [58] . 2011. Human model evaluation in interactive supervised learning. In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems - CHI'11. ACM Press, 147–156.
DOI: Google ScholarDigital Library - [59] . 2017. Beyond the technical challenges for deploying machine learning solutions in a software company. arXiv (2017).
arxiv:1708.02363 .Google Scholar - [60] . 2019. Technical debt in data-intensive software systems. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 338–341.
DOI: arxiv:1905.13455 .Google ScholarCross Ref - [61] . 2017. Open source software ecosystems: A systematic mapping. Information and Software Technology 91 (2017), 160–185.Google ScholarDigital Library
- [62] . 2020. Formal scenario-based testing of autonomous vehicles: From simulation to the real world. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE.
DOI: arxiv:2003.07739 .Google ScholarDigital Library - [63] . 2019. Automatically testing self-driving cars with search-based procedural content generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 273–283.
DOI: Google ScholarDigital Library - [64] . 2019. Invited paper: What is AI software testing? and why. In 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE). IEEE, 27–36.
DOI: Google ScholarCross Ref - [65] . 2020. A cloud-based framework for machine learning workloads and applications. IEEE Access 8 (2020), 18681–18692.
DOI: Google ScholarCross Ref - [66] . 2018. Structuring validation targets of a machine learning function applied to automated driving. In Developments in Language Theory. Springer International Publishing, 45–58.
DOI: Google ScholarCross Ref - [67] . 2020. Importance-driven deep learning system testing. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings. ACM, 322–323.
DOI: arxiv:2002.03433 .Google ScholarDigital Library - [68] . 2018. On the safety of automotive systems incorporating machine learning based components: A position paper. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 271–274.
DOI: Google ScholarCross Ref - [69] . 2019. Reusability in artificial neural networks. In Proceedings of the 23rd International Systems and Software Product Line Conference Volume B - SPLC'19. ACM Press.
DOI: Google ScholarDigital Library - [70] . 2019. Counterexample-guided synthesis of perception models and control. arXiv (2019).
arxiv:1911.01523 .Google Scholar - [71] . 2021. A software engineering perspective on engineering machine learning systems: State of the art and challenges. Journal of Systems and Software 180 (2021), 111031.
DOI: Google ScholarDigital Library - [72] . 2017. DeepSafe: A data-driven approach for checking adversarial robustness in neural networks. arXiv (2017).
arxiv:1710.00486 .Google Scholar - [73] . 2019. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 810–822.
DOI: arxiv:1909.06727 .Google ScholarDigital Library - [74] . 2018. Towards formal methods and software engineering for deep learning: Security, safety and productivity for DL systems development. In 2018 Annual IEEE International Systems Conference (SysCon). IEEE, 1–5.
DOI: Google ScholarCross Ref - [75] . 2018. Applying deep learning to Airbnb search. arXiv (2018), 1927–1935.Google Scholar
- [76] . 2019. Model-based design for CPS with learning-enabled components. In Proceedings of the Workshop on Design Automation for CPS and IoT - DESTION'19. ACM Press, 1–9.
DOI: Google ScholarDigital Library - [77] . 2019. Did we test all scenarios for automated and autonomous driving systems? In 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2950–2955.
DOI: Google ScholarDigital Library - [78] . 2017. Ethical challenges in data-driven dialogue systems. arXiv (2017), 123–129.Google Scholar
- [79] . 2018. Automotive safety and machine learning. In Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems. ACM, 47–49.
DOI: Google ScholarDigital Library - [80] . 2016. Trials and tribulations of developers of intelligent systems: A field study. In 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 162–170.
DOI: Google ScholarCross Ref - [81] . 2018. Improving fairness in machine learning systems: What do industry practitioners need? arXiv (2018), 1–16.Google Scholar
- [82] . 2019. Non-functional requirements for machine learning: Challenges and new directions. In 2019 IEEE 27th International Requirements Engineering Conference (RE). IEEE, 386–391.
DOI: Google ScholarCross Ref - [83] . 2018. Challenges of testing machine learning applications. International Journal of Performability Engineering (2018), 1275–1282.
DOI: Google ScholarCross Ref - [84] . 2017. Safety verification of deep neural networks. In Computer Aided Verification. Springer International Publishing, 3–29.
DOI: arxiv:1610.06940 .Google ScholarCross Ref - [85] . 2019. ModelOps: Cloud-based lifecycle management for reliable and trusted AI. In 2019 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 113–120.
DOI: Google ScholarCross Ref - [86] . 2019. Recent trends in formal validation and verification of autonomous robots software. In 2019 Third IEEE International Conference on Robotic Computing (IRC). IEEE, 321–328.
DOI: Google ScholarCross Ref - [87] . 2011. ISO/IEC 25010 - Systems and Software Engineering - Systems and Software Quality Requirements and Evaluation (SQuaRE) - System and Software Quality Models. 25 pages. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=35733.Google Scholar
- [88] . 2018. Concepts in quality assessment for machine learning - from test data to arguments. In Conceptual Modeling. Springer International Publishing, 536–544.
DOI: Google ScholarCross Ref - [89] . 2018. Continuous argument engineering: Tackling uncertainty in machine learning based systems. In Developments in Language Theory. Springer International Publishing, 14–21.
DOI: Google ScholarCross Ref - [90] . 2019. How do engineers perceive difficulties in engineering of machine-learning systems? - Questionnaire survey. In 2019 IEEE/ACM Joint 7th International Workshop on Conducting Empirical Studies in Industry (CESI) and 6th International Workshop on Software Engineering Research and Industrial Practice (SER&IP). IEEE, 2–9.
DOI: Google ScholarDigital Library - [91] . 2019. A comprehensive study on deep learning bug characteristics. arXiv (2019), 510–520.Google Scholar
- [92] . 2019. What do developers ask about ML libraries? A large-scale study using stack overflow. arXivMl (2019).
arxiv:1906.11940 .Google Scholar - [93] . 2010. A method for evaluating rigor and industrial relevance of technology evaluations. Empirical Software Engineering 16, 3 (
Oct 2010), 365–395.DOI: Google ScholarDigital Library - [94] . 2020. Identifying challenges to the certification of machine learning for safety critical systems. In Proceedings of the 10th European Congress on Embedded Real Time Systems (ERTS). 10.Google Scholar
- [95] . 2019. Don't forget your roots! Using provenance data for transparent and explainable development of machine learning models. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). IEEE, 37–40.
DOI: Google ScholarCross Ref - [96] . 2018. Model-reuse attacks on deep learning systems. arXiv (2018), 349–363.Google Scholar
- [97] . 2020. Testing machine learning classifiers based on compositional metamorphic relations. International Journal of Performability Engineering 16, 1 (2020), 67.
DOI: Google ScholarCross Ref - [98] . 2017. Safety assessment of automated vehicle functions by simulation-based fault injection. In 2017 IEEE International Conference on Vehicular Electronics and Safety (ICVES). IEEE, 214–219.
DOI: Google ScholarDigital Library - [99] . 2018. The story in the notebook. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 1–11.
DOI: Google ScholarDigital Library - [100] . 2018. A survey of current end-user data analytics tool support. In 2018 IEEE International Congress on Big Data (BigData Congress). IEEE, 41–48.
DOI: Google ScholarCross Ref - [101] . 2018. Software engineering for machine-learning applications: The road ahead. IEEE Software 35, 5 (2018), 81–84.Google ScholarCross Ref
- [102] . 2018. Data scientists in software teams: State of the art and challenges. IEEE Transactions on Software Engineering 44, 11 (
Nov 2018), 1024–1038.DOI: Google ScholarCross Ref - [103] . 2004. Procedures for performing systematic reviews. Keele, UK, Keele University 33, 2004 (2004), 1–26.Google Scholar
- [104] . 2007. Guidelines for performing systematic literature reviews in software engineering. Keele University and University of Durham.Google Scholar
- [105] . 2018. Using ontologies for test suites generation for automated and autonomous driving functions. In 2018 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 118–123.
DOI: Google ScholarCross Ref - [106] . 2016. Challenges in autonomous vehicle testing and validation. SAE International Journal of Transportation Safety 4, 1 (
Apr 2016), 15–24.DOI: Google ScholarCross Ref - [107] . 2018. Toward a framework for highly automated vehicle safety validation. In SAE Technical Paper Series. SAE International, 1–13.
DOI: Google ScholarCross Ref - [108] . 2019. Efficient autonomy validation in simulation with adaptive stress testing. arXiv (2019), 4178–4183.Google Scholar
- [109] . 2019. Realization of a machine learning domain specific modeling language: A baseball analytics case study. In Proceedings of the 7th International Conference on Model-Driven Engineering and Software Development. SciTePress - Science and Technology Publications, 13–24.
DOI: Google ScholarDigital Library - [110] . 2020. On the interplay between requirements, engineering, and artificial intelligence. CEUR Workshop Proceedings 2584 (2020).Google Scholar
- [111] . 2019. Machine learning in artificial intelligence: Towards a common understanding. In 52nd Hawaii International Conference on System Sciences, HICSS 2019, Grand Wailea, Maui, Hawaii, USA, January 8-11, 2019, (Ed.). ScholarSpace, 1–10. http://hdl.handle.net/10125/59960.Google ScholarCross Ref
- [112] . 2017. On the pragmatic design of literature studies in software engineering: An experience-based guideline. Empirical Software Engineering 22, 6 (
Jan 2017), 2852–2891.DOI: Google ScholarDigital Library - [113] . 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, 126–137.
DOI: Google ScholarDigital Library - [114] . 2020. Trustworthy AI in the age of pervasive computing and big data. arXiv (2020).
arxiv:2002.05657 .Google Scholar - [115] . 2019. Software engineering challenges for machine learning applications: A literature review. Intelligent Decision Technologies 13, 4 (2019), 463–476.Google ScholarCross Ref
- [116] . 2019. Adapting SQuaRE for quality assessment of artificial intelligence systems. In 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 13–18.
DOI: arxiv:1908.02134 .Google ScholarCross Ref - [117] . 2019. Open Problems in Engineering Machine Learning Systems and the Quality Model. arXiv (2019).
arxiv:1904.00001v1 .Google Scholar - [118] . 2020. Engineering problems in machine learning systems. Machine Learning 109, 5 (
Apr 2020), 1103–1126.DOI: arxiv:1904.00001 .Google ScholarDigital Library - [119] . 2020. Teaching software engineering for AI-enabled systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering Education and Training. ACM, 45–48.
DOI: arxiv:2001.06691 .Google ScholarDigital Library - [120] . 2018. Design automation for intelligent automotive systems. In 2018 IEEE International Test Conference (ITC). IEEE, 1–10.
DOI: Google ScholarCross Ref - [121] . 2016. Learning with safety requirements: State of the art and open questions. CEUR Workshop Proceedings 1745 (2016), 11–25.Google Scholar
- [122] . 2019. How do implementation bugs affect the results of machine learning algorithms? In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 1304–1313.
DOI: Google ScholarDigital Library - [123] . 2017. Related pins at Pinterest. In Proceedings of the 26th International Conference on World Wide Web Companion - WWW'17 Companion. ACM Press, 583–592.
DOI: Google ScholarDigital Library - [124] . 2021. Machine learning model development from a software engineering perspective: A systematic literature review. arXiv preprint arXiv:2102.07574 (2021).Google Scholar
- [125] . 2019. A taxonomy of software engineering challenges for machine learning systems: An empirical investigation. In Lecture Notes in Business Information Processing. Springer International Publishing, 227–243.
DOI: Google ScholarCross Ref - [126] . 2020. Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions. Information and Software Technology 127 (2020), 106368.Google ScholarCross Ref
- [127] . 2018. Combinatorial testing for deep learning systems. arXiv (2018), 614–618.
arxiv:1806.07723 .Google Scholar - [128] . 2018. MODE: Automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 175–186.
DOI: Google ScholarDigital Library - [129] . 2019. N-version machine learning models for safety critical systems. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 48–51.
DOI: Google ScholarCross Ref - [130] . 2019. On the diversity of machine learning models for system reliability. In 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 276–285.
DOI: Google ScholarCross Ref - [131] . 2019. Paracosm: A language and tool for testing autonomous driving systems. arXiv (2019).
arxiv:1902.01084 .Google Scholar - [132] . 2018. Keeping intelligence under control. In Proceedings of the 1st International Workshop on Software Engineering for Cognitive Services. ACM, 37–40.
DOI: Google ScholarDigital Library - [133] . 2020. Research directions for developing and operating artificial intelligence models in trustworthy autonomous systems. arXiv (2020).
arxiv:2003.05434 .Google Scholar - [134] . 2018. A survey of software quality for machine learning applications. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 279–284.Google ScholarCross Ref
- [135] . 2019. Leveraging business transformation with machine learning experiments. In Lecture Notes in Business Information Processing. Springer International Publishing, 183–191.
DOI: Google ScholarCross Ref - [136] . 2019. Towards a framework for safety assurance of autonomous systems. CEUR Workshop Proceedings 2419 (2019).Google Scholar
- [137] . [n.d.]. Architecting AI deployment: A systematic review of state-of-the-art and state-of-practice literature.Google Scholar
- [138] . 2020. The five laws of SE for AI. IEEE Software 37, 1 (
Jan 2020), 81–85.DOI: Google ScholarDigital Library - [139] . 2017. Assuring fully autonomous vehicles safety by design: The autonomous vehicle control (AVC) module strategy. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). IEEE, 16–21.
DOI: Google ScholarCross Ref - [140] . 2020. Corrections to “A novel software engineering approach toward using machine learning for improving the efficiency of health systems”. IEEE Access 8 (2020), 136459–136459.
DOI: Google ScholarCross Ref - [141] . 2020. On the performance of hybrid search strategies for systematic literature reviews in software engineering. Information and Software Technology 123 (
Jul 2020), 106294.DOI: Google ScholarCross Ref - [142] . 2019. Data management challenges for deep learning. In 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 140–147.
DOI: Google ScholarCross Ref - [143] . 2018. [Invited] Quality assurance of machine learning software. In 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE). IEEE, 143–144.
DOI: Google ScholarCross Ref - [144] . 2019. Dataset diversity for metamorphic testing of machine learning software. In Structured Object-Oriented Formal Language and Method, , , , and (Eds.). Springer International Publishing, Cham, 21–38.
DOI: Google ScholarCross Ref - [145] . 2019. Quality evaluation assurance levels for deep neural networks software. In 2019 International Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE.
DOI: Google ScholarCross Ref - [146] . 2019. Solution patterns for machine learning. In Advanced Information Systems Engineering. Springer International Publishing, 627–642.
DOI: Google ScholarDigital Library - [147] . 2020. Software engineering for artificial intelligence and machine learning software: A systematic literature review. arXiv preprint arXiv:2011.03751 (2020).Google Scholar
- [148] , , , and (Eds.). 1969. Software Engineering: Report on a Conference Sponsored by the NATO Science Committee, Garmisch, Germany, 7th to 11th October 1968. Scientific Affairs Division, NATO.Google Scholar
- [149] . 2018. A test architecture for machine learning product. In 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, 273–278.
DOI: Google ScholarCross Ref - [150] . 2018. Towards accountable AI: Hybrid human-machine analyses for characterizing system failure. arXivHcomp (2018), 126–135.
arxiv:1809.07424 .Google Scholar - [151] . 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. arXiv (2018).Google Scholar
- [152] . 2015. Research directions for engineering big data analytics software. IEEE Intelligent Systems 30, 1 (
Jan 2015), 13–19.DOI: Google ScholarDigital Library - [153] . 2020. What is really different in engineering AI-enabled systems? IEEE Software 37, 4 (
Jul 2020), 3–6.DOI: Google ScholarDigital Library - [154] . 1987. Does AI have a methodology which is different from software engineering? Artificial Intelligence Review 1, 2 (1987), 111–120.
DOI: Google ScholarCross Ref - [155] . 2010. Lowering the barrier to applying machine learning. In Adjunct Proceedings of the 23rd Annual ACM Symposium on User Interface Software and Technology - UIST'10. ACM Press, 355–358.
DOI: Google ScholarDigital Library - [156] . 2019. Safe-by-design development method for artificial intelligent based systems. In Proceedings of the 31st International Conference on Software Engineering and Knowledge Engineering. KSI Research Inc. and Knowledge Systems Institute Graduate School, 391–397.
DOI: Google ScholarCross Ref - [157] . 2020. Intelligent software engineering in the context of agile software development: A systematic literature review. Information and Software Technology 119 (2020), 106241.
DOI: Google ScholarDigital Library - [158] . 2008. Systematic mapping studies in software engineering. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering. BCS Learning & Development, 68–77.
DOI: Google ScholarCross Ref - [159] . 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (
Aug 2015), 1–18.DOI: Google ScholarDigital Library - [160] . 2010. An abstraction-refinement approach to verification of artificial neural networks. CEUR Workshop Proceedings 616 (2010), 243–257.Google Scholar
- [161] . 2019. Toward requirements specification for machine-learned components. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 241–244.
DOI: Google ScholarCross Ref - [162] . 2019. Machine learning software engineering in practice: An industrial case study. arXiv (2019), 1–21.
arxiv:1906.07154 .Google Scholar - [163] . 2020. Closing the AI accountability gap. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, 33–44.
DOI: arxiv:2001.00973 .Google ScholarDigital Library - [164] . 2021. Empirical Standards for Software Engineering Research.
arxiv:2010.03525 [cs.SE].Google Scholar - [165] . 2016. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135–1144.
DOI: arxiv:1602.04938 .Google ScholarDigital Library - [166] . 2020. Testing machine learning based systems: A systematic mapping. Empirical Software Engineering 25, 6 (2020), 5193–5254.
DOI: Google ScholarDigital Library - [167] . 2019. Cognitive modeling approach for dealing with challenges in cyber-physical systems. Studia Universitatis Babe s-Bolyai Informatica 64, 1 (
Jun 2019), 51–66.DOI: Google ScholarCross Ref - [168] . 2018. Experimental resilience assessment of an open-source driving agent. In 2018 IEEE 23rd Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 54–63.
DOI: arxiv:1807.06172 .Google ScholarCross Ref - [169] . 2021. Artificial Intelligence: A Modern Approach (Fourth edition). Pearson, Hoboken.Google Scholar
- [170] . 2018. Using machine learning safely in automotive software: An assessment and adaption of software process requirements in ISO 26262. arXiv (2018).
arxiv:1808.01614 .Google Scholar - [171] . 2019. Improving ML safety with partial specifications. In Lecture Notes in Computer Science. Springer International Publishing, 288–300.
DOI: Google ScholarDigital Library - [172] . 2017. An analysis of ISO 26262: Using machine learning safely in automotive software. arXiv (2017).
arxiv:1709.02435 .Google Scholar - [173] . 2019. Engineering reliable deep learning systems. arXiv 3 (2019), 1–8.
arxiv:1910.12582 .Google Scholar - [174] . 2019. Realizing the promise of artificial intelligence for unmanned aircraft systems through behavior bounded assurance. In 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC). IEEE.
DOI: Google ScholarCross Ref - [175] . 2018. DeepSaucer: Unified environment for verifying deep neural networks. arXiv (2018).
arxiv:1811.03752 .Google Scholar - [176] . 2018. Trial without error: Towards safe reinforcement learning via human intervention. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 3 (2018), 2067–2069.
arxiv:1707.05173 .Google Scholar - [177] . 2018. On challenges in machine learning model management. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering (2018), 5–13. http://sites.computer.org/debull/A18dec/p5.pdf.Google Scholar
- [178] . 2015. An architecture for agile machine learning in real-time applications. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2059–2068.
DOI: Google ScholarDigital Library - [179] . 2015. Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems 2015-Jan (2015), 2503–2511.Google Scholar
- [180] . 2020. Adoption and effects of software engineering best practices in machine learning. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12.Google ScholarDigital Library
- [181] . 2021. An empirical study of software architecture for machine learning. arXiv preprint arXiv:2105.12422 (2021).Google Scholar
- [182] . 2019. Designing safety critical software systems to manage inherent uncertainty. In 2019 IEEE International Conference on Software Architecture Companion (ICSA-C). IEEE, 246–249.
DOI: Google ScholarCross Ref - [183] . 2016. Towards verified artificial intelligence. arXiv (2016), 1–18.
arxiv:1606.08514 http://arxiv.org/abs/1606.08514.Google Scholar - [184] . 2018. Uncertainty in machine learning: A safety perspective on autonomous driving. In Developments in Language Theory. Springer International Publishing, 458–464.
DOI: Google ScholarCross Ref - [185] . 2017. On a formal model of safe and scalable self-driving cars. arXiv (2017), 1–37.
arxiv:1708.06374 .Google Scholar - [186] . 2018. Defining explainable AI for requirements analysis. KI - Künstliche Intelligenz 32, 4 (
Oct 2018), 261–266.DOI: Google ScholarCross Ref - [187] . 2017. Machine teaching a new paradigm for building machine learning systems. arXiv (2017).
arxiv:1707.06742 .Google Scholar - [188] . 2019. Towards testing of deep learning systems with training set reduction. arXiv2 (2019).
arxiv:1901.04169 .Google Scholar - [189] . 2018. MulDef: Multi-model-based defense against adversarial examples for neural networks. arXiv (2018).
arxiv:1809.00065 .Google Scholar - [190] . 2017. An empirical study on real bugs for machine learning programs. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 348–357.
DOI: Google ScholarCross Ref - [191] . 2019. DeepConcolic: Testing and debugging deep neural networks. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 111–114.
DOI: Google ScholarDigital Library - [192] . 2019. Structural test coverage criteria for deep neural networks. ACM Transactions on Embedded Computing Systems 18, 5s (
Oct 2019), 1–23.DOI: Google ScholarDigital Library - [193] . 2018. Concolic testing for deep neural networks. arXiv (2018), 109–119.Google Scholar
- [194] . 2020. Reliability validation of learning enabled vehicle tracking. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 9390–9396.
DOI: .arxiv:2002.02424 Google ScholarCross Ref - [195] . 2012. An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. IEEE, 271–280.
DOI: Google ScholarDigital Library - [196] . 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303–314.
DOI: Google ScholarDigital Library - [197] . 2016. Stealing machine learning models via prediction APIs. In 25th USENIX Security Symposium (USENIX Security 16). https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer.Google Scholar
- [198] . 2018. Sim-ATAV. In Proceedings of the 21st International Conference on Hybrid Systems: Computation and Control (Part of CPS Week). ACM, 283–284.
DOI: Google ScholarDigital Library - [199] . 2020. Requirements-driven test generation for autonomous vehicles with machine learning components. IEEE Transactions on Intelligent Vehicles 5, 2 (
Jun 2020), 265–280.DOI: Google Scholar - [200] . 2018. Automated directed fairness testing. arXiv (2018), 98–108.Google Scholar
- [201] . 2017. Versioning for end-to-end machine learning pipelines. In Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning. ACM.
DOI: Google ScholarDigital Library - [202] . 2016. Engineering safety in machine learning. In 2016 Information Theory and Applications Workshop (ITA). IEEE.
DOI: arxiv:1601.04126 .Google ScholarCross Ref - [203] . 2017. On the safety of machine learning: Cyber-physical systems, decision sciences, and data products. Big Data 5, 3 (
Sep 2017), 246–255.DOI: arxiv:1610.01256 .Google ScholarCross Ref - [204] . 2017. Bottester. In Proceedings of the XVI Brazilian Symposium on Human Factors in Computing Systems. ACM, 1–4.
DOI: Google ScholarDigital Library - [205] . 2019. Requirements engineering for machine learning: Perspectives from data scientists. In 2019 IEEE 27th International Requirements Engineering Conference Workshops (REW). IEEE, 245–251.
DOI: arxiv:1908.04674 .Google ScholarCross Ref - [206] . 2020. How does machine learning change software development practices? IEEE Transactions on Software Engineering (2020), 1–15.
DOI: Google ScholarCross Ref - [207] . 2018. Detecting adversarial samples for deep neural networks through mutation testing. arXiv (2018), 1–10.
arxiv:1805.05010 .Google Scholar - [208] . 2020. Synergy between machine/deep learning and software engineering: How far are we? arXiv preprint arXiv:2008.05515 (2020).Google Scholar
- [209] . 2018. Efficient formal safety analysis of neural networks. arXivNeurIPS (2018).Google Scholar
- [210] . 2018. Formal security analysis of neural networks using symbolic intervals. arXiv (2018).Google Scholar
- [211] . 2019. Studying software engineering patterns for designing machine learning systems. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE, 49–54.Google ScholarCross Ref
- [212] . 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering - EASE'14. ACM Press, New York, NY, USA, Article
38 .DOI: Google ScholarDigital Library - [213] . 2013. On the reliability of mapping studies in software engineering. Journal of Systems and Software 86, 10 (
Oct 2013), 2594–2610.DOI: Google ScholarCross Ref - [214] . 2020. Sensemaking practices in the everyday work of AI/ML software engineering. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. ACM, 86–92.
DOI: Google ScholarDigital Library - [215] . 2017. Observation based creation of minimal test suites for autonomous vehicles. In 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). IEEE, 294–301.
DOI: Google ScholarCross Ref - [216] . 2021. A bibliometric assessment of software engineering themes, scholars and institutions (2013–2020). Journal of Systems and Software 180 (2021), 111029.
DOI: Google ScholarDigital Library - [217] . 2019. Deep validation: Toward detecting real-world corner cases for deep neural networks. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 125–137.
DOI: Google ScholarCross Ref - [218] . 2018. Intelligent software engineering: Synergy between AI and software engineering. In Dependable Software Engineering. Theories, Tools, and Applications, , , and (Eds.). Springer International Publishing, Cham, 3–7.Google Scholar
- [219] . 2019. DeepHunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 158–168.
DOI: Google ScholarDigital Library - [220] . 2019. DiffChaser: Detecting disagreements for deep neural networks. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 5772–5778.
DOI: Google ScholarCross Ref - [221] . 2018. Gray-box adversarial testing for control systems with machine learning component. arXiv (2018), 179–184.Google Scholar
- [222] . 2017. The Role of Design in Creating Machine-Learning-enhanced User Experience. AAAI Spring Symposium - Technical Report SS-17-01 - (2017), 406–411.Google Scholar
- [223] . 2018. Telemade: A testing framework for learning-based malware detection systems. Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018), 400–403.Google Scholar
- [224] . 2020. End-to-end robustness for sensing-reasoning machine learning pipelines. arXiv (2020), 1–43.
arxiv:2003.00120 .Google Scholar - [225] . 2019. Machine learning system architectural pattern for improving operational stability. In 2019 IEEE International Conference on Software Architecture Companion (ICSA-C). IEEE, 267–274.
DOI: Google ScholarCross Ref - [226] . 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).Google ScholarDigital Library
- [227] . 2019. An empirical study of common challenges in developing deep learning applications. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 104–115.
DOI: Google ScholarCross Ref - [228] . 2019. Software engineering practice in the development of deep learning applications. arXiv (2019).
arxiv:1910.03156 .Google Scholar - [229] . 2018. An empirical study on TensorFlow program bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 129–140.
DOI: Google ScholarDigital Library - [230] . 2018. Packaging and sharing machine learning models via the Acumos AI open platform. arXiv (2018).Google Scholar
- [231] . 2018. An AI software test method based on scene deductive approach. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 14–20.
DOI: Google ScholarCross Ref - [232] . 2019. Testing untestable neural machine translation: An industrial case. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 314–315.
DOI: arxiv:1807.02340 .Google ScholarDigital Library - [233] . 2020. DeepBillboard: Systematic physical-world testing of autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. ACM, 347–358.
DOI: arxiv:1812.10812 .Google ScholarDigital Library
Index Terms
- Software Engineering for AI-Based Systems: A Survey
Recommendations
Intelligent Software Engineering: Synergy between AI and Software Engineering
ISEC '18: Proceedings of the 11th Innovations in Software Engineering ConferenceThere has been a long history of applying AI technologies to address software engineering problems especially on tool automation. On the other hand, given the increasing importance and popularity of AI software, recent research efforts have been on ...
Ways of applying artificial intelligence in software engineering
RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software EngineeringAs Artificial Intelligence (AI) techniques become more powerful and easier to use they are increasingly deployed as key components of modern software systems. While this enables new functionality and often allows better adaptation to user needs it also ...
Modern software cybernetics
Classify software cybernetics as Software Cybernetics I and II.Identify the transition from Software Cybernetics I to Software Cybernetics II.Indicate that some new research areas are related to Software Cybernetics II.Highlight new research trends of ...
Comments