Skip to main content
Top
Published in: Empirical Software Engineering 1/2024

01-02-2024

Bug characterization in machine learning-based systems

Authors: Mohammad Mehdi Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh, Zhen Ming (Jack) Jiang

Published in: Empirical Software Engineering | Issue 1/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Since corrective maintenance, i.e. identifying and resolving systems bugs, is a key task in the software development process to deliver reliable software components, it is necessary to investigate the usage of ML components, from the software maintenance perspective. Understanding the bugs’ characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate the extracted repositories to exclude non-ML-based systems. Our investigation involved a manual inspection of 386 sampled reported issues in the identified ML-based systems to indicate whether they affect ML components or not. Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. Next, we thoroughly examined 109 identified ML bugs to identify their root causes, and symptoms, and calculate their required fixing time. The results also revealed that ML bugs have significantly different characteristics compared to non-ML bugs, in terms of the complexity of bug-fixing (number of commits, changed files, and changed lines of code). Based on our results, fixing ML bugs is more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively. Hence, paying significant attention to the reliability of the ML components is crucial in ML-based systems. These results deepen the understanding of ML bugs and we hope that our findings help shed light on opportunities for designing effective tools for testing and debugging ML-based systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16). Savannah, GA, USA, USENIX, pp 265–283 Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16). Savannah, GA, USA, USENIX, pp 265–283
go back to reference Aithal SG, Rao AB, Singh S (2021) Automatic question-answer pairs generation and question similarity mechanism in question answering system. Appl Intell 51(11):8484–8497CrossRef Aithal SG, Rao AB, Singh S (2021) Automatic question-answer pairs generation and question similarity mechanism in question answering system. Appl Intell 51(11):8484–8497CrossRef
go back to reference Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41st International conference on software engineering: software engineering in practice (ICSE-SEIP), IEEE, pp 291–300 Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41st International conference on software engineering: software engineering in practice (ICSE-SEIP), IEEE, pp 291–300
go back to reference Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on eclipse technology exchange, ser. eclipse ’05. New York, NY, USA: Association for Computing Machinery, pp 35–39. [Online]. Available: https://doi.org/10.1145/1117696.1117704 Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on eclipse technology exchange, ser. eclipse ’05. New York, NY, USA: Association for Computing Machinery, pp 35–39. [Online]. Available: https://​doi.​org/​10.​1145/​1117696.​1117704
go back to reference Arcuri A, Briand L (2014) A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw Test Verif Rel 24(3):219–250CrossRef Arcuri A, Briand L (2014) A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw Test Verif Rel 24(3):219–250CrossRef
go back to reference Bennett KH, Rajlich VT (2000) Software maintenance and evolution: a roadmap. In: Proceedings of the conference on the future of software engineering, pp 73–87 Bennett KH, Rajlich VT (2000) Software maintenance and evolution: a roadmap. In: Proceedings of the conference on the future of software engineering, pp 73–87
go back to reference Bosu A, Carver JC, Bird C, Orbeck J, Chockley C (2016) Process aspects and social dynamics of contemporary code review: insights from open source development and industrial practice at microsoft. IEEE Trans Softw Eng 43(1):56–75CrossRef Bosu A, Carver JC, Bird C, Orbeck J, Chockley C (2016) Process aspects and social dynamics of contemporary code review: insights from open source development and industrial practice at microsoft. IEEE Trans Softw Eng 43(1):56–75CrossRef
go back to reference Bosu A, Carver JC (2014) Impact of developer reputation on code review outcomes in oss projects: an empirical investigation. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–10 Bosu A, Carver JC (2014) Impact of developer reputation on code review outcomes in oss projects: an empirical investigation. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–10
go back to reference Cao J, Chen B, Sun C, Hu L, Peng X (2021) Characterizing performance bugs in deep learning systems. [Online]. Available: arXiv:2112.01771 Cao J, Chen B, Sun C, Hu L, Peng X (2021) Characterizing performance bugs in deep learning systems. [Online]. Available: arXiv:​2112.​01771
go back to reference Carta S, Corriga A, Ferreira A, Podda AS, Recupero DR (2021) A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl Intell 51(2):889–905CrossRef Carta S, Corriga A, Ferreira A, Podda AS, Recupero DR (2021) A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl Intell 51(2):889–905CrossRef
go back to reference Chaturvedi K, Kapur P, Anand S, Singh V (2014) Predicting the complexity of code changes using entropy based measures. Int J Syst Assur Eng Manag 5(2):155–164CrossRef Chaturvedi K, Kapur P, Anand S, Singh V (2014) Predicting the complexity of code changes using entropy based measures. Int J Syst Assur Eng Manag 5(2):155–164CrossRef
go back to reference Chen Z, Cao Y, Liu Y, Wang H, Xie T, Liu X (2020) A comprehensive study on challenges in deploying deep learning based software. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2020. New York, NY, USA: Association for Computing Machinery, pp 750–762. [Online]. Available: https://doi.org/10.1145/3368089.3409759 Chen Z, Cao Y, Liu Y, Wang H, Xie T, Liu X (2020) A comprehensive study on challenges in deploying deep learning based software. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2020. New York, NY, USA: Association for Computing Machinery, pp 750–762. [Online]. Available: https://​doi.​org/​10.​1145/​3368089.​3409759
go back to reference Chollet F et al (2018) Keras: the python deep learning library. Michigan, United States, pp ascl–1806 Chollet F et al (2018) Keras: the python deep learning library. Michigan, United States, pp ascl–1806
go back to reference Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494CrossRef Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494CrossRef
go back to reference Falotico R, Quatto P (2015) Fleiss’ kappa statistic without paradoxes. Quality & Quantity 49(2):463–470CrossRef Falotico R, Quatto P (2015) Fleiss’ kappa statistic without paradoxes. Quality & Quantity 49(2):463–470CrossRef
go back to reference Galin D (2004) Software quality assurance: from theory to implementation. pearson.com: Pearson education Galin D (2004) Software quality assurance: from theory to implementation. pearson.​com: Pearson education
go back to reference Grubb P, Takang AA (2003) Software maintenance: concepts and practice. World Scientific Grubb P, Takang AA (2003) Software maintenance: concepts and practice. World Scientific
go back to reference Hanam Q, Brito FSdM, Mesbah A (2016) Discovering bug patterns in javascript. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 144–156 Hanam Q, Brito FSdM, Mesbah A (2016) Discovering bug patterns in javascript. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 144–156
go back to reference Hartling L, Hamm M, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM (2012) Validity and inter-rater reliability testing of quality assessment instruments Hartling L, Hamm M, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM (2012) Validity and inter-rater reliability testing of quality assessment instruments
go back to reference Hata H, Kula RG, Ishio T, Treude C (2021) Same file, different changes: the potential of meta-maintenance on github. In: Proceedings of the 43rd international conference on software engineering, IEEE. 3 Park Avenue, New York NY 10016-5997, USA: IEEE Press, pp 773–784. [Online]. Available: https://doi.org/10.1109/ICSE43902.2021.00076 Hata H, Kula RG, Ishio T, Treude C (2021) Same file, different changes: the potential of meta-maintenance on github. In: Proceedings of the 43rd international conference on software engineering, IEEE. 3 Park Avenue, New York NY 10016-5997, USA: IEEE Press, pp 773–784. [Online]. Available: https://​doi.​org/​10.​1109/​ICSE43902.​2021.​00076
go back to reference Hu X, Chu L, Pei J, Liu W, Bian J (2021) Model complexity of deep learning: a survey. Knowl Inf Syst 63:2585–2619CrossRef Hu X, Chu L, Pei J, Liu W, Bian J (2021) Model complexity of deep learning: a survey. Knowl Inf Syst 63:2585–2619CrossRef
go back to reference Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE ’20. New York, USA: Association for Computing Machinery, pp 1110–1121. [Online]. Available: https://doi.org/10.1145/3377811.3380395 Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE ’20. New York, USA: Association for Computing Machinery, pp 1110–1121. [Online]. Available: https://​doi.​org/​10.​1145/​3377811.​3380395
go back to reference Humbatova N, Jahangirova G, Tonella P (2021) Deepcrime: mutation testing of deep learning systems based on real faults. In: Proceedings of the30th ACM SIGSOFT international symposium on software testing and analysis, ser. ISSTA 2021. New York, USA: Association for Computing Machinery, pp 67–78. [Online]. Available: https://doi.org/10.1145/3460319.3464825 Humbatova N, Jahangirova G, Tonella P (2021) Deepcrime: mutation testing of deep learning systems based on real faults. In: Proceedings of the30th ACM SIGSOFT international symposium on software testing and analysis, ser. ISSTA 2021. New York, USA: Association for Computing Machinery, pp 67–78. [Online]. Available: https://​doi.​org/​10.​1145/​3460319.​3464825
go back to reference IEEE (2010) ISO/IEC/IEEE International Standard - Systems and software engineering – Vocabulary. 3 Park Avenue, New York 10016-5997, USA: IEEE IEEE (2010) ISO/IEC/IEEE International Standard - Systems and software engineering – Vocabulary. 3 Park Avenue, New York 10016-5997, USA: IEEE
go back to reference IEEE (2017) IEEE recommended practice on software reliability. 3 Park Avenue, New York 10016-5997, USA: IEEE IEEE (2017) IEEE recommended practice on software reliability. 3 Park Avenue, New York 10016-5997, USA: IEEE
go back to reference Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2019. New York, USA: Association for Computing Machinery, pp 510–520. [Online]. Available: https://doi.org/10.1145/3338906.3338955 Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2019. New York, USA: Association for Computing Machinery, pp 510–520. [Online]. Available: https://​doi.​org/​10.​1145/​3338906.​3338955
go back to reference Islam MJ, Pan R, Nguyen G, Rajan H (2020) Repairing deep neural networks: fix patterns and challenges. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, USA: Association for Computing Machinery, pp 1135–1146. [Online]. Available: https://doi.org/10.1145/3377811.3380378 Islam MJ, Pan R, Nguyen G, Rajan H (2020) Repairing deep neural networks: fix patterns and challenges. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, USA: Association for Computing Machinery, pp 1135–1146. [Online]. Available: https://​doi.​org/​10.​1145/​3377811.​3380378
go back to reference Jia L, Zhong H, Wang X, Huang L, Lu X (2021) The symptoms, causes, and repairs of bugs inside a deep learning library. J Syst Softw 177:110935CrossRef Jia L, Zhong H, Wang X, Huang L, Lu X (2021) The symptoms, causes, and repairs of bugs inside a deep learning library. J Syst Softw 177:110935CrossRef
go back to reference Joshua G, Yang F, Junjie S, Sumaya A, Yuan Chen X, Alfred Q (2020) A comprehensive study of autonomous vehicle bugs. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE’ 20. New York, USA: Association for Computing Machinery, pp 385–396. [Online]. Available: https://doi.org/10.1145/3377811.3380397 Joshua G, Yang F, Junjie S, Sumaya A, Yuan Chen X, Alfred Q (2020) A comprehensive study of autonomous vehicle bugs. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE’ 20. New York, USA: Association for Computing Machinery, pp 385–396. [Online]. Available: https://​doi.​org/​10.​1145/​3377811.​3380397
go back to reference Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, De Water B (2018) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 124–133 Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, De Water B (2018) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 124–133
go back to reference Krishna R, Agrawal A, Rahman A, Sobran A, Menzies T (2018) What is the connection between issues, bugs, and enhancements? In: 2018 IEEE/ACM 40th International conference on software engineering: software engineering in practice track (ICSE-SEIP). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 306–315 Krishna R, Agrawal A, Rahman A, Sobran A, Menzies T (2018) What is the connection between issues, bugs, and enhancements? In: 2018 IEEE/ACM 40th International conference on software engineering: software engineering in practice track (ICSE-SEIP). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 306–315
go back to reference Lenarduzzi V, Lomio F, Moreschini S, Taibi D, Tamburri DA (2021) Software quality for ai: where we are now? In: Winkler D, Biffl S, Mendez D, Wimmer M, Bergsmann J (eds) Software quality: future perspectives on software engineering quality. Springer International Publishing, Cham, pp 43–53 Lenarduzzi V, Lomio F, Moreschini S, Taibi D, Tamburri DA (2021) Software quality for ai: where we are now? In: Winkler D, Biffl S, Mendez D, Wimmer M, Bergsmann J (eds) Software quality: future perspectives on software engineering quality. Springer International Publishing, Cham, pp 43–53
go back to reference Liu Z, Li D, Ge SS, Tian F (2020) Small traffic sign detection from large image. Appl Intell 50(1):1–13CrossRef Liu Z, Li D, Ge SS, Tian F (2020) Small traffic sign detection from large image. Appl Intell 50(1):1–13CrossRef
go back to reference Liu C, Lu J, Li G, Yuan T, Li L, Tan F, Yang J, You L, Xue J (2021) Detecting tensorflow program bugs in real-world industrial environment. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 55–66 Liu C, Lu J, Li G, Yuan T, Li L, Tan F, Yang J, You L, Xue J (2021) Detecting tensorflow program bugs in real-world industrial environment. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 55–66
go back to reference Li S, Wu Y, Liu Y, Wang D, Wen M, Tao Y, Sui Y, Liu Y (2020) An exploratory study of bugs in extended reality applications on the web. In: 2020 IEEE 31st International symposium on software reliability engineering (ISSRE). 3 Park Avenue, New York 10016–5997, USA: IEEE, pp 172–183 Li S, Wu Y, Liu Y, Wang D, Wen M, Tao Y, Sui Y, Liu Y (2020) An exploratory study of bugs in extended reality applications on the web. In: 2020 IEEE 31st International symposium on software reliability engineering (ISSRE). 3 Park Avenue, New York 10016–5997, USA: IEEE, pp 172–183
go back to reference Long G, Chen T (2022) On reporting performance and accuracy bugs for deep learning frameworks: an exploratory study from github. [Online]. Available: arXiv:2204.07893 Long G, Chen T (2022) On reporting performance and accuracy bugs for deep learning frameworks: an exploratory study from github. [Online]. Available: arXiv:​2204.​07893
go back to reference Lyu MR (2007) Software reliability engineering: a roadmap. In: Future of software engineering (FOSE’07). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 153–170 Lyu MR (2007) Software reliability engineering: a roadmap. In: Future of software engineering (FOSE’07). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 153–170
go back to reference Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Universitas Psychologica 10(2):545–555CrossRef Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Universitas Psychologica 10(2):545–555CrossRef
go back to reference Maddila C, Bansal C, Nagappan N (2019) Predicting pull request completion time: a case study on large scale cloud services. In: Proceedings of the 2019 27th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 874–882 Maddila C, Bansal C, Nagappan N (2019) Predicting pull request completion time: a case study on large scale cloud services. In: Proceedings of the 2019 27th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 874–882
go back to reference Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems: a survey Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems: a survey
go back to reference Morovati MM, Nikanjam A, Khomh F, Jiang ZM (2023) Bugs in machine learning-based systems: a faultload benchmark. Empir Softw Eng 28(3):62CrossRef Morovati MM, Nikanjam A, Khomh F, Jiang ZM (2023) Bugs in machine learning-based systems: a faultload benchmark. Empir Softw Eng 28(3):62CrossRef
go back to reference Ni Z, Li B, Sun X, Chen T, Tang B, Shi X (2020) Analyzing bug fix for automatic bug cause classification. J Syst Softw 163:110538CrossRef Ni Z, Li B, Sun X, Chen T, Tang B, Shi X (2020) Analyzing bug fix for automatic bug cause classification. J Syst Softw 163:110538CrossRef
go back to reference Nikanjam A, Braiek HB, Morovati MM, Khomh F (2021) Automatic fault detection for deep learning programs using graph transformations. ACM Trans Softw Eng Methodol (TOSEM) 31(1):1–27CrossRef Nikanjam A, Braiek HB, Morovati MM, Khomh F (2021) Automatic fault detection for deep learning programs using graph transformations. ACM Trans Softw Eng Methodol (TOSEM) 31(1):1–27CrossRef
go back to reference Nikanjam A, Morovati MM, Khomh F, Ben Braiek H (2022) Faults in deep reinforcement learning programs: a taxonomy and a detection approach. Autom Softw Eng 29(1):1–32CrossRef Nikanjam A, Morovati MM, Khomh F, Ben Braiek H (2022) Faults in deep reinforcement learning programs: a taxonomy and a detection approach. Autom Softw Eng 29(1):1–32CrossRef
go back to reference Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Ithaca, NY, United States Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Ithaca, NY, United States
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. [Online]. Available: http://scikit-learn.sourceforge.net Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. [Online]. Available: http://​scikit-learn.​sourceforge.​net
go back to reference Quach S, Lamothe M, Kamei Y, Shang W (2021) An empirical study on the use of szz for identifying inducing changes of non-functional bugs. Empir Softw Eng 26(4):1–25CrossRef Quach S, Lamothe M, Kamei Y, Shang W (2021) An empirical study on the use of szz for identifying inducing changes of non-functional bugs. Empir Softw Eng 26(4):1–25CrossRef
go back to reference Riccio V, Jahangirova G, Stocco A, Humbatova N, Weiss M, Tonella P (2020) Testing machine learning based systems: a systematic mapping. Empir Softw Eng 25(6):5193–5254CrossRef Riccio V, Jahangirova G, Stocco A, Humbatova N, Weiss M, Tonella P (2020) Testing machine learning based systems: a systematic mapping. Empir Softw Eng 25(6):5193–5254CrossRef
go back to reference Rivera-Landos E, Khomh F, Nikanjam A (2021) The challenge of reproducible ml: an empirical study on the impact of bugs Rivera-Landos E, Khomh F, Nikanjam A (2021) The challenge of reproducible ml: an empirical study on the impact of bugs
go back to reference Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: are the t-test and cohen’sd indices the most appropriate choices. In: Annual meeting of the Southern association for institutional research. The Pennsylvania State University, Citeseer, pp 1–51 Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: are the t-test and cohen’sd indices the most appropriate choices. In: Annual meeting of the Southern association for institutional research. The Pennsylvania State University, Citeseer, pp 1–51
go back to reference Romano A, Liu X, Kwon Y, Wang W (2021) An empirical study of bugs in webassembly compilers. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). 3 Park Avenue, New York NY 10016-5997, USA: IEEE, pp 42–54 Romano A, Liu X, Kwon Y, Wang W (2021) An empirical study of bugs in webassembly compilers. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). 3 Park Avenue, New York NY 10016-5997, USA: IEEE, pp 42–54
go back to reference Schober P, Boer C, Schwarte LA (2018) Correlation coefficients: appropriate use and interpretation. Anesth Analg 126(5):1763–1768CrossRef Schober P, Boer C, Schwarte LA (2018) Correlation coefficients: appropriate use and interpretation. Anesth Analg 126(5):1763–1768CrossRef
go back to reference Schoop E, Huang F, Hartmann B (2021) Umlaut: debugging deep learning programs using program structure and model behavior. In: Proceedings of the 2021 CHI conference on human factors in computing systems, ser. CHI’ 21. New York, USA: Association for Computing Machinery, [Online]. Available: https://doi.org/10.1145/3411764.3445538 Schoop E, Huang F, Hartmann B (2021) Umlaut: debugging deep learning programs using program structure and model behavior. In: Proceedings of the 2021 CHI conference on human factors in computing systems, ser. CHI’ 21. New York, USA: Association for Computing Machinery, [Online]. Available: https://​doi.​org/​10.​1145/​3411764.​3445538
go back to reference Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo J-F, Dennison D (2015) Hidden technical debt in machine learning systems. Adv Neural Inf Process Syst 28 Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo J-F, Dennison D (2015) Hidden technical debt in machine learning systems. Adv Neural Inf Process Syst 28
go back to reference Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572CrossRef Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572CrossRef
go back to reference Shen Q, Ma H, Chen J, Tian Y, Cheung S-C, Chen X (2021) A comprehensive study of deep learning compiler bugs. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, USA: Association for Computing Machinery, pp 968–980. [Online]. Available: https://doi.org/10.1145/3468264.3468591 Shen Q, Ma H, Chen J, Tian Y, Cheung S-C, Chen X (2021) A comprehensive study of deep learning compiler bugs. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, USA: Association for Computing Machinery, pp 968–980. [Online]. Available: https://​doi.​org/​10.​1145/​3468264.​3468591
go back to reference Tagra A, Zhang H, Rajbahadur GK, Hassan AE (2022) Revisiting reopened bugs in open source software systems. Empir Softw Eng 27(4):1–34CrossRef Tagra A, Zhang H, Rajbahadur GK, Hassan AE (2022) Revisiting reopened bugs in open source software systems. Empir Softw Eng 27(4):1–34CrossRef
go back to reference Tambon F, Nikanjam A, An L, Khomh F, Antoniol G (2021) Silent bugs in deep learning frameworks: an empirical study of keras and tensorflow Tambon F, Nikanjam A, An L, Khomh F, Antoniol G (2021) Silent bugs in deep learning frameworks: an empirical study of keras and tensorflow
go back to reference Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empir Softw Eng 19:1665–1705CrossRef Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empir Softw Eng 19:1665–1705CrossRef
go back to reference Wang H, Pham H et al (2006) Reliability and optimal maintenance. Springer International Publishing, Springer, p 14197 Wang H, Pham H et al (2006) Reliability and optimal maintenance. Springer International Publishing, Springer, p 14197
go back to reference Wardat M, Cruz BD, Le W, Rajan H (2022) Deepdiagnosis: automatically diagnosing faults and recommending actionable fixes in deep learning programs. In: Proceedings of the 44th international conference on software engineering, pp 561–572 Wardat M, Cruz BD, Le W, Rajan H (2022) Deepdiagnosis: automatically diagnosing faults and recommending actionable fixes in deep learning programs. In: Proceedings of the 44th international conference on software engineering, pp 561–572
go back to reference Wardat M, Le W, Rajan H (2021) Deeplocalize: fault localization for deep neural networks. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 251–262 Wardat M, Le W, Rajan H (2021) Deeplocalize: fault localization for deep neural networks. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 251–262
go back to reference Wirsansky E (2020) Hands-on genetic algorithms with Python: applying genetic algorithms to solve real-world deep learning and artificial intelligence problems. Packt Publishing Ltd, Packt Publishing Ltd Wirsansky E (2020) Hands-on genetic algorithms with Python: applying genetic algorithms to solve real-world deep learning and artificial intelligence problems. Packt Publishing Ltd, Packt Publishing Ltd
go back to reference Yan M, Chen J, Zhang X, Tan L, Wang G, Wang Z (2021) Exposing numerical bugs in deep learning via gradient back-propagation. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, USA: Association for Computing Machinery, pp 627–638. [Online]. Available: https://doi.org/10.1145/3468264.3468612 Yan M, Chen J, Zhang X, Tan L, Wang G, Wang Z (2021) Exposing numerical bugs in deep learning via gradient back-propagation. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, USA: Association for Computing Machinery, pp 627–638. [Online]. Available: https://​doi.​org/​10.​1145/​3468264.​3468612
go back to reference Yang Y, He T, Feng Y, Liu S, Xu B (2022) Mining python fix patterns via analyzing fine-grained source code changes. Empir Softw Eng 27(2):1–37CrossRef Yang Y, He T, Feng Y, Liu S, Xu B (2022) Mining python fix patterns via analyzing fine-grained source code changes. Empir Softw Eng 27(2):1–37CrossRef
go back to reference Yao Y, Xiao Z, Wang B, Viswanath B, Zheng H, Zhao BY (2017) Complexity vs. performance: empirical analysis of machine learning as a service. In: Proceedings of the 2017 internet measurement conference, pp 384–397 Yao Y, Xiao Z, Wang B, Viswanath B, Zheng H, Zhao BY (2017) Complexity vs. performance: empirical analysis of machine learning as a service. In: Proceedings of the 2017 internet measurement conference, pp 384–397
go back to reference Zhang JM, Harman M, Ma L, Liu Y (2022) Machine learning testing: survey, landscapes and horizons. IEEE Trans Softw Eng 48(01):1–36CrossRef Zhang JM, Harman M, Ma L, Liu Y (2022) Machine learning testing: survey, landscapes and horizons. IEEE Trans Softw Eng 48(01):1–36CrossRef
go back to reference Zhang Y, Chen Y, Cheung S-C, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 129–140 Zhang Y, Chen Y, Cheung S-C, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 129–140
go back to reference Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th International symposium on software reliability engineering (ISSRE). 3 Park Avenue, New York 10016–5997, USA: IEEE, pp 104–115 Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th International symposium on software reliability engineering (ISSRE). 3 Park Avenue, New York 10016–5997, USA: IEEE, pp 104–115
go back to reference Zhang X, Zhai J, Ma S, Shen C (2021) Autotrainer: an automatic dnn training problem detection and repair system. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE). IEEE, pp 359–371 Zhang X, Zhai J, Ma S, Shen C (2021) Autotrainer: an automatic dnn training problem detection and repair system. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE). IEEE, pp 359–371
go back to reference Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: 2012 34th International conference on software engineering (ICSE). IEEE, pp 1074–1083 Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: 2012 34th International conference on software engineering (ICSE). IEEE, pp 1074–1083
Metadata
Title
Bug characterization in machine learning-based systems
Authors
Mohammad Mehdi Morovati
Amin Nikanjam
Florian Tambon
Foutse Khomh
Zhen Ming (Jack) Jiang
Publication date
01-02-2024
Publisher
Springer US
Published in
Empirical Software Engineering / Issue 1/2024
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-023-10400-0

Other articles of this Issue 1/2024

Empirical Software Engineering 1/2024 Go to the issue

Premium Partner