Skip to main content
Erschienen in: Empirical Software Engineering 1/2024

01.02.2024

Bug characterization in machine learning-based systems

verfasst von: Mohammad Mehdi Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh, Zhen Ming (Jack) Jiang

Erschienen in: Empirical Software Engineering | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Since corrective maintenance, i.e. identifying and resolving systems bugs, is a key task in the software development process to deliver reliable software components, it is necessary to investigate the usage of ML components, from the software maintenance perspective. Understanding the bugs’ characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate the extracted repositories to exclude non-ML-based systems. Our investigation involved a manual inspection of 386 sampled reported issues in the identified ML-based systems to indicate whether they affect ML components or not. Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. Next, we thoroughly examined 109 identified ML bugs to identify their root causes, and symptoms, and calculate their required fixing time. The results also revealed that ML bugs have significantly different characteristics compared to non-ML bugs, in terms of the complexity of bug-fixing (number of commits, changed files, and changed lines of code). Based on our results, fixing ML bugs is more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively. Hence, paying significant attention to the reliability of the ML components is crucial in ML-based systems. These results deepen the understanding of ML bugs and we hope that our findings help shed light on opportunities for designing effective tools for testing and debugging ML-based systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16). Savannah, GA, USA, USENIX, pp 265–283 Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) symposium on operating systems design and implementation (\(\{\)OSDI\(\}\) 16). Savannah, GA, USA, USENIX, pp 265–283
Zurück zum Zitat Aithal SG, Rao AB, Singh S (2021) Automatic question-answer pairs generation and question similarity mechanism in question answering system. Appl Intell 51(11):8484–8497CrossRef Aithal SG, Rao AB, Singh S (2021) Automatic question-answer pairs generation and question similarity mechanism in question answering system. Appl Intell 51(11):8484–8497CrossRef
Zurück zum Zitat Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41st International conference on software engineering: software engineering in practice (ICSE-SEIP), IEEE, pp 291–300 Amershi S, Begel A, Bird C, DeLine R, Gall H, Kamar E, Nagappan N, Nushi B, Zimmermann T (2019) Software engineering for machine learning: a case study. In: 2019 IEEE/ACM 41st International conference on software engineering: software engineering in practice (ICSE-SEIP), IEEE, pp 291–300
Zurück zum Zitat Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on eclipse technology exchange, ser. eclipse ’05. New York, NY, USA: Association for Computing Machinery, pp 35–39. [Online]. Available: https://doi.org/10.1145/1117696.1117704 Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the 2005 OOPSLA workshop on eclipse technology exchange, ser. eclipse ’05. New York, NY, USA: Association for Computing Machinery, pp 35–39. [Online]. Available: https://​doi.​org/​10.​1145/​1117696.​1117704
Zurück zum Zitat Arcuri A, Briand L (2014) A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw Test Verif Rel 24(3):219–250CrossRef Arcuri A, Briand L (2014) A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw Test Verif Rel 24(3):219–250CrossRef
Zurück zum Zitat Bennett KH, Rajlich VT (2000) Software maintenance and evolution: a roadmap. In: Proceedings of the conference on the future of software engineering, pp 73–87 Bennett KH, Rajlich VT (2000) Software maintenance and evolution: a roadmap. In: Proceedings of the conference on the future of software engineering, pp 73–87
Zurück zum Zitat Bosu A, Carver JC, Bird C, Orbeck J, Chockley C (2016) Process aspects and social dynamics of contemporary code review: insights from open source development and industrial practice at microsoft. IEEE Trans Softw Eng 43(1):56–75CrossRef Bosu A, Carver JC, Bird C, Orbeck J, Chockley C (2016) Process aspects and social dynamics of contemporary code review: insights from open source development and industrial practice at microsoft. IEEE Trans Softw Eng 43(1):56–75CrossRef
Zurück zum Zitat Bosu A, Carver JC (2014) Impact of developer reputation on code review outcomes in oss projects: an empirical investigation. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–10 Bosu A, Carver JC (2014) Impact of developer reputation on code review outcomes in oss projects: an empirical investigation. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–10
Zurück zum Zitat Cao J, Chen B, Sun C, Hu L, Peng X (2021) Characterizing performance bugs in deep learning systems. [Online]. Available: arXiv:2112.01771 Cao J, Chen B, Sun C, Hu L, Peng X (2021) Characterizing performance bugs in deep learning systems. [Online]. Available: arXiv:​2112.​01771
Zurück zum Zitat Carta S, Corriga A, Ferreira A, Podda AS, Recupero DR (2021) A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl Intell 51(2):889–905CrossRef Carta S, Corriga A, Ferreira A, Podda AS, Recupero DR (2021) A multi-layer and multi-ensemble stock trader using deep learning and deep reinforcement learning. Appl Intell 51(2):889–905CrossRef
Zurück zum Zitat Chaturvedi K, Kapur P, Anand S, Singh V (2014) Predicting the complexity of code changes using entropy based measures. Int J Syst Assur Eng Manag 5(2):155–164CrossRef Chaturvedi K, Kapur P, Anand S, Singh V (2014) Predicting the complexity of code changes using entropy based measures. Int J Syst Assur Eng Manag 5(2):155–164CrossRef
Zurück zum Zitat Chen Z, Cao Y, Liu Y, Wang H, Xie T, Liu X (2020) A comprehensive study on challenges in deploying deep learning based software. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2020. New York, NY, USA: Association for Computing Machinery, pp 750–762. [Online]. Available: https://doi.org/10.1145/3368089.3409759 Chen Z, Cao Y, Liu Y, Wang H, Xie T, Liu X (2020) A comprehensive study on challenges in deploying deep learning based software. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2020. New York, NY, USA: Association for Computing Machinery, pp 750–762. [Online]. Available: https://​doi.​org/​10.​1145/​3368089.​3409759
Zurück zum Zitat Chollet F et al (2018) Keras: the python deep learning library. Michigan, United States, pp ascl–1806 Chollet F et al (2018) Keras: the python deep learning library. Michigan, United States, pp ascl–1806
Zurück zum Zitat Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494CrossRef Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114(3):494CrossRef
Zurück zum Zitat Falotico R, Quatto P (2015) Fleiss’ kappa statistic without paradoxes. Quality & Quantity 49(2):463–470CrossRef Falotico R, Quatto P (2015) Fleiss’ kappa statistic without paradoxes. Quality & Quantity 49(2):463–470CrossRef
Zurück zum Zitat Galin D (2004) Software quality assurance: from theory to implementation. pearson.com: Pearson education Galin D (2004) Software quality assurance: from theory to implementation. pearson.​com: Pearson education
Zurück zum Zitat Grubb P, Takang AA (2003) Software maintenance: concepts and practice. World Scientific Grubb P, Takang AA (2003) Software maintenance: concepts and practice. World Scientific
Zurück zum Zitat Hanam Q, Brito FSdM, Mesbah A (2016) Discovering bug patterns in javascript. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 144–156 Hanam Q, Brito FSdM, Mesbah A (2016) Discovering bug patterns in javascript. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 144–156
Zurück zum Zitat Hartling L, Hamm M, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM (2012) Validity and inter-rater reliability testing of quality assessment instruments Hartling L, Hamm M, Milne A, Vandermeer B, Santaguida PL, Ansari M, Tsertsvadze A, Hempel S, Shekelle P, Dryden DM (2012) Validity and inter-rater reliability testing of quality assessment instruments
Zurück zum Zitat Hata H, Kula RG, Ishio T, Treude C (2021) Same file, different changes: the potential of meta-maintenance on github. In: Proceedings of the 43rd international conference on software engineering, IEEE. 3 Park Avenue, New York NY 10016-5997, USA: IEEE Press, pp 773–784. [Online]. Available: https://doi.org/10.1109/ICSE43902.2021.00076 Hata H, Kula RG, Ishio T, Treude C (2021) Same file, different changes: the potential of meta-maintenance on github. In: Proceedings of the 43rd international conference on software engineering, IEEE. 3 Park Avenue, New York NY 10016-5997, USA: IEEE Press, pp 773–784. [Online]. Available: https://​doi.​org/​10.​1109/​ICSE43902.​2021.​00076
Zurück zum Zitat Hu X, Chu L, Pei J, Liu W, Bian J (2021) Model complexity of deep learning: a survey. Knowl Inf Syst 63:2585–2619CrossRef Hu X, Chu L, Pei J, Liu W, Bian J (2021) Model complexity of deep learning: a survey. Knowl Inf Syst 63:2585–2619CrossRef
Zurück zum Zitat Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE ’20. New York, USA: Association for Computing Machinery, pp 1110–1121. [Online]. Available: https://doi.org/10.1145/3377811.3380395 Humbatova N, Jahangirova G, Bavota G, Riccio V, Stocco A, Tonella P (2020) Taxonomy of real faults in deep learning systems. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE ’20. New York, USA: Association for Computing Machinery, pp 1110–1121. [Online]. Available: https://​doi.​org/​10.​1145/​3377811.​3380395
Zurück zum Zitat Humbatova N, Jahangirova G, Tonella P (2021) Deepcrime: mutation testing of deep learning systems based on real faults. In: Proceedings of the30th ACM SIGSOFT international symposium on software testing and analysis, ser. ISSTA 2021. New York, USA: Association for Computing Machinery, pp 67–78. [Online]. Available: https://doi.org/10.1145/3460319.3464825 Humbatova N, Jahangirova G, Tonella P (2021) Deepcrime: mutation testing of deep learning systems based on real faults. In: Proceedings of the30th ACM SIGSOFT international symposium on software testing and analysis, ser. ISSTA 2021. New York, USA: Association for Computing Machinery, pp 67–78. [Online]. Available: https://​doi.​org/​10.​1145/​3460319.​3464825
Zurück zum Zitat IEEE (2010) ISO/IEC/IEEE International Standard - Systems and software engineering – Vocabulary. 3 Park Avenue, New York 10016-5997, USA: IEEE IEEE (2010) ISO/IEC/IEEE International Standard - Systems and software engineering – Vocabulary. 3 Park Avenue, New York 10016-5997, USA: IEEE
Zurück zum Zitat IEEE (2017) IEEE recommended practice on software reliability. 3 Park Avenue, New York 10016-5997, USA: IEEE IEEE (2017) IEEE recommended practice on software reliability. 3 Park Avenue, New York 10016-5997, USA: IEEE
Zurück zum Zitat Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2019. New York, USA: Association for Computing Machinery, pp 510–520. [Online]. Available: https://doi.org/10.1145/3338906.3338955 Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2019. New York, USA: Association for Computing Machinery, pp 510–520. [Online]. Available: https://​doi.​org/​10.​1145/​3338906.​3338955
Zurück zum Zitat Islam MJ, Pan R, Nguyen G, Rajan H (2020) Repairing deep neural networks: fix patterns and challenges. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, USA: Association for Computing Machinery, pp 1135–1146. [Online]. Available: https://doi.org/10.1145/3377811.3380378 Islam MJ, Pan R, Nguyen G, Rajan H (2020) Repairing deep neural networks: fix patterns and challenges. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, USA: Association for Computing Machinery, pp 1135–1146. [Online]. Available: https://​doi.​org/​10.​1145/​3377811.​3380378
Zurück zum Zitat Jia L, Zhong H, Wang X, Huang L, Lu X (2021) The symptoms, causes, and repairs of bugs inside a deep learning library. J Syst Softw 177:110935CrossRef Jia L, Zhong H, Wang X, Huang L, Lu X (2021) The symptoms, causes, and repairs of bugs inside a deep learning library. J Syst Softw 177:110935CrossRef
Zurück zum Zitat Joshua G, Yang F, Junjie S, Sumaya A, Yuan Chen X, Alfred Q (2020) A comprehensive study of autonomous vehicle bugs. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE’ 20. New York, USA: Association for Computing Machinery, pp 385–396. [Online]. Available: https://doi.org/10.1145/3377811.3380397 Joshua G, Yang F, Junjie S, Sumaya A, Yuan Chen X, Alfred Q (2020) A comprehensive study of autonomous vehicle bugs. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, ser. ICSE’ 20. New York, USA: Association for Computing Machinery, pp 385–396. [Online]. Available: https://​doi.​org/​10.​1145/​3377811.​3380397
Zurück zum Zitat Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, De Water B (2018) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 124–133 Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, De Water B (2018) Studying pull request merges: a case study of shopify’s active merchant. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 124–133
Zurück zum Zitat Krishna R, Agrawal A, Rahman A, Sobran A, Menzies T (2018) What is the connection between issues, bugs, and enhancements? In: 2018 IEEE/ACM 40th International conference on software engineering: software engineering in practice track (ICSE-SEIP). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 306–315 Krishna R, Agrawal A, Rahman A, Sobran A, Menzies T (2018) What is the connection between issues, bugs, and enhancements? In: 2018 IEEE/ACM 40th International conference on software engineering: software engineering in practice track (ICSE-SEIP). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 306–315
Zurück zum Zitat Lenarduzzi V, Lomio F, Moreschini S, Taibi D, Tamburri DA (2021) Software quality for ai: where we are now? In: Winkler D, Biffl S, Mendez D, Wimmer M, Bergsmann J (eds) Software quality: future perspectives on software engineering quality. Springer International Publishing, Cham, pp 43–53 Lenarduzzi V, Lomio F, Moreschini S, Taibi D, Tamburri DA (2021) Software quality for ai: where we are now? In: Winkler D, Biffl S, Mendez D, Wimmer M, Bergsmann J (eds) Software quality: future perspectives on software engineering quality. Springer International Publishing, Cham, pp 43–53
Zurück zum Zitat Liu Z, Li D, Ge SS, Tian F (2020) Small traffic sign detection from large image. Appl Intell 50(1):1–13CrossRef Liu Z, Li D, Ge SS, Tian F (2020) Small traffic sign detection from large image. Appl Intell 50(1):1–13CrossRef
Zurück zum Zitat Liu C, Lu J, Li G, Yuan T, Li L, Tan F, Yang J, You L, Xue J (2021) Detecting tensorflow program bugs in real-world industrial environment. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 55–66 Liu C, Lu J, Li G, Yuan T, Li L, Tan F, Yang J, You L, Xue J (2021) Detecting tensorflow program bugs in real-world industrial environment. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 55–66
Zurück zum Zitat Li S, Wu Y, Liu Y, Wang D, Wen M, Tao Y, Sui Y, Liu Y (2020) An exploratory study of bugs in extended reality applications on the web. In: 2020 IEEE 31st International symposium on software reliability engineering (ISSRE). 3 Park Avenue, New York 10016–5997, USA: IEEE, pp 172–183 Li S, Wu Y, Liu Y, Wang D, Wen M, Tao Y, Sui Y, Liu Y (2020) An exploratory study of bugs in extended reality applications on the web. In: 2020 IEEE 31st International symposium on software reliability engineering (ISSRE). 3 Park Avenue, New York 10016–5997, USA: IEEE, pp 172–183
Zurück zum Zitat Long G, Chen T (2022) On reporting performance and accuracy bugs for deep learning frameworks: an exploratory study from github. [Online]. Available: arXiv:2204.07893 Long G, Chen T (2022) On reporting performance and accuracy bugs for deep learning frameworks: an exploratory study from github. [Online]. Available: arXiv:​2204.​07893
Zurück zum Zitat Lyu MR (2007) Software reliability engineering: a roadmap. In: Future of software engineering (FOSE’07). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 153–170 Lyu MR (2007) Software reliability engineering: a roadmap. In: Future of software engineering (FOSE’07). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 153–170
Zurück zum Zitat Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Universitas Psychologica 10(2):545–555CrossRef Macbeth G, Razumiejczyk E, Ledesma RD (2011) Cliff’s delta calculator: a non-parametric effect size program for two groups of observations. Universitas Psychologica 10(2):545–555CrossRef
Zurück zum Zitat Maddila C, Bansal C, Nagappan N (2019) Predicting pull request completion time: a case study on large scale cloud services. In: Proceedings of the 2019 27th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 874–882 Maddila C, Bansal C, Nagappan N (2019) Predicting pull request completion time: a case study on large scale cloud services. In: Proceedings of the 2019 27th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 874–882
Zurück zum Zitat Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems: a survey Martínez-Fernández S, Bogner J, Franch X, Oriol M, Siebert J, Trendowicz A, Vollmer AM, Wagner S (2021) Software engineering for ai-based systems: a survey
Zurück zum Zitat Menzies T (2019) The five laws of se for ai. IEEE Softw 37(1):81–85CrossRef Menzies T (2019) The five laws of se for ai. IEEE Softw 37(1):81–85CrossRef
Zurück zum Zitat Morovati MM, Nikanjam A, Khomh F, Jiang ZM (2023) Bugs in machine learning-based systems: a faultload benchmark. Empir Softw Eng 28(3):62CrossRef Morovati MM, Nikanjam A, Khomh F, Jiang ZM (2023) Bugs in machine learning-based systems: a faultload benchmark. Empir Softw Eng 28(3):62CrossRef
Zurück zum Zitat Ni Z, Li B, Sun X, Chen T, Tang B, Shi X (2020) Analyzing bug fix for automatic bug cause classification. J Syst Softw 163:110538CrossRef Ni Z, Li B, Sun X, Chen T, Tang B, Shi X (2020) Analyzing bug fix for automatic bug cause classification. J Syst Softw 163:110538CrossRef
Zurück zum Zitat Nikanjam A, Braiek HB, Morovati MM, Khomh F (2021) Automatic fault detection for deep learning programs using graph transformations. ACM Trans Softw Eng Methodol (TOSEM) 31(1):1–27CrossRef Nikanjam A, Braiek HB, Morovati MM, Khomh F (2021) Automatic fault detection for deep learning programs using graph transformations. ACM Trans Softw Eng Methodol (TOSEM) 31(1):1–27CrossRef
Zurück zum Zitat Nikanjam A, Morovati MM, Khomh F, Ben Braiek H (2022) Faults in deep reinforcement learning programs: a taxonomy and a detection approach. Autom Softw Eng 29(1):1–32CrossRef Nikanjam A, Morovati MM, Khomh F, Ben Braiek H (2022) Faults in deep reinforcement learning programs: a taxonomy and a detection approach. Autom Softw Eng 29(1):1–32CrossRef
Zurück zum Zitat Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Ithaca, NY, United States Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Ithaca, NY, United States
Zurück zum Zitat Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. [Online]. Available: http://scikit-learn.sourceforge.net Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. [Online]. Available: http://​scikit-learn.​sourceforge.​net
Zurück zum Zitat Quach S, Lamothe M, Kamei Y, Shang W (2021) An empirical study on the use of szz for identifying inducing changes of non-functional bugs. Empir Softw Eng 26(4):1–25CrossRef Quach S, Lamothe M, Kamei Y, Shang W (2021) An empirical study on the use of szz for identifying inducing changes of non-functional bugs. Empir Softw Eng 26(4):1–25CrossRef
Zurück zum Zitat Riccio V, Jahangirova G, Stocco A, Humbatova N, Weiss M, Tonella P (2020) Testing machine learning based systems: a systematic mapping. Empir Softw Eng 25(6):5193–5254CrossRef Riccio V, Jahangirova G, Stocco A, Humbatova N, Weiss M, Tonella P (2020) Testing machine learning based systems: a systematic mapping. Empir Softw Eng 25(6):5193–5254CrossRef
Zurück zum Zitat Rivera-Landos E, Khomh F, Nikanjam A (2021) The challenge of reproducible ml: an empirical study on the impact of bugs Rivera-Landos E, Khomh F, Nikanjam A (2021) The challenge of reproducible ml: an empirical study on the impact of bugs
Zurück zum Zitat Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: are the t-test and cohen’sd indices the most appropriate choices. In: Annual meeting of the Southern association for institutional research. The Pennsylvania State University, Citeseer, pp 1–51 Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: are the t-test and cohen’sd indices the most appropriate choices. In: Annual meeting of the Southern association for institutional research. The Pennsylvania State University, Citeseer, pp 1–51
Zurück zum Zitat Romano A, Liu X, Kwon Y, Wang W (2021) An empirical study of bugs in webassembly compilers. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). 3 Park Avenue, New York NY 10016-5997, USA: IEEE, pp 42–54 Romano A, Liu X, Kwon Y, Wang W (2021) An empirical study of bugs in webassembly compilers. In: 2021 36th IEEE/ACM International conference on automated software engineering (ASE). 3 Park Avenue, New York NY 10016-5997, USA: IEEE, pp 42–54
Zurück zum Zitat Schober P, Boer C, Schwarte LA (2018) Correlation coefficients: appropriate use and interpretation. Anesth Analg 126(5):1763–1768CrossRef Schober P, Boer C, Schwarte LA (2018) Correlation coefficients: appropriate use and interpretation. Anesth Analg 126(5):1763–1768CrossRef
Zurück zum Zitat Schoop E, Huang F, Hartmann B (2021) Umlaut: debugging deep learning programs using program structure and model behavior. In: Proceedings of the 2021 CHI conference on human factors in computing systems, ser. CHI’ 21. New York, USA: Association for Computing Machinery, [Online]. Available: https://doi.org/10.1145/3411764.3445538 Schoop E, Huang F, Hartmann B (2021) Umlaut: debugging deep learning programs using program structure and model behavior. In: Proceedings of the 2021 CHI conference on human factors in computing systems, ser. CHI’ 21. New York, USA: Association for Computing Machinery, [Online]. Available: https://​doi.​org/​10.​1145/​3411764.​3445538
Zurück zum Zitat Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo J-F, Dennison D (2015) Hidden technical debt in machine learning systems. Adv Neural Inf Process Syst 28 Sculley D, Holt G, Golovin D, Davydov E, Phillips T, Ebner D, Chaudhary V, Young M, Crespo J-F, Dennison D (2015) Hidden technical debt in machine learning systems. Adv Neural Inf Process Syst 28
Zurück zum Zitat Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572CrossRef Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572CrossRef
Zurück zum Zitat Shen Q, Ma H, Chen J, Tian Y, Cheung S-C, Chen X (2021) A comprehensive study of deep learning compiler bugs. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, USA: Association for Computing Machinery, pp 968–980. [Online]. Available: https://doi.org/10.1145/3468264.3468591 Shen Q, Ma H, Chen J, Tian Y, Cheung S-C, Chen X (2021) A comprehensive study of deep learning compiler bugs. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, USA: Association for Computing Machinery, pp 968–980. [Online]. Available: https://​doi.​org/​10.​1145/​3468264.​3468591
Zurück zum Zitat Tagra A, Zhang H, Rajbahadur GK, Hassan AE (2022) Revisiting reopened bugs in open source software systems. Empir Softw Eng 27(4):1–34CrossRef Tagra A, Zhang H, Rajbahadur GK, Hassan AE (2022) Revisiting reopened bugs in open source software systems. Empir Softw Eng 27(4):1–34CrossRef
Zurück zum Zitat Tambon F, Nikanjam A, An L, Khomh F, Antoniol G (2021) Silent bugs in deep learning frameworks: an empirical study of keras and tensorflow Tambon F, Nikanjam A, An L, Khomh F, Antoniol G (2021) Silent bugs in deep learning frameworks: an empirical study of keras and tensorflow
Zurück zum Zitat Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empir Softw Eng 19:1665–1705CrossRef Tan L, Liu C, Li Z, Wang X, Zhou Y, Zhai C (2014) Bug characteristics in open source software. Empir Softw Eng 19:1665–1705CrossRef
Zurück zum Zitat Wang H, Pham H et al (2006) Reliability and optimal maintenance. Springer International Publishing, Springer, p 14197 Wang H, Pham H et al (2006) Reliability and optimal maintenance. Springer International Publishing, Springer, p 14197
Zurück zum Zitat Wardat M, Cruz BD, Le W, Rajan H (2022) Deepdiagnosis: automatically diagnosing faults and recommending actionable fixes in deep learning programs. In: Proceedings of the 44th international conference on software engineering, pp 561–572 Wardat M, Cruz BD, Le W, Rajan H (2022) Deepdiagnosis: automatically diagnosing faults and recommending actionable fixes in deep learning programs. In: Proceedings of the 44th international conference on software engineering, pp 561–572
Zurück zum Zitat Wardat M, Le W, Rajan H (2021) Deeplocalize: fault localization for deep neural networks. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 251–262 Wardat M, Le W, Rajan H (2021) Deeplocalize: fault localization for deep neural networks. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE). 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 251–262
Zurück zum Zitat Wirsansky E (2020) Hands-on genetic algorithms with Python: applying genetic algorithms to solve real-world deep learning and artificial intelligence problems. Packt Publishing Ltd, Packt Publishing Ltd Wirsansky E (2020) Hands-on genetic algorithms with Python: applying genetic algorithms to solve real-world deep learning and artificial intelligence problems. Packt Publishing Ltd, Packt Publishing Ltd
Zurück zum Zitat Yan M, Chen J, Zhang X, Tan L, Wang G, Wang Z (2021) Exposing numerical bugs in deep learning via gradient back-propagation. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, USA: Association for Computing Machinery, pp 627–638. [Online]. Available: https://doi.org/10.1145/3468264.3468612 Yan M, Chen J, Zhang X, Tan L, Wang G, Wang Z (2021) Exposing numerical bugs in deep learning via gradient back-propagation. In: Proceedings of the 29th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, ser. ESEC/FSE 2021. New York, USA: Association for Computing Machinery, pp 627–638. [Online]. Available: https://​doi.​org/​10.​1145/​3468264.​3468612
Zurück zum Zitat Yang Y, He T, Feng Y, Liu S, Xu B (2022) Mining python fix patterns via analyzing fine-grained source code changes. Empir Softw Eng 27(2):1–37CrossRef Yang Y, He T, Feng Y, Liu S, Xu B (2022) Mining python fix patterns via analyzing fine-grained source code changes. Empir Softw Eng 27(2):1–37CrossRef
Zurück zum Zitat Yao Y, Xiao Z, Wang B, Viswanath B, Zheng H, Zhao BY (2017) Complexity vs. performance: empirical analysis of machine learning as a service. In: Proceedings of the 2017 internet measurement conference, pp 384–397 Yao Y, Xiao Z, Wang B, Viswanath B, Zheng H, Zhao BY (2017) Complexity vs. performance: empirical analysis of machine learning as a service. In: Proceedings of the 2017 internet measurement conference, pp 384–397
Zurück zum Zitat Zhang JM, Harman M, Ma L, Liu Y (2022) Machine learning testing: survey, landscapes and horizons. IEEE Trans Softw Eng 48(01):1–36CrossRef Zhang JM, Harman M, Ma L, Liu Y (2022) Machine learning testing: survey, landscapes and horizons. IEEE Trans Softw Eng 48(01):1–36CrossRef
Zurück zum Zitat Zhang Y, Chen Y, Cheung S-C, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 129–140 Zhang Y, Chen Y, Cheung S-C, Xiong Y, Zhang L (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis. 3 Park Avenue, New York 10016-5997, USA: IEEE, pp 129–140
Zurück zum Zitat Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th International symposium on software reliability engineering (ISSRE). 3 Park Avenue, New York 10016–5997, USA: IEEE, pp 104–115 Zhang T, Gao C, Ma L, Lyu M, Kim M (2019) An empirical study of common challenges in developing deep learning applications. In: 2019 IEEE 30th International symposium on software reliability engineering (ISSRE). 3 Park Avenue, New York 10016–5997, USA: IEEE, pp 104–115
Zurück zum Zitat Zhang X, Zhai J, Ma S, Shen C (2021) Autotrainer: an automatic dnn training problem detection and repair system. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE). IEEE, pp 359–371 Zhang X, Zhai J, Ma S, Shen C (2021) Autotrainer: an automatic dnn training problem detection and repair system. In: 2021 IEEE/ACM 43rd International conference on software engineering (ICSE). IEEE, pp 359–371
Zurück zum Zitat Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: 2012 34th International conference on software engineering (ICSE). IEEE, pp 1074–1083 Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: 2012 34th International conference on software engineering (ICSE). IEEE, pp 1074–1083
Metadaten
Titel
Bug characterization in machine learning-based systems
verfasst von
Mohammad Mehdi Morovati
Amin Nikanjam
Florian Tambon
Foutse Khomh
Zhen Ming (Jack) Jiang
Publikationsdatum
01.02.2024
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 1/2024
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-023-10400-0

Weitere Artikel der Ausgabe 1/2024

Empirical Software Engineering 1/2024 Zur Ausgabe

Premium Partner