A comprehensive study on deep learning bug characteristics

Authors:
Md Johirul Islam

Iowa State University, USA

Iowa State University, USA
View Profile

,
Giang Nguyen

Iowa State University, USA

Iowa State University, USA
View Profile

,
Rangeet Pan

Iowa State University, USA

Iowa State University, USA
View Profile

,
Hridesh Rajan

Iowa State University, USA

Iowa State University, USA
View Profile

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringAugust 2019Pages 510–520https://doi.org/10.1145/3338906.3338955

Published:12 August 2019Publication History

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 510–520

ABSTRACT

Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of such bugs? What impacts do such bugs have? Which stages of deep learning pipeline are more bug prone? Are there any antipatterns? Understanding such characteristics of bugs in deep learning software has the potential to foster the development of better deep learning platforms, debugging mechanisms, development practices, and encourage the development of analysis and verification frameworks. Therefore, we study 2716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, root causes of bugs, impacts of bugs, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. The key findings of our study include: data bug and logic bug are the most severe bug types in deep learning software appearing more than 48% of the times, major root causes of these bugs are Incorrect Model Parameter (IPS) and Structural Inefficiency (SI) showing up more than 43% of the times.We have also found that the bugs in the usage of deep learning libraries have some common antipatterns.

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265–283. Google ScholarDigital Library
Alexander Shvets. 2017. Software Development AntiPatterns. https:// sourcemaking.com/antipatterns/software-development-antipatterns.Google Scholar
Anton Barua, Stephen W Thomas, and Ahmed E Hassan. 2014. What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Software Engineering 19, 3 (2014), 619–654. Google ScholarDigital Library
Gabriele Bavota, Mario Linares-Vasquez, Carlos Eduardo Bernal-Cardenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2015. The impact of api change-and fault-proneness on the user ratings of android apps. IEEE Transactions on Software Engineering 41, 4 (2015), 384–407.Google ScholarDigital Library
Boris Beizer. 1984. Software system testing and quality assurance. Van Nostrand Reinhold Co. Google ScholarDigital Library
Sumon Biswas, Md Johirul Islam, Yijia Huang, and Hridesh Rajan. 2019. Boa Meets Python: A Boa Dataset of Data Science Software in Python Language. In MSR’19: 16th International Conference on Mining Software Repositories. Google ScholarDigital Library
François Chollet et al. 2015. Keras. https://github.com/fchollet/keras.Google Scholar
Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. 2002. Torch: a modular machine learning software library. Technical Report. Idiap.Google Scholar
Danny Dig and Ralph Johnson. 2006. How do APIs evolve? A story of refactoring. Journal of software maintenance and evolution: Research and Practice 18, 2 (2006), 83–107. Google ScholarCross Ref
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). 422–431. Google ScholarDigital Library
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2015. Boa: an Enabling Language and Infrastructure for Ultra-large Scale MSR Studies. The Art and Science of Analyzing Software Data (2015), 593–621.Google Scholar
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2015. Boa: Ultra-Large-Scale Software Repository and Source-Code Mining. ACM Trans. Softw. Eng. Methodol. 25, 1, Article 7 (2015), 7:1–7:34 pages. Google ScholarDigital Library
Yu Gao, Wensheng Dou, Feng Qin, Chushu Gao, Dong Wang, Jun Wei, Ruirui Huang, Li Zhou, and Yongming Wu. 2018. An empirical study on crash recovery bugs in large-scale distributed systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 539–550. Google ScholarDigital Library
David Gómez and Alfonso Rojas. 2016. An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification. Neural computation 28, 1 (2016), 216–228. Google ScholarDigital Library
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675–678. Google ScholarDigital Library
David Kavaler, Daryl Posnett, Clint Gibler, Hao Chen, Premkumar Devanbu, and Vladimir Filkov. 2013. Using and asking: Apis used in the android market and asked about in stackoverflow. In International Conference on Social Informatics. Springer, 405–418. Google ScholarDigital Library
Raula Gaikovina Kula, Ali Ouni, Daniel M German, and Katsuro Inoue. 2018. An empirical study on the impact of refactoring activities on evolving client-used APIs. Information and Software Technology 93 (2018), 186–199. Google ScholarDigital Library
Jun Li, Yingfei Xiong, Xuanzhe Liu, and Lu Zhang. 2013. How does web service API evolution affect clients?. In 2013 IEEE 20th International Conference on Web Services. IEEE, 300–307. Google ScholarDigital Library
Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2014. How do api changes trigger stack overflow discussions? a study on the android sdk. In proceedings of the 22nd International Conference on Program Comprehension. ACM, 83–94. Google ScholarDigital Library
Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ACM SIGARCH Computer Architecture News, Vol. 36. ACM, 329–339. Google ScholarDigital Library
Sarah Meldrum, Sherlock A Licorish, and Bastin Tony Roy Savarimuthu. 2017. Crowdsourced Knowledge on Stack Overflow: A Systematic Mapping Study. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 180–185. Google ScholarDigital Library
Seyyed Ehsan Salamati Taba, Foutse Khomh, Ying Zou, Ahmed E Hassan, and Meiyappan Nagappan. 2013. Predicting bugs using antipatterns. In 2013 IEEE International Conference on Software Maintenance. IEEE, 270–279. Google ScholarDigital Library
The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, et al. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688 (2016).Google Scholar
Ferdian Thung, Shaowei Wang, David Lo, and Lingxiao Jiang. 2012. An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. IEEE, 271–280. Google ScholarDigital Library
Anthony J Viera, Joanne M Garrett, et al. 2005. Understanding interobserver agreement: the kappa statistic. Fam med 37, 5 (2005), 360–363.Google Scholar
Yufeng Guo. 2017. The 7 Steps of Machine Learning. https://towardsdatascience. com/the-7-steps-of-machine-learning-2877d7e5548e.Google Scholar

Index Terms

A comprehensive study on deep learning bug characteristics
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Repairing deep neural networks: fix patterns and challenges
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Significant interest in applying Deep Neural Network (DNN) has fueled the need to support engineering of software that uses DNNs. Repairing software that uses DNNs is one such unmistakable SE need where automated tools could be beneficial; however, we ...
Read More
A comprehensive empirical study on bug characteristics of deep learning frameworks
Abstract Context:
Deep Learning (DL) frameworks enable developers to build DNN models without learning the underlying algorithms and models. While some of these DL-based software systems have been deployed in safety-critical areas, ...
Read More
Exploring the Impact of Code Clones on Deep Learning Software
Deep learning (DL) is a really active topic in recent years. Code cloning is a common code implementation that could negatively impact software maintenance. For DL software, developers rely heavily on frameworks to implement DL features. Meanwhile, to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2019
1264 pages
ISBN:9781450355728
DOI:10.1145/3338906
General Chairs:
Marlon Dumas
University of Tartu, Estonia
,
Dietmar Pfahl
University of Tartu, Estonia
,
Program Chairs:
Sven Apel
Saarland University, Germany
,
Alessandra Russo
Imperial College, UK
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bugs
Deep learning bugs
Deep learning software
Empirical Study of Bugs
Q&A forums
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 173
  Total Citations
  View Citations
- 3,143
  Total Downloads
- Downloads (Last 12 months)793
- Downloads (Last 6 weeks)96
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A comprehensive study on deep learning bug characteristics

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Repairing deep neural networks: fix patterns and challenges

A comprehensive empirical study on bug characteristics of deep learning frameworks

Exploring the Impact of Code Clones on Deep Learning Software