ABSTRACT
Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of such bugs? What impacts do such bugs have? Which stages of deep learning pipeline are more bug prone? Are there any antipatterns? Understanding such characteristics of bugs in deep learning software has the potential to foster the development of better deep learning platforms, debugging mechanisms, development practices, and encourage the development of analysis and verification frameworks. Therefore, we study 2716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, root causes of bugs, impacts of bugs, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. The key findings of our study include: data bug and logic bug are the most severe bug types in deep learning software appearing more than 48% of the times, major root causes of these bugs are Incorrect Model Parameter (IPS) and Structural Inefficiency (SI) showing up more than 43% of the times.We have also found that the bugs in the usage of deep learning libraries have some common antipatterns.
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265–283. Google ScholarDigital Library
- Alexander Shvets. 2017. Software Development AntiPatterns. https:// sourcemaking.com/antipatterns/software-development-antipatterns.Google Scholar
- Anton Barua, Stephen W Thomas, and Ahmed E Hassan. 2014. What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Software Engineering 19, 3 (2014), 619–654. Google ScholarDigital Library
- Gabriele Bavota, Mario Linares-Vasquez, Carlos Eduardo Bernal-Cardenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2015. The impact of api change-and fault-proneness on the user ratings of android apps. IEEE Transactions on Software Engineering 41, 4 (2015), 384–407.Google ScholarDigital Library
- Boris Beizer. 1984. Software system testing and quality assurance. Van Nostrand Reinhold Co. Google ScholarDigital Library
- Sumon Biswas, Md Johirul Islam, Yijia Huang, and Hridesh Rajan. 2019. Boa Meets Python: A Boa Dataset of Data Science Software in Python Language. In MSR’19: 16th International Conference on Mining Software Repositories. Google ScholarDigital Library
- François Chollet et al. 2015. Keras. https://github.com/fchollet/keras.Google Scholar
- Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. 2002. Torch: a modular machine learning software library. Technical Report. Idiap.Google Scholar
- Danny Dig and Ralph Johnson. 2006. How do APIs evolve? A story of refactoring. Journal of software maintenance and evolution: Research and Practice 18, 2 (2006), 83–107. Google ScholarCross Ref
- Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). 422–431. Google ScholarDigital Library
- Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2015. Boa: an Enabling Language and Infrastructure for Ultra-large Scale MSR Studies. The Art and Science of Analyzing Software Data (2015), 593–621.Google Scholar
- Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2015. Boa: Ultra-Large-Scale Software Repository and Source-Code Mining. ACM Trans. Softw. Eng. Methodol. 25, 1, Article 7 (2015), 7:1–7:34 pages. Google ScholarDigital Library
- Yu Gao, Wensheng Dou, Feng Qin, Chushu Gao, Dong Wang, Jun Wei, Ruirui Huang, Li Zhou, and Yongming Wu. 2018. An empirical study on crash recovery bugs in large-scale distributed systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 539–550. Google ScholarDigital Library
- David Gómez and Alfonso Rojas. 2016. An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification. Neural computation 28, 1 (2016), 216–228. Google ScholarDigital Library
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675–678. Google ScholarDigital Library
- David Kavaler, Daryl Posnett, Clint Gibler, Hao Chen, Premkumar Devanbu, and Vladimir Filkov. 2013. Using and asking: Apis used in the android market and asked about in stackoverflow. In International Conference on Social Informatics. Springer, 405–418. Google ScholarDigital Library
- Raula Gaikovina Kula, Ali Ouni, Daniel M German, and Katsuro Inoue. 2018. An empirical study on the impact of refactoring activities on evolving client-used APIs. Information and Software Technology 93 (2018), 186–199. Google ScholarDigital Library
- Jun Li, Yingfei Xiong, Xuanzhe Liu, and Lu Zhang. 2013. How does web service API evolution affect clients?. In 2013 IEEE 20th International Conference on Web Services. IEEE, 300–307. Google ScholarDigital Library
- Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2014. How do api changes trigger stack overflow discussions? a study on the android sdk. In proceedings of the 22nd International Conference on Program Comprehension. ACM, 83–94. Google ScholarDigital Library
- Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ACM SIGARCH Computer Architecture News, Vol. 36. ACM, 329–339. Google ScholarDigital Library
- Sarah Meldrum, Sherlock A Licorish, and Bastin Tony Roy Savarimuthu. 2017. Crowdsourced Knowledge on Stack Overflow: A Systematic Mapping Study. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 180–185. Google ScholarDigital Library
- Seyyed Ehsan Salamati Taba, Foutse Khomh, Ying Zou, Ahmed E Hassan, and Meiyappan Nagappan. 2013. Predicting bugs using antipatterns. In 2013 IEEE International Conference on Software Maintenance. IEEE, 270–279. Google ScholarDigital Library
- The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, et al. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688 (2016).Google Scholar
- Ferdian Thung, Shaowei Wang, David Lo, and Lingxiao Jiang. 2012. An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. IEEE, 271–280. Google ScholarDigital Library
- Anthony J Viera, Joanne M Garrett, et al. 2005. Understanding interobserver agreement: the kappa statistic. Fam med 37, 5 (2005), 360–363.Google Scholar
- Yufeng Guo. 2017. The 7 Steps of Machine Learning. https://towardsdatascience. com/the-7-steps-of-machine-learning-2877d7e5548e.Google Scholar
Index Terms
- A comprehensive study on deep learning bug characteristics
Recommendations
Repairing deep neural networks: fix patterns and challenges
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringSignificant interest in applying Deep Neural Network (DNN) has fueled the need to support engineering of software that uses DNNs. Repairing software that uses DNNs is one such unmistakable SE need where automated tools could be beneficial; however, we ...
A comprehensive empirical study on bug characteristics of deep learning frameworks
Abstract Context:Deep Learning (DL) frameworks enable developers to build DNN models without learning the underlying algorithms and models. While some of these DL-based software systems have been deployed in safety-critical areas, ...
Exploring the Impact of Code Clones on Deep Learning Software
Deep learning (DL) is a really active topic in recent years. Code cloning is a common code implementation that could negatively impact software maintenance. For DL software, developers rely heavily on frameworks to implement DL features. Meanwhile, to ...
Comments