skip to main content
10.1145/3338906.3338955acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open Access

A comprehensive study on deep learning bug characteristics

Published:12 August 2019Publication History

ABSTRACT

Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of such bugs? What impacts do such bugs have? Which stages of deep learning pipeline are more bug prone? Are there any antipatterns? Understanding such characteristics of bugs in deep learning software has the potential to foster the development of better deep learning platforms, debugging mechanisms, development practices, and encourage the development of analysis and verification frameworks. Therefore, we study 2716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, root causes of bugs, impacts of bugs, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. The key findings of our study include: data bug and logic bug are the most severe bug types in deep learning software appearing more than 48% of the times, major root causes of these bugs are Incorrect Model Parameter (IPS) and Structural Inefficiency (SI) showing up more than 43% of the times.We have also found that the bugs in the usage of deep learning libraries have some common antipatterns.

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265–283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexander Shvets. 2017. Software Development AntiPatterns. https:// sourcemaking.com/antipatterns/software-development-antipatterns.Google ScholarGoogle Scholar
  3. Anton Barua, Stephen W Thomas, and Ahmed E Hassan. 2014. What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Software Engineering 19, 3 (2014), 619–654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gabriele Bavota, Mario Linares-Vasquez, Carlos Eduardo Bernal-Cardenas, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2015. The impact of api change-and fault-proneness on the user ratings of android apps. IEEE Transactions on Software Engineering 41, 4 (2015), 384–407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Boris Beizer. 1984. Software system testing and quality assurance. Van Nostrand Reinhold Co. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sumon Biswas, Md Johirul Islam, Yijia Huang, and Hridesh Rajan. 2019. Boa Meets Python: A Boa Dataset of Data Science Software in Python Language. In MSR’19: 16th International Conference on Mining Software Repositories. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. François Chollet et al. 2015. Keras. https://github.com/fchollet/keras.Google ScholarGoogle Scholar
  8. Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. 2002. Torch: a modular machine learning software library. Technical Report. Idiap.Google ScholarGoogle Scholar
  9. Danny Dig and Ralph Johnson. 2006. How do APIs evolve? A story of refactoring. Journal of software maintenance and evolution: Research and Practice 18, 2 (2006), 83–107. Google ScholarGoogle ScholarCross RefCross Ref
  10. Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A Language and Infrastructure for Analyzing Ultra-Large-Scale Software Repositories. In Proceedings of the 35th International Conference on Software Engineering (ICSE’13). 422–431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2015. Boa: an Enabling Language and Infrastructure for Ultra-large Scale MSR Studies. The Art and Science of Analyzing Software Data (2015), 593–621.Google ScholarGoogle Scholar
  12. Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2015. Boa: Ultra-Large-Scale Software Repository and Source-Code Mining. ACM Trans. Softw. Eng. Methodol. 25, 1, Article 7 (2015), 7:1–7:34 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yu Gao, Wensheng Dou, Feng Qin, Chushu Gao, Dong Wang, Jun Wei, Ruirui Huang, Li Zhou, and Yongming Wu. 2018. An empirical study on crash recovery bugs in large-scale distributed systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 539–550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David Gómez and Alfonso Rojas. 2016. An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification. Neural computation 28, 1 (2016), 216–228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675–678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. David Kavaler, Daryl Posnett, Clint Gibler, Hao Chen, Premkumar Devanbu, and Vladimir Filkov. 2013. Using and asking: Apis used in the android market and asked about in stackoverflow. In International Conference on Social Informatics. Springer, 405–418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Raula Gaikovina Kula, Ali Ouni, Daniel M German, and Katsuro Inoue. 2018. An empirical study on the impact of refactoring activities on evolving client-used APIs. Information and Software Technology 93 (2018), 186–199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jun Li, Yingfei Xiong, Xuanzhe Liu, and Lu Zhang. 2013. How does web service API evolution affect clients?. In 2013 IEEE 20th International Conference on Web Services. IEEE, 300–307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mario Linares-Vásquez, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Denys Poshyvanyk. 2014. How do api changes trigger stack overflow discussions? a study on the android sdk. In proceedings of the 22nd International Conference on Program Comprehension. ACM, 83–94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ACM SIGARCH Computer Architecture News, Vol. 36. ACM, 329–339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sarah Meldrum, Sherlock A Licorish, and Bastin Tony Roy Savarimuthu. 2017. Crowdsourced Knowledge on Stack Overflow: A Systematic Mapping Study. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 180–185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Seyyed Ehsan Salamati Taba, Foutse Khomh, Ying Zou, Ahmed E Hassan, and Meiyappan Nagappan. 2013. Predicting bugs using antipatterns. In 2013 IEEE International Conference on Software Maintenance. IEEE, 270–279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, et al. 2016. Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688 (2016).Google ScholarGoogle Scholar
  24. Ferdian Thung, Shaowei Wang, David Lo, and Lingxiao Jiang. 2012. An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. IEEE, 271–280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Anthony J Viera, Joanne M Garrett, et al. 2005. Understanding interobserver agreement: the kappa statistic. Fam med 37, 5 (2005), 360–363.Google ScholarGoogle Scholar
  26. Yufeng Guo. 2017. The 7 Steps of Machine Learning. https://towardsdatascience. com/the-7-steps-of-machine-learning-2877d7e5548e.Google ScholarGoogle Scholar

Index Terms

  1. A comprehensive study on deep learning bug characteristics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
        August 2019
        1264 pages
        ISBN:9781450355728
        DOI:10.1145/3338906

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate112of543submissions,21%

        Upcoming Conference

        FSE '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader