skip to main content
10.1145/3238147.3238213acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

How many of all bugs do we find? a study of static bug detectors

Published:03 September 2018Publication History

ABSTRACT

Static bug detectors are becoming increasingly popular and are widely used by professional software developers. While most work on bug detectors focuses on whether they find bugs at all, and on how many false positives they report in addition to legitimate warnings, the inverse question is often neglected: How many of all real-world bugs do static bug detectors find? This paper addresses this question by studying the results of applying three widely used static bug detectors to an extended version of the Defects4J dataset that consists of 15 Java projects with 594 known bugs. To decide which of these bugs the tools detect, we use a novel methodology that combines an automatic analysis of warnings and bugs with a manual validation of each candidate of a detected bug. The results of the study show that: (i) static bug detectors find a non-negligible amount of all bugs, (ii) different tools are mostly complementary to each other, and (iii) current bug detectors miss the large majority of the studied bugs. A detailed analysis of bugs missed by the static detectors shows that some bugs could have been found by variants of the existing detectors, while others are domain-specific problems that do not match any existing bug pattern. These findings help potential users of such tools to assess their utility, motivate and outline directions for future work on static bug detection, and provide a basis for future comparisons of static bug detection with other bug finding techniques, such as manual and automated testing.

References

  1. Edward Aftandilian, Raluca Sauciuc, Siddharth Priya, and Sundaresan Krishnan. 2012. Building Useful Program Analysis Tools Using an Extensible Java Compiler. In 12th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2012, Riva del Garda, Italy, September 23-24, 2012. 14–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mohammad Moein Almasi, Hadi Hemmati, Gordon Fraser, Andrea Arcuri, and Janis Benefelds. 2017. An Industrial Evaluation of Unit Test Generation: Finding Real Faults in a Financial Application. In 39th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice Track, ICSE-SEIP 2017, Buenos Aires, Argentina, May 20-28, 2017. 263–272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. 2008. Using Static Analysis to Find Bugs. IEEE Software 25, 5 (2008), 22–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nathaniel Ayewah and William Pugh. 2010. The Google FindBugs fixit. In Proceedings of the Nineteenth International Symposium on Software Testing and Analysis, ISSTA 2010, Trento, Italy, July 12-16, 2010. 241–252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66–75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fraser Brown, Shravan Narayan, Riad S. Wahby, Dawson R. Engler, Ranjit Jhala, and Deian Stefan. 2017. Finding and Preventing Bugs in JavaScript Bindings. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. 559–578.Google ScholarGoogle Scholar
  7. Cristiano Calcagno, Dino Distefano, Jérémy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving fast with software verification. In NASA Formal Methods Symposium. Springer, 3–11.Google ScholarGoogle ScholarCross RefCross Ref
  8. Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson R. Engler. 2001. An Empirical Study of Operating System Errors. In Symposium on Operating Systems Principles (SOSP). 73–88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In Symposium on Operating Systems Principles (SOSP). ACM, 57– 72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. David Hovemeyer and William Pugh. 2004. Finding bugs is easy. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). ACM, 132–136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don’t software developers use static analysis tools to find bugs?. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 672–681. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. C. Johnson. 1978. Lint, a C Program Checker.Google ScholarGoogle Scholar
  13. René Just, Darioush Jalali, Laura Inozemtseva, Michael D Ernst, Reid Holmes, and Gordon Fraser. 2014. Are mutants a valid substitute for real faults in software testing?. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 654–665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sunghun Kim and Michael D. Ernst. 2007. Which warnings should I fix first?. In European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 45–54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ted Kremenek and Dawson R. Engler. 2003. Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations. In International Symposium on Static Analysis (SAS). Springer, 295–315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: Mining More Bugs by Reducing Noise Interference. In ICSE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. L. Lions. 1996. ARIANE 5 Flight 501 Failure. Report by the Inquiry Board. European Space Agency.Google ScholarGoogle Scholar
  18. Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. 2005. Bugbench: Benchmarks for evaluating bug detection tools. In Workshop on the Evaluation of Software Defect Detection Tools.Google ScholarGoogle Scholar
  19. Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 329–339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Matias Martinez, Thomas Durieux, Romain Sommerard, Jifeng Xuan, and Martin Monperrus. 2017. Automatic repair of real bugs in java: A large-scale experiment on the defects4j dataset. Empirical Software Engineering 22, 4 (2017), 1936–1964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Steve McConnell. 2004. Code Complete: A Practical Handbook of Software Construction, Second Edition. Microsoft Press.Google ScholarGoogle Scholar
  22. Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based mining of multiple object usage patterns. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 383–392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Frolin S. Ocariza Jr., Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. 2013. An Empirical Study of Client-Side JavaScript Bugs. In Symposium on Empirical Software Engineering and Measurement (ESEM). 55–64.Google ScholarGoogle Scholar
  24. Nicolas Palix, Gaël Thomas 0001, Suman Saha, Christophe Calvès, Julia L. Lawall, and Gilles Muller. 2011. Faults in linux: ten years later. In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 305–318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kai Pan, Sunghun Kim, and E. James Whitehead Jr. 2009. Toward an understanding of bug fix patterns. Empirical Software Engineering 14, 3 (2009), 286–315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and improving fault localization. In Software Engineering (ICSE), 2017 IEEE/ACM 39th International Conference on. IEEE, 609–620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jacques A. Pienaar and Robert Hundt. 2013. JSWhiz: Static analysis for JavaScript memory leaks. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, Shenzhen, China, February 23-27, 2013. 11:1–11:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kevin Poulsen. 2004. Software Bug Contributed to Blackout. SecurityFocus.Google ScholarGoogle Scholar
  29. Michael Pradel and Thomas R. Gross. 2011. Detecting anomalies in the order of equally-typed method arguments. In International Symposium on Software Testing and Analysis (ISSTA). 232–242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Michael Pradel, Ciera Jaspan, Jonathan Aldrich, and Thomas R. Gross. 2012. Statically Checking API Protocol Conformance with Mined Multi-Object Specifications. In International Conference on Software Engineering (ICSE). 925–935. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Foyzur Rahman, Sameer Khatri, Earl T Barr, and Premkumar Devanbu. 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering. ACM, 424–434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar T. Devanbu. 2016. On the "naturalness" of buggy code. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 428–439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting Argument Selection Defects. In OOPSLA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Nick Rutar, Christian B. Almazan, and Jeffrey S. Foster. 2004. A Comparison of Bug Finding Tools for Java. In International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society, 245–256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Joseph R. Ruthruff, John Penix, J. David Morgenthaler, Sebastian Elbaum, and Gregg Rothermel. 2008. Predicting accurate and actionable static analysis warnings: an experimental approach. In International Conference on Software Engineering (ICSE). 341–350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter. 2015. Tricorder: Building a Program Analysis Ecosystem. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 598–608. http://dl.acm.org/citation.cfm?id= 2818754.2818828 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Marija Selakovic and Michael Pradel. 2016. Performance Issues and Optimizations in JavaScript: An Empirical Study. In International Conference on Software Engineering (ICSE). 61–72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sina Shamshiri, René Just, José Miguel Rojas, Gordon Fraser, Phil McMinn, and Andrea Arcuri. 2015. Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015. 201–211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ferdian Thung, Lucia, David Lo, Lingxiao Jiang, Foyzur Rahman, and Premkumar T. Devanbu. 2012. To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools. In Conference on Automated Software Engineering (ASE). ACM, 50–59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ferdian Thung, Lucia, David Lo, Lingxiao Jiang, Foyzur Rahman, and Premkumar T. Devanbu. 2015. To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools. Autom. Softw. Eng. 22, 4 (2015), 561–602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Stefan Wagner, Jan Jürjens, Claudia Koller, and Peter Trischberger. 2005. Comparing Bug Finding Tools with Reviews and Tests. In International Conference on Testing of Communicating Systems (TestCom). Springer, 40–55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: bug detection with n-gram language models. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 708–719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Andrzej Wasylkowski and Andreas Zeller. 2009. Mining Temporal Specifications from Object Usage. In International Conference on Automated Software Engineering (ASE). IEEE, 295–306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Murray Wood, Marc Roper, Andrew Brooks, and James Miller. 1997. Comparing and Combining Software Defect Detection Techniques: A Replicated Empirical Study. In Software Engineering - ESEC/FSE ’97, 6th European Software Engineering Conference Held Jointly with the 5th ACM SIGSOFT Symposium on Foundations of Software Engineering, Zurich, Switzerland, September 22-25, 1997, Proceedings. 262–277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jiang Zheng, Laurie A. Williams, Nachiappan Nagappan, Will Snipes, John P. Hudepohl, and Mladen A. Vouk. 2006. On the Value of Static Analysis for Fault Detection in Software. IEEE Trans. Software Eng. 32, 4 (2006), 240–253. ASE ’18, September 3–7, 2018, Montpellier, France Andrew Habib and Michael Pradel Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How many of all bugs do we find? a study of static bug detectors

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering
          September 2018
          955 pages
          ISBN:9781450359375
          DOI:10.1145/3238147

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 September 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate82of337submissions,24%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader