ABSTRACT
Static bug detectors are becoming increasingly popular and are widely used by professional software developers. While most work on bug detectors focuses on whether they find bugs at all, and on how many false positives they report in addition to legitimate warnings, the inverse question is often neglected: How many of all real-world bugs do static bug detectors find? This paper addresses this question by studying the results of applying three widely used static bug detectors to an extended version of the Defects4J dataset that consists of 15 Java projects with 594 known bugs. To decide which of these bugs the tools detect, we use a novel methodology that combines an automatic analysis of warnings and bugs with a manual validation of each candidate of a detected bug. The results of the study show that: (i) static bug detectors find a non-negligible amount of all bugs, (ii) different tools are mostly complementary to each other, and (iii) current bug detectors miss the large majority of the studied bugs. A detailed analysis of bugs missed by the static detectors shows that some bugs could have been found by variants of the existing detectors, while others are domain-specific problems that do not match any existing bug pattern. These findings help potential users of such tools to assess their utility, motivate and outline directions for future work on static bug detection, and provide a basis for future comparisons of static bug detection with other bug finding techniques, such as manual and automated testing.
- Edward Aftandilian, Raluca Sauciuc, Siddharth Priya, and Sundaresan Krishnan. 2012. Building Useful Program Analysis Tools Using an Extensible Java Compiler. In 12th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2012, Riva del Garda, Italy, September 23-24, 2012. 14–23. Google ScholarDigital Library
- Mohammad Moein Almasi, Hadi Hemmati, Gordon Fraser, Andrea Arcuri, and Janis Benefelds. 2017. An Industrial Evaluation of Unit Test Generation: Finding Real Faults in a Financial Application. In 39th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice Track, ICSE-SEIP 2017, Buenos Aires, Argentina, May 20-28, 2017. 263–272. Google ScholarDigital Library
- Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. 2008. Using Static Analysis to Find Bugs. IEEE Software 25, 5 (2008), 22–29. Google ScholarDigital Library
- Nathaniel Ayewah and William Pugh. 2010. The Google FindBugs fixit. In Proceedings of the Nineteenth International Symposium on Software Testing and Analysis, ISSTA 2010, Trento, Italy, July 12-16, 2010. 241–252. Google ScholarDigital Library
- Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66–75. Google ScholarDigital Library
- Fraser Brown, Shravan Narayan, Riad S. Wahby, Dawson R. Engler, Ranjit Jhala, and Deian Stefan. 2017. Finding and Preventing Bugs in JavaScript Bindings. In 2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA, USA, May 22-26, 2017. 559–578.Google Scholar
- Cristiano Calcagno, Dino Distefano, Jérémy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving fast with software verification. In NASA Formal Methods Symposium. Springer, 3–11.Google ScholarCross Ref
- Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson R. Engler. 2001. An Empirical Study of Operating System Errors. In Symposium on Operating Systems Principles (SOSP). 73–88. Google ScholarDigital Library
- Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In Symposium on Operating Systems Principles (SOSP). ACM, 57– 72. Google ScholarDigital Library
- David Hovemeyer and William Pugh. 2004. Finding bugs is easy. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). ACM, 132–136. Google ScholarDigital Library
- Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge. 2013. Why don’t software developers use static analysis tools to find bugs?. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 672–681. Google ScholarDigital Library
- S. C. Johnson. 1978. Lint, a C Program Checker.Google Scholar
- René Just, Darioush Jalali, Laura Inozemtseva, Michael D Ernst, Reid Holmes, and Gordon Fraser. 2014. Are mutants a valid substitute for real faults in software testing?. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 654–665. Google ScholarDigital Library
- Sunghun Kim and Michael D. Ernst. 2007. Which warnings should I fix first?. In European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 45–54. Google ScholarDigital Library
- Ted Kremenek and Dawson R. Engler. 2003. Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations. In International Symposium on Static Analysis (SAS). Springer, 295–315. Google ScholarDigital Library
- Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: Mining More Bugs by Reducing Noise Interference. In ICSE. Google ScholarDigital Library
- J. L. Lions. 1996. ARIANE 5 Flight 501 Failure. Report by the Inquiry Board. European Space Agency.Google Scholar
- Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. 2005. Bugbench: Benchmarks for evaluating bug detection tools. In Workshop on the Evaluation of Software Defect Detection Tools.Google Scholar
- Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. 2008. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 329–339. Google ScholarDigital Library
- Matias Martinez, Thomas Durieux, Romain Sommerard, Jifeng Xuan, and Martin Monperrus. 2017. Automatic repair of real bugs in java: A large-scale experiment on the defects4j dataset. Empirical Software Engineering 22, 4 (2017), 1936–1964. Google ScholarDigital Library
- Steve McConnell. 2004. Code Complete: A Practical Handbook of Software Construction, Second Edition. Microsoft Press.Google Scholar
- Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based mining of multiple object usage patterns. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 383–392. Google ScholarDigital Library
- Frolin S. Ocariza Jr., Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. 2013. An Empirical Study of Client-Side JavaScript Bugs. In Symposium on Empirical Software Engineering and Measurement (ESEM). 55–64.Google Scholar
- Nicolas Palix, Gaël Thomas 0001, Suman Saha, Christophe Calvès, Julia L. Lawall, and Gilles Muller. 2011. Faults in linux: ten years later. In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 305–318. Google ScholarDigital Library
- Kai Pan, Sunghun Kim, and E. James Whitehead Jr. 2009. Toward an understanding of bug fix patterns. Empirical Software Engineering 14, 3 (2009), 286–315. Google ScholarDigital Library
- Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and improving fault localization. In Software Engineering (ICSE), 2017 IEEE/ACM 39th International Conference on. IEEE, 609–620. Google ScholarDigital Library
- Jacques A. Pienaar and Robert Hundt. 2013. JSWhiz: Static analysis for JavaScript memory leaks. In Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, Shenzhen, China, February 23-27, 2013. 11:1–11:11. Google ScholarDigital Library
- Kevin Poulsen. 2004. Software Bug Contributed to Blackout. SecurityFocus.Google Scholar
- Michael Pradel and Thomas R. Gross. 2011. Detecting anomalies in the order of equally-typed method arguments. In International Symposium on Software Testing and Analysis (ISSTA). 232–242. Google ScholarDigital Library
- Michael Pradel, Ciera Jaspan, Jonathan Aldrich, and Thomas R. Gross. 2012. Statically Checking API Protocol Conformance with Mined Multi-Object Specifications. In International Conference on Software Engineering (ICSE). 925–935. Google ScholarDigital Library
- Foyzur Rahman, Sameer Khatri, Earl T Barr, and Premkumar Devanbu. 2014. Comparing static bug finders and statistical prediction. In Proceedings of the 36th International Conference on Software Engineering. ACM, 424–434. Google ScholarDigital Library
- Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar T. Devanbu. 2016. On the "naturalness" of buggy code. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 428–439. Google ScholarDigital Library
- Andrew Rice, Edward Aftandilian, Ciera Jaspan, Emily Johnston, Michael Pradel, and Yulissa Arroyo-Paredes. 2017. Detecting Argument Selection Defects. In OOPSLA. Google ScholarDigital Library
- Nick Rutar, Christian B. Almazan, and Jeffrey S. Foster. 2004. A Comparison of Bug Finding Tools for Java. In International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society, 245–256. Google ScholarDigital Library
- Joseph R. Ruthruff, John Penix, J. David Morgenthaler, Sebastian Elbaum, and Gregg Rothermel. 2008. Predicting accurate and actionable static analysis warnings: an experimental approach. In International Conference on Software Engineering (ICSE). 341–350. Google ScholarDigital Library
- Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter. 2015. Tricorder: Building a Program Analysis Ecosystem. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 598–608. http://dl.acm.org/citation.cfm?id= 2818754.2818828 Google ScholarDigital Library
- Marija Selakovic and Michael Pradel. 2016. Performance Issues and Optimizations in JavaScript: An Empirical Study. In International Conference on Software Engineering (ICSE). 61–72. Google ScholarDigital Library
- Sina Shamshiri, René Just, José Miguel Rojas, Gordon Fraser, Phil McMinn, and Andrea Arcuri. 2015. Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015. 201–211.Google ScholarDigital Library
- Ferdian Thung, Lucia, David Lo, Lingxiao Jiang, Foyzur Rahman, and Premkumar T. Devanbu. 2012. To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools. In Conference on Automated Software Engineering (ASE). ACM, 50–59. Google ScholarDigital Library
- Ferdian Thung, Lucia, David Lo, Lingxiao Jiang, Foyzur Rahman, and Premkumar T. Devanbu. 2015. To what extent could we detect field defects? An extended empirical study of false negatives in static bug-finding tools. Autom. Softw. Eng. 22, 4 (2015), 561–602. Google ScholarDigital Library
- Stefan Wagner, Jan Jürjens, Claudia Koller, and Peter Trischberger. 2005. Comparing Bug Finding Tools with Reviews and Tests. In International Conference on Testing of Communicating Systems (TestCom). Springer, 40–55. Google ScholarDigital Library
- Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: bug detection with n-gram language models. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 708–719. Google ScholarDigital Library
- Andrzej Wasylkowski and Andreas Zeller. 2009. Mining Temporal Specifications from Object Usage. In International Conference on Automated Software Engineering (ASE). IEEE, 295–306. Google ScholarDigital Library
- Murray Wood, Marc Roper, Andrew Brooks, and James Miller. 1997. Comparing and Combining Software Defect Detection Techniques: A Replicated Empirical Study. In Software Engineering - ESEC/FSE ’97, 6th European Software Engineering Conference Held Jointly with the 5th ACM SIGSOFT Symposium on Foundations of Software Engineering, Zurich, Switzerland, September 22-25, 1997, Proceedings. 262–277. Google ScholarDigital Library
- Jiang Zheng, Laurie A. Williams, Nachiappan Nagappan, Will Snipes, John P. Hudepohl, and Mladen A. Vouk. 2006. On the Value of Static Analysis for Fault Detection in Software. IEEE Trans. Software Eng. 32, 4 (2006), 240–253. ASE ’18, September 3–7, 2018, Montpellier, France Andrew Habib and Michael Pradel Google ScholarDigital Library
Index Terms
- How many of all bugs do we find? a study of static bug detectors
Recommendations
On the real-world effectiveness of static bug detectors at finding null pointer exceptions
ASE '21: Proceedings of the 36th IEEE/ACM International Conference on Automated Software EngineeringStatic bug detectors aim at helping developers to automatically find and prevent bugs. In this experience paper, we study the effectiveness of static bug detectors at identifying Null Pointer Dereferences or Null Pointer Exceptions (NPEs). NPEs pervade ...
Find bugs in static bug finders
ICPC '22: Proceedings of the 30th IEEE/ACM International Conference on Program ComprehensionStatic bug finders (also known as static code analyzers, e.g., Find-Bugs, SonarQube) have been widely-adopted by developers to find bugs in real-world software projects. They leverage predefined heuristic static analysis rules to scan source code or ...
Comments