skip to main content
10.1145/3338906.3338952acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open Access

Phoenix: automated data-driven synthesis of repairs for static analysis violations

Published:12 August 2019Publication History

ABSTRACT

Traditional automatic program repair (APR) tools rely on a test-suite as a repair specification. But test suites even when available are not of specification quality, limiting the performance and hence viability of test-suite based repair. On the other hand, static analysis-based bug finding tools are seeing increasing adoption in industry but still face challenges since the reported violations are viewed as not easily actionable. We propose a novel solution that solves both these challenges through a technique for automatically generating high-quality patches for static analysis violations by learning from examples. Our approach uses the static analyzer as an oracle and does not require a test suite. We realize our solution in a system, Phoenix, that implements a fully-automated pipeline that mines and cleans patches for static analysis violations from the wild, learns generalized executable repair strategies as programs in a novel Domain Specific Language (DSL), and then instantiates concrete repairs from them on new unseen violations. Using Phoenix we mine a corpus of 5,389 unique violations and patches from 517 Github projects. In a cross-validation study on this corpus Phoenix successfully produced 4,596 bug-fixes, with a recall of 85% and a precision of 54%. When applied to the latest revisions of a further5 Github projects, Phoenix produced 94 correct patches to previously unknown bugs, 19 of which have already been accepted and merged by the development teams. To the best of our knowledge this constitutes, by far the largest application of any automatic patch generation technology to large-scale real-world systems

References

  1. Pavel Avgustinov, Arthur I. Baars, Anders S. Henriksen, Greg Lavender, Galen Menzel, Oege de Moor, Max Schäfer, and Julian Tibble. 2015. Tracking Static Analysis Violations over Time to Capture Developer Characteristics. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 437–447. http://dl.acm.org/citation.cfm? id=2818754.2818809 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. 2008. Using Static Analysis to Find Bugs. IEEE Softw. 25, 5 (Sept. 2008), 22–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Barik, Y. Song, B. Johnson, and E. Murphy-Hill. 2016. From Quick Fixes to Slow Fixes: Reimagining Static Analysis Resolutions to Enable Design Space Exploration. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). 211–221.Google ScholarGoogle Scholar
  4. Earl T. Barr, Mark Harman, Yue Jia, Alexandru Marginean, and Justyna Petke. 2015. Automated Software Transplantation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). ACM, New York, NY, USA, 257–269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (Feb. 2010), 66–75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cristiano Calcagno, Dino Distefano, Jérémy Dubreil, Dominik Gabi, Pieter Hooimeijer, Martino Luca, Peter W. O’Hearn, Irene Papakonstantinou, Jim Purbrick, and Dulma Rodriguez. 2015. Moving Fast with Software Verification. In NASA Formal Methods - 7th International Symposium, NFM 2015, Pasadena, CA, USA, April 27-29, 2015, Proceedings. 3–11.Google ScholarGoogle Scholar
  7. V. Chvatal. 1979. A Greedy Heuristic for the Set-Covering Problem. Math. Oper. Res. 4, 3 (Aug. 1979), 233–235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Xuan Bach D Le, Ferdian Thung, David Lo, and Claire Le Goues. 2018. Overfitting in semantics-based automated program repair. Empirical Software Engineering 23 (03 2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. W. Dijkstra. 1959. A Note on Two Problems in Connexion with Graphs. Numer. Math. 1, 1 (Dec. 1959), 269–271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, and Martin Monperrus. 2014. Fine-grained and Accurate Source Code Differencing. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE ’14). ACM, New York, NY, USA, 313–324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 422–436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Gazzola, D. Micucci, and L. Mariani. 2018. Automatic Software Repair: A Survey. IEEE Transactions on Software Engineering (2018), 1–1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Google. 2017. Error Prone. https://errorprone.info/. (2017).Google ScholarGoogle Scholar
  14. Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR ’13). IEEE Press, Piscataway, NJ, USA, 233–236. http://dl.acm.org/citation.cfm?id=2487085. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 2487132Google ScholarGoogle Scholar
  16. Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, New York, NY, USA, 317–330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jacob A. Harer, Onur Ozdemir, Tomo Lazovich, Christopher P. Reale, Rebecca L. Russell, Louis Y. Kim, and Peter Chin. 2018. Learning to Repair Software Vulnerabilities with Generative Adversarial Networks. In Proceedings of the 32Nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., USA, 7944–7954. http://dl.acm.org/citation.cfm?id=3327757.3327890 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman. 2006. Introduction to Automata Theory, Languages, and Computation (3rd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eclipse IDE. 2018. Java Editor Quickfix. https://help.eclipse.org/neon/topic/org. eclipse.jdt.doc.user/reference/ref-java-editor-quickfix.htm. (2018).Google ScholarGoogle Scholar
  20. JetBrains. 2018. IntelliJ Quick Fixes. https://www.jetbrains.com/resharper/ features/quick_fixes.html. (2018).Google ScholarGoogle Scholar
  21. Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping Program Repair Space with Existing Patches and Similar Code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018). ACM, New York, NY, USA, 298–309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis. ACM, 437–440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shalini Kaleeswaran, Varun Tulsian, Aditya Kanade, and Alessandro Orso. 2014. MintHint: Automated Synthesis of Repair Hints. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 266–276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yalin Ke, Kathryn T. Stolee, Claire Le Goues, and Yuriy Brun. 2015. Repairing Programs with Semantic Code Search (T). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE) (ASE ’15). IEEE Computer Society, Washington, DC, USA, 295–306.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-written Patches. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 802–811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 542– 553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. B. D. Le, D. Lo, and C. L. Goues. 2016. History Driven Program Repair. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. IEEE Press, Piscataway, NJ, USA, 213–224.Google ScholarGoogle Scholar
  28. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 3–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Alan Leung, John Sarracino, and Sorin Lerner. 2015. Interactive Parser Synthesis by Example. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, New York, NY, USA, 565–574. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Henry Lieberman. 2001. Your Wish is My Command: Programming by Example. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google ScholarGoogle Scholar
  31. K. Liu, D. Kim, T. F. Bissyande, S. Yoo, and Y. Le Traon. 2018. Mining Fix Patterns for FindBugs Violations. IEEE Transactions on Software Engineering (2018), 1–1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Fan Long, Peter Amidon, and Martin Rinard. 2017. Automatic Inference of Code Transforms for Patch Generation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, New York, NY, USA, 727–739. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). ACM, New York, NY, USA, 298–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. DirectFix: Looking for Simple Program Repairs. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 448–458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 691–701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Microsoft. 2018. Visual Studio - Common Quick Actions. https://docs.microsoft. com/en-us/visualstudio/ide/common-quick-actions. (2018).Google ScholarGoogle Scholar
  37. Martin Monperrus. 2018. Automatic Software Repair: A Bibliography. Comput. Surveys 51, 1, Article 17 (Jan. 2018), 24 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Eugene W. Myers. 1986. An O(ND) difference algorithm and its variations. Algorithmica 1, 1 (01 Nov 1986), 251–266.Google ScholarGoogle Scholar
  39. Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. SemFix: Program Repair via Semantic Analysis. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 772–781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 107–126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 404–415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan. 2018. Lessons from Building Static Analysis Tools at Google. Commun. ACM 61, 4 (March 2018), 58–66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. ELIXIR: Effective Object Oriented Program Repair. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Piscataway, NJ, USA, 648–659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Andrew Scott, Johannes Bader, and Satish Chandra. 2019. Getafix: Learning to fix bugs automatically. CoRR abs/1902.06111 (2019). arXiv: 1902.06111 http: //arxiv.org/abs/1902.06111 ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Rohan Bavishi, Hiroaki Yoshida, and Mukul R. PrasadGoogle ScholarGoogle Scholar
  45. Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 532–543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Synopsys. 2018. 2017 Coverity Scan Report. https://www.synopsys.com/content/ dam/synopsys/sig-assets/reports/SCAN-Report-2017.pdf. (2018).Google ScholarGoogle Scholar
  47. Shin Hwei Tan and Abhik Roychoudhury. 2015. Relifix: Automated Repair of Software Regressions. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 471–482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Rijnard van Tonder and Claire Le Goues. 2018. Static Automated Program Repair for Heap Properties. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 151–162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing Highly Expressive SQL Queries from Input-output Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 452–466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. 3062365Google ScholarGoogle Scholar
  51. Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware Patch Generation for Better Automated Program Repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 1–11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Qi Xin and Steven P. Reiss. 2017. Leveraging Syntax-related Code for Automated Program Repair. In Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, Piscataway, NJ, USA, 660–670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise Condition Synthesis for Program Repair. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 416–426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. Xuan, M. Martinez, F. DeMarco, M. Clément, S. L. Marcote, T. Durieux, D. Le Berre, and M. Monperrus. 2017. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Transactions on Software Engineering 43, 1 (Jan 2017), 34–55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Navid Yaghmazadeh, Xinyu Wang, and Isil Dillig. 2018. Automated Migration of Hierarchical Data to Relational Tables Using Programming-by-example. Proc. VLDB Endow. 11, 5 (Jan. 2018), 580–593. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Phoenix: automated data-driven synthesis of repairs for static analysis violations

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
            August 2019
            1264 pages
            ISBN:9781450355728
            DOI:10.1145/3338906

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 12 August 2019

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate112of543submissions,21%

            Upcoming Conference

            FSE '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader