skip to main content
10.1145/3338906.3338911acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections

Empirical review of Java program repair tools: a large-scale experiment on 2,141 bugs and 23,551 repair attempts

Published:12 August 2019Publication History
Related Artifact: RepairThemAll software https://doi.org/10.5281/zenodo.3334854

ABSTRACT

In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are evaluated on a single benchmark of bugs, which are also rarely reproduced by other researchers. In this paper, we present a large-scale experiment using 11 Java test-suite-based repair tools and 2,141 bugs from 5 benchmarks. Our goal is to have a better understanding of the current state of automatic program repair tools on a large diversity of benchmarks. Our investigation is guided by the hypothesis that the repairability of repair tools might not be generalized across different benchmarks. We found that the 11 tools 1) are able to generate patches for 21% of the bugs from the 5 benchmarks, and 2) have better performance on Defects4J compared to other benchmarks, by generating patches for 47% of the bugs from Defects4J compared to 10-30% of bugs from the other benchmarks. Our experiment comprises 23,551 repair attempts, which we used to find causes of non-patch generation. These causes are reported in this paper, which can help repair tool designers to improve their approaches and tools.

References

  1. Daniel Balouek, Alexandra Carpen Amarie, Ghislain Charrier, Frédéric Desprez, Emmanuel Jeannot, Emmanuel Jeanvoine, Adrien Lèbre, David Margery, Nicolas Niclausse, Lucas Nussbaum, Olivier Richard, Christian Pérez, Flavien Quesnel, Cyril Rohr, and Luc Sarzyniec. 2013. Adding Virtualization Capabilities to the Grid’5000 Testbed. In Cloud Computing and Services Science, Ivan I. Ivanov, Marten van Sinderen, Frank Leymann, and Tony Shan (Eds.). Communications in Computer and Information Science, Vol. 367. Springer International Publishing, Cham, 3–20.Google ScholarGoogle Scholar
  2. Liushan Chen, Yu Pei, and Carlo A. Furia. 2017. Contract-Based Program Repair without the Contracts. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17). IEEE Press, Piscataway, NJ, USA, 637–647. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Valentin Dallmeier and Thomas Zimmermann. 2007. Extraction of Bug Localization Benchmarks from History. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’07). ACM, New York, NY, USA, 433–436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Vidroha Debroy and W. Eric Wong. 2010. Using Mutation to Automatically Suggest Fixes for Faulty Programs. In Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation (ICST ’10). IEEE Computer Society, Washington, DC, USA, 65–74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Defects4J. 2011. Defects4J patch for Closure-51 bug. http://program-repair.org/ defects4j-dissection/#!/bug/Closure/51.Google ScholarGoogle Scholar
  6. Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus. 2017. Dynamic Patch Generation for Null Pointer Exceptions Using Metaprogramming. In Proceedings of the 24th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’17). IEEE, Klagenfurt, Austria, 349–358.Google ScholarGoogle ScholarCross RefCross Ref
  7. Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. The RepairThemAll framework repository. https://github.com/program-repair/ RepairThemAll.Google ScholarGoogle Scholar
  8. Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. The repair attempts’ results. https://github.com/program-repair/RepairThemAll_ experiment.Google ScholarGoogle Scholar
  9. Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. Website for browsing the generated patches. http://program-repair.org/ RepairThemAll_experiment.Google ScholarGoogle Scholar
  10. Thomas Durieux and Martin Monperrus. 2016. DynaMoth: Dynamic Code Synthesis for Automatic Program Repair. In Proceedings of the 11th International Workshop on Automation of Software Test (AST ’16). ACM, New York, NY, USA, 85–91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Thomas Durieux and Martin Monperrus. 2016. IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs. Technical Report #hal-01272126. University of Lille, University of Lille.Google ScholarGoogle Scholar
  12. Péter Gyimesi, Béla Vancsics, Andrea Stocco, Davood Mazinanian, Árpád Beszédes, Rudolf Ferenc, and Ali Mesbah. 2019.Google ScholarGoogle Scholar
  13. Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards Practical Program Repair with On-Demand Candidate Generation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 12–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping Program Repair Space with Existing Patches and Similar Code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’18). ACM, New York, NY, USA, 298–309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. johnlenz. 2011. Human patch for Defects4J Closure-51 bug. https://github.com/ google/closure-compiler/commit/a02241e5df48e44e23dc0e66dbef3fdc3c91eb3e.Google ScholarGoogle Scholar
  16. René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the 23rd International Symposium on Software Testing and Analysis (ISSTA ’14). ACM, New York, NY, USA, 437–440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-Written Patches. In Proceedings of the 35th International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 802–811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xuan Bach D. Le, David Lo, and Claire Le Goues. 2016. History Driven Program Repair. In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’16). IEEE, Suita, Japan, 213–224.Google ScholarGoogle ScholarCross RefCross Ref
  19. Xuan-Bach D. Le, Ferdian Thung, David Lo, and Claire Le Goues. 2018. Overfitting in semantics-based automated program repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 163–163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 3–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Claire Le Goues, Neal Holtschulte, Edward K. Smith, Yuriy Brun, Premkumar Devanbu, Stephanie Forrest, and Westley Weimer. 2015. The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs. IEEE Transactions on Software Engineering 41, 12 (Dec. 2015), 1236–1256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Derrick Lin, James Koppel, Angela Chen, and Armando Solar-Lezama. 2017. QuixBugs: A Multi-Lingual Program Repair Benchmark Set Based on the Quixey Challenge. In Proceedings of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH Companion 2017). ACM, New York, NY, USA, 55–56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kui Liu, Anil Koyuncu, Kisub Kim, Dongsun Kim, and Tegawendé F. Bissyandé. 2018. LSRepair: Live Search of Fix Ingredients for Automated Program Repair. In Proceedings of the 25th Asia-Pacific Software Engineering Conference (APSEC ’18). IEEE Computer Society, Washington, DC, USA, 1–5.Google ScholarGoogle Scholar
  24. Xuliang Liu and Hao Zhong. 2018. Mining StackOverflow for Program Repair. In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’18). IEEE, Campobasso, Italy, 118–129.Google ScholarGoogle ScholarCross RefCross Ref
  25. Fernanda Madeiral, Simon Urli, Marcelo Maia, and Martin Monperrus. 2019. Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19). IEEE, Hangzhou, China, 468–478.Google ScholarGoogle ScholarCross RefCross Ref
  26. Matias Martinez, Thomas Durieux, Romain Sommerard, Jifeng Xuan, and Martin Monperrus. 2017. Automatic Repair of Real Bugs in Java: A Large-scale Experiment on the Defects4J Dataset. Empirical Software Engineering 22, 4 (Aug. 2017), 1936–1964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Matias Martinez and Martin Monperrus. 2016. ASTOR: A Program Repair Library for Java. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA ’16), Demonstration Track. ACM, New York, NY, USA, 441– 444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Matias Martinez and Martin Monperrus. 2018. Ultra-Large Repair Search Space with Automatically Mined Templates: the Cardumen Mode of Astor. In Proceedings of the 10th International Symposium on Search-Based Software Engineering (SSBSE ’18). Lecture Notes in Computer Science, vol 11036, Thelma Elita Colanzi and Phil McMinn (Eds.). Springer International Publishing, Cham, 65–86.Google ScholarGoogle ScholarCross RefCross Ref
  29. Matias Martinez, Westley Weimer, and Martin Monperrus. 2014. Do the Fix Ingredients Already Exist? An Empirical Inquiry into the Redundancy Assumptions of Program Repair Approaches. In Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, USA, 492–495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Martin Monperrus. 2014. A Critical Review of “Automatic Patch Generation Learned from Human-Written Patches”: Essay on the Problem Statement and the Evaluation of Automatic Software Repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE ’14). ACM, New York, NY, USA, 234– 242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Martin Monperrus. 2018. Automatic Software Repair: a Bibliography. Comput. Surveys 51, 1, Article 17 (Jan. 2018), 24 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Martin Monperrus. 2018. The Living Review on Automated Program Repair. Technical Report hal-01956501. HAL/archives-ouvertes.fr, HAL/archives-ouvertes.fr.Google ScholarGoogle Scholar
  33. Manish Motwani, Sandhya Sankaranarayanan, René Just, and Yuriy Brun. 2018. Do automated program repair techniques repair hard and important bugs? Empirical Software Engineering 23, 5 (Oct. 2018), 2901–2947. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The Strength of Random Search on Automated Program Repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE ’14). ACM, New York, NY, USA, 254–265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An Analysis of Patch Plausibility and Correctness for Generate-and-Validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA ’15). ACM, New York, NY, USA, 24–36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ripon K. Saha, Yingjun Lyu, Wing Lam, Hiroaki Yoshida, and Mukul R. Prasad. 2018. Bugs.jar: A Large-scale, Diverse Dataset of Real-world Java Bugs. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). ACM, New York, NY, USA, 10–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. ELIXIR: Effective Object-Oriented Program Repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17). IEEE Press, Piscataway, NJ, USA, 648–659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’15). ACM, New York, NY, USA, 532–543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Victor Sobreira, Thomas Durieux, Fernanda Madeiral, Martin Monperrus, and Marcelo A. Maia. 2018. Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J. In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’18). IEEE, Campobasso, Italy, 130–140.Google ScholarGoogle Scholar
  40. Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, and Abhik Roychoudhury. 2017. Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools. In Proceedings of the 39th International Conference ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu on Software Engineering Companion (ICSE-C ’17). IEEE Press, Piscataway, NJ, USA, 180–182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically Finding Patches Using Genetic Programming. In Proceedings of the 31st International Conference on Software Engineering (ICSE ’09). IEEE Computer Society, Washington, DC, USA, 364–374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-Aware Patch Generation for Better Automated Program Repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 1–11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19). IEEE, Hangzhou, China, 479–490.Google ScholarGoogle ScholarCross RefCross Ref
  44. Qi Xin and Steven P. Reiss. 2017. Identifying Test-Suite-Overfitted Patches throughTest Case Generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’17). ACM, New York, NY, USA, 226–236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Qi Xin and Steven P. Reiss. 2017. Leveraging Syntax-Related Code for Automated Program Repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17). IEEE Press, Piscataway, NJ, USA, 660–670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise Condition Synthesis for Program Repair. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 416–426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jifeng Xuan, Matias Martinez, Favio DeMarco, Maxime Clément, Sebastian Lamelas, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2016. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Transactions on Software Engineering 43, 1 (April 2016), 34–55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. He Ye, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2019. A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark. In International Workshop on Intelligent Bug Fixing (IBF ’19, co-located with SANER). IEEE, Hangzhou, China, 1–10.Google ScholarGoogle ScholarCross RefCross Ref
  49. Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2019. Alleviating Patch Overfitting with Automatic Test Generation: A Study of Feasibility and Effectiveness for the Nopol Repair System. Empirical Software Engineering 24, 1 (Feb. 2019), 33–67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Yuan Yuan and Wolfgang Banzhaf. 2018. ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming. IEEE Transactions on Software Engineering PP (2018).Google ScholarGoogle Scholar

Index Terms

  1. Empirical review of Java program repair tools: a large-scale experiment on 2,141 bugs and 23,551 repair attempts

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
      August 2019
      1264 pages
      ISBN:9781450355728
      DOI:10.1145/3338906

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      FSE '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader