research-article

Empirical review of Java program repair tools: a large-scale experiment on 2,141 bugs and 23,551 repair attempts

Authors:
Thomas Durieux

University of Lisbon, Portugal / INESC-ID, Portugal

University of Lisbon, Portugal / INESC-ID, Portugal
View Profile

,
Fernanda Madeiral

Federal University of Uberlândia, Brazil

Federal University of Uberlândia, Brazil
View Profile

,
Matias Martinez

Polytechnic University of Hauts-de-France, France

Polytechnic University of Hauts-de-France, France
View Profile

,
Rui Abreu

University of Lisbon, Portugal / INESC-ID, Portugal

University of Lisbon, Portugal / INESC-ID, Portugal
View Profile

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringAugust 2019Pages 302–313https://doi.org/10.1145/3338906.3338911

Published:12 August 2019Publication History

Related Artifact: RepairThemAll July 2019 software https://doi.org/10.5281/zenodo.3334854

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 302–313

ABSTRACT

In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are evaluated on a single benchmark of bugs, which are also rarely reproduced by other researchers. In this paper, we present a large-scale experiment using 11 Java test-suite-based repair tools and 2,141 bugs from 5 benchmarks. Our goal is to have a better understanding of the current state of automatic program repair tools on a large diversity of benchmarks. Our investigation is guided by the hypothesis that the repairability of repair tools might not be generalized across different benchmarks. We found that the 11 tools 1) are able to generate patches for 21% of the bugs from the 5 benchmarks, and 2) have better performance on Defects4J compared to other benchmarks, by generating patches for 47% of the bugs from Defects4J compared to 10-30% of bugs from the other benchmarks. Our experiment comprises 23,551 repair attempts, which we used to find causes of non-patch generation. These causes are reported in this paper, which can help repair tool designers to improve their approaches and tools.

References

Daniel Balouek, Alexandra Carpen Amarie, Ghislain Charrier, Frédéric Desprez, Emmanuel Jeannot, Emmanuel Jeanvoine, Adrien Lèbre, David Margery, Nicolas Niclausse, Lucas Nussbaum, Olivier Richard, Christian Pérez, Flavien Quesnel, Cyril Rohr, and Luc Sarzyniec. 2013. Adding Virtualization Capabilities to the Grid’5000 Testbed. In Cloud Computing and Services Science, Ivan I. Ivanov, Marten van Sinderen, Frank Leymann, and Tony Shan (Eds.). Communications in Computer and Information Science, Vol. 367. Springer International Publishing, Cham, 3–20.Google Scholar
Liushan Chen, Yu Pei, and Carlo A. Furia. 2017. Contract-Based Program Repair without the Contracts. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17). IEEE Press, Piscataway, NJ, USA, 637–647. Google ScholarDigital Library
Valentin Dallmeier and Thomas Zimmermann. 2007. Extraction of Bug Localization Benchmarks from History. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’07). ACM, New York, NY, USA, 433–436. Google ScholarDigital Library
Vidroha Debroy and W. Eric Wong. 2010. Using Mutation to Automatically Suggest Fixes for Faulty Programs. In Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation (ICST ’10). IEEE Computer Society, Washington, DC, USA, 65–74. Google ScholarDigital Library
Defects4J. 2011. Defects4J patch for Closure-51 bug. http://program-repair.org/ defects4j-dissection/#!/bug/Closure/51.Google Scholar
Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus. 2017. Dynamic Patch Generation for Null Pointer Exceptions Using Metaprogramming. In Proceedings of the 24th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’17). IEEE, Klagenfurt, Austria, 349–358.Google ScholarCross Ref
Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. The RepairThemAll framework repository. https://github.com/program-repair/ RepairThemAll.Google Scholar
Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. The repair attempts’ results. https://github.com/program-repair/RepairThemAll_ experiment.Google Scholar
Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. Website for browsing the generated patches. http://program-repair.org/ RepairThemAll_experiment.Google Scholar
Thomas Durieux and Martin Monperrus. 2016. DynaMoth: Dynamic Code Synthesis for Automatic Program Repair. In Proceedings of the 11th International Workshop on Automation of Software Test (AST ’16). ACM, New York, NY, USA, 85–91. Google ScholarDigital Library
Thomas Durieux and Martin Monperrus. 2016. IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs. Technical Report #hal-01272126. University of Lille, University of Lille.Google Scholar
Péter Gyimesi, Béla Vancsics, Andrea Stocco, Davood Mazinanian, Árpád Beszédes, Rudolf Ferenc, and Ali Mesbah. 2019.Google Scholar
Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards Practical Program Repair with On-Demand Candidate Generation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 12–23. Google ScholarDigital Library
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping Program Repair Space with Existing Patches and Similar Code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’18). ACM, New York, NY, USA, 298–309. Google ScholarDigital Library
johnlenz. 2011. Human patch for Defects4J Closure-51 bug. https://github.com/ google/closure-compiler/commit/a02241e5df48e44e23dc0e66dbef3fdc3c91eb3e.Google Scholar
René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs. In Proceedings of the 23rd International Symposium on Software Testing and Analysis (ISSTA ’14). ACM, New York, NY, USA, 437–440. Google ScholarDigital Library
Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-Written Patches. In Proceedings of the 35th International Conference on Software Engineering (ICSE ’13). IEEE Press, Piscataway, NJ, USA, 802–811. Google ScholarDigital Library
Xuan Bach D. Le, David Lo, and Claire Le Goues. 2016. History Driven Program Repair. In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’16). IEEE, Suita, Japan, 213–224.Google ScholarCross Ref
Xuan-Bach D. Le, Ferdian Thung, David Lo, and Claire Le Goues. 2018. Overfitting in semantics-based automated program repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 163–163. Google ScholarDigital Library
Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 34th International Conference on Software Engineering (ICSE ’12). IEEE Press, Piscataway, NJ, USA, 3–13. Google ScholarDigital Library
Claire Le Goues, Neal Holtschulte, Edward K. Smith, Yuriy Brun, Premkumar Devanbu, Stephanie Forrest, and Westley Weimer. 2015. The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs. IEEE Transactions on Software Engineering 41, 12 (Dec. 2015), 1236–1256.Google ScholarDigital Library
Derrick Lin, James Koppel, Angela Chen, and Armando Solar-Lezama. 2017. QuixBugs: A Multi-Lingual Program Repair Benchmark Set Based on the Quixey Challenge. In Proceedings of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity (SPLASH Companion 2017). ACM, New York, NY, USA, 55–56. Google ScholarDigital Library
Kui Liu, Anil Koyuncu, Kisub Kim, Dongsun Kim, and Tegawendé F. Bissyandé. 2018. LSRepair: Live Search of Fix Ingredients for Automated Program Repair. In Proceedings of the 25th Asia-Pacific Software Engineering Conference (APSEC ’18). IEEE Computer Society, Washington, DC, USA, 1–5.Google Scholar
Xuliang Liu and Hao Zhong. 2018. Mining StackOverflow for Program Repair. In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’18). IEEE, Campobasso, Italy, 118–129.Google ScholarCross Ref
Fernanda Madeiral, Simon Urli, Marcelo Maia, and Martin Monperrus. 2019. Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19). IEEE, Hangzhou, China, 468–478.Google ScholarCross Ref
Matias Martinez, Thomas Durieux, Romain Sommerard, Jifeng Xuan, and Martin Monperrus. 2017. Automatic Repair of Real Bugs in Java: A Large-scale Experiment on the Defects4J Dataset. Empirical Software Engineering 22, 4 (Aug. 2017), 1936–1964. Google ScholarDigital Library
Matias Martinez and Martin Monperrus. 2016. ASTOR: A Program Repair Library for Java. In Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA ’16), Demonstration Track. ACM, New York, NY, USA, 441– 444. Google ScholarDigital Library
Matias Martinez and Martin Monperrus. 2018. Ultra-Large Repair Search Space with Automatically Mined Templates: the Cardumen Mode of Astor. In Proceedings of the 10th International Symposium on Search-Based Software Engineering (SSBSE ’18). Lecture Notes in Computer Science, vol 11036, Thelma Elita Colanzi and Phil McMinn (Eds.). Springer International Publishing, Cham, 65–86.Google ScholarCross Ref
Matias Martinez, Westley Weimer, and Martin Monperrus. 2014. Do the Fix Ingredients Already Exist? An Empirical Inquiry into the Redundancy Assumptions of Program Repair Approaches. In Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, USA, 492–495. Google ScholarDigital Library
Martin Monperrus. 2014. A Critical Review of “Automatic Patch Generation Learned from Human-Written Patches”: Essay on the Problem Statement and the Evaluation of Automatic Software Repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE ’14). ACM, New York, NY, USA, 234– 242. Google ScholarDigital Library
Martin Monperrus. 2018. Automatic Software Repair: a Bibliography. Comput. Surveys 51, 1, Article 17 (Jan. 2018), 24 pages. Google ScholarDigital Library
Martin Monperrus. 2018. The Living Review on Automated Program Repair. Technical Report hal-01956501. HAL/archives-ouvertes.fr, HAL/archives-ouvertes.fr.Google Scholar
Manish Motwani, Sandhya Sankaranarayanan, René Just, and Yuriy Brun. 2018. Do automated program repair techniques repair hard and important bugs? Empirical Software Engineering 23, 5 (Oct. 2018), 2901–2947. Google ScholarDigital Library
Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The Strength of Random Search on Automated Program Repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE ’14). ACM, New York, NY, USA, 254–265. Google ScholarDigital Library
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An Analysis of Patch Plausibility and Correctness for Generate-and-Validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA ’15). ACM, New York, NY, USA, 24–36. Google ScholarDigital Library
Ripon K. Saha, Yingjun Lyu, Wing Lam, Hiroaki Yoshida, and Mukul R. Prasad. 2018. Bugs.jar: A Large-scale, Diverse Dataset of Real-world Java Bugs. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). ACM, New York, NY, USA, 10–13. Google ScholarDigital Library
Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. ELIXIR: Effective Object-Oriented Program Repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17). IEEE Press, Piscataway, NJ, USA, 648–659. Google ScholarDigital Library
Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’15). ACM, New York, NY, USA, 532–543. Google ScholarDigital Library
Victor Sobreira, Thomas Durieux, Fernanda Madeiral, Martin Monperrus, and Marcelo A. Maia. 2018. Dissection of a Bug Dataset: Anatomy of 395 Patches from Defects4J. In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’18). IEEE, Campobasso, Italy, 130–140.Google Scholar
Shin Hwei Tan, Jooyong Yi, Yulis, Sergey Mechtaev, and Abhik Roychoudhury. 2017. Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools. In Proceedings of the 39th International Conference ESEC/FSE ’19, August 26–30, 2019, Tallinn, Estonia Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu on Software Engineering Companion (ICSE-C ’17). IEEE Press, Piscataway, NJ, USA, 180–182. Google ScholarDigital Library
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically Finding Patches Using Genetic Programming. In Proceedings of the 31st International Conference on Software Engineering (ICSE ’09). IEEE Computer Society, Washington, DC, USA, 364–374. Google ScholarDigital Library
Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-Aware Patch Generation for Better Automated Program Repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). ACM, New York, NY, USA, 1–11. Google ScholarDigital Library
Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19). IEEE, Hangzhou, China, 479–490.Google ScholarCross Ref
Qi Xin and Steven P. Reiss. 2017. Identifying Test-Suite-Overfitted Patches throughTest Case Generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’17). ACM, New York, NY, USA, 226–236. Google ScholarDigital Library
Qi Xin and Steven P. Reiss. 2017. Leveraging Syntax-Related Code for Automated Program Repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17). IEEE Press, Piscataway, NJ, USA, 660–670. Google ScholarDigital Library
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise Condition Synthesis for Program Repair. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, Piscataway, NJ, USA, 416–426. Google ScholarDigital Library
Jifeng Xuan, Matias Martinez, Favio DeMarco, Maxime Clément, Sebastian Lamelas, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2016. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Transactions on Software Engineering 43, 1 (April 2016), 34–55. Google ScholarDigital Library
He Ye, Matias Martinez, Thomas Durieux, and Martin Monperrus. 2019. A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark. In International Workshop on Intelligent Bug Fixing (IBF ’19, co-located with SANER). IEEE, Hangzhou, China, 1–10.Google ScholarCross Ref
Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2019. Alleviating Patch Overfitting with Automatic Test Generation: A Study of Feasibility and Effectiveness for the Nopol Repair System. Empirical Software Engineering 24, 1 (Feb. 2019), 33–67. Google ScholarDigital Library
Yuan Yuan and Wolfgang Banzhaf. 2018. ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming. IEEE Transactions on Software Engineering PP (2018).Google Scholar

Index Terms

Empirical review of Java program repair tools: a large-scale experiment on 2,141 bugs and 23,551 repair attempts
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

ELIXIR: effective object oriented program repair
ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering

This work is motivated by the pervasive use of method invocations in object-oriented (OO) programs, and indeed their prevalence in patches of OO-program bugs. We propose a generate-and-validate repair technique, called ELIXIR designed to be able to ...
Read More
Towards API-specific automatic program repair
ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering

The domain of Automatic Program Repair (APR) had many research contributions in recent years. So far, most approaches target fixing generic bugs in programs (e.g., off-by-one errors). Nevertheless, recent studies reveal that about 50% of real bugs ...
Read More
Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset

Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J comes with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2019
1264 pages
ISBN:9781450355728
DOI:10.1145/3338906
General Chairs:
Marlon Dumas
University of Tartu, Estonia
,
Dietmar Pfahl
University of Tartu, Estonia
,
Program Chairs:
Sven Apel
Saarland University, Germany
,
Alessandra Russo
Imperial College, UK
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 August 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
Author Tags
Automatic program repair
benchmark overfitting
patch generation
Qualifiers
- research-article
Conference
Upcoming Conference
FSE '24

Sponsor:

sigsoft

32nd ACM International Conference on the Foundations of Software Engineering

July 15 - 19, 2024

Ipojuca (Pernambuco) , Brazil
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 81
  Total Citations
  View Citations
- 1,146
  Total Downloads
- Downloads (Last 12 months)166
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Empirical review of Java program repair tools: a large-scale experiment on 2,141 bugs and 23,551 repair attempts

ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

ELIXIR: effective object oriented program repair

Towards API-specific automatic program repair

Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset