skip to main content
10.1145/3377811.3380338acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections

On the efficiency of test suite based program repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

Published:01 October 2020Publication History

ABSTRACT

Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but affordable, approximation to program specifications. Although the literature regularly sets new records on the number of benchmark bugs that can be fixed, several studies increasingly raise concerns about the limitations and biases of state-of-the-art approaches. For example, the correctness of generated patches has been questioned in a number of studies, while other researchers pointed out that evaluation schemes may be misleading with respect to the processing of fault localization results. Nevertheless, there is little work addressing the efficiency of patch generation, with regard to the practicality of program repair. In this paper, we fill this gap in the literature, by providing an extensive review on the efficiency of test suite based program repair. Our objective is to assess the number of generated patch candidates, since this information is correlated to (1) the strategy to traverse the search space efficiently in order to select sensical repair attempts, (2) the strategy to minimize the test effort for identifying a plausible patch, (3) as well as the strategy to prioritize the generation of a correct patch. To that end, we perform a large-scale empirical study on the efficiency, in terms of quantity of generated patch candidates of the 16 open-source repair tools for Java programs. The experiments are carefully conducted under the same fault localization configurations to limit biases. Eventually, among other findings, we note that: (1) many irrelevant patch candidates are generated by changing wrong code locations; (2) however, if the search space is carefully triaged, fault localization noise has little impact on patch generation efficiency; (3) yet, current template-based repair systems, which are known to be most effective in fixing a large number of bugs, are actually least efficient as they tend to generate majoritarily irrelevant patch candidates.

References

  1. Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2007. On the accuracy of spectrum-based fault localization. In Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION (TAICPART-MUTATION). IEEE, 89--98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. Comput. Surveys 51, 4 (2018), 81:1--81:37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Tien-Duy B. Le, David Lo, Claire Le Goues, and Lars Grunske. 2016. A Learning-to-Rank Based Fault Localization Approach Using Likely Invariants. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 177--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Liushan Chen, Yu Pei, and Carlo A Furia. 2017. Contract-based program repair without the contracts. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 637--647.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Vidroha Debroy and W Eric Wong. 2010. Using mutation to automatically suggest fixes for faulty programs. In Proceedings of the 3rd International Conference on Software Testing, Verification and Validation. IEEE, 65--74.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Thomas Durieux, Benoit Cornu, Lionel Seinturier, and Martin Monperrus. 2017. Dynamic patch generation for null pointer exceptions using metaprogramming. In Proceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 349--358.Google ScholarGoogle ScholarCross RefCross Ref
  7. Thomas Durieux, Fernanda Madeiral, Matias Martinez, and Rui Abreu. 2019. Empirical Review of Java Program Repair Tools: A Large-Scale Experiment on 2,141 Bugs and 23,551 Repair Attempts. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 302--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Thomas Durieux and Martin Monperrus. 2016. Dynamoth: dynamic code synthesis for automatic program repair. In Proceedings of the 11th IEEE/ACM International Workshop in Automation of Software Test. IEEE, 85--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael Frigge, David C. Hoaglin, and Boris Iglewicz. 1989. Some implementations of the boxplot. The American Statistician 43, 1 (1989), 50--54.Google ScholarGoogle ScholarCross RefCross Ref
  10. Zachary P. Fry, Bryan Landau, and Westley Weimer. 2012. A Human Study of Patch Maintainability. In Proceedings of the 21st International Symposium on Software Testing and Analysis. ACM, 177--187. Google ScholarGoogle ScholarCross RefCross Ref
  11. Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2017. Automatic software repair: A survey. IEEE Transactions on Software Engineering 45, 1 (2017), 34--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ali Ghanbari, Samuel Benton, and Lingming Zhang. 2019. Practical program repair via bytecode mutation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 19--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. Deepfix: Fixing common c language errors by deep learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. AAAI, 1345--1351.Google ScholarGoogle Scholar
  14. Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards practical program repair with on-demand candidate generation. In Proceedings of the 40th International Conference on Software Engineering. ACM, 12--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 298--309.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 23rd International Symposium on Software Testing and Analysis. ACM, 437--440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic patch generation learned from human-written patches. In Proceedings of the 35th International Conference on Software Engineering. IEEE, 802--811.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering. ACM, 946--957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Anil Koyuncu, Kui Liu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. 2018. Fixminer: Mining relevant fix patterns for automated program repair. arXiv preprint arXiv:1810.01791 (2018).Google ScholarGoogle Scholar
  20. Xuan-Bach D. Le, Lingfeng Bao, David Lo, Xin Xia, Shanping Li, and Corina Pasareanu. 2019. On reliability of patch correctness assessment. In Proceedings of the 41st International Conference on Software Engineering. IEEE, 524--535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. S3: syntax-and semantic-guided repair synthesis via programming by examples. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 593--604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xuan Bach D. Le, David Lo, and Claire Le Goues. 2016. History driven program repair. In Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. IEEE, 213--224.Google ScholarGoogle ScholarCross RefCross Ref
  23. Xuan Bach D. Le, Ferdian Thung, David Lo, and Claire Le Goues. 2018. Overfitting in semantics-based automated program repair. Empirical Software Engineering 23, 5 (2018), 3007--3033.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering. IEEE, 3--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Claire Le Goues, ThanhVu Nguyen, Stephanie Forrest, and Westley Weimer. 2012. GenProg: A generic method for automatic software repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM 62, 12 (2019), 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Derrick Lin, James Koppel, Angela Chen, and Armando Solar-Lezama. 2017. QuixBugs: A multi-lingual program repair benchmark set based on the Quixey Challenge. In Proceedings Companion of the 32nd ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity. ACM, 55--56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kui Liu, Dongsun Kim, Tegawendé F Bissyandé, Shin Yoo, and Yves Le Traon. 2018. Mining fix patterns for findbugs violations. IEEE Transactions on Software Engineering (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kui Liu, Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques. Klein, and Yves Le Traon. 2019. You Cannot Fix What You Cannot Find! An Investigation of Fault Localization Bias in Benchmarking Automated Program Repair Systems. In Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification. 102--113. Google ScholarGoogle ScholarCross RefCross Ref
  30. Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F Bissyandé. 2019. Avatar: Fixing semantic bugs with fix patterns of static analysis violations. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 456--467.Google ScholarGoogle ScholarCross RefCross Ref
  31. Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawendé F. Bissyandé. 2019. TBar: Revisiting Template-based Automated Program Repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 31--42.Google ScholarGoogle Scholar
  32. Kui Liu, Anil Koyuncu, Kisub Kim, Dongsun Kim, and Tegawendé F. Bissyandé. 2018. LSRepair: Live Search of Fix Ingredients for Automated Program Repair. In Proceedings of the 25th Asia-Pacific Software Engineering Conference. 658--662. Google ScholarGoogle ScholarCross RefCross Ref
  33. Xuliang Liu and Hao Zhong. 2018. Mining stackoverflow for program repair. In Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 118--129.Google ScholarGoogle ScholarCross RefCross Ref
  34. Fan Long and Martin Rinard. 2016. An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems. In Proceedings of the 38th International Conference on Software Engineering. ACM, 702--713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, 298--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Fernanda Madeiral, Simon Urli, Marcelo Maia, and Martin Monperrus. 2019. Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 468--478.Google ScholarGoogle ScholarCross RefCross Ref
  37. Henry B Mann and Donald R. Whitney. 1947. On a Test of Whether One of Two Random Variables Is Stochastically Larger than the Other. The Annals of Mathematical Statistics 18, 1 (1947), 50--60. Google ScholarGoogle ScholarCross RefCross Ref
  38. Matias Martinez and Martin Monperrus. 2016. Astor: A program repair library for java. In Proceedings of the 25th International Symposium on Software Testing and Analysis. ACM, 441--444.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Matias Martinez and Martin Monperrus. 2018. Ultra-Large Repair Search Space with Automatically Mined Templates: the Cardumen Mode of Astor. In Proceedings of the 10th International Symposium on Search Based Software Engineering. Springer, 65--86.Google ScholarGoogle ScholarCross RefCross Ref
  40. Martin Monperrus. 2014. A critical review of automatic patch generation learned from human-written patches: essay on the problem statement and the evaluation of automatic software repair. In Proceedings of the 36th International Conference on Software Engineering. ACM, 234--242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Martin Monperrus. 2018. Automatic software repair: A bibliography. Comput. Surveys 51, 1 (2018), 17:1--17:24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Martin Monperrus. 2018. The Living Review on Automated Program Repair. Technical Report. Technical Report hal-01956501. HAL/archives-ouvertes. fr, HAL/archives ....Google ScholarGoogle Scholar
  43. Manish Motwani, Sandhya Sankaranarayanan, René Just, and Yuriy Brun. 2018. Do automated program repair techniques repair hard and important bugs? Empirical Software Engineering 23, 5 (2018), 2901--2947.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hoang Duong Thien Nguyen, Dawei Qi, Abhik Roychoudhury, and Satish Chandra. 2013. Semfix: Program repair via semantic analysis. In Proceedings of the 35th International Conference on Software Engineering. IEEE, 772--781.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Spencer Pearson, José Campos, René Just, Gordon Fraser, Rui Abreu, Michael D. Ernst, Deric Pang, and Benjamin Keller. 2017. Evaluating and Improving Fault Localization. In Proceedings of the 39th International Conference on Software Engineering. IEEE, 609--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering. ACM, 254--265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yuhua Qi, Xiaoguang Mao, Yan Lei, and Chengsong Wang. 2013. Using automated program repair for evaluating the effectiveness of fault localization techniques. In Proceedings of the 22nd International Symposium on Software Testing and Analysis. ACM, 191--201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In Proceedings of the 24th International Symposium on Software Testing and Analysis. ACM, 24--36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ripon Saha, Yingjun Lyu, Wing Lam, Hiroaki Yoshida, and Mukul Prasad. 2018. Bugs.jar: A large-scale, diverse dataset of real-world java bugs. In Proceedings of the 15th IEEE/ACM International Conference on Mining Software Repositories. IEEE, 10--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ripon K. Saha, Yingjun Lyu, Hiroaki Yoshida, and Mukul R. Prasad. 2017. Elixir: Effective object-oriented program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 648--659.Google ScholarGoogle Scholar
  51. Seemanta Saha, Ripon K Saha, and Mukul R Prasad. 2019. Harnessing evolution for multi-hunk program repair. In Proceedings of the 41st International Conference on Software Engineering. IEEE, 13--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Edward K Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015. Is the cure worse than the disease? overfitting in automated program repair. In Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 532--543.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Victor Sobreira, Thomas Durieux, Fernanda Madeiral, Martin Monperrus, and Marcelo de Almeida Maia. 2018. Dissection of a bug dataset: Anatomy of 395 patches from Defects4J. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 130--140.Google ScholarGoogle ScholarCross RefCross Ref
  54. Shangwen Wang, Ming Wen, Liqian Chen, Xin Yi, and Xiaoguang Mao. 2019. How Different Is It Between Machine-Generated and Developer-Provided Patches?: An Empirical Study on the Correct Patches Generated by Automated Program Repair Techniques. In Proceedings of the 13rd ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  55. Shangwen Wang, Ming Wen, Xiaoguang Mao, and Deheng Yang. 2019. Attention please: Consider Mockito when evaluating newly proposed automated program repair techniques. In Proceedings of the 23rd Evaluation and Assessment on Software Engineering. ACM, 260--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming. In Proceedings of the 31st International Conference on Software Engineering. IEEE, 364--374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Ming Wen, Junjie Chen, Rongxin Wu, Dan Hao, and Shing-Chi Cheung. 2018. Context-aware patch generation for better automated program repair. In Proceedings of the 40th International Conference on Software Engineering. ACM, 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Ming Wen, Rongxin Wu, Yepang Liu, Yongqiang Tian, Xuan Xie, Shing-Chi Cheung, and Zhendong Su. 2019. Exploring and Exploiting the Correlations Between Bug-Inducing and Bug-Fixing Commits. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 326--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Martin White, Michele Tufano, Matias Martinez, Martin Monperrus, and Denys Poshyvanyk. 2019. Sorting and transforming program repair ingredients via deep learning code similarities. In Proceedings of the IEEE 26th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 479--490.Google ScholarGoogle ScholarCross RefCross Ref
  60. F. Wilcoxon. 1945. Individual Comparisons by Ranking Methods. Biometrics Bulletin 1, 6 (1945), 80--83.Google ScholarGoogle ScholarCross RefCross Ref
  61. Qi Xin and Steven P. Reiss. 2017. Leveraging syntax-related code for automated program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 660--670.Google ScholarGoogle Scholar
  62. Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In Proceedings of the 40th International Conference on Software Engineering. ACM, 789--799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. 2017. Precise condition synthesis for program repair. In Proceedings of the 39th IEEE/ACM International Conference on Software Engineering. IEEE, 416--426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Clement, Sebastian Lamelas Marcote, Thomas Durieux, Daniel Le Berre, and Martin Monperrus. 2017. Nopol: Automatic repair of conditional statement bugs in java programs. IEEE Transactions on Software Engineering 43, 1 (2017), 34--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Jinqiu Yang, Alexey Zhikhartsev, Yuefei Liu, and Lin Tan. 2017. Better test cases for better automated program repair. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 831--841.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Jooyong Yi, Shin Hwei Tan, Sergey Mechtaev, Marcel Böhme, and Abhik Roychoudhury. 2018. A correlation study between automated program repair and test-suite metrics. Empirical Software Engineering 23, 5 (2018), 2948--2979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Yuan Yuan and Wolfgang Banzhaf. 2018. ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming. IEEE Transactions on Software Engineering (2018).Google ScholarGoogle Scholar

Index Terms

  1. On the efficiency of test suite based program repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
      June 2020
      1640 pages
      ISBN:9781450371216
      DOI:10.1145/3377811

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate276of1,856submissions,15%

      Upcoming Conference

      ICSE 2025

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader