skip to main content
10.1145/3106237.3106274acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Better test cases for better automated program repair

Published:21 August 2017Publication History

ABSTRACT

Automated generate-and-validate program repair techniques (G&V techniques) suffer from generating many overfitted patches due to in-capabilities of test cases. Such overfitted patches are incor- rect patches, which only make all given test cases pass, but fail to fix the bugs. In this work, we propose an overfitted patch detec- tion framework named Opad (Overfitted PAtch Detection). Opad helps improve G&V techniques by enhancing existing test cases to filter out overfitted patches. To enhance test cases, Opad uses fuzz testing to generate new test cases, and employs two test or- acles (crash and memory-safety) to enhance validity checking of automatically-generated patches. Opad also uses a novel metric (named O-measure) for deciding whether automatically-generated patches overfit. Evaluated on 45 bugs from 7 large systems (the same benchmark used by GenProg and SPR), Opad filters out 75.2% (321/427) over- fitted patches generated by GenProg/AE, Kali, and SPR. In addition, Opad guides SPR to generate correct patches for one more bug (the original SPR generates correct patches for 11 bugs). Our analysis also shows that up to 40% of such automatically-generated test cases may further improve G&V techniques if empowered with better test oracles (in addition to crash and memory-safety oracles employed by Opad).

References

  1. 2016. American Fuzzy Lop. (2016). http://lcamtuf.coredump.cx/afl/. 2016.Google ScholarGoogle Scholar
  2. gcov—a Test Coverage Program. (2016).Google ScholarGoogle Scholar
  3. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html. 2016. Valgrind. (2016). http://valgrind.org/.Google ScholarGoogle Scholar
  4. Paul Ammann and Jeff Offutt. 2008.Google ScholarGoogle Scholar
  5. Introduction to Software Testing (1 ed.). Cambridge University Press, New York, NY, USA.Google ScholarGoogle Scholar
  6. Earl T. Barr, Yuriy Brun, Premkumar Devanbu, Mark Harman, and Federica Sarro. 2014. The Plastic Surgery Hypothesis. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 306–317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI 2008). USENIX Association, Berkeley, CA, USA, 209–224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Christoph Csallner and Yannis Smaragdakis. 2004.Google ScholarGoogle Scholar
  9. JCrasher: An automatic robustness tester for Java. Software—Practice & Experience 34, 11 (Sept. 2004), 1025–1050. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (FSE 2011). ACM, 416–419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Generation for Object-oriented Software. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE 2011). ACM, New York, NY, USA, 416–419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Juan Pablo Galeotti, Gordon Fraser, and Andrea Arcuri. 2014.Google ScholarGoogle Scholar
  13. Extending a Search-Based Test Generator with Adaptive Dynamic Symbolic Execution (Tool paper). In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA 2014). ACM, ACM, New York, NY, USA, 421–424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S Hocevar. 2011. zzuf—multi-purpose fuzzer. (2011).Google ScholarGoogle Scholar
  15. Laura Inozemtseva and Reid Holmes. 2014. Coverage is not strongly correlated with test suite effectiveness. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, 435–445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. René Just, Darioush Jalali, Laura Inozemtseva, Michael D. Ernst, Reid Holmes, and Gordon Fraser. 2014. Are Mutants a Valid Substitute for Real Faults in Software Testing?. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 654–665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. 2013. Automatic Patch Generation Learned from Human-written Patches. In Proceedings of the 2013 International Conference on Software Engineering (ICSE 2013). IEEE Press, Piscataway, NJ, USA, 802–811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. B. D. Le, D. Lo, and C. L. Goues. 2016.Google ScholarGoogle Scholar
  19. History Driven Program Repair. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016), Vol. 1. 213–224.Google ScholarGoogle Scholar
  20. Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer. 2012. A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each. In Proceedings of the 2012 International Conference on Software Engineering (ICSE 2012). IEEE Press, Piscataway, NJ, USA, 3–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xinyuan Liu, Muhan Zeng, Yingfei Xiong, Lu Zhang, and Gang Huang. 2017.Google ScholarGoogle Scholar
  22. Identifying Patch Correctness in Test-Based Automatic Program Repair. CoRR abs/1706.09120 (2017).Google ScholarGoogle Scholar
  23. Fan Long and Martin Rinard. 2015. Staged program repair with condition synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE (FSE 2015). 166–178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fan Long and Martin Rinard. 2016. An analysis of the search spaces for generate and validate patch generation systems. In Proceedings of the 38th International Conference on Software Engineering (ICSE 2016). ACM, 702–713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fan Long and Martin Rinard. 2016. Automatic Patch Generation by Learning Correct Code. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2016). ACM, New York, NY, USA, 298–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Matias Martinez, Westley Weimer, and Martin Monperrus. 2014. Do the Fix Ingredients Already Exist? An Empirical Inquiry into the Redundancy Assumptions of Program Repair Approaches. In Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, USA, 492–495. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Richard McNally, Ken Yiu, Duncan Grove, and Damien Gerhardy. 2012.Google ScholarGoogle Scholar
  28. Fuzzing: the state of the art. Technical Report. DTIC Document.Google ScholarGoogle Scholar
  29. Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. Directfix: Looking for simple program repairs. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE 2015), Vol. 1. IEEE, 448–458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th International Conference on Software Engineering. ACM, 691–701. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Barton P Miller, Louis Fredriksen, and Bryan So. 1990. An empirical study of the reliability of UNIX utilities. Commun. ACM 33, 12 (1990), 32–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Carlos Pacheco and Michael D Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN conference on Objectoriented programming systems and applications companion. ACM, 815–816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007.Google ScholarGoogle Scholar
  34. Feedback-Directed Random Test Generation. In Proceedings of the 29th International Conference on Software Engineering (ICSE 2007). IEEE Computer Society, Washington, DC, USA, 75–84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yuhua Qi, Xiaoguang Mao, Yan Lei, Ziying Dai, and Chengsong Wang. 2014. The strength of random search on automated program repair. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). 254–265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zichao Qi, Fan Long, Sara Anchor, and Martin Rinard. An Analysis of Patch Plausibility and Correctness for Generate-And-Validate Patch Generation Systems. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). 24–36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Koushik Sen, Darko Marinov, and Gul Agha. 2005.Google ScholarGoogle Scholar
  38. CUTE: A Concolic Unit Testing Engine for C. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE 2013). ACM, New York, NY, USA, 263–272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tan Shin Weiand, H Yoshida, Prasad M, and A. Roychoudhury. Anti-patterns in Search-based Program Repair. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). 727– 738. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Edward K. Smith, Earl T. Barr, Claire Le Goues, and Yuriy Brun. 2015.Google ScholarGoogle Scholar
  41. Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, New York, NY, USA, 532–543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Michael Sutton. 2012. FileFuzz. (2012).Google ScholarGoogle Scholar
  43. Shin Hwei Tan and Abhik Roychoudhury. 2015.Google ScholarGoogle Scholar
  44. relifix: Automated Repair of Software Regressions. In Proceedings of the 2015 Internaltional Conference on Software Engineering (ICSE 2015). 471–482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Westley Weimer, Zachary P Fry, and Stephen Forrest. 2013. Leveraging program equivalence for adaptive program repair: Models and first results. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 356–366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Tao Xie. 2006.Google ScholarGoogle Scholar
  47. Augmenting Automatically Generated Unit-test Suites with Regression Oracle Checking. In Proceedings of the 20th European Conference on Object-Oriented Programming (ECOOP 2006). Springer-Verlag, Berlin, Heidelberg, 380–403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Qi Xin and Steven P. Reiss. 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Identifying Test-Suite-Overfitted Patches through Test Case Generation. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA 2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Xuan, M. Martinez, F. DeMarco, M. Clement, S. Lamelas Marcote, T. Durieux, D. Le Berre, and M. Monperrus. 2016. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Transactions on Software Engineering PP, 99 (2016), 1–1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Z. Yu, M. Martinez, B. Danglot, T. Durieux, and M. Monperrus. 2017. Test Case Generation for Program Repair: A Study of Feasibility and Effectiveness. ArXiv e-prints (March 2017). arXiv:cs.SE/1703.00198Google ScholarGoogle Scholar
  52. Yucheng Zhang and Ali Mesbah. 2015.Google ScholarGoogle Scholar
  53. Assertions Are Strongly Correlated with Test Suite Effectiveness. In Proceedings of the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2015). ACM, 214–224. Abstract 1 Introduction 2 Background on Automated G&V Program Repair 3 Approach 3.1 Generating New Test Cases Using Fuzz Testing 3.2 Generating Memory-Safety Oracles 3.3 Measuring the Overfitness of a Patch Using an Overfitness Metric (O-measure) 3.4 An Optimized Setting of Opad 4 Evaluation 5 Threats to Validity 6 Related Work 7 Conclusions References Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Better test cases for better automated program repair

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
      August 2017
      1073 pages
      ISBN:9781450351058
      DOI:10.1145/3106237

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate112of543submissions,21%

      Upcoming Conference

      FSE '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader