Skip to main content
Erschienen in: Empirical Software Engineering 5/2018

02.03.2018

Overfitting in semantics-based automated program repair

verfasst von: Xuan Bach D. Le, Ferdian Thung, David Lo, Claire Le Goues

Erschienen in: Empirical Software Engineering | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The primary goal of Automated Program Repair (APR) is to automatically fix buggy software, to reduce the manual bug-fix burden that presently rests on human developers. Existing APR techniques can be generally divided into two families: semantics- vs. heuristics-based. Semantics-based APR uses symbolic execution and test suites to extract semantic constraints, and uses program synthesis to synthesize repairs that satisfy the extracted constraints. Heuristic-based APR generates large populations of repair candidates via source manipulation, and searches for the best among them. Both families largely rely on a primary assumption that a program is correctly patched if the generated patch leads the program to pass all provided test cases. Patch correctness is thus an especially pressing concern. A repair technique may generate overfitting patches, which lead a program to pass all existing test cases, but fails to generalize beyond them. In this work, we revisit the overfitting problem with a focus on semantics-based APR techniques, complementing previous studies of the overfitting problem in heuristics-based APR. We perform our study using IntroClass and Codeflaws benchmarks, two datasets well-suited for assessing repair quality, to systematically characterize and understand the nature of overfitting in semantics-based APR. We find that similar to heuristics-based APR, overfitting also occurs in semantics-based APR in various different ways.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
2
Angelix can target multiple expressions at once; we explain the process with respect to a single buggy expression for clarity, but the technique generalizes naturally.
 
4
We use the words “repair” and “patch” interchangeably.
 
5
We discuss the real-world bugs we describe qualitatively in Section 3.6.
 
7
Recall the explanation in Section 2.3 on the number of tests used for specification inference.
 
Literatur
Zurück zum Zitat Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007. IEEE, pp 89–98 Abreu R, Zoeteweij P, Van Gemund AJ (2007) On the accuracy of spectrum-based fault localization. In: Testing: academic and industrial conference practice and research techniques-MUTATION, 2007. TAICPART-MUTATION 2007. IEEE, pp 89–98
Zurück zum Zitat Alur R, Bodik R, Juniwal G, Martin MM, Raghothaman M, Seshia SA, Singh R, Solar-Lezama A, Torlak E, Udupa A (2015) Syntax-guided synthesis. Dependable Software Systems Engineering 40:1–25 Alur R, Bodik R, Juniwal G, Martin MM, Raghothaman M, Seshia SA, Singh R, Solar-Lezama A, Torlak E, Udupa A (2015) Syntax-guided synthesis. Dependable Software Systems Engineering 40:1–25
Zurück zum Zitat Cadar C, Dunbar D, Engler DR et al. (2008) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Operating systems design and implementation, OSDI, pp 209–224 Cadar C, Dunbar D, Engler DR et al. (2008) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Operating systems design and implementation, OSDI, pp 209–224
Zurück zum Zitat DeMarco F, Xuan J, Le Berre D, Monperrus M (2014) Automatic repair of buggy if conditions and missing preconditions with SMT. In: Proceedings of the 6th international workshop on constraints in software testing, verification, and analysis, pp 30–39 DeMarco F, Xuan J, Le Berre D, Monperrus M (2014) Automatic repair of buggy if conditions and missing preconditions with SMT. In: Proceedings of the 6th international workshop on constraints in software testing, verification, and analysis, pp 30–39
Zurück zum Zitat Fry ZP, Landau B, Weimer W (2012) A human study of patch maintainability. In: International symposium on software testing and analysis (ISSTA), pp 177–187 Fry ZP, Landau B, Weimer W (2012) A human study of patch maintainability. In: International symposium on software testing and analysis (ISSTA), pp 177–187
Zurück zum Zitat Gulwani S, Esparza J, Grumberg O, Sickert S (2016) Programming by examples (and its applications in data wrangling). Verification and synthesis of correct and secure systems Gulwani S, Esparza J, Grumberg O, Sickert S (2016) Programming by examples (and its applications in data wrangling). Verification and synthesis of correct and secure systems
Zurück zum Zitat Jha S, Gulwani S, Seshia SA, Tiwari A (2010) Oracle-guided component-based program synthesis. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE, pp 215–224 Jha S, Gulwani S, Seshia SA, Tiwari A (2010) Oracle-guided component-based program synthesis. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE, pp 215–224
Zurück zum Zitat Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis, ISSTA, pp 437–440 Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis, ISSTA, pp 437–440
Zurück zum Zitat Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering, pp 295–306 Ke Y, Stolee KT, Le Goues C, Brun Y (2015) Repairing programs with semantic code search. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering, pp 295–306
Zurück zum Zitat Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: ACM/IEEE international conference on software engineering, ICSE, pp 802–811 Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: ACM/IEEE international conference on software engineering, ICSE, pp 802–811
Zurück zum Zitat Könighofer R, Bloem R (2011) Automated error localization and correction for imperative programs. In: Formal methods in computer-aided design, IEEE, FMCAD, pp 91–100 Könighofer R, Bloem R (2011) Automated error localization and correction for imperative programs. In: Formal methods in computer-aided design, IEEE, FMCAD, pp 91–100
Zurück zum Zitat Könighofer R, Bloem R (2012) Repair with on-the-fly program analysis. In: Haifa verification conference. Springer, pp 56–71 Könighofer R, Bloem R (2012) Repair with on-the-fly program analysis. In: Haifa verification conference. Springer, pp 56–71
Zurück zum Zitat Le XBD (2016) Towards efficient and effective automatic program repair. In: International conference on automated software engineering (ASE). IEEE, pp 876–879 Le XBD (2016) Towards efficient and effective automatic program repair. In: International conference on automated software engineering (ASE). IEEE, pp 876–879
Zurück zum Zitat Le TDB, Le XBD, Lo D, Beschastnikh I (2015a) Synergizing specification miners through model fissions and fusions. In: International conference on automated software engineering (ASE). IEEE, pp 115–125 Le TDB, Le XBD, Lo D, Beschastnikh I (2015a) Synergizing specification miners through model fissions and fusions. In: International conference on automated software engineering (ASE). IEEE, pp 115–125
Zurück zum Zitat Le X BD, Le TDB, Lo D (2015b) Should fixing these failures be delegated to automated program repair?. In: International symposium on software reliability engineering (ISSRE). IEEE, pp 427–437 Le X BD, Le TDB, Lo D (2015b) Should fixing these failures be delegated to automated program repair?. In: International symposium on software reliability engineering (ISSRE). IEEE, pp 427–437
Zurück zum Zitat Le X BD, Le Q L, Lo D, Le Goues C (2016a) Enhancing automated program repair with deductive verification. In: International conference on software maintenance and evolution (ICSME). IEEE, pp 428–432 Le X BD, Le Q L, Lo D, Le Goues C (2016a) Enhancing automated program repair with deductive verification. In: International conference on software maintenance and evolution (ICSME). IEEE, pp 428–432
Zurück zum Zitat Le X BD, Lo D, Le Goues C (2016b) Empirical study on synthesis engines for semantics-based program repair. In: International conference on software maintenance and evolution (ICSME). IEEE, pp 423–427 Le X BD, Lo D, Le Goues C (2016b) Empirical study on synthesis engines for semantics-based program repair. In: International conference on software maintenance and evolution (ICSME). IEEE, pp 423–427
Zurück zum Zitat Le X BD, Lo D, Le Goues C (2016c) History driven program repair. In: Proceedings of the 23rd IEEE international conference on software analysis, evolution, and reengineering, SANER, pp 213–224 Le X BD, Lo D, Le Goues C (2016c) History driven program repair. In: Proceedings of the 23rd IEEE international conference on software analysis, evolution, and reengineering, SANER, pp 213–224
Zurück zum Zitat Le X BD, Chu D H, David L, Le Goues C (2017a) Jfix: sematics-based repair of java programs via symbolic pathfinder. In: Proceedings of the 2017 ACM international symposium on software testing and analysis, ISSTA Le X BD, Chu D H, David L, Le Goues C (2017a) Jfix: sematics-based repair of java programs via symbolic pathfinder. In: Proceedings of the 2017 ACM international symposium on software testing and analysis, ISSTA
Zurück zum Zitat Le X BD, Chu DH, Lo D, Le Goues C, Visser W (2017b) S3: syntax-and semantic-guided repair synthesis via programming by examples. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 593–604 Le X BD, Chu DH, Lo D, Le Goues C, Visser W (2017b) S3: syntax-and semantic-guided repair synthesis via programming by examples. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 593–604
Zurück zum Zitat Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012) A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In: 2012 34th international conference on software engineering (ICSE). IEEE, pp 3–13 Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012) A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In: 2012 34th international conference on software engineering (ICSE). IEEE, pp 3–13
Zurück zum Zitat Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72CrossRef Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72CrossRef
Zurück zum Zitat Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans Softw Eng 41(12):1236–1256CrossRef Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass benchmarks for automated repair of C programs. IEEE Trans Softw Eng 41(12):1236–1256CrossRef
Zurück zum Zitat Logozzo F, Ball T (2012) Modular and verified automatic program repair. SIGPLAN Not 47(10):133–146CrossRef Logozzo F, Ball T (2012) Modular and verified automatic program repair. SIGPLAN Not 47(10):133–146CrossRef
Zurück zum Zitat Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 166–178 Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM, pp 166–178
Zurück zum Zitat Long F, Rinard M (2016a) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th international conference on software engineering. ACM, pp 702–713 Long F, Rinard M (2016a) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th international conference on software engineering. ACM, pp 702–713
Zurück zum Zitat Long F, Rinard M (2016b) Automatic patch generation by learning correct code. In: ACM SIGPLAN notices, vol 51. ACM, pp 298–312 Long F, Rinard M (2016b) Automatic patch generation by learning correct code. In: ACM SIGPLAN notices, vol 51. ACM, pp 298–312
Zurück zum Zitat Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20(1):176–205CrossRef Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20(1):176–205CrossRef
Zurück zum Zitat Mechtaev S, Yi J, Roychoudhury A (2015) Directfix: Looking for simple program repairs. In: Proceedings of the 37th ACM/IEEE international conference on software engineering, ICSE, pp 448–458 Mechtaev S, Yi J, Roychoudhury A (2015) Directfix: Looking for simple program repairs. In: Proceedings of the 37th ACM/IEEE international conference on software engineering, ICSE, pp 448–458
Zurück zum Zitat Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th ACM/IEEE international conference on software engineering, ICSE, pp 691–701 Mechtaev S, Yi J, Roychoudhury A (2016) Angelix: scalable multiline program patch synthesis via symbolic analysis. In: Proceedings of the 38th ACM/IEEE international conference on software engineering, ICSE, pp 691–701
Zurück zum Zitat Nguyen HD T, Qi D, Roychoudhury A, Chandra S (2013) Semfix: Program repair via semantic analysis. In: Proceedings of the 2013 international conference on software engineering, ICSE, pp 772–781 Nguyen HD T, Qi D, Roychoudhury A, Chandra S (2013) Semfix: Program repair via semantic analysis. In: Proceedings of the 2013 international conference on software engineering, ICSE, pp 772–781
Zurück zum Zitat Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th international conference on software engineering, ACM, ICSE, pp 254–265 Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th international conference on software engineering, ACM, ICSE, pp 254–265
Zurück zum Zitat Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA, pp 24–36 Qi Z, Long F, Achour S, Rinard M (2015) An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA, pp 24–36
Zurück zum Zitat Reynolds A, Deters M, Kuncak V, Tinelli C, Barrett C (2015) Counterexample-guided quantifier instantiation for synthesis in SMT. In: International conference on computer aided verification. Springer, pp 198–216 Reynolds A, Deters M, Kuncak V, Tinelli C, Barrett C (2015) Counterexample-guided quantifier instantiation for synthesis in SMT. In: International conference on computer aided verification. Springer, pp 198–216
Zurück zum Zitat Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ACM, ESEC/FSE, pp 532–543 Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the cure worse than the disease? Overfitting in automated program repair. In: Joint meeting of the european software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ACM, ESEC/FSE, pp 532–543
Zurück zum Zitat Tan SH, Yi J, Yulis, Mechtaev S, Roychoudhury A (2017) Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In: ICSE Poster, to appear Tan SH, Yi J, Yulis, Mechtaev S, Roychoudhury A (2017) Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In: ICSE Poster, to appear
Zurück zum Zitat Tassey G (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology, RTI project 7007(011) Tassey G (2002) The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology, RTI project 7007(011)
Zurück zum Zitat Thung F, Le X BD, Lo D (2015) Active semi-supervised defect categorization. In: International conference on program comprehension (ICPC). IEEE Press, pp 60–70 Thung F, Le X BD, Lo D (2015) Active semi-supervised defect categorization. In: International conference on program comprehension (ICPC). IEEE Press, pp 60–70
Zurück zum Zitat Weimer W, Forrest S, Le Goues C, Nguyen T (2010) Automatic program repair with evolutionary computation. Commun ACM 53(5):109–116CrossRef Weimer W, Forrest S, Le Goues C, Nguyen T (2010) Automatic program repair with evolutionary computation. Commun ACM 53(5):109–116CrossRef
Zurück zum Zitat Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: models and first results. In: Proceedings of the 28th international conference on automated software engineering, IEEE, ASE, pp 356–366 Weimer W, Fry ZP, Forrest S (2013) Leveraging program equivalence for adaptive program repair: models and first results. In: Proceedings of the 28th international conference on automated software engineering, IEEE, ASE, pp 356–366
Metadaten
Titel
Overfitting in semantics-based automated program repair
verfasst von
Xuan Bach D. Le
Ferdian Thung
David Lo
Claire Le Goues
Publikationsdatum
02.03.2018
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 5/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-017-9577-2

Weitere Artikel der Ausgabe 5/2018

Empirical Software Engineering 5/2018 Zur Ausgabe