ABSTRACT
Many techniques on automated fault localization (AFL) have been introduced to assist developers in debugging. Prior studies evaluate the localization technique from the viewpoint of developers: measuring how many benefits that developers can obtain from the localization technique used when debugging. However, these evaluation approaches are not always suitable, because it is difficult to quantify precisely the benefits due to the complex debugging behaviors of developers. In addition, recent user studies have presented that developers working with AFL do not correct the defects more efficiently than ones working with only traditional debugging techniques such as breakpoints, even when the effectiveness of AFL is artificially improved. In this paper we attempt to propose a new research direction of developing AFL techniques from the viewpoint of fully automated debugging including the program repair of automation, for which the activity of AFL is necessary. We also introduce the NCP score as the evaluation measurement to assess and compare various techniques from this perspective. Our experiment on 15 popular AFL techniques with 11 subject programs shipping with real-life field failures presents the evidence that these AFL techniques performing well in prior studies do not have better localization effectiveness according to NCP score. We also observe that Jaccard has the better performance over other techniques in our experiment.
- R. Abreu, P. Zoeteweij, R. Golsteijn, and A. J. van Gemund. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software (JSS), 82(11):1780 – 1792, 2009. Google ScholarDigital Library
- R. Abreu, P. Zoeteweij, and A. van Gemund. On the accuracy of spectrum-based fault localization. In Testing: Academic and Industrial Conference, Practice and Research Techniques, 2007. Google ScholarDigital Library
- S. Ali, J. H. Andrews, T. Dhandapani, and W. Wang. Evaluating the accuracy of fault localization techniques. In International Conference on Automated Software Engineering (ASE), pages 76–87, 2009. Google ScholarDigital Library
- A. Arcuri. On the automation of fixing software bugs. In International Conference on Software Engineering (ICSE), pages 1003–1006, 2008. Google ScholarDigital Library
- A. Arcuri. Evolutionary repair of faulty software. Applied Soft Computing, 11(4):3494 – 3514, 2011. Google ScholarDigital Library
- A. Arcuri and L. Briand. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In International Conference on Software Engineering (ICSE), pages 1–10, 2011. Google ScholarDigital Library
- S. Artzi, J. Dolby, F. Tip, and M. Pistoia. Directed test generation for effective fault localization. In International Symposium on Software Testing and Analysis (ISSTA), 2010. Google ScholarDigital Library
- M. Burger and A. Zeller. Minimizing reproduction of software failures. In International Symposium on Software Testing and Analysis (ISSTA), pages 221–231, 2011. Google ScholarDigital Library
- M. Y. Chen, E. Kiciman, E. Fratkin, A. Fox, and E. Brewer. Pinpoint: Problem determination in large, dynamic internet services. In International Conference on Dependable Systems and Networks, pages 595–604, 2002. Google ScholarDigital Library
- Z. P. Fry, B. Landau, and W. Weimer. A human study of patch maintainability. In International Symposium on Software Testing and Analysis (ISSTA), pages 177–187, 2012. Google ScholarDigital Library
- M. Harman. Automated patching techniques: the fix is in: technical perspective. Communications of the ACM, 53(5):108–108, 2010. Google ScholarDigital Library
- G. Jin, L. Song, W. Zhang, S. Lu, and B. Liblit. Automated atomicity-violation fixing. In Programming Language Design and Implementation (PLDI), pages 389–400, 2011. Google ScholarDigital Library
- W. Jin and A. Orso. Bugredux: reproducing field failures for in-house debugging. In International Conference on Software Engineering (ICSE), pages 474–484, 2012. Google ScholarDigital Library
- J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In International Conference on Automated Software Engineering (ASE), pages 273–282, 2005. Google ScholarDigital Library
- J. A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. In International Conference on Software Engineering (ICSE), pages 467–477, 2002. Google ScholarDigital Library
- C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer. A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In International Conference on Software Engineering (ICSE), pages 3–13, 2012. Google ScholarDigital Library
- C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. GenProg: a generic method for automatic software repair. IEEE Transactions on Software Engineering (TSE), 38(1):54 –72, 2012. Google ScholarDigital Library
- C. Le Goues and W. Weimer. Measuring code quality to improve specification mining. IEEE Transactions on Software Engineering (TSE), 38(1):175 –190, 2012. Google ScholarDigital Library
- B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In Programming Language Design and Implementation (PLDI), pages 15–26, 2005. Google ScholarDigital Library
- L. Naish, H. J. Lee, and K. Ramamohanarao. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology (TOSEM), 20(3):11:1–11:32, 2011. Google ScholarDigital Library
- C. Parnin and A. Orso. Are automated debugging techniques actually helping programmers? In International Symposium on Software Testing and Analysis (ISSTA), pages 199–209, 2011. Google ScholarDigital Library
- Y. Pei, Y. Wei, C. Furia, M. Nordio, and B. Meyer. Code-based automated program fixing. In International Conference on Automated Software Engineering (ASE), pages 392 –395, 2011. Google ScholarDigital Library
- S. Poulding and J. A. Clark. Efficient software verification: Statistical testing using automated search. IEEE Transactions on Software Engineering (TSE), 36(6):763–777, Nov. 2010. Google ScholarDigital Library
- Y. Qi, X. Mao, and Y. Lei. Making automatic repair for large-scale programs more efficient using weak recompilation. In International Conference on Software Maintenance (ICSM), pages 254–263, 2012. Google ScholarDigital Library
- J. Röβler, G. Fraser, A. Zeller, and A. Orso. Isolating failure causes through test case generation. In International Symposium on Software Testing and Analysis (ISSTA), pages 309–319, 2012. Google ScholarDigital Library
- H. Samimi, M. Schäfer, S. Artzi, T. Millstein, F. Tip, and L. Hendren. Automated repair of HTML generation errors in php applications using string constraint solving. In International Conference on Software Engineering (ICSE), pages 277–287, 2012. Google ScholarDigital Library
- F. Thung, Lucia, D. Lo, L. Jiang, F. Rahman, and P. T. Devanbu. To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools. In International Conference on Automated Software Engineering (ASE), pages 50–59, 2012. Google ScholarDigital Library
- A. Vargha and H. D. Delaney. A critique and improvement of the CL common language effect size statistics of mcgraw and wong. Journal of Educational and Behavioral Statistics, 25(2):101–132, 2000.Google Scholar
- Y. Wei, C. A. Furia, N. Kazmin, and B. Meyer. Inferring better contracts. In International Conference on Software Engineering (ICSE), pages 191–200, 2011. Google ScholarDigital Library
- Y. Wei, Y. Pei, C. A. Furia, L. S. Silva, S. Buchholz, B. Meyer, and A. Zeller. Automated fixing of programs with contracts. In International Symposium on Software Testing and Analysis (ISSTA), pages 61–72, 2010. Google ScholarDigital Library
- F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1(6):80 – 83, 1945.Google ScholarCross Ref
- X. Xie, T. Y. Chen, F.-c. Kuo, and B. Xu. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology (TOSEM), 2013 (to appear).Google ScholarDigital Library
- Y. Yu, J. A. Jones, and M. J. Harrold. An empirical study of the effects of test-suite reduction on fault localization. In International Conference on Software Engineering (ICSE), pages 201–210, 2008. Google ScholarDigital Library
- A. Zeller. Automated debugging: Are we close. Computer, 34(11):26–31, 2001. Google ScholarDigital Library
- A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering (TSE), 28(2):183–200, 2002. Google ScholarDigital Library
Index Terms
- Using automated program repair for evaluating the effectiveness of fault localization techniques
Recommendations
Can automated program repair refine fault localization? a unified debugging approach
ISSTA 2020: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and AnalysisA large body of research efforts have been dedicated to automated software debugging, including both automated fault localization and program repair. However, existing fault localization techniques have limited effectiveness on real-world software ...
Empirical Effectiveness Evaluation of Spectra-Based Fault Localization on Automated Program Repair
COMPSAC '13: Proceedings of the 2013 IEEE 37th Annual Computer Software and Applications ConferenceResearchers have proposed many spectra-based fault localization (SBFL) techniques in the past decades. Existing studies evaluate the effectiveness of these techniques from the viewpoint of developers, and have drawn some important conclusions through ...
Comparing developer-provided to user-provided tests for fault localization and automated program repair
ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and AnalysisTo realistically evaluate a software testing or debugging technique, it must be run on defects and tests that are characteristic of those a developer would encounter in practice. For example, to determine the utility of a fault localization or automated ...
Comments