skip to main content
10.1145/1831708.1831717acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Causal inference for statistical fault localization

Published:12 July 2010Publication History

ABSTRACT

This paper investigates the application of causal inference methodology for observational studies to software fault localization based on test outcomes and profiles. This methodology combines statistical techniques for counterfactual inference with causal graphical models to obtain causal-effect estimates that are not subject to severe confounding bias. The methodology applies Pearl's Back-Door Criterion to program dependence graphs to justify a linear model for estimating the causal effect of covering a given statement on the occurrence of failures. The paper also presents the analysis of several proposed-fault localization metrics and their relationships to our causal estimator. Finally, the paper presents empirical results demonstrating that our model significantly improves the effectiveness of fault localization.

References

  1. R. Abreu, P. Zoeteweij, and A. J. C. van Gemund. On the Accuracy of Spectrum-based Fault Localization. In Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques, pages 89--98, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. K. Baah, A. Podgurski, and M. J. Harrold. The Probabilistic Program Dependence Graph and Its Application to Fault Diagnosis. In Proceedings of International Symposium for Software Testing and Analysis, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Casella and R. L. Berger. Statistical Inference. Thomson Learning, 2002.Google ScholarGoogle Scholar
  5. H. Cheng, D. Lo, Y. Zhou, X. Wang, and X. Yan. Identifying Bug Signatures Using Discriminative Graph Mining. In Proceedings of the International Symposium on Software Testing and Analysis, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Cleve and A. Zeller. Locating Causes of Program Failures. In Proceedings of the International Symposium on the Foundations of Software Engineering, pages 342--351, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Do, S. Elbaum, and G. Rothermel. Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact. Empirical Software Engineering, 10(4):405--435, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Ferrante, K. J. Ottenstein, and J. D. Warren. The Program Dependence Graph and Its Use in Optimization. ACM Transactions on Programming Languages and Systems, 9(3):319--349, July 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. J. Heckman. Microdata, Heterogeneity and the Evaluation of Public Policy. Nobel Lectures, Economics 1996-2000:255--322, 2000.Google ScholarGoogle Scholar
  10. G. W. Imbens. Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review. Review of Economics and Statistics, 86(1):4--29, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Jeffrey, N. Gupta, and R. Gupta. Fault Localization Using Value Replacement. In Proceedings of the 2008 International Symposium on Software Testing and Analysis, pages 167--178, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Jones and M. J. Harrold. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In Proceedings of the International Conference on Automated Software Engineering, pages 273--282, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Jones, M. J. Harrold, and J. Stasko. Visualization of Test Information to Assist Fault Localization. In Proceedings of the International Conference on Software Engineering, pages 467--477, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable Statistical Bug Isolation. In Proceedings of the Conference on Programming Language Design and Implementation, pages 15--26, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Liu, L. Fei, X. Yan, J. Han, and S. Midkiff. Statistical Debugging: A Hypothesis Testing-Based Approach. IEEE Transactions on Software Engineering, 32:841--848, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. SOBER: Statistical Model-based Bug Localization. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 286--295, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. D. Manning, Prabhakar, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Masri and A. Podgurski. Algorithms and Tool Support for Dynamic Information Flow Analysis. Information and Software Technology, 51(2):385--404, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. L. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles of Social Research. Cambridge University Press, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. In Proceedings of the International Conference on Compiler Construction, pages 213--228, April 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. S. Neyman. On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Statistical Science, 5:465--480, 1923.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, San Francisco, CA, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Pearl. An Introduction to Causal Inference. Technical report, UCLA Cognitive Systems Laboratory, 2009.Google ScholarGoogle Scholar
  24. J. Pearl and T. Verma. A Theory of Inferred Causation. In J. A. Allen, R. Fikes, and E. Sandewall (Eds.), Principles of Knowledge Representation and Reasoning: Proceeding of the 2nd International Conference, pages 441--452, 1991.Google ScholarGoogle Scholar
  25. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2008.Google ScholarGoogle Scholar
  26. M. Renieris and S. Reiss. Fault Localization With Nearest Neighbor Queries. In International Conference on Automated Software Engineering, pages 30--39, November 2003.Google ScholarGoogle Scholar
  27. D. Rubin. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66:688--701, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  28. D. B. Rubin. The Design versus the Analysis of Observational Studies for Causal Effects: Parallels With the Design of Randomized Trials. In Statistics in Medicine, 2006.Google ScholarGoogle Scholar
  29. P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search, 2nd Edition. The MIT Press, December 2001.Google ScholarGoogle ScholarCross RefCross Ref
  30. C. Winship and S. L. Morgan. The Estimation of Causal Effects from Observational Data. Annual Review of Sociology, 25:659--707, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Zeller. Isolating cause-effect chains from computer programs. In Proceedings ACM SIGSOFT 10th International Symposium on the Foundations of Software Engineering, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Zhang, R. Gupta, and N. Gupta. Locating faults through automated predicate switching. In Proceedings of the 28th International Conference on Software Engineering, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Causal inference for statistical fault localization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISSTA '10: Proceedings of the 19th international symposium on Software testing and analysis
        July 2010
        294 pages
        ISBN:9781605588230
        DOI:10.1145/1831708

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 July 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate58of213submissions,27%

        Upcoming Conference

        ISSTA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader