ABSTRACT
This paper investigates the application of causal inference methodology for observational studies to software fault localization based on test outcomes and profiles. This methodology combines statistical techniques for counterfactual inference with causal graphical models to obtain causal-effect estimates that are not subject to severe confounding bias. The methodology applies Pearl's Back-Door Criterion to program dependence graphs to justify a linear model for estimating the causal effect of covering a given statement on the occurrence of failures. The paper also presents the analysis of several proposed-fault localization metrics and their relationships to our causal estimator. Finally, the paper presents empirical results demonstrating that our model significantly improves the effectiveness of fault localization.
- R. Abreu, P. Zoeteweij, and A. J. C. van Gemund. On the Accuracy of Spectrum-based Fault Localization. In Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques, pages 89--98, 2007. Google ScholarDigital Library
- G. K. Baah, A. Podgurski, and M. J. Harrold. The Probabilistic Program Dependence Graph and Its Application to Fault Diagnosis. In Proceedings of International Symposium for Software Testing and Analysis, July 2008. Google ScholarDigital Library
- C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarDigital Library
- G. Casella and R. L. Berger. Statistical Inference. Thomson Learning, 2002.Google Scholar
- H. Cheng, D. Lo, Y. Zhou, X. Wang, and X. Yan. Identifying Bug Signatures Using Discriminative Graph Mining. In Proceedings of the International Symposium on Software Testing and Analysis, July 2009. Google ScholarDigital Library
- H. Cleve and A. Zeller. Locating Causes of Program Failures. In Proceedings of the International Symposium on the Foundations of Software Engineering, pages 342--351, May 2005. Google ScholarDigital Library
- H. Do, S. Elbaum, and G. Rothermel. Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact. Empirical Software Engineering, 10(4):405--435, 2005. Google ScholarDigital Library
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The Program Dependence Graph and Its Use in Optimization. ACM Transactions on Programming Languages and Systems, 9(3):319--349, July 1987. Google ScholarDigital Library
- J. J. Heckman. Microdata, Heterogeneity and the Evaluation of Public Policy. Nobel Lectures, Economics 1996-2000:255--322, 2000.Google Scholar
- G. W. Imbens. Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review. Review of Economics and Statistics, 86(1):4--29, 2004.Google ScholarCross Ref
- D. Jeffrey, N. Gupta, and R. Gupta. Fault Localization Using Value Replacement. In Proceedings of the 2008 International Symposium on Software Testing and Analysis, pages 167--178, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- J. Jones and M. J. Harrold. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In Proceedings of the International Conference on Automated Software Engineering, pages 273--282, November 2005. Google ScholarDigital Library
- J. Jones, M. J. Harrold, and J. Stasko. Visualization of Test Information to Assist Fault Localization. In Proceedings of the International Conference on Software Engineering, pages 467--477, May 2002. Google ScholarDigital Library
- B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable Statistical Bug Isolation. In Proceedings of the Conference on Programming Language Design and Implementation, pages 15--26, June 2005. Google ScholarDigital Library
- C. Liu, L. Fei, X. Yan, J. Han, and S. Midkiff. Statistical Debugging: A Hypothesis Testing-Based Approach. IEEE Transactions on Software Engineering, 32:841--848, 2006. Google ScholarDigital Library
- C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. SOBER: Statistical Model-based Bug Localization. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 286--295, September 2005. Google ScholarDigital Library
- C. D. Manning, Prabhakar, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
- W. Masri and A. Podgurski. Algorithms and Tool Support for Dynamic Information Flow Analysis. Information and Software Technology, 51(2):385--404, 2009. Google ScholarDigital Library
- S. L. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles of Social Research. Cambridge University Press, 2007.Google ScholarCross Ref
- G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. In Proceedings of the International Conference on Compiler Construction, pages 213--228, April 2002. Google ScholarDigital Library
- J. S. Neyman. On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Statistical Science, 5:465--480, 1923.Google ScholarCross Ref
- J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, San Francisco, CA, USA, 2000. Google ScholarDigital Library
- J. Pearl. An Introduction to Causal Inference. Technical report, UCLA Cognitive Systems Laboratory, 2009.Google Scholar
- J. Pearl and T. Verma. A Theory of Inferred Causation. In J. A. Allen, R. Fikes, and E. Sandewall (Eds.), Principles of Knowledge Representation and Reasoning: Proceeding of the 2nd International Conference, pages 441--452, 1991.Google Scholar
- R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2008.Google Scholar
- M. Renieris and S. Reiss. Fault Localization With Nearest Neighbor Queries. In International Conference on Automated Software Engineering, pages 30--39, November 2003.Google Scholar
- D. Rubin. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66:688--701, 1974.Google ScholarCross Ref
- D. B. Rubin. The Design versus the Analysis of Observational Studies for Causal Effects: Parallels With the Design of Randomized Trials. In Statistics in Medicine, 2006.Google Scholar
- P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search, 2nd Edition. The MIT Press, December 2001.Google ScholarCross Ref
- C. Winship and S. L. Morgan. The Estimation of Causal Effects from Observational Data. Annual Review of Sociology, 25:659--707, 1999.Google ScholarCross Ref
- A. Zeller. Isolating cause-effect chains from computer programs. In Proceedings ACM SIGSOFT 10th International Symposium on the Foundations of Software Engineering, November 2002. Google ScholarDigital Library
- X. Zhang, R. Gupta, and N. Gupta. Locating faults through automated predicate switching. In Proceedings of the 28th International Conference on Software Engineering, May 2006. Google ScholarDigital Library
Index Terms
- Causal inference for statistical fault localization
Recommendations
Mitigating the confounding effects of program dependences for effective fault localization
ESEC/FSE '11: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineeringDynamic program dependences are recognized as important factors in software debugging because they contribute to triggering the effects of faults and propagating the effects to a program's output. The effects of dynamic dependences also produce ...
Causal Inference Based Service Dependency Graph for Statistical Service Fault Localization
SKG '14: Proceedings of the 2014 10th International Conference on Semantics, Knowledge and GridsIn the interconnection environment, people combine basic services into composite services to provide more complex function for sophisticated applications. Accordingly, service fault localization in composite services becomes a critical issue for ...
The Importance of Being Positive in Causal Statistical Fault Localization: Important Properties of Baah et al.'s CSFL Regression Model
COUFLESS '15: Proceedings of the 2015 IEEE/ACM 1st International Workshop on Complex faUlts and Failures in LargE Software SystemsThis paper investigates the performance of Baah et al.'s causal regression model for fault localization when an important precondition for causal inference, called positivity, is violated. Two kinds of positivity violations are considered: structural ...
Comments