research-article

Causal inference for statistical fault localization

Authors:
George K. Baah

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA
View Profile

,
Andy Podgurski

Case Western Reserve University, Cleveland, OH, USA

Case Western Reserve University, Cleveland, OH, USA
View Profile

,
Mary Jean Harrold

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA
View Profile

ISSTA '10: Proceedings of the 19th international symposium on Software testing and analysisJuly 2010Pages 73–84https://doi.org/10.1145/1831708.1831717

Published:12 July 2010Publication History

ISSTA '10: Proceedings of the 19th international symposium on Software testing and analysis

Pages 73–84

ABSTRACT

This paper investigates the application of causal inference methodology for observational studies to software fault localization based on test outcomes and profiles. This methodology combines statistical techniques for counterfactual inference with causal graphical models to obtain causal-effect estimates that are not subject to severe confounding bias. The methodology applies Pearl's Back-Door Criterion to program dependence graphs to justify a linear model for estimating the causal effect of covering a given statement on the occurrence of failures. The paper also presents the analysis of several proposed-fault localization metrics and their relationships to our causal estimator. Finally, the paper presents empirical results demonstrating that our model significantly improves the effectiveness of fault localization.

References

R. Abreu, P. Zoeteweij, and A. J. C. van Gemund. On the Accuracy of Spectrum-based Fault Localization. In Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques, pages 89--98, 2007. Google ScholarDigital Library
G. K. Baah, A. Podgurski, and M. J. Harrold. The Probabilistic Program Dependence Graph and Its Application to Fault Diagnosis. In Proceedings of International Symposium for Software Testing and Analysis, July 2008. Google ScholarDigital Library
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarDigital Library
G. Casella and R. L. Berger. Statistical Inference. Thomson Learning, 2002.Google Scholar
H. Cheng, D. Lo, Y. Zhou, X. Wang, and X. Yan. Identifying Bug Signatures Using Discriminative Graph Mining. In Proceedings of the International Symposium on Software Testing and Analysis, July 2009. Google ScholarDigital Library
H. Cleve and A. Zeller. Locating Causes of Program Failures. In Proceedings of the International Symposium on the Foundations of Software Engineering, pages 342--351, May 2005. Google ScholarDigital Library
H. Do, S. Elbaum, and G. Rothermel. Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact. Empirical Software Engineering, 10(4):405--435, 2005. Google ScholarDigital Library
J. Ferrante, K. J. Ottenstein, and J. D. Warren. The Program Dependence Graph and Its Use in Optimization. ACM Transactions on Programming Languages and Systems, 9(3):319--349, July 1987. Google ScholarDigital Library
J. J. Heckman. Microdata, Heterogeneity and the Evaluation of Public Policy. Nobel Lectures, Economics 1996-2000:255--322, 2000.Google Scholar
G. W. Imbens. Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review. Review of Economics and Statistics, 86(1):4--29, 2004.Google ScholarCross Ref
D. Jeffrey, N. Gupta, and R. Gupta. Fault Localization Using Value Replacement. In Proceedings of the 2008 International Symposium on Software Testing and Analysis, pages 167--178, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
J. Jones and M. J. Harrold. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In Proceedings of the International Conference on Automated Software Engineering, pages 273--282, November 2005. Google ScholarDigital Library
J. Jones, M. J. Harrold, and J. Stasko. Visualization of Test Information to Assist Fault Localization. In Proceedings of the International Conference on Software Engineering, pages 467--477, May 2002. Google ScholarDigital Library
B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable Statistical Bug Isolation. In Proceedings of the Conference on Programming Language Design and Implementation, pages 15--26, June 2005. Google ScholarDigital Library
C. Liu, L. Fei, X. Yan, J. Han, and S. Midkiff. Statistical Debugging: A Hypothesis Testing-Based Approach. IEEE Transactions on Software Engineering, 32:841--848, 2006. Google ScholarDigital Library
C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. SOBER: Statistical Model-based Bug Localization. In Proceedings of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 286--295, September 2005. Google ScholarDigital Library
C. D. Manning, Prabhakar, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
W. Masri and A. Podgurski. Algorithms and Tool Support for Dynamic Information Flow Analysis. Information and Software Technology, 51(2):385--404, 2009. Google ScholarDigital Library
S. L. Morgan and C. Winship. Counterfactuals and Causal Inference: Methods and Principles of Social Research. Cambridge University Press, 2007.Google ScholarCross Ref
G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. In Proceedings of the International Conference on Compiler Construction, pages 213--228, April 2002. Google ScholarDigital Library
J. S. Neyman. On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Statistical Science, 5:465--480, 1923.Google ScholarCross Ref
J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, San Francisco, CA, USA, 2000. Google ScholarDigital Library
J. Pearl. An Introduction to Causal Inference. Technical report, UCLA Cognitive Systems Laboratory, 2009.Google Scholar
J. Pearl and T. Verma. A Theory of Inferred Causation. In J. A. Allen, R. Fikes, and E. Sandewall (Eds.), Principles of Knowledge Representation and Reasoning: Proceeding of the 2nd International Conference, pages 441--452, 1991.Google Scholar
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2008.Google Scholar
M. Renieris and S. Reiss. Fault Localization With Nearest Neighbor Queries. In International Conference on Automated Software Engineering, pages 30--39, November 2003.Google Scholar
D. Rubin. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies. Journal of Educational Psychology, 66:688--701, 1974.Google ScholarCross Ref
D. B. Rubin. The Design versus the Analysis of Observational Studies for Causal Effects: Parallels With the Design of Randomized Trials. In Statistics in Medicine, 2006.Google Scholar
P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search, 2nd Edition. The MIT Press, December 2001.Google ScholarCross Ref
C. Winship and S. L. Morgan. The Estimation of Causal Effects from Observational Data. Annual Review of Sociology, 25:659--707, 1999.Google ScholarCross Ref
A. Zeller. Isolating cause-effect chains from computer programs. In Proceedings ACM SIGSOFT 10th International Symposium on the Foundations of Software Engineering, November 2002. Google ScholarDigital Library
X. Zhang, R. Gupta, and N. Gupta. Locating faults through automated predicate switching. In Proceedings of the 28th International Conference on Software Engineering, May 2006. Google ScholarDigital Library

Index Terms

Causal inference for statistical fault localization
1. Information systems
  1. Data management systems
    1. Middleware for databases
      1. Distributed transaction monitors
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Monitors

Recommendations

Mitigating the confounding effects of program dependences for effective fault localization
ESEC/FSE '11: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering

Dynamic program dependences are recognized as important factors in software debugging because they contribute to triggering the effects of faults and propagating the effects to a program's output. The effects of dynamic dependences also produce ...
Read More
Causal Inference Based Service Dependency Graph for Statistical Service Fault Localization
SKG '14: Proceedings of the 2014 10th International Conference on Semantics, Knowledge and Grids

In the interconnection environment, people combine basic services into composite services to provide more complex function for sophisticated applications. Accordingly, service fault localization in composite services becomes a critical issue for ...
Read More
The Importance of Being Positive in Causal Statistical Fault Localization: Important Properties of Baah et al.'s CSFL Regression Model
COUFLESS '15: Proceedings of the 2015 IEEE/ACM 1st International Workshop on Complex faUlts and Failures in LargE Software Systems

This paper investigates the performance of Baah et al.'s causal regression model for fault localization when an important precondition for causal inference, called positivity, is violated. Two kinds of positivity violations are considered: structural ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISSTA '10: Proceedings of the 19th international symposium on Software testing and analysis
July 2010
294 pages
ISBN:9781605588230
DOI:10.1145/1831708
General Chair:
Paolo Tonella
Fondazione Bruno Kessler -- IRST, Italy
,
Program Chair:
Alessandro Orso
Georgia Institute of Technology, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
causal inference
debugging
fault localization
potential outcome model
program analysis
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate58of213submissions,27%
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 85
  Total Citations
  View Citations
- 1,168
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Causal inference for statistical fault localization

ISSTA '10: Proceedings of the 19th international symposium on Software testing and analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mitigating the confounding effects of program dependences for effective fault localization

Causal Inference Based Service Dependency Graph for Statistical Service Fault Localization

The Importance of Being Positive in Causal Statistical Fault Localization: Important Properties of Baah et al.'s CSFL Regression Model