research-article

Isolating and understanding concurrency errors using reconstructed execution fragments

Authors:
Brandon Lucia

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

,
Benjamin P. Wood

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

,
Luis Ceze

University of Washington, Seattle, WA, USA

University of Washington, Seattle, WA, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 46 Issue 6June 2011pp 378–388https://doi.org/10.1145/1993316.1993543

Published:04 June 2011Publication History

ACM SIGPLAN Notices

Abstract

In this paper we propose Recon, a new general approach to concurrency debugging. Recon goes beyond just detecting bugs, it also presents to the programmer short fragments of buggy execution schedules that illustrate how and why bugs happened. These fragments, called reconstructions, are inferred from inter-thread communication surrounding the root cause of a bug and significantly simplify the process of understanding bugs.

The key idea in Recon is to monitor executions and build graphs that encode inter-thread communication with enough context information to build reconstructions. Recon leverages reconstructions built from multiple application executions and uses machine learning to identify which ones illustrate the root cause of a bug. Recon's approach is general because it does not rely on heuristics specific to any type of bug, application, or programming model. Therefore, it is able to deal with single- and multiple-variable concurrency bugs regardless of their type (e.g., atomicity violation, ordering, etc). To make graph collection efficient, Recon employs selective monitoring and allows metadata information to be imprecise without compromising accuracy. With these optimizations, Recon's graph collection imposes overheads typically between 5x and 20x for both C/C++ and Java programs, with overheads as low as 13% in our experiments. We evaluate Recon with buggy applications, and show it produces reconstructions that include all code points involved in bugs' causes, and presents them in an accurate order. We include a case study of understanding and fixing a previously unresolved bug to showcase Recon's effectiveness.

References

Z. Anderson, D. Gay, R. Ennals, and E. Brewer. SharC: Checking Data Sharing Strategies for Multithreaded C. In PLDI, 2008. Google ScholarDigital Library
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008. Google ScholarDigital Library
S. M. Blackburn et al. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA, 2006. Google ScholarDigital Library
H.-J. Boehm and S. V. Adve. Foundations of the C Concurrency Memory Model. In PLDI, 2008. Google ScholarDigital Library
S. Burckhardt, P. Kothari, M. Musuvathi, and S. Nagarakatte. A Randomized Scheduler with Probabilistic Guarantees of Finding Bugs. In ASPLOS, 2010. Google ScholarDigital Library
C. Flanagan and S. N. Freund. FastTrack: Efficient and Precise Dynamic Race Detection. In PLDI, 2009. Google ScholarDigital Library
C. Flanagan and S. N. Freund. The RoadRunner Dynamic Analysis Framework for Concurrent Programs. In PASTE, 2010. Google ScholarDigital Library
C. Flanagan, S. N. Freund, and J. Yi. Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs. In PLDI, 2008. Google ScholarDigital Library
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 2009. Google ScholarDigital Library
C. Hammer, J. Dolby, M. Vaziri, and F. Tip. Dynamic Detection of Atomic-Set-Serializability Violations. In ICSE, 2008. Google ScholarDigital Library
G. Jin, A. Thakur, B. Liblit, and S. Lu. Instrumentation and Sampling Strategies for Cooperative Concurrency Bug Isolation. In OOPSLA, 2010. Google ScholarDigital Library
P. Joshi and K. Sen. Predictive Typestate Checking of Multithreaded Java Programs. In ASE, 2008. Google ScholarDigital Library
K. Kawachiya, A. Koseki, and T. Onodera. Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations. In OOPSLA, 2002. Google ScholarDigital Library
I. Kononenko. Estimating Attributes: Analysis and Extensions of RELIEF. In European Conference on Machine Learning, 1994. Google ScholarDigital Library
B. Liblit. Cooperative Bug Isolation, volume 4440 of Lecture Notes in Computer Science. Springer, 2007.Google Scholar
S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from Mistakes - A Comprehensive Study on Real World Concurrency Bug Characteristics. In ASPLOS, 2008. Google ScholarDigital Library
S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting Atomicity Violations via Access Interleaving Invariants. In ASPLOS, 2006. Google ScholarDigital Library
B. Lucia and L. Ceze. Finding Concurrency Bugs with Context-Aware Communication Graphs. In MICRO, 2009. Google ScholarDigital Library
C.-K. Luk et al. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005. Google ScholarDigital Library
J. Manson, W. Pugh, and S. V. Adve. The Java Memory Model. In POPL, 2005. Google ScholarDigital Library
M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar, and I. Neamtiu. Finding and Reproducing Heisenbugs in Concurrent Programs. In OSDI, 2008. Google ScholarDigital Library
S. Park, S. Lu, and Y. Zhou. CTrigger: Exposing Atomicity Violation Bugs from Their Hiding Places. In ASPLOS, 2009. Google ScholarDigital Library
S. Park, R. W. Vuduc, and M. J. Harrold. Falcon: Fault Localization in Concurrent Programs. In ICSE, 2010. Google ScholarDigital Library
S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, 2009. Google ScholarDigital Library
M. Ronsee and K. D. Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. ToCS, 1999. Google ScholarDigital Library
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs. ToCS, 1997. Google ScholarDigital Library
Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I Use the Wrong Definition?: DeFuse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs. In OOPSLA, 2010. Google ScholarDigital Library
L. A. Smith, J. M. Bull, and J. Obdrzálek. A parallel Java Grande benchmark suite. In Supercomputing, 2001. Google ScholarDigital Library
B. P. Wood, A. Sampson, L. Ceze, and D. Grossman. Composable Specifications for Structured Shared-Memory Communication. In OOPSLA, 2010. Google ScholarDigital Library
M. Xu, R. Bodík, and M. D. Hill. A Serializability Violation Detector for Shared-Memory Server Programs. In PLDI, June 2005. Google ScholarDigital Library
M. Xu, M. D. Hill, and R. Bodik. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In ASPLOS, 2006. Google ScholarDigital Library
J. Yu and S. Narayanasamy. A Case for an Interleaving Constrained Shared-Memory Multi-Processor. In ISCA, 2009. Google ScholarDigital Library
P. Zhou, R. Teodorescu, and Y. Zhou. HARD: Hardware-Assisted Lockset-based Race Detection. In HPCA, 2007. Google ScholarDigital Library

Index Terms

Isolating and understanding concurrency errors using reconstructed execution fragments
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging
  2. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

Isolating and understanding concurrency errors using reconstructed execution fragments
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

In this paper we propose Recon, a new general approach to concurrency debugging. Recon goes beyond just detecting bugs, it also presents to the programmer short fragments of buggy execution schedules that illustrate how and why bugs happened. These ...
Read More
Instrumentation and sampling strategies for cooperative concurrency bug isolation
OOPSLA '10

Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under ...
Read More
Instrumentation and sampling strategies for cooperative concurrency bug isolation
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 46, Issue 6
PLDI '11
June 2011
652 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1993316
Issue’s Table of Contents
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
General Chair:
Mary Hall
University of Utah
,
Program Chair:
David Padua
University of Illinois at Urbana-Champaign
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 June 2011
Check for updates
Author Tags
concurrency
multithreading
statistical debugging
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 435
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Isolating and understanding concurrency errors using reconstructed execution fragments

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Isolating and understanding concurrency errors using reconstructed execution fragments

Instrumentation and sampling strategies for cooperative concurrency bug isolation

Instrumentation and sampling strategies for cooperative concurrency bug isolation