research-article

Instrumentation and sampling strategies for cooperative concurrency bug isolation

Authors:
Guoliang Jin

University of Wisconsin-Madison, Madison, WI, USA

University of Wisconsin-Madison, Madison, WI, USA
View Profile

,
Aditya Thakur

University of Wisconsin-Madison, Madison, WI, USA

University of Wisconsin-Madison, Madison, WI, USA
View Profile

,
Ben Liblit

University of Wisconsin-Madison, Madison, WI, USA

University of Wisconsin-Madison, Madison, WI, USA
View Profile

,
Shan Lu

University of Wisconsin-Madison, Madison, WI, USA

University of Wisconsin-Madison, Madison, WI, USA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 45 Issue 10October 2010pp 241–255https://doi.org/10.1145/1932682.1869481

Published:17 October 2010Publication History

ACM SIGPLAN Notices

Abstract

Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under specific execution interleavings that may not arise during in-house testing, thereby demanding a lightweight program monitoring technique that can be used post-deployment.

We present Cooperative Crug Isolation (CCI), a low-overhead instrumentation framework to diagnose production-run failures caused by crugs. CCI tracks specific thread interleavings at run-time, and uses statistical models to identify strong failure predictors among these. We offer a varied suite of predicates that represent different trade-offs between complexity and fault isolation capability. We also develop variant random sampling strategies that suit different types of predicates and help keep the run-time overhead low. Experiments with 9 real-world bugs in 6 non-trivial C applications show that these schemes span a wide spectrum of performance and diagnosis capabilities, each suitable for different usage scenarios.

References

}}The Apache Software Foundation. Apache HTTP Server Project. http://httpd.apache.org/, 2009.Google Scholar
}}C. Armour-Brown, J. Fitzhardinge, T. Hughes, N. Nethercote, P. Mackerras, D. Mueller, J. Seward, B. V. Assche, R. Walsh, and J. Weidendorfer. Valgrind User Manual. Valgrind project, 3.5.0 edition, Aug. 2009. http://valgrind.org/docs/manual/manual.html.Google Scholar
}}P. Arumuga Nainar, T. Chen, J. Rosin, and B. Liblit. Statistical debugging using compound Boolean predicates. In ISSTA, 2007. Google ScholarDigital Library
}}E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for c/c++. In OOPSLA, 2009. Google ScholarDigital Library
}}M. D. Bond, K. E. Coons, and K. S. McKinley. Pacer: Proportional detection of data races. In PLDI, 2010. Google ScholarDigital Library
}}A. Bron, E. Farchi, Y. Magid, Y. Nir, and S. Ur. Applications of synchronization coverage. In PPOPP, 2005. Google ScholarDigital Library
}}J. Burnim and K. Sen. Asserting and checking determinism for multithreaded programs. In FSE, 2009. Google ScholarDigital Library
}}T. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. HOLMES: Effective statistical debugging via efficient path profiling. In ICSE, May 2009. Google ScholarDigital Library
}}J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and precise datarace detection for multithreaded object-oriented programs. In PLDI, 2002. Google ScholarDigital Library
}}G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution replay of multiprocessor virtual machines. In VEE, 2008. Google ScholarDigital Library
}}T. Elmas, S. Qadeer, and S. Tasiran. A calculus of atomic actions. In POPL, 2009. Google ScholarDigital Library
}}D. Engler and K. Ashcraft. RacerX: Effective, static detection of race conditions and deadlocks. SIGOPS Oper. Syst. Rev., 37(5), 2003. Google ScholarDigital Library
}}C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicity checker for multithreaded programs. In POPL, 2004. Google ScholarDigital Library
}}C. Flanagan and S. N. Freund. Fasttrack: efficient and precise dynamic race detection. In PLDI, 2009. Google ScholarDigital Library
}}J. Gilchrist. PBZIP2: Parallel BZIP2 Data Compression Software. http://compression.ca/pbzip2/, 2009.Google Scholar
}}P. Godefroid and N. Nagappan. Concurrency at Microsoft - an exploratory survey. In Workshop on Exploiting Concurrency Efficiently and Correctly, 2008.Google Scholar
}}T. A. Henzinger, R. Jhala, and R. Majumdar. Race checking by context inference. In PLDI, 2004. Google ScholarDigital Library
}}M. Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1), 1991. Google ScholarDigital Library
}}M. Hirzel and T. M. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2001.Google Scholar
}}D. R. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two hardware-based approaches for deterministic multi-processor replay. CACM, 2009. Google ScholarDigital Library
}}V. Kahlon, Y. Yang, S. Sankaranarayanan, and A. Gupta. Fast and accurate static data-race detection for concurrent programs. In CAV, 2007. Google ScholarDigital Library
}}T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4), 1987. Google ScholarDigital Library
}}B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003. Google ScholarDigital Library
}}B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Public deployment of Cooperative Bug Isolation. In RAMSS, 2004.Google ScholarCross Ref
}}B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In PLDI, 2005. Google ScholarDigital Library
}}S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting atomicity violations via access-interleaving invariants. In ASPLOS, 2006. Google ScholarDigital Library
}}S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes - a comprehensive study of real world concurrency bug characteristics. In ASPLOS, March 2008. Google ScholarDigital Library
}}B. Lucia, J. Devietti, L. Ceze, and K. Strauss. Atom-aid: Detecting and surviving atomicity violations. IEEE Micro, 29 (1), 2009. Google ScholarDigital Library
}}D. Marino, M. Musuvathi, and S. Narayanasamy. Effective sampling for lightweight data-race detection. In PLDI, 2009. Google ScholarDigital Library
}}M. Musuvathi and S. Qadeer. Iterative context bounding for systematic testing of multithreaded programs. In PLDI, 2007. Google ScholarDigital Library
}}S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically classifying benign and harmful data races using replay analysis. In PLDI, 2007. Google ScholarDigital Library
}}S. Park, S. Lu, and Y. Zhou. CTrigger: exposing atomicity violation bugs from their hiding places. In ASPLOS, 2009. Google ScholarDigital Library
}}W. Pugh and N. Ayewah. Unit testing concurrent software. In ASE, 2007. Google ScholarDigital Library
}}S. Rajamani, G. Ramalingam, V.-P. Ranganath, and K. Vaswani. Isolator: dynamically ensuring isolation in comcurrent programs. In ASPLOS, 2009. Google ScholarDigital Library
}}P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In PPoPP, 2009. Google ScholarDigital Library
}}S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems, 15, 1997. Google ScholarDigital Library
}}SecurityFocus. Software bug contributed to blackout. http://www.securityfocus.com/news/8016, Feb. 2004.Google Scholar
}}K. Sen. Race directed random testing of concurrent programs. In PLDI, 2008. Google ScholarDigital Library
}}Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I Use the Wrong Definition? DefUse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs. In OOPSLA, 2010. Google ScholarDigital Library
}}A. Sussman and J. Trawick. Corrupt log lines at high volumes. https://issues.apache.org/bugzilla/show_bug.cgi?id=25520, 2003.Google Scholar
}}A. Thakur, R. Sen, B. Liblit, and S. Lu. Cooperative Crug Isolation. In WODA, 2009. Google ScholarDigital Library
}}M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints with data in an object-oriented language. In POPL, 2006. Google ScholarDigital Library
}}M. T. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of synchronization. In POPL, 2010. Google ScholarDigital Library
}}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, 1995. Google ScholarDigital Library
}}J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ISCA, 2009. Google ScholarDigital Library
}}Y. Yu, T. Rodeheffer, and W. Chen. Racetrack: efficient detection of data race conditions via adaptive tracking. In SOSP, 2005. Google ScholarDigital Library

Index Terms

Instrumentation and sampling strategies for cooperative concurrency bug isolation

Recommendations

Instrumentation and sampling strategies for cooperative concurrency bug isolation
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under ...
Read More
Cooperative crug isolation
WODA '09: Proceedings of the Seventh International Workshop on Dynamic Analysis

With the widespread deployment of multi-core hardware, writing concurrent programs has become inescapable. This has made fixing concurrency bugs (or crugs) critical in modern software systems. Static analysis techniques to find crugs such as data races ...
Read More
Isolating and understanding concurrency errors using reconstructed execution fragments
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation

In this paper we propose Recon, a new general approach to concurrency debugging. Recon goes beyond just detecting bugs, it also presents to the programmer short fragments of buggy execution schedules that illustrate how and why bugs happened. These ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 45, Issue 10
OOPSLA '10
October 2010
957 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1932682
Issue’s Table of Contents
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
October 2010
984 pages
ISBN:9781450302036
DOI:10.1145/1869459
General Chairs:
William R. Cook
University of Texas at Austin
,
Siobhán Clarke
Trinity College Dublin
,
Program Chairs:
Martin Rinard
MIT
,
Kevin J. Sullivan
University of Virginia
,
Daniel H. Steinberg
Dim Sum Thinking Inc.
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2010
Check for updates
Author Tags
bug isolation
concurrency
random sampling
statistical debugging
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 123
  Total Citations
  View Citations
- 740
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Instrumentation and sampling strategies for cooperative concurrency bug isolation

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Instrumentation and sampling strategies for cooperative concurrency bug isolation

Cooperative crug isolation

Isolating and understanding concurrency errors using reconstructed execution fragments