Abstract
Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under specific execution interleavings that may not arise during in-house testing, thereby demanding a lightweight program monitoring technique that can be used post-deployment.
We present Cooperative Crug Isolation (CCI), a low-overhead instrumentation framework to diagnose production-run failures caused by crugs. CCI tracks specific thread interleavings at run-time, and uses statistical models to identify strong failure predictors among these. We offer a varied suite of predicates that represent different trade-offs between complexity and fault isolation capability. We also develop variant random sampling strategies that suit different types of predicates and help keep the run-time overhead low. Experiments with 9 real-world bugs in 6 non-trivial C applications show that these schemes span a wide spectrum of performance and diagnosis capabilities, each suitable for different usage scenarios.
- }}The Apache Software Foundation. Apache HTTP Server Project. http://httpd.apache.org/, 2009.Google Scholar
- }}C. Armour-Brown, J. Fitzhardinge, T. Hughes, N. Nethercote, P. Mackerras, D. Mueller, J. Seward, B. V. Assche, R. Walsh, and J. Weidendorfer. Valgrind User Manual. Valgrind project, 3.5.0 edition, Aug. 2009. http://valgrind.org/docs/manual/manual.html.Google Scholar
- }}P. Arumuga Nainar, T. Chen, J. Rosin, and B. Liblit. Statistical debugging using compound Boolean predicates. In ISSTA, 2007. Google ScholarDigital Library
- }}E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for c/c++. In OOPSLA, 2009. Google ScholarDigital Library
- }}M. D. Bond, K. E. Coons, and K. S. McKinley. Pacer: Proportional detection of data races. In PLDI, 2010. Google ScholarDigital Library
- }}A. Bron, E. Farchi, Y. Magid, Y. Nir, and S. Ur. Applications of synchronization coverage. In PPOPP, 2005. Google ScholarDigital Library
- }}J. Burnim and K. Sen. Asserting and checking determinism for multithreaded programs. In FSE, 2009. Google ScholarDigital Library
- }}T. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. HOLMES: Effective statistical debugging via efficient path profiling. In ICSE, May 2009. Google ScholarDigital Library
- }}J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and precise datarace detection for multithreaded object-oriented programs. In PLDI, 2002. Google ScholarDigital Library
- }}G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution replay of multiprocessor virtual machines. In VEE, 2008. Google ScholarDigital Library
- }}T. Elmas, S. Qadeer, and S. Tasiran. A calculus of atomic actions. In POPL, 2009. Google ScholarDigital Library
- }}D. Engler and K. Ashcraft. RacerX: Effective, static detection of race conditions and deadlocks. SIGOPS Oper. Syst. Rev., 37(5), 2003. Google ScholarDigital Library
- }}C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicity checker for multithreaded programs. In POPL, 2004. Google ScholarDigital Library
- }}C. Flanagan and S. N. Freund. Fasttrack: efficient and precise dynamic race detection. In PLDI, 2009. Google ScholarDigital Library
- }}J. Gilchrist. PBZIP2: Parallel BZIP2 Data Compression Software. http://compression.ca/pbzip2/, 2009.Google Scholar
- }}P. Godefroid and N. Nagappan. Concurrency at Microsoft - an exploratory survey. In Workshop on Exploiting Concurrency Efficiently and Correctly, 2008.Google Scholar
- }}T. A. Henzinger, R. Jhala, and R. Majumdar. Race checking by context inference. In PLDI, 2004. Google ScholarDigital Library
- }}M. Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1), 1991. Google ScholarDigital Library
- }}M. Hirzel and T. M. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2001.Google Scholar
- }}D. R. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two hardware-based approaches for deterministic multi-processor replay. CACM, 2009. Google ScholarDigital Library
- }}V. Kahlon, Y. Yang, S. Sankaranarayanan, and A. Gupta. Fast and accurate static data-race detection for concurrent programs. In CAV, 2007. Google ScholarDigital Library
- }}T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4), 1987. Google ScholarDigital Library
- }}B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003. Google ScholarDigital Library
- }}B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Public deployment of Cooperative Bug Isolation. In RAMSS, 2004.Google ScholarCross Ref
- }}B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In PLDI, 2005. Google ScholarDigital Library
- }}S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting atomicity violations via access-interleaving invariants. In ASPLOS, 2006. Google ScholarDigital Library
- }}S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes - a comprehensive study of real world concurrency bug characteristics. In ASPLOS, March 2008. Google ScholarDigital Library
- }}B. Lucia, J. Devietti, L. Ceze, and K. Strauss. Atom-aid: Detecting and surviving atomicity violations. IEEE Micro, 29 (1), 2009. Google ScholarDigital Library
- }}D. Marino, M. Musuvathi, and S. Narayanasamy. Effective sampling for lightweight data-race detection. In PLDI, 2009. Google ScholarDigital Library
- }}M. Musuvathi and S. Qadeer. Iterative context bounding for systematic testing of multithreaded programs. In PLDI, 2007. Google ScholarDigital Library
- }}S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically classifying benign and harmful data races using replay analysis. In PLDI, 2007. Google ScholarDigital Library
- }}S. Park, S. Lu, and Y. Zhou. CTrigger: exposing atomicity violation bugs from their hiding places. In ASPLOS, 2009. Google ScholarDigital Library
- }}W. Pugh and N. Ayewah. Unit testing concurrent software. In ASE, 2007. Google ScholarDigital Library
- }}S. Rajamani, G. Ramalingam, V.-P. Ranganath, and K. Vaswani. Isolator: dynamically ensuring isolation in comcurrent programs. In ASPLOS, 2009. Google ScholarDigital Library
- }}P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In PPoPP, 2009. Google ScholarDigital Library
- }}S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems, 15, 1997. Google ScholarDigital Library
- }}SecurityFocus. Software bug contributed to blackout. http://www.securityfocus.com/news/8016, Feb. 2004.Google Scholar
- }}K. Sen. Race directed random testing of concurrent programs. In PLDI, 2008. Google ScholarDigital Library
- }}Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I Use the Wrong Definition? DefUse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs. In OOPSLA, 2010. Google ScholarDigital Library
- }}A. Sussman and J. Trawick. Corrupt log lines at high volumes. https://issues.apache.org/bugzilla/show_bug.cgi?id=25520, 2003.Google Scholar
- }}A. Thakur, R. Sen, B. Liblit, and S. Lu. Cooperative Crug Isolation. In WODA, 2009. Google ScholarDigital Library
- }}M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints with data in an object-oriented language. In POPL, 2006. Google ScholarDigital Library
- }}M. T. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of synchronization. In POPL, 2010. Google ScholarDigital Library
- }}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, 1995. Google ScholarDigital Library
- }}J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ISCA, 2009. Google ScholarDigital Library
- }}Y. Yu, T. Rodeheffer, and W. Chen. Racetrack: efficient detection of data race conditions via adaptive tracking. In SOSP, 2005. Google ScholarDigital Library
Index Terms
- Instrumentation and sampling strategies for cooperative concurrency bug isolation
Recommendations
Instrumentation and sampling strategies for cooperative concurrency bug isolation
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsFixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under ...
Cooperative crug isolation
WODA '09: Proceedings of the Seventh International Workshop on Dynamic AnalysisWith the widespread deployment of multi-core hardware, writing concurrent programs has become inescapable. This has made fixing concurrency bugs (or crugs) critical in modern software systems. Static analysis techniques to find crugs such as data races ...
Isolating and understanding concurrency errors using reconstructed execution fragments
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationIn this paper we propose Recon, a new general approach to concurrency debugging. Recon goes beyond just detecting bugs, it also presents to the programmer short fragments of buggy execution schedules that illustrate how and why bugs happened. These ...
Comments