skip to main content
research-article

Instrumentation and sampling strategies for cooperative concurrency bug isolation

Authors Info & Claims
Published:17 October 2010Publication History
Skip Abstract Section

Abstract

Fixing concurrency bugs (or "crugs") is critical in modern software systems. Static analyses to find crugs such as data races and atomicity violations scale poorly, while dynamic approaches incur high run-time overheads. Crugs manifest only under specific execution interleavings that may not arise during in-house testing, thereby demanding a lightweight program monitoring technique that can be used post-deployment.

We present Cooperative Crug Isolation (CCI), a low-overhead instrumentation framework to diagnose production-run failures caused by crugs. CCI tracks specific thread interleavings at run-time, and uses statistical models to identify strong failure predictors among these. We offer a varied suite of predicates that represent different trade-offs between complexity and fault isolation capability. We also develop variant random sampling strategies that suit different types of predicates and help keep the run-time overhead low. Experiments with 9 real-world bugs in 6 non-trivial C applications show that these schemes span a wide spectrum of performance and diagnosis capabilities, each suitable for different usage scenarios.

References

  1. }}The Apache Software Foundation. Apache HTTP Server Project. http://httpd.apache.org/, 2009.Google ScholarGoogle Scholar
  2. }}C. Armour-Brown, J. Fitzhardinge, T. Hughes, N. Nethercote, P. Mackerras, D. Mueller, J. Seward, B. V. Assche, R. Walsh, and J. Weidendorfer. Valgrind User Manual. Valgrind project, 3.5.0 edition, Aug. 2009. http://valgrind.org/docs/manual/manual.html.Google ScholarGoogle Scholar
  3. }}P. Arumuga Nainar, T. Chen, J. Rosin, and B. Liblit. Statistical debugging using compound Boolean predicates. In ISSTA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for c/c++. In OOPSLA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}M. D. Bond, K. E. Coons, and K. S. McKinley. Pacer: Proportional detection of data races. In PLDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}A. Bron, E. Farchi, Y. Magid, Y. Nir, and S. Ur. Applications of synchronization coverage. In PPOPP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}J. Burnim and K. Sen. Asserting and checking determinism for multithreaded programs. In FSE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}T. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. HOLMES: Effective statistical debugging via efficient path profiling. In ICSE, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and precise datarace detection for multithreaded object-oriented programs. In PLDI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution replay of multiprocessor virtual machines. In VEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}T. Elmas, S. Qadeer, and S. Tasiran. A calculus of atomic actions. In POPL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}D. Engler and K. Ashcraft. RacerX: Effective, static detection of race conditions and deadlocks. SIGOPS Oper. Syst. Rev., 37(5), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}C. Flanagan and S. N. Freund. Atomizer: a dynamic atomicity checker for multithreaded programs. In POPL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}C. Flanagan and S. N. Freund. Fasttrack: efficient and precise dynamic race detection. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}J. Gilchrist. PBZIP2: Parallel BZIP2 Data Compression Software. http://compression.ca/pbzip2/, 2009.Google ScholarGoogle Scholar
  16. }}P. Godefroid and N. Nagappan. Concurrency at Microsoft - an exploratory survey. In Workshop on Exploiting Concurrency Efficiently and Correctly, 2008.Google ScholarGoogle Scholar
  17. }}T. A. Henzinger, R. Jhala, and R. Majumdar. Race checking by context inference. In PLDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}M. Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13(1), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}M. Hirzel and T. M. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, 2001.Google ScholarGoogle Scholar
  20. }}D. R. Hower, P. Montesinos, L. Ceze, M. D. Hill, and J. Torrellas. Two hardware-based approaches for deterministic multi-processor replay. CACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}V. Kahlon, Y. Yang, S. Sankaranarayanan, and A. Gupta. Fast and accurate static data-race detection for concurrent programs. In CAV, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4), 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug isolation via remote program sampling. In PLDI, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Public deployment of Cooperative Bug Isolation. In RAMSS, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  25. }}B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan. Scalable statistical bug isolation. In PLDI, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. }}S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting atomicity violations via access-interleaving invariants. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. }}S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes - a comprehensive study of real world concurrency bug characteristics. In ASPLOS, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}B. Lucia, J. Devietti, L. Ceze, and K. Strauss. Atom-aid: Detecting and surviving atomicity violations. IEEE Micro, 29 (1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. }}D. Marino, M. Musuvathi, and S. Narayanasamy. Effective sampling for lightweight data-race detection. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. }}M. Musuvathi and S. Qadeer. Iterative context bounding for systematic testing of multithreaded programs. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. }}S. Narayanasamy, Z. Wang, J. Tigani, A. Edwards, and B. Calder. Automatically classifying benign and harmful data races using replay analysis. In PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. }}S. Park, S. Lu, and Y. Zhou. CTrigger: exposing atomicity violation bugs from their hiding places. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. }}W. Pugh and N. Ayewah. Unit testing concurrent software. In ASE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. }}S. Rajamani, G. Ramalingam, V.-P. Ranganath, and K. Vaswani. Isolator: dynamically ensuring isolation in comcurrent programs. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. }}P. Ratanaworabhan, M. Burtscher, D. Kirovski, B. Zorn, R. Nagpal, and K. Pattabiraman. Detecting and tolerating asymmetric races. In PPoPP, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. }}S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems, 15, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. }}SecurityFocus. Software bug contributed to blackout. http://www.securityfocus.com/news/8016, Feb. 2004.Google ScholarGoogle Scholar
  38. }}K. Sen. Race directed random testing of concurrent programs. In PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. }}Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I Use the Wrong Definition? DefUse: Definition-Use Invariants for Detecting Concurrency and Sequential Bugs. In OOPSLA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. }}A. Sussman and J. Trawick. Corrupt log lines at high volumes. https://issues.apache.org/bugzilla/show_bug.cgi?id=25520, 2003.Google ScholarGoogle Scholar
  41. }}A. Thakur, R. Sen, B. Liblit, and S. Lu. Cooperative Crug Isolation. In WODA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. }}M. Vaziri, F. Tip, and J. Dolby. Associating synchronization constraints with data in an object-oriented language. In POPL, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. }}M. T. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of synchronization. In POPL, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. }}S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. }}J. Yu and S. Narayanasamy. A case for an interleaving constrained shared-memory multi-processor. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. }}Y. Yu, T. Rodeheffer, and W. Chen. Racetrack: efficient detection of data race conditions via adaptive tracking. In SOSP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Instrumentation and sampling strategies for cooperative concurrency bug isolation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 45, Issue 10
            OOPSLA '10
            October 2010
            957 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1932682
            Issue’s Table of Contents
            • cover image ACM Conferences
              OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
              October 2010
              984 pages
              ISBN:9781450302036
              DOI:10.1145/1869459

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 17 October 2010

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader