skip to main content
10.1145/2000064.2000084acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Demand-driven software race detection using hardware performance counters

Published:04 June 2011Publication History

ABSTRACT

Dynamic data race detectors are an important mechanism for creating robust parallel programs. Software race detectors instrument the program under test, observe each memory access, and watch for inter-thread data sharing that could lead to concurrency errors. While this method of bug hunting can find races that are normally difficult to observe, it also suffers from high runtime overheads. It is not uncommon for commercial race detectors to experience 300x slowdowns, limiting their usage.

This paper presents a hardware-assisted demand-driven race detector. We are able to observe cache events that are indicative of data sharing between threads by taking advantage of hardware available on modern commercial microprocessors. We use these to build a race detector that is only enabled when it is likely that inter-thread data sharing is occurring. When little sharing takes place, this demand-driven analysis is much faster than contemporary continuous-analysis tools without a large loss of detection accuracy. We modified the race detector in Intel(R) Inspector XE to utilize our hardware-based sharing indicator and were able to achieve performance increases of 3x and 10x in two parallel benchmark suites and 51x for one particular program.

Skip Supplemental Material Section

Supplemental Material

isca_4a_2.mp4

mp4

105.4 MB

References

  1. D. F. Bacon and S. C. Goldstein. Hardware-assisted replay of multiprocessor programs. In ACM/ONR Workshop on Parallel & Distributed Debugging, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. U. Banerjee, B. Bliss, Z. Ma, and P. Petersen. A theory of data race detection. In Workshop on Parallel and Distributed Systems: Testing and Debugging, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. U. Banerjee, B. Bliss, Z. Ma, and P. Petersen. Unraveling data race detection in the Intel Thread Checker. In Workshop on Software Tools for MultiCore Systems, 2006.Google ScholarGoogle Scholar
  4. L. Baugh, N. Neelakantam, and C. Zilles. Using hardware memory protection to build a high-performance, strongly-atomic hybrid transactional memory. In Int'l. Symp. on Computer Architecture (ISCA), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe multithreaded programming for C/C >. In Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Int'l. Conf. on Parallel Architecture and Compilation Techniques (PACT), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. D. Bond, K. E. Coons, and K. S. McKinley. PACER: Proportional detection of data races. In Conf. on Programming Language Design and Implementation (PLDI), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Chen, W.-C. Hsu, J. Lu, P.-C. Yew, and D.-Y. Chen. Dynamic trace selection using performance monitoring hardware sampling. In Int'l Symp. on Code Generation and Optimization (CGO), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and precise datarace detection for multithreaded object-oriented programs. In Conf. on Programming Language Design and Implementation, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Damron, A. Fedorova, Y. Lev, V. Luchangco, M. Moir, and D. Nussbaum. Hybrid transactional memory. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. O. Edelstein, E. Farchi, Y. Nir, G. Ratsaby, and S. Ur. Multithreaded java program test generation. IBM Systems Journal, 41:111--125, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Flannagan and S. N. Freund. FastTrack: Efficient and precise dynamic race detection. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. L. Greathouse, C. LeBlanc, T. Austin, and V. Bertacco. Highly scalable distributed dataflow analysis. In Int'l Symp. on Code Generation and Optimization (CGO), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Ho, M. Fetterman, C. Clark, A. Warfield, and S. Hand. Practical taint-based protection using demand emulation. In European Conf. on Computer Systems (EuroSys), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith. Informing memory operations: Memory performance feedback mechanisms and their applications. Trans. on Computer Systems, 16:170--205, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. R. Hower and M. D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In Int'l Symp. on Computer Architecture (ISCA), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Jalbert, C. Pereira, G. Pokam, and K. Sen. RADBench: A Concurrency Bug Benchmark Suite. In Workshop on Hot Topics in Parallelism (HotPar), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Jannesari, K. Bao, V. Pankratius, and W. F. Tichy. Helgrind: An efficient dynamic race detector. In Int'l Parallel & Distributed Processing Symp. (IPDPS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes - a comprehensive study on real world concurrency bug characteristics. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting atomicity violations via access interleaving invariants. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Conf. on Programming Language Design and Implementation (PLDI), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Marino, M. Musuvathi, and S. Narayanasamy. LiteRace: Effective sampling for lightweight data-race detection. In Conf. on Programming Language Design and Implementation (PLDI), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. L. Min and J.-D. Choi. An efficient cache-based access anomaly detection scheme. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. Log™: Log-based transactional memory. In Int'l Symp. on High-Performance Computer Architecture (HPCA), 2006.Google ScholarGoogle Scholar
  26. A. Muzahid, D. Suárez, J. Torrellas, and S. Qi. SigRace: Signature-based data race detection. In Int'l Symp. on Computer Architecture (ISCA), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. Nagarajan and R. Gupta. ECMon: Exposing cache events for monitoring. In Int'l Symp. on Computer Architecture (ISCA), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously recording program execution for deterministic replay debugging. In Int'l Symp. on Computer Architecture (ISCA), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. National Vulnerability Database. Vulnerability Summary for CVE-2010-3864: OpenSSL 1.0.0. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2010-3864, 2010.Google ScholarGoogle Scholar
  30. R. H. B. Netzer and B. P. Miller. What are race conditions? some issues and formalizations. ACM Letters on Programming Languages and Systems, 1(1):74--88, March 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. O'Callahan and J.-D. Choi. Hybrid dynamic data race detection. In Symp. on Principles and Practice of Parallel Programming (PPoPP), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Park, S. Lu, and Y. Zhou. CTrigger: Exposing atomicity violation bugs from their hiding places. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Pincus and B. Baker. Beyond stack smashing: Recent advances in exploiting buffer overruns. IEEE Security and Privacy, 2(4):20--27, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Prvulovic. CORD: Cost-effective (and nearly overhead-free) order-recording and data race detection. In Int'l Symp. on High-Performance Computer Architecture (HPCA), 2006.Google ScholarGoogle Scholar
  35. Y. Qi, R. Das, Z. D. Luo, and M. Trotter. MulticoreSDK: A practical and efficient data race detector for real-world applications. In Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. In Int'l. Symp. on High-Performance Computer Architecture (HPCA), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Sack, B. E. Bliss, Z. Ma, P. Petersen, and J. Torrellas. Accurate and efficient filtering for the Intel Thread Checker race detector. In Workshop on Architectural and System Support for Improving Software Dependability, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. Trans. on Computer Systems, 15(4):391--411, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Serebryany and T. Iskhodzhanov. ThreadSanitizer -- data race detection in practice. In Workshop on Binary Instrumentation and Applications, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Singh, M. Bhadauria, and S. A. McKee. Real time power estimation and thread scheduling via performance counters. In Workshop on Design, Architecture, and Simulation of Chip Multi-Processors, 2008.Google ScholarGoogle Scholar
  41. B. Sprunt. The basics of performance-monitoring hardware. IEEE Micro, 22(4):64--71, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. Terboven. Comparing Intel Thread Checker and Sun Thread Analyzer. Parallel Computing: Architectures, Algorithms and Applications, 38:669--676, 2007.Google ScholarGoogle Scholar
  43. G. Venkataramani, I. Doudalis, Y. Solihin, and M. Prvulovic. MemTracker: An accelerator for memory debugging and monitoring. ACM Transactions on Architecture and Code Optimization, 6(2):1--33, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. E. Witchel. Mondriaan Memory Protection. PhD thesis, Massachusetts Institute of Technology, January 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Xu, R. Bodik, and M. D. Hill. A "Flight Data Recorder" for enabling full-system multiprocessor deterministic replay. In Int'l Symp. on Computer Architecture (ISCA), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Q. Zhao, D. Bruening, and S. Amarasinghe. Umbra: Efficient and scalable memory shadowing. In Int'l Symp. on Code Generation and Optimization (CGO), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. Zhou, R. Teodorescu, and Y. Zhou. HARD: Hardware-assisted lockset-based race detection. In Int'l Symp. on High-Performance Computer Architecture (HPCA), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Demand-driven software race detection using hardware performance counters

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
          June 2011
          488 pages
          ISBN:9781450304726
          DOI:10.1145/2000064
          • cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 39, Issue 3
            ISCA '11
            June 2011
            462 pages
            ISSN:0163-5964
            DOI:10.1145/2024723
            Issue’s Table of Contents

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 June 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate543of3,203submissions,17%

          Upcoming Conference

          ISCA '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader