ABSTRACT
Dynamic data race detectors are an important mechanism for creating robust parallel programs. Software race detectors instrument the program under test, observe each memory access, and watch for inter-thread data sharing that could lead to concurrency errors. While this method of bug hunting can find races that are normally difficult to observe, it also suffers from high runtime overheads. It is not uncommon for commercial race detectors to experience 300x slowdowns, limiting their usage.
This paper presents a hardware-assisted demand-driven race detector. We are able to observe cache events that are indicative of data sharing between threads by taking advantage of hardware available on modern commercial microprocessors. We use these to build a race detector that is only enabled when it is likely that inter-thread data sharing is occurring. When little sharing takes place, this demand-driven analysis is much faster than contemporary continuous-analysis tools without a large loss of detection accuracy. We modified the race detector in Intel(R) Inspector XE to utilize our hardware-based sharing indicator and were able to achieve performance increases of 3x and 10x in two parallel benchmark suites and 51x for one particular program.
Supplemental Material
- D. F. Bacon and S. C. Goldstein. Hardware-assisted replay of multiprocessor programs. In ACM/ONR Workshop on Parallel & Distributed Debugging, 1991. Google ScholarDigital Library
- U. Banerjee, B. Bliss, Z. Ma, and P. Petersen. A theory of data race detection. In Workshop on Parallel and Distributed Systems: Testing and Debugging, 2006. Google ScholarDigital Library
- U. Banerjee, B. Bliss, Z. Ma, and P. Petersen. Unraveling data race detection in the Intel Thread Checker. In Workshop on Software Tools for MultiCore Systems, 2006.Google Scholar
- L. Baugh, N. Neelakantam, and C. Zilles. Using hardware memory protection to build a high-performance, strongly-atomic hybrid transactional memory. In Int'l. Symp. on Computer Architecture (ISCA), 2008. Google ScholarDigital Library
- E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe multithreaded programming for C/C >. In Conf. on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 2009. Google ScholarDigital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Int'l. Conf. on Parallel Architecture and Compilation Techniques (PACT), 2008. Google ScholarDigital Library
- M. D. Bond, K. E. Coons, and K. S. McKinley. PACER: Proportional detection of data races. In Conf. on Programming Language Design and Implementation (PLDI), 2010. Google ScholarDigital Library
- H. Chen, W.-C. Hsu, J. Lu, P.-C. Yew, and D.-Y. Chen. Dynamic trace selection using performance monitoring hardware sampling. In Int'l Symp. on Code Generation and Optimization (CGO), 2003. Google ScholarDigital Library
- J.-D. Choi, K. Lee, A. Loginov, R. O'Callahan, V. Sarkar, and M. Sridharan. Efficient and precise datarace detection for multithreaded object-oriented programs. In Conf. on Programming Language Design and Implementation, 2002. Google ScholarDigital Library
- P. Damron, A. Fedorova, Y. Lev, V. Luchangco, M. Moir, and D. Nussbaum. Hybrid transactional memory. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006. Google ScholarDigital Library
- O. Edelstein, E. Farchi, Y. Nir, G. Ratsaby, and S. Ur. Multithreaded java program test generation. IBM Systems Journal, 41:111--125, 2002. Google ScholarDigital Library
- C. Flannagan and S. N. Freund. FastTrack: Efficient and precise dynamic race detection. In PLDI, 2009. Google ScholarDigital Library
- J. L. Greathouse, C. LeBlanc, T. Austin, and V. Bertacco. Highly scalable distributed dataflow analysis. In Int'l Symp. on Code Generation and Optimization (CGO), 2011. Google ScholarDigital Library
- A. Ho, M. Fetterman, C. Clark, A. Warfield, and S. Hand. Practical taint-based protection using demand emulation. In European Conf. on Computer Systems (EuroSys), 2006. Google ScholarDigital Library
- M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith. Informing memory operations: Memory performance feedback mechanisms and their applications. Trans. on Computer Systems, 16:170--205, 1998. Google ScholarDigital Library
- D. R. Hower and M. D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In Int'l Symp. on Computer Architecture (ISCA), 2008. Google ScholarDigital Library
- N. Jalbert, C. Pereira, G. Pokam, and K. Sen. RADBench: A Concurrency Bug Benchmark Suite. In Workshop on Hot Topics in Parallelism (HotPar), 2011. Google ScholarDigital Library
- A. Jannesari, K. Bao, V. Pankratius, and W. F. Tichy. Helgrind: An efficient dynamic race detector. In Int'l Parallel & Distributed Processing Symp. (IPDPS), 2009. Google ScholarDigital Library
- L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, 1978. Google ScholarDigital Library
- S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes - a comprehensive study on real world concurrency bug characteristics. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008. Google ScholarDigital Library
- S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: Detecting atomicity violations via access interleaving invariants. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006. Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Conf. on Programming Language Design and Implementation (PLDI), 2005. Google ScholarDigital Library
- D. Marino, M. Musuvathi, and S. Narayanasamy. LiteRace: Effective sampling for lightweight data-race detection. In Conf. on Programming Language Design and Implementation (PLDI), 2009. Google ScholarDigital Library
- S. L. Min and J.-D. Choi. An efficient cache-based access anomaly detection scheme. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1991. Google ScholarDigital Library
- K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood. Log™: Log-based transactional memory. In Int'l Symp. on High-Performance Computer Architecture (HPCA), 2006.Google Scholar
- A. Muzahid, D. Suárez, J. Torrellas, and S. Qi. SigRace: Signature-based data race detection. In Int'l Symp. on Computer Architecture (ISCA), 2009. Google ScholarDigital Library
- V. Nagarajan and R. Gupta. ECMon: Exposing cache events for monitoring. In Int'l Symp. on Computer Architecture (ISCA), 2009. Google ScholarDigital Library
- S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously recording program execution for deterministic replay debugging. In Int'l Symp. on Computer Architecture (ISCA), 2005. Google ScholarDigital Library
- National Vulnerability Database. Vulnerability Summary for CVE-2010-3864: OpenSSL 1.0.0. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2010-3864, 2010.Google Scholar
- R. H. B. Netzer and B. P. Miller. What are race conditions? some issues and formalizations. ACM Letters on Programming Languages and Systems, 1(1):74--88, March 1992. Google ScholarDigital Library
- R. O'Callahan and J.-D. Choi. Hybrid dynamic data race detection. In Symp. on Principles and Practice of Parallel Programming (PPoPP), 2003. Google ScholarDigital Library
- S. Park, S. Lu, and Y. Zhou. CTrigger: Exposing atomicity violation bugs from their hiding places. In Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009. Google ScholarDigital Library
- J. Pincus and B. Baker. Beyond stack smashing: Recent advances in exploiting buffer overruns. IEEE Security and Privacy, 2(4):20--27, 2004. Google ScholarDigital Library
- M. Prvulovic. CORD: Cost-effective (and nearly overhead-free) order-recording and data race detection. In Int'l Symp. on High-Performance Computer Architecture (HPCA), 2006.Google Scholar
- Y. Qi, R. Das, Z. D. Luo, and M. Trotter. MulticoreSDK: A practical and efficient data race detector for real-world applications. In Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, 2009. Google ScholarDigital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. In Int'l. Symp. on High-Performance Computer Architecture (HPCA), 2007. Google ScholarDigital Library
- P. Sack, B. E. Bliss, Z. Ma, P. Petersen, and J. Torrellas. Accurate and efficient filtering for the Intel Thread Checker race detector. In Workshop on Architectural and System Support for Improving Software Dependability, 2006. Google ScholarDigital Library
- S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. Eraser: A dynamic data race detector for multithreaded programs. Trans. on Computer Systems, 15(4):391--411, 1997. Google ScholarDigital Library
- K. Serebryany and T. Iskhodzhanov. ThreadSanitizer -- data race detection in practice. In Workshop on Binary Instrumentation and Applications, 2009. Google ScholarDigital Library
- K. Singh, M. Bhadauria, and S. A. McKee. Real time power estimation and thread scheduling via performance counters. In Workshop on Design, Architecture, and Simulation of Chip Multi-Processors, 2008.Google Scholar
- B. Sprunt. The basics of performance-monitoring hardware. IEEE Micro, 22(4):64--71, 2002. Google ScholarDigital Library
- C. Terboven. Comparing Intel Thread Checker and Sun Thread Analyzer. Parallel Computing: Architectures, Algorithms and Applications, 38:669--676, 2007.Google Scholar
- G. Venkataramani, I. Doudalis, Y. Solihin, and M. Prvulovic. MemTracker: An accelerator for memory debugging and monitoring. ACM Transactions on Architecture and Code Optimization, 6(2):1--33, 2009. Google ScholarDigital Library
- E. Witchel. Mondriaan Memory Protection. PhD thesis, Massachusetts Institute of Technology, January 2004. Google ScholarDigital Library
- M. Xu, R. Bodik, and M. D. Hill. A "Flight Data Recorder" for enabling full-system multiprocessor deterministic replay. In Int'l Symp. on Computer Architecture (ISCA), 2003. Google ScholarDigital Library
- Q. Zhao, D. Bruening, and S. Amarasinghe. Umbra: Efficient and scalable memory shadowing. In Int'l Symp. on Code Generation and Optimization (CGO), 2010. Google ScholarDigital Library
- P. Zhou, R. Teodorescu, and Y. Zhou. HARD: Hardware-assisted lockset-based race detection. In Int'l Symp. on High-Performance Computer Architecture (HPCA), 2007. Google ScholarDigital Library
Index Terms
- Demand-driven software race detection using hardware performance counters
Recommendations
Demand-driven software race detection using hardware performance counters
ISCA '11Dynamic data race detectors are an important mechanism for creating robust parallel programs. Software race detectors instrument the program under test, observe each memory access, and watch for inter-thread data sharing that could lead to concurrency ...
Parallelizing data race detection
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsDetecting data races in multithreaded programs is a crucial part of debugging such programs, but traditional data race detectors are too slow to use routinely. This paper shows how to speed up race detection by spreading the work across multiple cores. ...
Parallelizing data race detection
ASPLOS '13Detecting data races in multithreaded programs is a crucial part of debugging such programs, but traditional data race detectors are too slow to use routinely. This paper shows how to speed up race detection by spreading the work across multiple cores. ...
Comments