Abstract
While removing software bugs consumes vast amounts of human time, hardware support for debugging in modern computers remains rudimentary. Fortunately, we show that mechanisms for Thread-Level Speculation (TLS) can be reused to boost debugging productivity. Most notably, TLS's rollback capabilities can be extended to support rolling back recent buggy execution and repeating it as many times as necessary until the bug is fully characterized. These incremental re-executions are deterministic even in multithreaded codes. Importantly, this operation can be done automatically on the fly, and is compatible with production runs.As a specific implementation of a TLS-based debugging framework, we introduce ReEnact. ReEnact targets a particularly hairy class of bugs: data races in multithreaded programs. ReEnact extends the communication monitoring mechanisms in TLS to also detect data races. It extends TLS's rollback capabilities to be able to roll back and deterministically re-execute the code with races to obtain the race signature. Finally, the signature is compared to a library of race patterns and, if a match occurs, the execution may be repaired. Overall, ReEnact successfully detects, characterizes, and often repairs races automatically on the fly. Moreover, it is fully compatible with always-on use in production runs: the slowdown of race-free execution with ReEnact is on average only 5.8%.
- S. V. Adve, M. D. Hill, B. P. Miller, and R. H. B. Netzer. Detecting Data Races on Weak Memory Systems. In 18th Intl. Symp. on Computer Architecture, pages 234--243, 1991. Google ScholarDigital Library
- T. M. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In 32nd Intl. Symp. on Microarchitecture, pages 196--207, 1999. Google ScholarDigital Library
- J.-D. Choi et al. Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs. In ACM SIGPLAN 2002 Conf. on Prog. Lang. Design and Implementation, pages 258--269, 2002. Google ScholarDigital Library
- J.-D. Choi and S. L. Min. Race Frontier: Reproducing Data Races in Parallel-Program Debugging. In 3rd ACM SIGPLAN Symp. on Principles & Practice of Parallel Programming, pages 145--154, 1991. Google ScholarDigital Library
- M. Cintra, J. F. Martinez, and J. Torrellas. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors. In 27th Intl. Symp. on Computer Architecture, pages 13--24, 2000. Google ScholarDigital Library
- K. D. Cooper et al. The ParaScope Parallel Programming Environment. Proc. of the IEEE, 81(2):244--263, 1993.Google ScholarCross Ref
- C. Fidge. Logical Time in Distributed Computing Systems. IEEE Computer, 24(8):23--33, 1991. Google ScholarDigital Library
- M. Garzaran et al. Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors. In 8th Intl. Symp. on High-Performance Computer Architecture, pages 191--202, 2003. Google ScholarDigital Library
- K. Gharachorloo and P. B. Gibbons. Detecting Violations of Sequential Consistency. In 3rd Symp. on Parallel Algorithms and Architectures, pages 316--326, 1991. Google ScholarDigital Library
- S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In 4th Intl. Symp. on High-Performance Computer Architecture, pages 195--205, 1998. Google ScholarDigital Library
- L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In 8th Intl. Conf. on Arch. Support for Prog. Lang. and Operating Sys., pages 58--69, 1998. Google ScholarDigital Library
- Intel Corporation. The IA-32 Intel Architecture Software Developer's Manual, Volume 3: System Programming Guide. Intel Corporation, 2002.Google Scholar
- S. W. Keckler et al. Exploiting Fine-Grain Thread-Level Parallelism on the MIT Multi-ALU Processor. In 25th Intl. Symp. on Computer Architecture, pages 306--317, 1998. Google ScholarDigital Library
- E. Marcus and H. Stern. Blueprints for High Availability. John Willey & Sons, 2000. Google ScholarDigital Library
- S. L. Min and J.-D. Choi. An Efficient Cache-Based Access Anomaly Detection Scheme. In 4th Intl. Conf. on Arch. Support for Prog. Lang. and Operating Sys., pages 235--244, 1991. Google ScholarDigital Library
- S. S. Mukherjee, M. Kontz, and S. K. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. In 29th Intl. Symp. on Computer Architecture, pages 99--110, 2002. Google ScholarDigital Library
- J. Oplinger and M. S. Lam. Enhancing Software Reliability with Speculative Threads. In 10th Intl. Conf. on Arch. Support for Prog. Lang. and Operating Sys., pages 184--196, 2002. Google ScholarDigital Library
- D. Perkovic and P. J. Keleher. A Protocol-Centric Approach to Onthe-Fly Race Detection. IEEE Trans. on Parallel and Distributed Systems, 11(10):1058--1072, 2000. Google ScholarDigital Library
- M. Prvulovic, M. J. Garzaran, L. Rauchwerger, and J. Torrellas. Removing Architectural Bottlenecks to the Scalability of Speculative Parallelization. In 28th Intl. Symp. on Computer Architecture, pages 204--215, 2001. Google ScholarDigital Library
- M. Prvulovic, Z. Zhang, and J. Torrellas. ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors. In 29th Intl. Symp. on Computer Architecture, pages 111--122, 2002. Google ScholarDigital Library
- M. Ronsse and K. D. Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Trans. on Computer Systems, 17(2):133--152, 1999. Google ScholarDigital Library
- S. Savage et al. Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs. ACM Trans. on Computer Systems, 15(4):391--411, 1997. Google ScholarDigital Library
- D. Shasha and M. Snir. Efficient and Correct Execution of Parallel Programs that Share Memory. ACM Trans. on Prog. Lang. and Systems, 10(2):282--312, 1988. Google ScholarDigital Library
- D. J. Sorin, M. M. K. Martin, M. D. Hill, and D. A. Wood. SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery. In 29th Intl. Symp. on Computer Architecture, pages 123--134, 2002. Google ScholarDigital Library
- R. Stallman, R. Pesch, and S. Shebs. Debugging with GDB - The GNU Source-Level Debugger. Free Software Foundation, 2002.Google Scholar
- J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry. A Scalable Approach to Thread-Level Speculation. In 27th Intl. Symp. on Computer Architecture, pages 1--12, 2000. Google ScholarDigital Library
- J. Y. Tsai et al. The Superthreaded Processor Architecture. IEEE Trans. on Computers, 48(9):881--902, 1999. Google ScholarDigital Library
- S. C. Woo et al. The SPLASH-2 Programs: Characterization and Methodological Considerations. In 22nd Intl. Symp. on Computer Architecture, pages 24--38, 1995. Google ScholarDigital Library
Index Terms
- ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes
Recommendations
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes
ISCA '03: Proceedings of the 30th annual international symposium on Computer architectureWhile removing software bugs consumes vast amounts of human time, hardware support for debugging in modern computers remains rudimentary. Fortunately, we show that mechanisms for Thread-Level Speculation (TLS) can be reused to boost debugging ...
Comments