Abstract
This article presents a practical solution for the cyclic debugging of nondeterministic parallel programs. The solution consists of a combination of record/replay with automatic on-the-fly data race detection. This combination enables us to limit the record phase to the more efficient recording of the synchronization operations, while deferring the time-consuming data race detection to the replay phase. As the record phase is highly efficient, there is no need to switch it off, hereby eliminating the possibility of Heisenbugs because tracing can be left on all the time. This article describes an implementation of the tools needed to support RecPlay.
- ADVE, S. V., HILL, M. D., MILLER, B. P., AND NETZER, R. H. B. 1991. Detecting data races on weak memory systems. SIGARCH Comput. Arch. News 19, 3 (May 1991), 234-243. Google ScholarDigital Library
- AUDENAERT, K. AND LEVROUW, L. 1994. Interrupt replay: A debugging method for parallel programs with interrupts. Microprocess. Microsyst. 18, 10, 601-612.Google ScholarCross Ref
- AUDENAERT, K. AND LEVROUW, L. 1995. Space efficient data race detection for parallel programs with series-parallel task graphs. In Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (San Remo, CA., Jan.). IEEE Computer Society Press, Los Alamitos, CA, 508-515. Google ScholarDigital Library
- BERANEK, A. 1992. Data race detection based on execution replay for parallel applications. In Proceedings of Conference on Parallel Processing (CONPAR '92, Lyon, France, Sept.). 109-114. Google ScholarDigital Library
- CAVALHEIRO, G. AND DOREILLE, M. 1996. Athapascan: A C++ library for parallel programming. In Stratagem '96 (Sophia Antipolis, France, June). INRIA, Rennes, France.Google Scholar
- CHOI, J.-D. AND MIN, S. L. 1991. Race frontier: Reproducing data races in parallel-program debugging. In Proceedings of the 3rd ACM Symposium on Principles and Practice of Parallel Programming (SIGPLAN '91, July). ACM, New York, NY, 145-154. Google ScholarDigital Library
- CHOI, J.-D. AND SRINIVASAN, H. 1998. Deterministisc replay of java multithreaded applications. In Proceedings of the 2nd ACM Symposium on Parallel and Distributed Tools (SIGMETRICS '98, Welches, OR, Aug.). ACM, New York, NY, 48-59. Google ScholarDigital Library
- DE BOSSCHERE, K. AND RONSSE, M. 1997. Clock snooping and its application in on-the-fly data race detection. In Proceedings of the 1997 IEEE International Symposium on Parallel Algorithms and Networks (I-SPAN '97, Taipei, Dec.). IEEE Computer Society Press, Los Alamitos, CA, 324-330. Google ScholarDigital Library
- FIDGE, C. 1991. Logical time in distributed computing systems. IEEE Comput. 24, 8 (Aug. 1991), 28-33. Google ScholarDigital Library
- HOLLOMAN, E. D. 1989. Design and implementation of a replay debugger for parallel programs on unix-based systems. Master's Thesis. Computer Science Department, NC State, Raleigh, NC.Google Scholar
- LEBLANC, T. J. AND MELLOR-CRUMMEY, J. M. 1987. Debugging parallel programs with instant replay. IEEE Trans. Comput. C-36, 4 (Apr. 1987), 471-482. Google ScholarDigital Library
- LEU, E., SCHIPER, A., AND ZRAMDINI, A. 1991. Efficient execution replay technique for distributed memory architectures. In Proceedings of the 2nd European Conference on Distributed Memory Computing (EDMCC2, Munich, Germany, Apr. 22-24, 1991), A. Bode, Ed. Lecture Notes in Computer Science, vol. 487. Springer-Verlag, New York, NY, 315-324. Google ScholarDigital Library
- LEVROUW, L. J., AUDENAERT, K. M., AND VAN CAMPENHOUT, J. M. 1994a. Execution replay with compact logs for shared-memory programs. In Proceedings of Applications in Parallel and Distributed Computing, IFIP Transactions A-44: Computer Science and Technology. Elsevier North-Holland, Inc., Amsterdam, The Netherlands, 125-134. Google ScholarDigital Library
- LEVROUW, L. J., AUDENAERT, K. M., AND VAN CAMPENHOUT, J. M. 1994b. A new trace and replay system for shared memory programs based on Lamport Clocks. In Proceedings of the 2nd Euromicro Workshop on Parallel and Distributed Processing (Jan.). IEEE Computer Society Press, Los Alamitos, CA, 471-478.Google ScholarCross Ref
- Lu, H., KLEIN, P., AND NETZER, R. 1993. Detecting race conditions in parallel programs that use one semaphore. Tech. Rep. Brown University, Providence, RI. Google ScholarDigital Library
- MATTERN, F. 1989. Virtual time and global states of distributed systems. In Proceedings of the International Workshop on Parallel and Distributed Algorithms (Gers, France, Oct. 3-6), M. Cosnard, Y. Robert, P. Quinton, and M. Raynal, Eds. North-Holland Publishing Co., Amsterdam, The Netherlands, 215-226.Google Scholar
- MELLOR-CRUMMEY, g. 1991. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of the 1991 Conference on Supercomputing (Albuquerque, New Mexico, Nov. 18-22, 1991), J. L. Martin, Ed. ACM Press, New York, NY, 24-33. Google ScholarDigital Library
- NETZER, R. H. B. 1993. Optimal tracing and replay for debugging shared-memory parallel programs. SIGPLAN Not. 28, 12 (Dec. 1993), 1-11. Google ScholarDigital Library
- NETZER, R. H. B. AND MILLER, B. P. 1990. On the complexity of event ordering for shared-memory parallel program executions. In Proceedings of the International Conference on Parallel Processing (Aug.). 93-97.Google Scholar
- NETZER, R. AND MILLER, B. 1991. Improving the accuracy of data race detection. In Proceedings of the 1991 Conference on Principles and Practice of Parallel Programming (Apr.). Google ScholarDigital Library
- NETZER, R. H. B. AND MILLER, B. P. 1992. What are race conditions? Some issues and formalizations. ACM Lett. Program. Lang. Syst. 1, 1 (Mar. 1992), 74-88. Google ScholarDigital Library
- PERKOVIC, D. AND KELEHER, P.g. 1996. Online data-race detection via coherency guarantees. ACM SIGOPS Oper. Syst. Rev. 30, Winter, 47-57. Google ScholarDigital Library
- RAYNAL, M. AND SINGHAL, M. 1996. Logical clocks: Capturing causality in distributed systems. IEEE Computer, 49-56. Google ScholarDigital Library
- RONSSE, M. AND ZWAENEPOEL, W. 1997. Execution replay for TreadMarks. In Proceedings of the 5th Euromicro Workshop on Parallel and Distributed Processing. 343-350.Google Scholar
- RONSSE, M., LEVROUW, L., AND BASTIAENS, K. 1995. Efficient coding of execution-traces of parallel programs. In Proceedings of the ProRISC/IEEE Benelux Workshop on Circuits, Systems and Signal Processing (Mar.), J. P. Veen, Ed. 251-258.Google Scholar
- RUSSINOVICH, M. AND COGSWELL, B. 1996. Replay for concurrent non-deterministic sharedmemory applications. SIGPLAN Not. 31, 5, 258-266. Google ScholarDigital Library
- SARIN, S. K. AND LYNCH, N.A. 1987. Discarding obsolete information in a replicated database system. IEEE Trans. Softw. Eng. 13, 1 (Jan. 1987), 39-47. Google ScholarDigital Library
- SAVAGE, S., BURROWS, M., NELSON, G., SOBALVARRO, P., AND ANDERSON, T. 1997. Eraser: a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst. 15, 4, 391-411. Google ScholarDigital Library
- SCHONBERG, D. 1989. On-the-fly detection of access anomalies. SIGPLAN Not. 24, 7 (July 1989), 285-297. Google ScholarDigital Library
- TAMCHES, A. AND MILLER, B. P. 1999. Fine-grained dynamic instrumentation of commodity operating system kernels. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (New Orleans, LA., Feb.). 117-130. Google ScholarDigital Library
- Woo, S. C., OHARA, M., TORRIE, E., SINGH, J. P., AND GUPTA, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95, Santa Margherita Ligure, Italy, June 22-24), D. A. Patterson, Ed. ACM Press, New York, NY, 24-36. Google ScholarDigital Library
- Wuu, G. AND BERNSTEIN, A. 1984. Efficient solutions to the replicated log and dictionary problems. In Proceedings of the 3rd ACM Symposium on Principles of Distributed Computing (New York, NY). ACM Press, New York, NY, 233-242. Google ScholarDigital Library
Index Terms
- RecPlay: a fully integrated practical record/replay system
Recommendations
Eraser: a dynamic data race detector for multithreaded programs
Multithreaded programming is difficult and error prone. It is easy to make a mistake in synchronization that produces a data race, yet it can be extremely hard to locate this mistake during debugging. This article describes a new tool, called Eraser, ...
TRADE: Precise Dynamic Race Detection for Scalable Transactional Memory Systems
As other multithreaded programs, transactional memory (TM) programs are prone to race conditions. Previous work focuses on extending existing definitions of data race for lock-based applications to TM applications, which requires all transactions to be ...
PARSNIP: performant architecture for race safety with no impact on precision
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on MicroarchitectureData race detection is a useful dynamic analysis for multithreaded programs that is a key building block in record-and-replay, enforcing strong consistency models, and detecting concurrency bugs. Existing software race detectors are precise but slow, ...
Comments