article

Free Access

RecPlay: a fully integrated practical record/replay system

Authors:
Michiel Ronsse

Univ. Ghent, Ghent, Belgium

Univ. Ghent, Ghent, Belgium
View Profile

,
Koen De Bosschere

Univ. Ghent, Ghent, Belgium

Univ. Ghent, Ghent, Belgium
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 17 Issue 2pp 133–152https://doi.org/10.1145/312203.312214

Published:01 May 1999Publication History

ACM Transactions on Computer Systems

Abstract

This article presents a practical solution for the cyclic debugging of nondeterministic parallel programs. The solution consists of a combination of record/replay with automatic on-the-fly data race detection. This combination enables us to limit the record phase to the more efficient recording of the synchronization operations, while deferring the time-consuming data race detection to the replay phase. As the record phase is highly efficient, there is no need to switch it off, hereby eliminating the possibility of Heisenbugs because tracing can be left on all the time. This article describes an implementation of the tools needed to support RecPlay.

References

ADVE, S. V., HILL, M. D., MILLER, B. P., AND NETZER, R. H. B. 1991. Detecting data races on weak memory systems. SIGARCH Comput. Arch. News 19, 3 (May 1991), 234-243. Google ScholarDigital Library
AUDENAERT, K. AND LEVROUW, L. 1994. Interrupt replay: A debugging method for parallel programs with interrupts. Microprocess. Microsyst. 18, 10, 601-612.Google ScholarCross Ref
AUDENAERT, K. AND LEVROUW, L. 1995. Space efficient data race detection for parallel programs with series-parallel task graphs. In Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (San Remo, CA., Jan.). IEEE Computer Society Press, Los Alamitos, CA, 508-515. Google ScholarDigital Library
BERANEK, A. 1992. Data race detection based on execution replay for parallel applications. In Proceedings of Conference on Parallel Processing (CONPAR '92, Lyon, France, Sept.). 109-114. Google ScholarDigital Library
CAVALHEIRO, G. AND DOREILLE, M. 1996. Athapascan: A C++ library for parallel programming. In Stratagem '96 (Sophia Antipolis, France, June). INRIA, Rennes, France.Google Scholar
CHOI, J.-D. AND MIN, S. L. 1991. Race frontier: Reproducing data races in parallel-program debugging. In Proceedings of the 3rd ACM Symposium on Principles and Practice of Parallel Programming (SIGPLAN '91, July). ACM, New York, NY, 145-154. Google ScholarDigital Library
CHOI, J.-D. AND SRINIVASAN, H. 1998. Deterministisc replay of java multithreaded applications. In Proceedings of the 2nd ACM Symposium on Parallel and Distributed Tools (SIGMETRICS '98, Welches, OR, Aug.). ACM, New York, NY, 48-59. Google ScholarDigital Library
DE BOSSCHERE, K. AND RONSSE, M. 1997. Clock snooping and its application in on-the-fly data race detection. In Proceedings of the 1997 IEEE International Symposium on Parallel Algorithms and Networks (I-SPAN '97, Taipei, Dec.). IEEE Computer Society Press, Los Alamitos, CA, 324-330. Google ScholarDigital Library
FIDGE, C. 1991. Logical time in distributed computing systems. IEEE Comput. 24, 8 (Aug. 1991), 28-33. Google ScholarDigital Library
HOLLOMAN, E. D. 1989. Design and implementation of a replay debugger for parallel programs on unix-based systems. Master's Thesis. Computer Science Department, NC State, Raleigh, NC.Google Scholar
LEBLANC, T. J. AND MELLOR-CRUMMEY, J. M. 1987. Debugging parallel programs with instant replay. IEEE Trans. Comput. C-36, 4 (Apr. 1987), 471-482. Google ScholarDigital Library
LEU, E., SCHIPER, A., AND ZRAMDINI, A. 1991. Efficient execution replay technique for distributed memory architectures. In Proceedings of the 2nd European Conference on Distributed Memory Computing (EDMCC2, Munich, Germany, Apr. 22-24, 1991), A. Bode, Ed. Lecture Notes in Computer Science, vol. 487. Springer-Verlag, New York, NY, 315-324. Google ScholarDigital Library
LEVROUW, L. J., AUDENAERT, K. M., AND VAN CAMPENHOUT, J. M. 1994a. Execution replay with compact logs for shared-memory programs. In Proceedings of Applications in Parallel and Distributed Computing, IFIP Transactions A-44: Computer Science and Technology. Elsevier North-Holland, Inc., Amsterdam, The Netherlands, 125-134. Google ScholarDigital Library
LEVROUW, L. J., AUDENAERT, K. M., AND VAN CAMPENHOUT, J. M. 1994b. A new trace and replay system for shared memory programs based on Lamport Clocks. In Proceedings of the 2nd Euromicro Workshop on Parallel and Distributed Processing (Jan.). IEEE Computer Society Press, Los Alamitos, CA, 471-478.Google ScholarCross Ref
Lu, H., KLEIN, P., AND NETZER, R. 1993. Detecting race conditions in parallel programs that use one semaphore. Tech. Rep. Brown University, Providence, RI. Google ScholarDigital Library
MATTERN, F. 1989. Virtual time and global states of distributed systems. In Proceedings of the International Workshop on Parallel and Distributed Algorithms (Gers, France, Oct. 3-6), M. Cosnard, Y. Robert, P. Quinton, and M. Raynal, Eds. North-Holland Publishing Co., Amsterdam, The Netherlands, 215-226.Google Scholar
MELLOR-CRUMMEY, g. 1991. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of the 1991 Conference on Supercomputing (Albuquerque, New Mexico, Nov. 18-22, 1991), J. L. Martin, Ed. ACM Press, New York, NY, 24-33. Google ScholarDigital Library
NETZER, R. H. B. 1993. Optimal tracing and replay for debugging shared-memory parallel programs. SIGPLAN Not. 28, 12 (Dec. 1993), 1-11. Google ScholarDigital Library
NETZER, R. H. B. AND MILLER, B. P. 1990. On the complexity of event ordering for shared-memory parallel program executions. In Proceedings of the International Conference on Parallel Processing (Aug.). 93-97.Google Scholar
NETZER, R. AND MILLER, B. 1991. Improving the accuracy of data race detection. In Proceedings of the 1991 Conference on Principles and Practice of Parallel Programming (Apr.). Google ScholarDigital Library
NETZER, R. H. B. AND MILLER, B. P. 1992. What are race conditions? Some issues and formalizations. ACM Lett. Program. Lang. Syst. 1, 1 (Mar. 1992), 74-88. Google ScholarDigital Library
PERKOVIC, D. AND KELEHER, P.g. 1996. Online data-race detection via coherency guarantees. ACM SIGOPS Oper. Syst. Rev. 30, Winter, 47-57. Google ScholarDigital Library
RAYNAL, M. AND SINGHAL, M. 1996. Logical clocks: Capturing causality in distributed systems. IEEE Computer, 49-56. Google ScholarDigital Library
RONSSE, M. AND ZWAENEPOEL, W. 1997. Execution replay for TreadMarks. In Proceedings of the 5th Euromicro Workshop on Parallel and Distributed Processing. 343-350.Google Scholar
RONSSE, M., LEVROUW, L., AND BASTIAENS, K. 1995. Efficient coding of execution-traces of parallel programs. In Proceedings of the ProRISC/IEEE Benelux Workshop on Circuits, Systems and Signal Processing (Mar.), J. P. Veen, Ed. 251-258.Google Scholar
RUSSINOVICH, M. AND COGSWELL, B. 1996. Replay for concurrent non-deterministic sharedmemory applications. SIGPLAN Not. 31, 5, 258-266. Google ScholarDigital Library
SARIN, S. K. AND LYNCH, N.A. 1987. Discarding obsolete information in a replicated database system. IEEE Trans. Softw. Eng. 13, 1 (Jan. 1987), 39-47. Google ScholarDigital Library
SAVAGE, S., BURROWS, M., NELSON, G., SOBALVARRO, P., AND ANDERSON, T. 1997. Eraser: a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst. 15, 4, 391-411. Google ScholarDigital Library
SCHONBERG, D. 1989. On-the-fly detection of access anomalies. SIGPLAN Not. 24, 7 (July 1989), 285-297. Google ScholarDigital Library
TAMCHES, A. AND MILLER, B. P. 1999. Fine-grained dynamic instrumentation of commodity operating system kernels. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (New Orleans, LA., Feb.). 117-130. Google ScholarDigital Library
Woo, S. C., OHARA, M., TORRIE, E., SINGH, J. P., AND GUPTA, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95, Santa Margherita Ligure, Italy, June 22-24), D. A. Patterson, Ed. ACM Press, New York, NY, 24-36. Google ScholarDigital Library
Wuu, G. AND BERNSTEIN, A. 1984. Efficient solutions to the replicated log and dictionary problems. In Proceedings of the 3rd ACM Symposium on Principles of Distributed Computing (New York, NY). ACM Press, New York, NY, 233-242. Google ScholarDigital Library

Index Terms

Recommendations

Eraser: a dynamic data race detector for multithreaded programs

Multithreaded programming is difficult and error prone. It is easy to make a mistake in synchronization that produces a data race, yet it can be extremely hard to locate this mistake during debugging. This article describes a new tool, called Eraser, ...
Read More
TRADE: Precise Dynamic Race Detection for Scalable Transactional Memory Systems

As other multithreaded programs, transactional memory (TM) programs are prone to race conditions. Previous work focuses on extending existing definitions of data race for lock-based applications to TM applications, which requires all transactions to be ...
Read More
PARSNIP: performant architecture for race safety with no impact on precision
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Data race detection is a useful dynamic analysis for multithreaded programs that is a key building block in record-and-replay, enforcing strong consistency models, and detecting concurrency bugs. Existing software race detectors are precise but slow, ...
Read More

Reviews

Reviewer: Armin B. Cremers

The cyclic debugging of parallel programs is addressed. The problem here is that, due to unsynchronized accesses to shared memory (race conditions), subsequent program executions with identical input are not guaranteed to produce the same behavior. Other sources of nondeterminism can be dealt with more easily during debugging. The authors present a combination of record/replay techniques intended to allow for the use of standard tools from sequential program debugging in the case of parallel programs. The idea is to limit the recording phase to the more efficient recording of the synchronization operations without tracing the shared data access operations, and to check for the occurrence of data races during replay. RecPlay notifies the user when the first data race is encountered and can be forced to continue running until all data races have been reported. (Of course, no correct replay can be guaranteed after the first data race has been detected.) The paper describes the phases in some detail, including the tools need to support RecPlay. There is an interesting performance evaluation for a number of benchmarks, and the paper can also be recommended for its thorough discussion of related work.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 17, Issue 2
May 1999
112 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/312203
Editor:
Kenneth P. Birman
Cornell Univ., Ithaca, NY
Issue’s Table of Contents
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 1999
Published in tocs Volume 17, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
binary code modification
multithreaded programming
race detection
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 253
  Total Citations
  View Citations
- 1,987
  Total Downloads
- Downloads (Last 12 months)126
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RecPlay: a fully integrated practical record/replay system

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Eraser: a dynamic data race detector for multithreaded programs

TRADE: Precise Dynamic Race Detection for Scalable Transactional Memory Systems

PARSNIP: performant architecture for race safety with no impact on precision

Reviews

Access critical reviews of Computing literature here