skip to main content
article
Free Access

RecPlay: a fully integrated practical record/replay system

Published:01 May 1999Publication History
Skip Abstract Section

Abstract

This article presents a practical solution for the cyclic debugging of nondeterministic parallel programs. The solution consists of a combination of record/replay with automatic on-the-fly data race detection. This combination enables us to limit the record phase to the more efficient recording of the synchronization operations, while deferring the time-consuming data race detection to the replay phase. As the record phase is highly efficient, there is no need to switch it off, hereby eliminating the possibility of Heisenbugs because tracing can be left on all the time. This article describes an implementation of the tools needed to support RecPlay.

References

  1. ADVE, S. V., HILL, M. D., MILLER, B. P., AND NETZER, R. H. B. 1991. Detecting data races on weak memory systems. SIGARCH Comput. Arch. News 19, 3 (May 1991), 234-243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AUDENAERT, K. AND LEVROUW, L. 1994. Interrupt replay: A debugging method for parallel programs with interrupts. Microprocess. Microsyst. 18, 10, 601-612.Google ScholarGoogle ScholarCross RefCross Ref
  3. AUDENAERT, K. AND LEVROUW, L. 1995. Space efficient data race detection for parallel programs with series-parallel task graphs. In Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (San Remo, CA., Jan.). IEEE Computer Society Press, Los Alamitos, CA, 508-515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BERANEK, A. 1992. Data race detection based on execution replay for parallel applications. In Proceedings of Conference on Parallel Processing (CONPAR '92, Lyon, France, Sept.). 109-114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CAVALHEIRO, G. AND DOREILLE, M. 1996. Athapascan: A C++ library for parallel programming. In Stratagem '96 (Sophia Antipolis, France, June). INRIA, Rennes, France.Google ScholarGoogle Scholar
  6. CHOI, J.-D. AND MIN, S. L. 1991. Race frontier: Reproducing data races in parallel-program debugging. In Proceedings of the 3rd ACM Symposium on Principles and Practice of Parallel Programming (SIGPLAN '91, July). ACM, New York, NY, 145-154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CHOI, J.-D. AND SRINIVASAN, H. 1998. Deterministisc replay of java multithreaded applications. In Proceedings of the 2nd ACM Symposium on Parallel and Distributed Tools (SIGMETRICS '98, Welches, OR, Aug.). ACM, New York, NY, 48-59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. DE BOSSCHERE, K. AND RONSSE, M. 1997. Clock snooping and its application in on-the-fly data race detection. In Proceedings of the 1997 IEEE International Symposium on Parallel Algorithms and Networks (I-SPAN '97, Taipei, Dec.). IEEE Computer Society Press, Los Alamitos, CA, 324-330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. FIDGE, C. 1991. Logical time in distributed computing systems. IEEE Comput. 24, 8 (Aug. 1991), 28-33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. HOLLOMAN, E. D. 1989. Design and implementation of a replay debugger for parallel programs on unix-based systems. Master's Thesis. Computer Science Department, NC State, Raleigh, NC.Google ScholarGoogle Scholar
  11. LEBLANC, T. J. AND MELLOR-CRUMMEY, J. M. 1987. Debugging parallel programs with instant replay. IEEE Trans. Comput. C-36, 4 (Apr. 1987), 471-482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. LEU, E., SCHIPER, A., AND ZRAMDINI, A. 1991. Efficient execution replay technique for distributed memory architectures. In Proceedings of the 2nd European Conference on Distributed Memory Computing (EDMCC2, Munich, Germany, Apr. 22-24, 1991), A. Bode, Ed. Lecture Notes in Computer Science, vol. 487. Springer-Verlag, New York, NY, 315-324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. LEVROUW, L. J., AUDENAERT, K. M., AND VAN CAMPENHOUT, J. M. 1994a. Execution replay with compact logs for shared-memory programs. In Proceedings of Applications in Parallel and Distributed Computing, IFIP Transactions A-44: Computer Science and Technology. Elsevier North-Holland, Inc., Amsterdam, The Netherlands, 125-134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. LEVROUW, L. J., AUDENAERT, K. M., AND VAN CAMPENHOUT, J. M. 1994b. A new trace and replay system for shared memory programs based on Lamport Clocks. In Proceedings of the 2nd Euromicro Workshop on Parallel and Distributed Processing (Jan.). IEEE Computer Society Press, Los Alamitos, CA, 471-478.Google ScholarGoogle ScholarCross RefCross Ref
  15. Lu, H., KLEIN, P., AND NETZER, R. 1993. Detecting race conditions in parallel programs that use one semaphore. Tech. Rep. Brown University, Providence, RI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. MATTERN, F. 1989. Virtual time and global states of distributed systems. In Proceedings of the International Workshop on Parallel and Distributed Algorithms (Gers, France, Oct. 3-6), M. Cosnard, Y. Robert, P. Quinton, and M. Raynal, Eds. North-Holland Publishing Co., Amsterdam, The Netherlands, 215-226.Google ScholarGoogle Scholar
  17. MELLOR-CRUMMEY, g. 1991. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of the 1991 Conference on Supercomputing (Albuquerque, New Mexico, Nov. 18-22, 1991), J. L. Martin, Ed. ACM Press, New York, NY, 24-33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. NETZER, R. H. B. 1993. Optimal tracing and replay for debugging shared-memory parallel programs. SIGPLAN Not. 28, 12 (Dec. 1993), 1-11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. NETZER, R. H. B. AND MILLER, B. P. 1990. On the complexity of event ordering for shared-memory parallel program executions. In Proceedings of the International Conference on Parallel Processing (Aug.). 93-97.Google ScholarGoogle Scholar
  20. NETZER, R. AND MILLER, B. 1991. Improving the accuracy of data race detection. In Proceedings of the 1991 Conference on Principles and Practice of Parallel Programming (Apr.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. NETZER, R. H. B. AND MILLER, B. P. 1992. What are race conditions? Some issues and formalizations. ACM Lett. Program. Lang. Syst. 1, 1 (Mar. 1992), 74-88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. PERKOVIC, D. AND KELEHER, P.g. 1996. Online data-race detection via coherency guarantees. ACM SIGOPS Oper. Syst. Rev. 30, Winter, 47-57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. RAYNAL, M. AND SINGHAL, M. 1996. Logical clocks: Capturing causality in distributed systems. IEEE Computer, 49-56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. RONSSE, M. AND ZWAENEPOEL, W. 1997. Execution replay for TreadMarks. In Proceedings of the 5th Euromicro Workshop on Parallel and Distributed Processing. 343-350.Google ScholarGoogle Scholar
  25. RONSSE, M., LEVROUW, L., AND BASTIAENS, K. 1995. Efficient coding of execution-traces of parallel programs. In Proceedings of the ProRISC/IEEE Benelux Workshop on Circuits, Systems and Signal Processing (Mar.), J. P. Veen, Ed. 251-258.Google ScholarGoogle Scholar
  26. RUSSINOVICH, M. AND COGSWELL, B. 1996. Replay for concurrent non-deterministic sharedmemory applications. SIGPLAN Not. 31, 5, 258-266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. SARIN, S. K. AND LYNCH, N.A. 1987. Discarding obsolete information in a replicated database system. IEEE Trans. Softw. Eng. 13, 1 (Jan. 1987), 39-47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. SAVAGE, S., BURROWS, M., NELSON, G., SOBALVARRO, P., AND ANDERSON, T. 1997. Eraser: a dynamic data race detector for multithreaded programs. ACM Trans. Comput. Syst. 15, 4, 391-411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. SCHONBERG, D. 1989. On-the-fly detection of access anomalies. SIGPLAN Not. 24, 7 (July 1989), 285-297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. TAMCHES, A. AND MILLER, B. P. 1999. Fine-grained dynamic instrumentation of commodity operating system kernels. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (New Orleans, LA., Feb.). 117-130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Woo, S. C., OHARA, M., TORRIE, E., SINGH, J. P., AND GUPTA, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA '95, Santa Margherita Ligure, Italy, June 22-24), D. A. Patterson, Ed. ACM Press, New York, NY, 24-36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wuu, G. AND BERNSTEIN, A. 1984. Efficient solutions to the replicated log and dictionary problems. In Proceedings of the 3rd ACM Symposium on Principles of Distributed Computing (New York, NY). ACM Press, New York, NY, 233-242. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RecPlay: a fully integrated practical record/replay system

            Recommendations

            Reviews

            Armin B. Cremers

            The cyclic debugging of parallel programs is addressed. The problem here is that, due to unsynchronized accesses to shared memory (race conditions), subsequent program executions with identical input are not guaranteed to produce the same behavior. Other sources of nondeterminism can be dealt with more easily during debugging. The authors present a combination of record/replay techniques intended to allow for the use of standard tools from sequential program debugging in the case of parallel programs. The idea is to limit the recording phase to the more efficient recording of the synchronization operations without tracing the shared data access operations, and to check for the occurrence of data races during replay. RecPlay notifies the user when the first data race is encountered and can be forced to continue running until all data races have been reported. (Of course, no correct replay can be guaranteed after the first data race has been detected.) The paper describes the phases in some detail, including the tools need to support RecPlay. There is an interesting performance evaluation for a number of benchmarks, and the paper can also be recommended for its thorough discussion of related work.

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Computer Systems
              ACM Transactions on Computer Systems  Volume 17, Issue 2
              May 1999
              112 pages
              ISSN:0734-2071
              EISSN:1557-7333
              DOI:10.1145/312203
              Issue’s Table of Contents

              Copyright © 1999 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 May 1999
              Published in tocs Volume 17, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader