Abstract
Replay of shared-memory program execution is desirable in many domains including cyclic debugging, fault tolerance and performance monitoring. Past approaches to repeatable execution have focused on the problem of re-executing the shared-memory access patterns in parallel programs. With the proliferation of operating system supported threads and shared memory for uniprocessor programs, there is a clear need for efficient replay of concurrent applications. The solutions for parallel systems can be performance prohibitive when applied to the uniprocessor case. We present an algorithm, called the repeatable scheduling algorithm, combining scheduling and instruction counts to provide an invariant for efficient, language independent replay of concurrent shared-memory applications. The approach is shown to have trace overheads that are independent of the amount of sharing that takes place. An implementation for cyclic debugging on Mach 3.0 is evaluated and benchmarks show typical performance overheads of around 10%. The algorithm implemented is compared with optimal event-based tracing and shown to do better with respect to the number of events monitored or number of events logged, in most cases by several orders of magnitude.
- 1 A. Aho, B. Kernighan and P. Weinberger, "The AWK Programming Language," Addison-Wesley, Reading, MA, 1988. Google ScholarDigital Library
- 2 T. A. Cargill and B. N. Locanthi, "Cheap Hardware Support for Software Debugging and Profiling," in Proc. Syrup. on Architectural Support for Prog. Lang. and Operating Syst., Palo Alto, CA, Oct. 1987, pp. 82-83. Google ScholarCross Ref
- 3 R. H. Carver and K. C. Tai, "Reproducible Testing of Concurrent Programs Based on Shared Variables," in Proc. 6th Int. Conf. on Distributed Computing Systems, Boston, MA., May 1986, pp. 428-432.Google Scholar
- 4 H. Custer, "Inside Windows NT," Microsoft Press, Redmond, WA, 1993. Google ScholarDigital Library
- 5 P. Dodd and C. Ravishankar, "Monitoring and Debugging Distributed Real-Time Programs," Software Practice and Experience, Vol. 22(10), Oct. 1992, pp. 863- 877. Google ScholarDigital Library
- 6 M. Johnson, "Some Requirements for Architectural Support of Software Debugging," in Proc. of the Syrup. on Architectural Support for Prog. Lang. and Operating $yst., Palo Alto, CA, Mar. 1982, pp. 140-148. Google ScholarDigital Library
- 7 A. King, "Inside Windows 95," Microsoft Press, Redmond, WA, 1994. Google ScholarDigital Library
- 8 T. j. LeBlanc and J. M. Mellor-Crummey, "Debugging Parallel Programs with Instant Replay." IEEE Trans. on Computers, Apr. 1987, pp. 471-482. Google ScholarDigital Library
- 9 C. Lin and R. LeBlanc, "Event-Based Debugging of Object/Action Programs," in Proc. of the A CM SiG- PLAN/SIGOPS Workshop on Parallel and Distributed Debugging, 1988, pp. 23-34. Google ScholarDigital Library
- 10 C. E. McDowell and D. P. Helbold, "Debugging Concurrent Programs," A CM Computing Surveys, Dec. 1989, pp. 593-622. Google ScholarDigital Library
- 11 J. M. Mellor-Crummey and T. J. LeBlanc, "A Software Instruction Counter," in Proc. Symp. on Architectural Support for Prog. Lang. and Operating Syst., Palo Alto, CA, Apr. 1989, pp. 78-86. Google ScholarDigital Library
- 12 R. Netzer, "Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs," in Proc. A CM/ ONR Workshop on Parallel and Distributed Debugging, May 1993, pp. 1-11. Google ScholarDigital Library
- 13 R. Netzer and B. Miller, "On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions,'' in Proc. Int. Conf. on Parallel Processing, 1990, pp. 93-97.Google Scholar
- 14 D. Pan and M. Linton, "Supporting Reverse Execution of Parallel Programs," in Proc. SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, May 1988, pp. 124-129. Google ScholarDigital Library
- 15 M. L. Powell, et. al., "SunOS Mulfithreaded Architecture,'' Sun Microsystems White Paper, Sun Microsystems, Cupertino, CA, June 1995.Google Scholar
- 16 R. Rashid, et. al., "Mach: A Foundation for Open Systems,'' in Proc. 2nd Workshop on Workstations and Operating Syst., Sept. 1989, pp. 27-29.Google ScholarCross Ref
- 17 M. Russinovich and B. Cogswell, "Operating System Support for Replay of Concurrent Non-Deterministic Shared Memory Applications," in Bulletin of the Technical Committee on Operating Systems and Applications Environments (TCOS), IEEE Computer Society, Winter 1995, Vol. 7, No. 4, pp. 15-19. Google ScholarDigital Library
- 18 K. C. Tai, R. H. Carver, and E. E. Obaid, "Debugging Concurrent Ada Programs by Deterministic Execution,'' IEEE Trans. on Software Engineering, Jan. 1991, pp. 45-63. Google ScholarDigital Library
- 19 S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The SPLASH-2 Programs' Characterization and Methodological Considerations," in Proc. of the 22nd International Symposium on Computer Architecture, June 1995, pp. 24-36. Google ScholarDigital Library
Index Terms
- Replay for concurrent non-deterministic shared-memory applications
Recommendations
Replay for concurrent non-deterministic shared-memory applications
PLDI '96: Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementationReplay of shared-memory program execution is desirable in many domains including cyclic debugging, fault tolerance and performance monitoring. Past approaches to repeatable execution have focused on the problem of re-executing the shared-memory access ...
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate ...
Comments