ABSTRACT
Parallel programs are difficult to debug because they run for a, long time and two executions may yield different results. Reverse execution, is a simple and powerful concept that solves both these problems. We are designing a tool for debugging parallel programs, called Recap, that provides the illusion of reverse execution using checkpoints and event recording and playback. During normal execution, Recap logs the results of system calls and shared memory reads: as well as the times that asynchronous events (signals) occur. Recap periodically checkpoints the state of a process by forking and suspending a new process. To reverse execute to a certain point in time, Recap continues the nearest checkpoint process forward in a self-contained environment, simulating all events using the log. We are implementing Recap as part of a larger environment for parallel program development.
- 1.A. Agarwal, R. L. Sites, and M. Horowitz, "ATUM: A New Technique for Capturing Address %ra.ces Using Microcode", Proceedings of the 13th Symposium on Computer Architecture, June 1986, pp. 119-127. Google ScholarDigital Library
- 2.T. A. Cargill and B. N. Locanthi, "Cheap Hardwa.re Support for Softwa.re Debugging and Profiling", Proceedings of th.e Second International Conference o77. Architectural ,5'~lppor~ for Programming Languages and Operating Systems, Palo Alto, California, in SIGPLAN Notices, Vol. 22, No. 10, October 1987, pp. 8:2-83. Google ScholarCross Ref
- 3.R. Curt, is and L. Wittie, "Bugnet: A Debugging System for Pa.rallel Programming Environments", Proceedings of the 3rd Interna.tional Conference on Distributed Computing Systems, Miami, Florida, October 1982, pp. 394-399.Google Scholar
- 4.S. i. Feldma~l and C. B. Brown, "Igor: A Systern for Program Debugging Via R.eversible Execution", Proceedings of the A CM Workshop on. Parallel and Distributed Debugging, Ma.y 1988. Google ScholarDigital Library
- 5.T. J. LeBla.nc and J. M. M ellor-Crummey, "Debugging Para.tlel Programs with Instant. Replay", IEEE Transactions on Com.puters, Vol. 36, No. 4, April 1987, pp. 471-482. Google ScholarDigital Library
- 6.M. A. Linton, "Distributed Management of a Software Database", IEEE Software, Vol. 4, No. 6, November 1987, pp 70-76.Google ScholarDigital Library
- 7.B. P. Miller and Jong-Deok Choi, "A Mechanism for Efficient Debugging of Parallel Programs", Technical l~eport~ TR754, University of Wisconsin-Madison, 1987.Google Scholar
- 8.M. Young, A. Tevanian, it. t~ashid, D. Golub, 21. Eppinger, J. Chew, W. Bolosky, D. Black, and it. Baron, "The Duality of Memory a.nd Conununication in the Implementation of a Multoiprocessor Operating System", Proceedlugs of the 11th A CM Symposium on Oper'aing Sys~.ems Principles, Austin, Texe~s, November 1987, pp. 63-76. Google ScholarDigital Library
Index Terms
- Supporting reverse execution for parallel programs
Recommendations
Supporting reverse execution for parallel programs
Special issue: Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on parallel and distributed debuggingParallel programs are difficult to debug because they run for a, long time and two executions may yield different results. Reverse execution, is a simple and powerful concept that solves both these problems. We are designing a tool for debugging ...
Globally precise-restartable execution of parallel programs
PLDI '14Emerging trends in computer design and use are likely to make exceptions, once rare, the norm, especially as the system size grows. Due to exceptions, arising from hardware faults, approximate computing, dynamic resource management, etc., successful and ...
Globally precise-restartable execution of parallel programs
PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and ImplementationEmerging trends in computer design and use are likely to make exceptions, once rare, the norm, especially as the system size grows. Due to exceptions, arising from hardware faults, approximate computing, dynamic resource management, etc., successful and ...
Comments