Abstract
Parallel programs are difficult to debug because they run for a, long time and two executions may yield different results. Reverse execution, is a simple and powerful concept that solves both these problems. We are designing a tool for debugging parallel programs, called Recap, that provides the illusion of reverse execution using checkpoints and event recording and playback. During normal execution, Recap logs the results of system calls and shared memory reads: as well as the times that asynchronous events (signals) occur. Recap periodically checkpoints the state of a process by forking and suspending a new process. To reverse execute to a certain point in time, Recap continues the nearest checkpoint process forward in a self-contained environment, simulating all events using the log. We are implementing Recap as part of a larger environment for parallel program development.
- 1 A. Agarwal, R. L. Sites, and M. Horowitz, "ATUM: A New Technique for Capturing Address %ra.ces Using Microcode", Proceedings of the 13th Symposium on Computer Architecture, June 1986, pp. 119-127. Google ScholarDigital Library
- 2 T. A. Cargill and B. N. Locanthi, "Cheap Hardwa.re Support for Softwa.re Debugging and Profiling", Proceedings of th.e Second International Conference o77. Architectural ,5'~lppor~ for Programming Languages and Operating Systems, Palo Alto, California, in SIGPLAN Notices, Vol. 22, No. 10, October 1987, pp. 8:2-83. Google ScholarCross Ref
- 3 R. Curt, is and L. Wittie, "Bugnet: A Debugging System for Pa.rallel Programming Environments", Proceedings of the 3rd Interna.tional Conference on Distributed Computing Systems, Miami, Florida, October 1982, pp. 394-399.Google Scholar
- 4 S. i. Feldma~l and C. B. Brown, "Igor: A Systern for Program Debugging Via R.eversible Execution", Proceedings of the A CM Workshop on. Parallel and Distributed Debugging, Ma.y 1988. Google ScholarDigital Library
- 5 T. J. LeBla.nc and J. M. M ellor-Crummey, "Debugging Para.tlel Programs with Instant. Replay", IEEE Transactions on Com.puters, Vol. 36, No. 4, April 1987, pp. 471-482. Google ScholarDigital Library
- 6 M. A. Linton, "Distributed Management of a Software Database", IEEE Software, Vol. 4, No. 6, November 1987, pp 70-76.Google ScholarDigital Library
- 7 B. P. Miller and Jong-Deok Choi, "A Mechanism for Efficient Debugging of Parallel Programs", Technical l~eport~ TR754, University of Wisconsin-Madison, 1987.Google Scholar
- 8 M. Young, A. Tevanian, it. t~ashid, D. Golub, 21. Eppinger, J. Chew, W. Bolosky, D. Black, and it. Baron, "The Duality of Memory a.nd Conununication in the Implementation of a Multoiprocessor Operating System", Proceedlugs of the 11th A CM Symposium on Oper'aing Sys~.ems Principles, Austin, Texe~s, November 1987, pp. 63-76. Google ScholarDigital Library
Index Terms
- Supporting reverse execution for parallel programs
Recommendations
Supporting reverse execution for parallel programs
PADD '88: Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debuggingParallel programs are difficult to debug because they run for a, long time and two executions may yield different results. Reverse execution, is a simple and powerful concept that solves both these problems. We are designing a tool for debugging ...
Machine Independent AND and OR Parallel Execution of Logic Programs: Part II-Compiled Execution
In pt.I, we presented a binding environment for the ANDand OR parallel execution of logic programs. This environment was instrumental inrendering a compiler for the AND and OR parallel execution of logic programs machineindependent. In this paper, we ...
Parallelized Direct Execution Simulation of Message-Passing Parallel Programs
As massively parallel computers proliferate, there is growing interest in finding ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing compilers, parallel ...
Comments