Abstract
Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by hindering their ability to create parallel applications with repeatable results. As a consequence, parallel applications are significantly harder to debug, test, and maintain than sequential programs.
This paper introduces Kendo: a new software-only system that provides deterministic multithreading of parallel applications. Kendo enforces a deterministic interleaving of lock acquisitions and specially declared non-protected reads through a novel dynamically load-balanced deterministic scheduling algorithm. The algorithm tracks the progress of each thread using performance counters to construct a deterministic logical time that is used to compute an interleaving of shared data accesses that is both deterministic and provides good load balancing. Kendo can run on today's commodity hardware while incurring only a modest performance cost. Experimental results on the SPLASH-2 applications yield a geometric mean overhead of only 16% when running on 4 processors. This low overhead makes it possible to benefit from Kendo even after an application is deployed. Programmers can start using Kendo today to program parallel applications that are easier to develop, debug, and test.
- Claudio Basile, Keith Whisnant, Zbigniew Kalbarczyk, and Ravi Iyer. Loose synchronization of multithreaded replicas. pages 250--255, 2002. Google ScholarDigital Library
- R. Bocchino, V. Adve, S. Adve, and M. Snir. Parallel programming must be deterministic by default. Technical Report UIUCDCS-R-2008-3012, University of Illinois at Urbana-Champaign, 2008. URL http://dpj.cs.uiuc.edu/DPJ/Publications_files/paper.pdf.Google Scholar
- Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. Detecting data races in cilk programs that use locks. In proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'98), pages 298--309, Puerto Vallarta, Mexico, June 28-July 2, 1998. Google ScholarDigital Library
- Joe Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. Explicitly parallel programming with shared-memory is insane: At least make it deterministic! In proceedings of SHCMP 2008: Workshop on Software and Hardware Challenges of Manycore Platforms, Beijing, China, June 22, 2008.Google Scholar
- Rachna Dhamija and Adrian Perrig. Deja vu: a user study using images for authentication. In SSYM'00: Proceedings of the 9th conference on USENIX Security Symposium, pages 4--4, Berkeley, CA, USA, 2000. USENIX Association. Google ScholarDigital Library
- Jorg Domaschka, Franz J. Hauck, Hans P. Reiser, and Rutiger Kapitza. Deterministic multithreading for java-based replicated objects. In proceedings of the International Conference on Parallel and Distributed Computing and Systems, 2006.Google Scholar
- Jorg Domaschka, Andreas I. Schmied, Hans P. Reiser, and Franz J. Hauck. Revisiting deterministic multithreading strategies. International Parallel and Distributed Processing Symposium, pages 1--8, 2007.Google ScholarCross Ref
- George W. Dunlap, Dominic G. Lucchetti, Michael A. Fetterman, and Peter M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 121--130, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-796-4. Google ScholarDigital Library
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In proceedings of the ACM SIGPLAN-98 Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Quebec, Canada, June 1998. Proceedings published ACMSIGPLAN Notices, Vol. 33, No. 5, May, 1998. Google ScholarDigital Library
- Derek R. Hower and Mark D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008. Google ScholarDigital Library
- Wolfgang Karl, Markus Leberecht, and Michael Oberhuber. Forcing deterministic execution of parallel programs -- debugging support through the smile monitoring approach. In proceedings of the SCI-Europe, September 1998.Google Scholar
- Milind Kulkarni, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala, and L. Paul Chew. Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News, 36(1):233--243, 2008. ISSN 0163-5964. Google ScholarDigital Library
- Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, 1978. ISSN 0001-0782. Google ScholarDigital Library
- T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4):471--482, 1987. ISSN 0018-9340. Google ScholarDigital Library
- Edward A. Lee. The problem with threads. Computer, 39(5):33--42, May 2006. ISSN 0018-9162. Google ScholarDigital Library
- Shan Lu, Joe Tucek, Feng Qin, and Yuanyuan Zhou. Avio: Detecting atomicity violations via access-interleaving invariants. In proceedings of the International Conference on Architecture Support for Programming Languages and Operating Systems, October 2006. Google ScholarDigital Library
- Shan Lu, Weihang Jiang, and Yuanyuan Zhou. A study of interleaving coverage criteria. In proceedings of ESEC-FSE, pages 533--536, New York, NY, USA, 2007a. ACM. ISBN 978-1-59593-811-4. Google ScholarDigital Library
- Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou. Muvi: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In proceedings of the 21st ACM Symposium on Operating Systems Principles, October 2007b. Google ScholarDigital Library
- Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ASPLOS XIII: proceedings of the 13th international conference on Architectural support for programming languages and operating systems, pages 329--339, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-958-6. Google ScholarDigital Library
- Pablo Montesinos, Luis Ceze, and Josep Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008. Google ScholarDigital Library
- Hans P. Reiser, Jorg Domaschka, Franz J. Hauck, Rudiger Kapitza, and Wolfgang Schroder-Preikschat. Consistent replication of multithreaded distributed objects. 25th IEEE Symposium on Reliable Distributed Systems, pages 257--266, 2006. ISSN 1060-9857. Google ScholarDigital Library
- Jonathan Rose. Locusroute: a parallel global router for standard cells. In DAC-88: Proceedings of the 25th ACM/IEEE conference on Design automation, pages 189--195, Los Alamitos, CA, USA, 1988. IEEE Computer Society Press. ISBN 0-8186-8864-5. Google ScholarDigital Library
- Mark Russinovich and Bryce Cogswell. Replay for concurrent non-deterministic shared-memory applications. In proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, pages 258--266, New York, NY, USA, 1996. ACM. ISBN 0-89791-795-2. Google ScholarDigital Library
- Debashis Saha and Sourav K. Dutta. Specification of deterministic execution timing schema for parallel programs on a multiprocessor. Computer, Communication, Control and Power Engineering, pages 114--116 vol.1, 1993.Google Scholar
- Jaswinder Pal Singh, Anoop Gupta, and Marc Levoy. Parallel visualization algorithms: Performance and architectural implications. Computer, 27(7):45--55, 1994. ISSN 0018-9162. Google ScholarDigital Library
- William Thies, Michal Karczmarek, and Saman Amarasinghe. Streamit: A language for streaming applications. In International Conference on Compiler Construction, Grenoble, France, April 2002. Google ScholarDigital Library
- Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. Triage: diagnosing production run failures at the user's site. In proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, pages 131--144, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-591-5. Google ScholarDigital Library
- Larry D. Wittie. Debugging distributed C programs by real time reply. SIGPLAN Not., 24(1):57--67, 1989. ISSN 0362-1340. Google ScholarDigital Library
- S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In proceedings of 22nd Annual International Symposium on Computer Architecture News, pages 24--36, June 1995. Google ScholarDigital Library
- Min Xu, Rastislav Bodik, and Mark D. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. SIGARCH Comput. Archit. News, 31(2):122--135, 2003. ISSN 0163-5964. Google ScholarDigital Library
Index Terms
- Kendo: efficient deterministic multithreading in software
Recommendations
Kendo: efficient deterministic multithreading in software
ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systemsAlthough chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by ...
Kendo: efficient deterministic multithreading in software
ASPLOS 2009Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by ...
CoreDet: a compiler and runtime system for deterministic multithreaded execution
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsThe behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in the execution of multithreaded programs. This severely complicates many ...
Comments