skip to main content
research-article

Kendo: efficient deterministic multithreading in software

Authors Info & Claims
Published:07 March 2009Publication History
Skip Abstract Section

Abstract

Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by hindering their ability to create parallel applications with repeatable results. As a consequence, parallel applications are significantly harder to debug, test, and maintain than sequential programs.

This paper introduces Kendo: a new software-only system that provides deterministic multithreading of parallel applications. Kendo enforces a deterministic interleaving of lock acquisitions and specially declared non-protected reads through a novel dynamically load-balanced deterministic scheduling algorithm. The algorithm tracks the progress of each thread using performance counters to construct a deterministic logical time that is used to compute an interleaving of shared data accesses that is both deterministic and provides good load balancing. Kendo can run on today's commodity hardware while incurring only a modest performance cost. Experimental results on the SPLASH-2 applications yield a geometric mean overhead of only 16% when running on 4 processors. This low overhead makes it possible to benefit from Kendo even after an application is deployed. Programmers can start using Kendo today to program parallel applications that are easier to develop, debug, and test.

References

  1. Claudio Basile, Keith Whisnant, Zbigniew Kalbarczyk, and Ravi Iyer. Loose synchronization of multithreaded replicas. pages 250--255, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Bocchino, V. Adve, S. Adve, and M. Snir. Parallel programming must be deterministic by default. Technical Report UIUCDCS-R-2008-3012, University of Illinois at Urbana-Champaign, 2008. URL http://dpj.cs.uiuc.edu/DPJ/Publications_files/paper.pdf.Google ScholarGoogle Scholar
  3. Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. Detecting data races in cilk programs that use locks. In proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'98), pages 298--309, Puerto Vallarta, Mexico, June 28-July 2, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Joe Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. Explicitly parallel programming with shared-memory is insane: At least make it deterministic! In proceedings of SHCMP 2008: Workshop on Software and Hardware Challenges of Manycore Platforms, Beijing, China, June 22, 2008.Google ScholarGoogle Scholar
  5. Rachna Dhamija and Adrian Perrig. Deja vu: a user study using images for authentication. In SSYM'00: Proceedings of the 9th conference on USENIX Security Symposium, pages 4--4, Berkeley, CA, USA, 2000. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jorg Domaschka, Franz J. Hauck, Hans P. Reiser, and Rutiger Kapitza. Deterministic multithreading for java-based replicated objects. In proceedings of the International Conference on Parallel and Distributed Computing and Systems, 2006.Google ScholarGoogle Scholar
  7. Jorg Domaschka, Andreas I. Schmied, Hans P. Reiser, and Franz J. Hauck. Revisiting deterministic multithreading strategies. International Parallel and Distributed Processing Symposium, pages 1--8, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. George W. Dunlap, Dominic G. Lucchetti, Michael A. Fetterman, and Peter M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 121--130, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-796-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In proceedings of the ACM SIGPLAN-98 Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Quebec, Canada, June 1998. Proceedings published ACMSIGPLAN Notices, Vol. 33, No. 5, May, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Derek R. Hower and Mark D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wolfgang Karl, Markus Leberecht, and Michael Oberhuber. Forcing deterministic execution of parallel programs -- debugging support through the smile monitoring approach. In proceedings of the SCI-Europe, September 1998.Google ScholarGoogle Scholar
  12. Milind Kulkarni, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala, and L. Paul Chew. Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News, 36(1):233--243, 2008. ISSN 0163-5964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, 1978. ISSN 0001-0782. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4):471--482, 1987. ISSN 0018-9340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Edward A. Lee. The problem with threads. Computer, 39(5):33--42, May 2006. ISSN 0018-9162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Shan Lu, Joe Tucek, Feng Qin, and Yuanyuan Zhou. Avio: Detecting atomicity violations via access-interleaving invariants. In proceedings of the International Conference on Architecture Support for Programming Languages and Operating Systems, October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shan Lu, Weihang Jiang, and Yuanyuan Zhou. A study of interleaving coverage criteria. In proceedings of ESEC-FSE, pages 533--536, New York, NY, USA, 2007a. ACM. ISBN 978-1-59593-811-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou. Muvi: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In proceedings of the 21st ACM Symposium on Operating Systems Principles, October 2007b. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ASPLOS XIII: proceedings of the 13th international conference on Architectural support for programming languages and operating systems, pages 329--339, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-958-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pablo Montesinos, Luis Ceze, and Josep Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hans P. Reiser, Jorg Domaschka, Franz J. Hauck, Rudiger Kapitza, and Wolfgang Schroder-Preikschat. Consistent replication of multithreaded distributed objects. 25th IEEE Symposium on Reliable Distributed Systems, pages 257--266, 2006. ISSN 1060-9857. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jonathan Rose. Locusroute: a parallel global router for standard cells. In DAC-88: Proceedings of the 25th ACM/IEEE conference on Design automation, pages 189--195, Los Alamitos, CA, USA, 1988. IEEE Computer Society Press. ISBN 0-8186-8864-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mark Russinovich and Bryce Cogswell. Replay for concurrent non-deterministic shared-memory applications. In proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, pages 258--266, New York, NY, USA, 1996. ACM. ISBN 0-89791-795-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Debashis Saha and Sourav K. Dutta. Specification of deterministic execution timing schema for parallel programs on a multiprocessor. Computer, Communication, Control and Power Engineering, pages 114--116 vol.1, 1993.Google ScholarGoogle Scholar
  25. Jaswinder Pal Singh, Anoop Gupta, and Marc Levoy. Parallel visualization algorithms: Performance and architectural implications. Computer, 27(7):45--55, 1994. ISSN 0018-9162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. William Thies, Michal Karczmarek, and Saman Amarasinghe. Streamit: A language for streaming applications. In International Conference on Compiler Construction, Grenoble, France, April 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. Triage: diagnosing production run failures at the user's site. In proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, pages 131--144, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-591-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Larry D. Wittie. Debugging distributed C programs by real time reply. SIGPLAN Not., 24(1):57--67, 1989. ISSN 0362-1340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In proceedings of 22nd Annual International Symposium on Computer Architecture News, pages 24--36, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Min Xu, Rastislav Bodik, and Mark D. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. SIGARCH Comput. Archit. News, 31(2):122--135, 2003. ISSN 0163-5964. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Kendo: efficient deterministic multithreading in software

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 44, Issue 3
          ASPLOS 2009
          March 2009
          346 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1508284
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS XIV: Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
            March 2009
            358 pages
            ISBN:9781605584065
            DOI:10.1145/1508244

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 March 2009

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader