skip to main content
10.1145/2384616.2384639acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Work-stealing without the baggage

Published:19 October 2012Publication History

ABSTRACT

Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle hardware busy while relieving overloaded hardware of its burden. Prior work has demonstrated that work-stealing is very effective in practice. However, work-stealing comes with a substantial overhead: as much as 2x to 12x slowdown over orthodox sequential code.

In this paper we identify the key sources of overhead in work-stealing schedulers and present two significant refinements to their implementation. We evaluate our work-stealing designs using a range of benchmarks, four different work-stealing implementations, including the popular fork-join framework, and a range of architectures. On these benchmarks, compared to orthodox sequential Java, our fastest design has an overhead of just 15%. By contrast, fork-join has a 2.3x overhead and the previous implementation of the system we use has an overhead of 4.1x. These results and our insight into the sources of overhead for work-stealing implementations give further hope to an already promising technique for exploiting increasingly available hardware parallelism.

References

  1. Intel® Cilk#8482; Plus sdk. URL http://software.intel.com/en-us/articles/intel-cilk-plus.Google ScholarGoogle Scholar
  2. V. Cavé, J. Zhao, J. Shirako, and V. Sarkar. Habanero-Java: the new adventures of old X10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ '11, pages 51--61, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0935--6. 10.1145/2093157.2093165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '05, pages 519--538, New York, NY, USA, 2005. ACM. ISBN 1--59593-031-0. 10.1145/1094811.1094852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In Proceedings of the 2008 37th International Conference on Parallel Processing, ICPP '08, pages 536--545, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 978-0--7695--3374--2. 10.1109/ICPP.2008.88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Duran, J. Corbalán, and E. Ayguadé. Evaluation of OpenMP task scheduling strategies. In Proceedings of the 4th international conference on OpenMP in a new era of parallelism, IWOMP'08, pages 100--110, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 3--540--79560-X, 978--3--540--79560--5. URL http://dl.acm.org/citation.cfm?id=1789826.1789838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Duran, X. Teruel, R. Ferrer, X. Martorell, and E. Ayguade. Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP. In Proceedings of the 2009 International Conference on Parallel Processing, ICPP '09, pages 124--131, Washington, DC, USA, 2009. IEEE Computer Society. ISBN 978-0--7695--3802-0. 10.1109/ICPP.2009.64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. J. Fink and F. Qian. Design, implementation and evaluation of adaptive recompilation with on-stack replacement. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO '03, pages 241--252, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--1913-X. URL http://dl.acm.org/citation.cfm?id=776261.776288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, PLDI '98, pages 212--223, New York, NY, USA, 1998. ACM. ISBN 0--89791--987--4. 10.1145/277650.277725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Frigo, Prokop, Frigo, Leiserson, Prokop, Ramachandran, Dailey, Leiserson, Lyubashevskiy, Kushman, et al.}frigo1998cilkM. Frigo, H. Prokop, M. Frigo, C. Leiserson, H. Prokop, S. Ramachandran, D. Dailey, C. Leiserson, I. Lyubashevskiy, N. Kushman, et al. The Cilk project. Algorithms, 1998.Google ScholarGoogle Scholar
  10. U. Hölzle, C. Chambers, and D. Ungar. Debugging optimized code with dynamic deoptimization. In Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation, PLDI '92, pages 32--43, New York, NY, USA, 1992. ACM. ISBN 0--89791--475--9. 10.1145/143103.143114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Kambadur, A. Gupta, A. Ghoting, H. Avron, and A. Lumsdaine. PFunc: modern task parallelism for modern high performance computing. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 43:1--43:11, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--744--8. 10.1145/1654059.1654103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Lea. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA '00, pages 36--43, New York, NY, USA, 2000. ACM. ISBN 1--58113--288--3. 10.1145/337449.337465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA '09, pages 227--242, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--766-0. 10.1145/1640089.1640106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H.-W. Loidl and K. Hammond. On the granularity of divide-and-conquer parallelism. In Proceedings of the 1995 International Conference on Functional Programming, FP'95, pages 135--144, Swinton, UK, UK, 1995. British Computer Society. URL http://dl.acm.org/citation.cfm?id=2227330.2227343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Mellor-Crummey. Cilk+, parallel performance, and the cilk runtime system. URL http://www.clear.rice.edu/comp422/lecture-notes/comp422--2012-Lecture5-Cilk.pdf.Google ScholarGoogle Scholar
  16. MIT. The Cilk project. URL http://supertech.csail.mit.edu/cilk/index.html.Google ScholarGoogle Scholar
  17. E. Mohr, D. Kranz, Halstead, and J. R.H. Lazy task creation: A technique for increasing the granularity of parallel programs. Technical report, Cambridge, MA, USA, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Reinders. phIntel threading building blocks: outfitting C+ for multi-core processor parallelism. O'Reilly Media, Inc., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Tardieu, H. Wang, and H. Lin. A work-stealing scheduler for X10's task parallelism with suspension. In Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 267--276, New York, NY, USA, 2012. ACM. ISBN 978--1--4503--1160--1. 10.1145/2145816.2145850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Wang, H. Cui, Y. Duan, F. Lu, X. Feng, and P.-C. Yew. An adaptive task creation strategy for work-stealing scheduling. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, CGO '10, pages 266--277, New York, NY, USA, 2010. ACM. ISBN 978--1--60558--635--9. 10.1145/1772954.1772992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why nothing matters: the impact of zeroing. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, OOPSLA '11, pages 307--324, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0940-0. 10.1145/2048066.2048092. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Work-stealing without the baggage

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
          October 2012
          1052 pages
          ISBN:9781450315616
          DOI:10.1145/2384616
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 47, Issue 10
            OOPSLA '12
            October 2012
            1011 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2398857
            Issue’s Table of Contents

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 October 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate268of1,244submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader