skip to main content
research-article
Free Access

User-Assisted Store Recycling for Dynamic Task Graph Schedulers

Published:28 December 2016Publication History
Skip Abstract Section

Abstract

The emergence of the multi-core era has led to increased interest in designing effective yet practical parallel programming models. Models based on task graphs that operate on single-assignment data are attractive in several ways. Notably, they can support dynamic applications and precisely represent the available concurrency. However, for efficient execution, they also require nuanced algorithms for scheduling and memory management. In this article, we consider memory-efficient dynamic scheduling of task graphs. Specifically, we present a novel approach for dynamically recycling the memory locations assigned to data items as they are produced by tasks. We develop algorithms to identify memory-efficient store recycling functions by systematically evaluating the validity of a set of user-provided or automatically generated alternatives. Because recycling functions can be input data-dependent, we have also developed support for continued correct execution of a task graph in the presence of a potentially incorrect store recycling function.

Experimental evaluation demonstrates that this approach to automatic store recycling incurs little to no overheads, achieves memory usage comparable to the best manually derived solutions, often produces recycling functions valid across problem sizes and input parameters, and efficiently recovers from an incorrect choice of store recycling functions.

References

  1. Samah Abu-Mahmeed, Cheryl McCosh, Zoran Budimlić, Ken Kennedy, Kaushik Ravindran, Kevin Hogan, Paul Austin, Steve Rogers, and Jacob Kornerup. 2009. Scheduling tasks to maximize usage of aggregate variables in place. In CC. 204--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kunal Agrawal, Charles E. Leiserson, and Jim Sukha. 2010. Executing task graphs using work-stealing. In Proceedings of the IEEE 24th International Symposium on Parallel 8 Distributed Processing (IPDPS). 1--12. Google ScholarGoogle ScholarCross RefCross Ref
  3. Arvind. 1981. Data flow languages and architecture. In Proceedings of the 8th Annual Symposium on Computer Architecture (ISCA’ 81). 1.Google ScholarGoogle Scholar
  4. Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2 (2011), 187--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. François Broquedis, Thierry Gautier, and Vincent Danjean. 2012. LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms. In Proceedings of the 8th International Conference on OpenMP in a Heterogeneous World (IWOMP’12). Springer-Verlag, Berlin, 102--115. DOI:http://dx.doi.org/10.1007/978-3-642-30961-8_8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zoran Budimlić, Michael Burke, Vincent Cavé, Kathleen Knobe, Geoff Lowney, and others. 2010. Concurrent collections. Scientific Programming 18, 3 (2010), 203--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zoran Budimlic, Aparna M. Chandramowlishwaran, Kathleen Knobe, Geoff N. Lowney, Vivek Sarkar, and Leo Treggiari. 2008. Declarative aspects of memory management in the concurrent collections parallel programming model. In Proceedings of the Workshop on Declarative Aspects of Multicore Programming. 47--58. DOI:http://dx.doi.org/10.1145/1481839.1481846 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Bui, Kwang-Ting Cheng, J. Cong, L. Vese, Yi-Chu Wang, Bo Yuan, and Yi Zou. 2012. Platform characterization for domain-specific computing. In ASP-DAC. 94--99.Google ScholarGoogle Scholar
  9. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’09). IEEE, 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jack B. Dennis. 1974. First version of a data flow procedure language. In Programming Symposium. Springer, 362--376. Google ScholarGoogle ScholarCross RefCross Ref
  11. Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. OmpSs: A Proposal for Programming Heterogeneous Multi-core Architectures. Parallel Processing Letters 21, 2 (2011), 173--193. Google ScholarGoogle ScholarCross RefCross Ref
  12. Thierry Gautier, Joao V. F. Lima, Nicolas Maillard, and Bruno Raffin. 2013. XKaapi: A runtime system for data-flow task programming on heterogeneous architectures. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel 8 Distributed Processing (IPDPS). IEEE, 1299--1308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Léonard Gérard, Adrien Guatto, Cédric Pasteur, and Marc Pouzet. 2012. A modular memory optimization for synchronous data-flow languages: Application to arrays in a lustre compiler. SIGPLAN Not. 47, 5 (June 2012), 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Paul Hudak and Adrienne G. Bloss. 1985. The aggregate update problem in functional programming systems. In POPL. 300--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Prabhanjan Kambadur, Anshul Gupta, Torsten Hoefler, and Andrew Lumsdaine. 2009. Demand-driven execution of static directed acyclic graphs using task parallelism. In Proceedings of the 2009 International Conference on High Performance Computing (HiPC). IEEE, 284--293. Google ScholarGoogle ScholarCross RefCross Ref
  16. Mehmet Can Kurt, Sriram Krishnamoorthy, Kunal Agrawal, and Gagan Agrawal. 2014. Fault-tolerant dynamic task graph scheduling. In SC. 719--730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yu-Kwong Kwok and Ishfaq Ahmad. 1999. Static scheduling algorithms for allocating directed task graphs to multiprocessors. Comput. Surveys 31, 4 (Dec. 1999), 406--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Malcolm Yoke Hean Low, Weiguo Liu, and Bertil Schmidt. 2007. A parallel BSP algorithm for irregular dynamic programming. In APPT. 151--160.Google ScholarGoogle Scholar
  19. Nicholas D. Matsakis. 2012. Parallel closures: A new twist on an old idea. In HotPar. 5--5.Google ScholarGoogle Scholar
  20. Walid A. Najjar, Edward A. Lee, and Guang R. Gao. 1999. Advances in the dataflow computational model. Parallel Comput. 25, 1314 (1999), 1907--1929. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Antoniu Pop and Albert Cohen. 2013. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Transactions on Architecture and Code Optimization (TACO) 9, 4 (2013), 53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Polyvios Pratikakis, Hans Vandierendonck, Spyros Lyberis, and Dimitrios S. Nikolopoulos. 2011. A programming model for deterministic task parallelism. In Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness. ACM, 7--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Veselin Raychev, Martin Vechev, and Manu Sridharan. 2013. Effective race detection for event-driven programs. In OOPSLA. 151--166. DOI:http://dx.doi.org/10.1145/2509136.2509538 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Dragos Sbirlea, Zoran Budimlic, and Vivek Sarkar. 2014. Bounded memory scheduling of dynamic task graphs. In PACT. 343--356. DOI:http://dx.doi.org/10.1145/2628071.2628090 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Dragos Sbirlea, Kathleen Knobe, and Vivek Sarkar. 2012. Folding of tagged single assignment values for memory-efficient parallelism. In Euro-Par. 601--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Peter Schnorf, Mahadevan Ganapathi, and John L. Hennessy. 1993. Compile-time copy elimination. Software: Practice and Experience 23, 11 (1993), 1175--1200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sagnak Tasirlar and Vivek Sarkar. 2011. Data-driven tasks and their implementation. In ICPP. 652--661. DOI:http://dx.doi.org/10.1109/ICPP.2011.87 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Asim YarKhan, Jakub Kurzak, and Jack Dongarra. 2011. Quark users guide: Queueing and runtime for kernels. University of Tennessee Innovative Computing Laboratory Technical Report ICL-UT-11-02.Google ScholarGoogle Scholar

Index Terms

  1. User-Assisted Store Recycling for Dynamic Task Graph Schedulers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
        December 2016
        648 pages
        ISSN:1544-3566
        EISSN:1544-3973
        DOI:10.1145/3012405
        Issue’s Table of Contents

        Copyright © 2016 ACM

        © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 December 2016
        • Revised: 1 November 2016
        • Accepted: 1 November 2016
        • Received: 1 May 2016
        Published in taco Volume 13, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader