Responsive parallel computation: bridging competitive and cooperative threading

Authors:
Stefan K. Muller

Carnegie Mellon University, USA

Carnegie Mellon University, USA
View Profile

,
Umut A. Acar

Carnegie Mellon University, USA / Inria, France

Carnegie Mellon University, USA / Inria, France
View Profile

,
Robert Harper

Carnegie Mellon University, USA

Carnegie Mellon University, USA
View Profile

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2017Pages 677–692https://doi.org/10.1145/3062341.3062370

Published:14 June 2017Publication History

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 677–692

ABSTRACT

Competitive and cooperative threading are widely used abstractions in computing. In competitive threading, threads are scheduled preemptively with the goal of minimizing response time, usually of interactive applications. In cooperative threading, threads are scheduled non-preemptively with the goal of maximizing throughput or minimizing the completion time, usually in compute-intensive applications, e.g. scientific computing, machine learning and AI.

Although both of these forms of threading rely on the same abstraction of a thread, they have, to date, remained largely separate forms of computing. Motivated by the recent increase in the mainstream use of multicore computers, we propose a threading model that aims to unify competitive and cooperative threading. To this end, we extend the classic graph-based cost model for cooperative threading to allow for competitive threading, and describe how such a cost model may be used in a programming language by presenting a language and a corresponding cost semantics. Finally, we show that the cost model and the semantics are realizable by presenting an operational semantics for the language that specifies the behavior of an implementation, as well as an implementation and a small empirical evaluation.

References

U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321–347, 2002.Google Scholar
U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP ’13, 2013. Google ScholarDigital Library
U. A. Acar, A. Charguéraud, and M. Rainey. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. Journal of Functional Programming (JFP), 26:e23, 2016.Google Scholar
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115–144, 2001.Google ScholarCross Ref
Arvind and K. P. Gostelow. The Id report: An asychronous language and computing machine. Technical Report TR-114, Department of Information and Computer Science, University of California, Irvine, Sept. 1978.Google Scholar
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pages 29–44, 2009. Google ScholarDigital Library
G. Blake, R. G. Dreslinski, T. Mudge, and K. Flautner. Evolution of thread-level parallelism in desktop applications. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pages 302–313, 2010. Google ScholarDigital Library
G. Blelloch and J. Greiner. Parallelism in sequential functional languages. In Proceedings of the 7th International Conference on Functional Programming Languages and Computer Architecture, FPCA ’95, pages 226–237. ACM, 1995. Google ScholarDigital Library
G. E. Blelloch and J. Greiner. A provable time and space e fficient implementation of NESL. In Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming, pages 213–225. ACM, 1996. Google ScholarDigital Library
G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput., 21(1):4–14, 1994. Google ScholarDigital Library
G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably e fficient scheduling for languages with fine-grained parallelism. J. ACM, 46:281–321, Mar. 1999. Google ScholarDigital Library
R. D. Blumofe and C. E. Leiserson. Space-e fficient scheduling of multithreaded computations. SIAM Journal on Computing, 27(1):202–229, 1998. Google ScholarDigital Library
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720–748, Sept. 1999. Google ScholarDigital Library
S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pages 43–57, 2008. Google ScholarDigital Library
R. P. Brent. The parallel evaluation of general arithmetic expressions. J. ACM, 21(2):201–206, 1974. Google ScholarDigital Library
F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Functional Programming Languages and Computer Architecture (FPCA ’81), pages 187– 194. ACM Press, Oct. 1981. Google ScholarDigital Library
M. M. T. Chakravarty, R. Leshchinskiy, S. L. Peyton Jones, G. Keller, and S. Marlow. Data parallel Haskell: a status report. In Proceedings of the POPL 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP 2007, Nice, France, January 16, 2007, pages 10–18, 2007. Google ScholarDigital Library
P. Charles, C. Grotho ff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an objectoriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’05, pages 519–538. ACM, 2005. Google ScholarDigital Library
R. Davies. A temporal-logic approach to binding-time analysis. In LICS, pages 184–195, 1996. Google ScholarDigital Library
D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus e fficiency in parallel systems. IEEE Transactions on Computing, 38(3):408–423, 1989. Google ScholarDigital Library
Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, OSDI ’96, pages 185–199, New York, NY, USA, 1996. ACM. Google ScholarDigital Library
D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling - A status report. In Job Scheduling Strategies for Parallel Processing (JSSPP), 10th International Workshop, pages 1–16, 2004. Google ScholarDigital Library
N. Feltman, C. Angiuli, U. A. Acar, and K. Fatahalian. Automatically splitting a two-stage lambda calculus. In Proceedings of the 25 European Symposium on Programming, ESOP, pages 255–281, 2016.Google ScholarDigital Library
K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Threadlevel parallelism and interactive performance of desktop applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, pages 129–138, 2000. Google ScholarDigital Library
M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6):1–40, 2011. Google ScholarDigital Library
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212–223, 1998. Google ScholarDigital Library
C. Gao, A. Gutierrez, R. G. Dreslinski, T. Mudge, K. Flautner, and G. Blake. A study of thread level parallelism on mobile devices. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, pages 126– 127, March 2014.Google ScholarCross Ref
R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429, 1969.Google ScholarDigital Library
J. Greiner and G. E. Blelloch. A provably time-e fficient parallel implementation of full speculation. ACM Transactions on Programming Languages and Systems, 21(2):240–285, Mar. 1999. Google ScholarDigital Library
R. H. Halstead. Multilisp: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7:501–538, 1985. Google ScholarDigital Library
M. Harchol-Balter. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013. Google ScholarDigital Library
R. Harper. Practical Foundations for Programming Languages. Cambridge University Press, New York, NY, USA, 2012. Google ScholarDigital Library
C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: A case study. In Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, SOSP ’93, pages 94–105, New York, NY, USA, 1993. ACM. Google ScholarDigital Library
S. Imam and V. Sarkar. Load balancing prioritized tasks via work-stealing. In Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, pages 222–234, 2015.Google Scholar
S. M. Imam and V. Sarkar. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14, pages 75–86, 2014. Google ScholarCross Ref
Intel. Intel threading building blocks, 2011.Google Scholar
https://www.threadingbuildingblocks.org/.Google Scholar
S. Jagannathan, A. Navabi, K. Sivaramakrishnan, and L. Ziarek. The design rationale for Multi-MLton. In ML ’10: Proceedings of the ACM SIGPLAN Workshop on ML. ACM, 2010.Google Scholar
J. Jaja. An introduction to parallel algorithms. Addison Wesley Longman Publishing Company, 1992. Google ScholarDigital Library
G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP ’10, pages 261–272, 2010. Google ScholarDigital Library
T. B. Knoblock and E. Ruf. Data specialization. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, PLDI ’96, pages 215– 225, 1996. Google ScholarDigital Library
D. Lea. A Java fork /join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA ’00, pages 36–43, 2000. Google ScholarDigital Library
D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’09, pages 227–242, 2009. Google ScholarDigital Library
R. Ley-Wild, U. A. Acar, and M. Fluet. A cost semantics for self-adjusting computation. In Proceedings of the 26th Annual ACM Symposium on Principles of Programming Languages, POPL ’09, 2009. Google ScholarDigital Library
MLton. MLton web site. http://www.mlton.org.Google Scholar
S. K. Muller and U. A. Acar. Latency-hiding work stealing: Scheduling interacting parallel computations with work stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’16, pages 71–82, 2016. Google ScholarDigital Library
S. K. Muller, U. A. Acar, and R. Harper. Responsive parallel computation: Bridging competitive and cooperative threading. Technical Report TBD, Carnegie Mellon University School of Computer Science, Apr. 2017.Google ScholarDigital Library
T. Murphy, VII, K. Crary, R. Harper, and F. Pfenning. A symmetric modal lambda calculus for distributed computing. In Proceedings of the 19th IEEE Symposium on Logic in Computer Science (LICS), pages 286–295. IEEE Press, 2004. Google ScholarDigital Library
A. Nanevski and F. Pfenning. Staged computation with names and necessity. J. Funct. Program., 15(5):893–939, 2005. Google ScholarDigital Library
G. J. Narlikar and G. E. Blelloch. Space-e fficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21, 1999. Google ScholarDigital Library
F. Pfenning and R. Davies. A judgmental reconstruction of modal logic. Mathematical Structures in Computer Science, 11:511–540, 2001. Google ScholarDigital Library
R. Raghunathan, S. K. Muller, U. A. Acar, and G. Blelloch. Hierarchical memory management for parallel programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, pages 392–406, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
M. Rosendahl. Automatic complexity analysis. In FPCA ’89: Functional Programming Languages and Computer Architecture, pages 144–156. ACM, 1989. Google ScholarDigital Library
A. Saifullah, D. Ferry, J. Li, K. Agrawal, C. Lu, and C. D. Gill. Parallel real-time scheduling of DAGs. IEEE Trans. Parallel Distrib. Syst., 25(12):3242–3252, 2014.Google ScholarCross Ref
D. Sands. Complexity analysis for a lazy higher-order language. In ESOP ’90: Proceedings of the 3rd European Symposium on Programming, pages 361–376, London, UK, 1990. Springer-Verlag. Google ScholarDigital Library
P. M. Sansom and S. L. Peyton Jones. Time and space profiling for non-strict, higher-order functional languages. In Principles of Programming Languages, pages 355–366, 1995. Google ScholarDigital Library
A. Silberschatz, P. B. Galvin, and G. Gagne. Operating system concepts (7. ed.). Wiley, 2005. Google ScholarDigital Library
D. C. Smith, C. Irby, R. Kimball, B. Verplank, and E. Harslem. Designing the star user interface. BYTE Magazine, 7(4):242– 282, 1982.Google Scholar
D. Spoonhower. Scheduling Deterministic Parallel Programs. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2009. Google ScholarDigital Library
D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In International Conference on Functional Programming, 2008. Google ScholarDigital Library
D. C. Swinehart, P. T. Zellweger, R. J. Beach, and R. B. Hagmann. A structural view of the cedar programming environment. ACM Trans. Program. Lang. Syst., 8(4):419– 490, Aug. 1986. Google ScholarDigital Library
W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science, 248 (1):211 – 242, 2000. Google ScholarDigital Library
J. Ullman. NP-complete scheduling problems. Journal of Computer and System Sciences, 10(3):384 – 393, 1975. Google ScholarDigital Library
M. Wimmer, D. Cederman, J. L. Trä ff, and P. Tsigas. Workstealing with configurable scheduling strategies. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pages 315–316, 2013. Google ScholarDigital Library
M. Wimmer, F. Versaci, J. L. Trä ff, D. Cederman, and P. Tsigas. Data structures for task-based priority scheduling. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 379–380, 2014. Google ScholarDigital Library

Index Terms

Responsive parallel computation: bridging competitive and cooperative threading
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
        Functional languages
        Parallel programming languages

Recommendations

Responsive parallel computation: bridging competitive and cooperative threading
PLDI '17

Competitive and cooperative threading are widely used abstractions in computing. In competitive threading, threads are scheduled preemptively with the goal of minimizing response time, usually of interactive applications. In cooperative threading, ...
Read More
PAIS: Parallelism-aware interconnect scheduling in multicores
Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Multicore processors have the potential to deliver scalable performance by distributing computation across multiple cores. However, the communication cost of parallel application thread execution may significantly limit the performance achievable due to ...
Read More
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

This paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2017
708 pages
ISBN:9781450349888
DOI:10.1145/3062341
General Chair:
Albert Cohen
Inria, France
,
Program Chair:
Martin Vechev
DeepCode, Switzerland / ETH Zurich, Switzerland
ACM SIGPLAN Notices Volume 52, Issue 6
PLDI '17
June 2017
708 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3140587
Editor:
Matthew Fluet
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cost Models
Cost Semantics
Operational Semantics
Parallelism
Scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 631
  Total Downloads
- Downloads (Last 12 months)69
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Responsive parallel computation: bridging competitive and cooperative threading

PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Responsive parallel computation: bridging competitive and cooperative threading

PAIS: Parallelism-aware interconnect scheduling in multicores

Exploiting Parallelism on GPUs and FPGAs with OmpSs