ABSTRACT
Competitive and cooperative threading are widely used abstractions in computing. In competitive threading, threads are scheduled preemptively with the goal of minimizing response time, usually of interactive applications. In cooperative threading, threads are scheduled non-preemptively with the goal of maximizing throughput or minimizing the completion time, usually in compute-intensive applications, e.g. scientific computing, machine learning and AI.
Although both of these forms of threading rely on the same abstraction of a thread, they have, to date, remained largely separate forms of computing. Motivated by the recent increase in the mainstream use of multicore computers, we propose a threading model that aims to unify competitive and cooperative threading. To this end, we extend the classic graph-based cost model for cooperative threading to allow for competitive threading, and describe how such a cost model may be used in a programming language by presenting a language and a corresponding cost semantics. Finally, we show that the cost model and the semantics are realizable by presenting an operational semantics for the language that specifies the behavior of an implementation, as well as an implementation and a small empirical evaluation.
- U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321–347, 2002.Google Scholar
- U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP ’13, 2013. Google ScholarDigital Library
- U. A. Acar, A. Charguéraud, and M. Rainey. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. Journal of Functional Programming (JFP), 26:e23, 2016.Google Scholar
- N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115–144, 2001.Google ScholarCross Ref
- Arvind and K. P. Gostelow. The Id report: An asychronous language and computing machine. Technical Report TR-114, Department of Information and Computer Science, University of California, Irvine, Sept. 1978.Google Scholar
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pages 29–44, 2009. Google ScholarDigital Library
- G. Blake, R. G. Dreslinski, T. Mudge, and K. Flautner. Evolution of thread-level parallelism in desktop applications. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pages 302–313, 2010. Google ScholarDigital Library
- G. Blelloch and J. Greiner. Parallelism in sequential functional languages. In Proceedings of the 7th International Conference on Functional Programming Languages and Computer Architecture, FPCA ’95, pages 226–237. ACM, 1995. Google ScholarDigital Library
- G. E. Blelloch and J. Greiner. A provable time and space e fficient implementation of NESL. In Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming, pages 213–225. ACM, 1996. Google ScholarDigital Library
- G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput., 21(1):4–14, 1994. Google ScholarDigital Library
- G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably e fficient scheduling for languages with fine-grained parallelism. J. ACM, 46:281–321, Mar. 1999. Google ScholarDigital Library
- R. D. Blumofe and C. E. Leiserson. Space-e fficient scheduling of multithreaded computations. SIAM Journal on Computing, 27(1):202–229, 1998. Google ScholarDigital Library
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720–748, Sept. 1999. Google ScholarDigital Library
- S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pages 43–57, 2008. Google ScholarDigital Library
- R. P. Brent. The parallel evaluation of general arithmetic expressions. J. ACM, 21(2):201–206, 1974. Google ScholarDigital Library
- F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Functional Programming Languages and Computer Architecture (FPCA ’81), pages 187– 194. ACM Press, Oct. 1981. Google ScholarDigital Library
- M. M. T. Chakravarty, R. Leshchinskiy, S. L. Peyton Jones, G. Keller, and S. Marlow. Data parallel Haskell: a status report. In Proceedings of the POPL 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP 2007, Nice, France, January 16, 2007, pages 10–18, 2007. Google ScholarDigital Library
- P. Charles, C. Grotho ff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an objectoriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’05, pages 519–538. ACM, 2005. Google ScholarDigital Library
- R. Davies. A temporal-logic approach to binding-time analysis. In LICS, pages 184–195, 1996. Google ScholarDigital Library
- D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus e fficiency in parallel systems. IEEE Transactions on Computing, 38(3):408–423, 1989. Google ScholarDigital Library
- Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, OSDI ’96, pages 185–199, New York, NY, USA, 1996. ACM. Google ScholarDigital Library
- D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling - A status report. In Job Scheduling Strategies for Parallel Processing (JSSPP), 10th International Workshop, pages 1–16, 2004. Google ScholarDigital Library
- N. Feltman, C. Angiuli, U. A. Acar, and K. Fatahalian. Automatically splitting a two-stage lambda calculus. In Proceedings of the 25 European Symposium on Programming, ESOP, pages 255–281, 2016.Google ScholarDigital Library
- K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Threadlevel parallelism and interactive performance of desktop applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, pages 129–138, 2000. Google ScholarDigital Library
- M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6):1–40, 2011. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212–223, 1998. Google ScholarDigital Library
- C. Gao, A. Gutierrez, R. G. Dreslinski, T. Mudge, K. Flautner, and G. Blake. A study of thread level parallelism on mobile devices. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, pages 126– 127, March 2014.Google ScholarCross Ref
- R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429, 1969.Google ScholarDigital Library
- J. Greiner and G. E. Blelloch. A provably time-e fficient parallel implementation of full speculation. ACM Transactions on Programming Languages and Systems, 21(2):240–285, Mar. 1999. Google ScholarDigital Library
- R. H. Halstead. Multilisp: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7:501–538, 1985. Google ScholarDigital Library
- M. Harchol-Balter. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013. Google ScholarDigital Library
- R. Harper. Practical Foundations for Programming Languages. Cambridge University Press, New York, NY, USA, 2012. Google ScholarDigital Library
- C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: A case study. In Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, SOSP ’93, pages 94–105, New York, NY, USA, 1993. ACM. Google ScholarDigital Library
- S. Imam and V. Sarkar. Load balancing prioritized tasks via work-stealing. In Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, pages 222–234, 2015.Google Scholar
- S. M. Imam and V. Sarkar. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14, pages 75–86, 2014. Google ScholarCross Ref
- Intel. Intel threading building blocks, 2011.Google Scholar
- https://www.threadingbuildingblocks.org/.Google Scholar
- S. Jagannathan, A. Navabi, K. Sivaramakrishnan, and L. Ziarek. The design rationale for Multi-MLton. In ML ’10: Proceedings of the ACM SIGPLAN Workshop on ML. ACM, 2010.Google Scholar
- J. Jaja. An introduction to parallel algorithms. Addison Wesley Longman Publishing Company, 1992. Google ScholarDigital Library
- G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP ’10, pages 261–272, 2010. Google ScholarDigital Library
- T. B. Knoblock and E. Ruf. Data specialization. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, PLDI ’96, pages 215– 225, 1996. Google ScholarDigital Library
- D. Lea. A Java fork /join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA ’00, pages 36–43, 2000. Google ScholarDigital Library
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’09, pages 227–242, 2009. Google ScholarDigital Library
- R. Ley-Wild, U. A. Acar, and M. Fluet. A cost semantics for self-adjusting computation. In Proceedings of the 26th Annual ACM Symposium on Principles of Programming Languages, POPL ’09, 2009. Google ScholarDigital Library
- MLton. MLton web site. http://www.mlton.org.Google Scholar
- S. K. Muller and U. A. Acar. Latency-hiding work stealing: Scheduling interacting parallel computations with work stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’16, pages 71–82, 2016. Google ScholarDigital Library
- S. K. Muller, U. A. Acar, and R. Harper. Responsive parallel computation: Bridging competitive and cooperative threading. Technical Report TBD, Carnegie Mellon University School of Computer Science, Apr. 2017.Google ScholarDigital Library
- T. Murphy, VII, K. Crary, R. Harper, and F. Pfenning. A symmetric modal lambda calculus for distributed computing. In Proceedings of the 19th IEEE Symposium on Logic in Computer Science (LICS), pages 286–295. IEEE Press, 2004. Google ScholarDigital Library
- A. Nanevski and F. Pfenning. Staged computation with names and necessity. J. Funct. Program., 15(5):893–939, 2005. Google ScholarDigital Library
- G. J. Narlikar and G. E. Blelloch. Space-e fficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21, 1999. Google ScholarDigital Library
- F. Pfenning and R. Davies. A judgmental reconstruction of modal logic. Mathematical Structures in Computer Science, 11:511–540, 2001. Google ScholarDigital Library
- R. Raghunathan, S. K. Muller, U. A. Acar, and G. Blelloch. Hierarchical memory management for parallel programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, pages 392–406, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- M. Rosendahl. Automatic complexity analysis. In FPCA ’89: Functional Programming Languages and Computer Architecture, pages 144–156. ACM, 1989. Google ScholarDigital Library
- A. Saifullah, D. Ferry, J. Li, K. Agrawal, C. Lu, and C. D. Gill. Parallel real-time scheduling of DAGs. IEEE Trans. Parallel Distrib. Syst., 25(12):3242–3252, 2014.Google ScholarCross Ref
- D. Sands. Complexity analysis for a lazy higher-order language. In ESOP ’90: Proceedings of the 3rd European Symposium on Programming, pages 361–376, London, UK, 1990. Springer-Verlag. Google ScholarDigital Library
- P. M. Sansom and S. L. Peyton Jones. Time and space profiling for non-strict, higher-order functional languages. In Principles of Programming Languages, pages 355–366, 1995. Google ScholarDigital Library
- A. Silberschatz, P. B. Galvin, and G. Gagne. Operating system concepts (7. ed.). Wiley, 2005. Google ScholarDigital Library
- D. C. Smith, C. Irby, R. Kimball, B. Verplank, and E. Harslem. Designing the star user interface. BYTE Magazine, 7(4):242– 282, 1982.Google Scholar
- D. Spoonhower. Scheduling Deterministic Parallel Programs. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2009. Google ScholarDigital Library
- D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In International Conference on Functional Programming, 2008. Google ScholarDigital Library
- D. C. Swinehart, P. T. Zellweger, R. J. Beach, and R. B. Hagmann. A structural view of the cedar programming environment. ACM Trans. Program. Lang. Syst., 8(4):419– 490, Aug. 1986. Google ScholarDigital Library
- W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science, 248 (1):211 – 242, 2000. Google ScholarDigital Library
- J. Ullman. NP-complete scheduling problems. Journal of Computer and System Sciences, 10(3):384 – 393, 1975. Google ScholarDigital Library
- M. Wimmer, D. Cederman, J. L. Trä ff, and P. Tsigas. Workstealing with configurable scheduling strategies. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pages 315–316, 2013. Google ScholarDigital Library
- M. Wimmer, F. Versaci, J. L. Trä ff, D. Cederman, and P. Tsigas. Data structures for task-based priority scheduling. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 379–380, 2014. Google ScholarDigital Library
Index Terms
- Responsive parallel computation: bridging competitive and cooperative threading
Recommendations
Responsive parallel computation: bridging competitive and cooperative threading
PLDI '17Competitive and cooperative threading are widely used abstractions in computing. In competitive threading, threads are scheduled preemptively with the goal of minimizing response time, usually of interactive applications. In cooperative threading, ...
PAIS: Parallelism-aware interconnect scheduling in multicores
Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular PapersMulticore processors have the potential to deliver scalable performance by distributing computation across multiple cores. However, the communication cost of parallel application thread execution may significantly limit the performance achievable due to ...
Exploiting Parallelism on GPUs and FPGAs with OmpSs
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC SystemsThis paper presents the OmpSs approach to deal with heterogeneous programming on GPU and FPGA accelerators. The OmpSs programming model is based on the Mercurium compiler and the Nanos++ runtime. Applications are annotated with compiler directives ...
Comments