skip to main content
10.1145/3062341.3062370acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access

Responsive parallel computation: bridging competitive and cooperative threading

Published:14 June 2017Publication History

ABSTRACT

Competitive and cooperative threading are widely used abstractions in computing. In competitive threading, threads are scheduled preemptively with the goal of minimizing response time, usually of interactive applications. In cooperative threading, threads are scheduled non-preemptively with the goal of maximizing throughput or minimizing the completion time, usually in compute-intensive applications, e.g. scientific computing, machine learning and AI.

Although both of these forms of threading rely on the same abstraction of a thread, they have, to date, remained largely separate forms of computing. Motivated by the recent increase in the mainstream use of multicore computers, we propose a threading model that aims to unify competitive and cooperative threading. To this end, we extend the classic graph-based cost model for cooperative threading to allow for competitive threading, and describe how such a cost model may be used in a programming language by presenting a language and a corresponding cost semantics. Finally, we show that the cost model and the semantics are realizable by presenting an operational semantics for the language that specifies the behavior of an implementation, as well as an implementation and a small empirical evaluation.

References

  1. U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3):321–347, 2002.Google ScholarGoogle Scholar
  2. U. A. Acar, A. Charguéraud, and M. Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP ’13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. U. A. Acar, A. Charguéraud, and M. Rainey. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. Journal of Functional Programming (JFP), 26:e23, 2016.Google ScholarGoogle Scholar
  4. N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115–144, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  5. Arvind and K. P. Gostelow. The Id report: An asychronous language and computing machine. Technical Report TR-114, Department of Information and Computer Science, University of California, Irvine, Sept. 1978.Google ScholarGoogle Scholar
  6. A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pages 29–44, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Blake, R. G. Dreslinski, T. Mudge, and K. Flautner. Evolution of thread-level parallelism in desktop applications. In Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pages 302–313, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Blelloch and J. Greiner. Parallelism in sequential functional languages. In Proceedings of the 7th International Conference on Functional Programming Languages and Computer Architecture, FPCA ’95, pages 226–237. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. E. Blelloch and J. Greiner. A provable time and space e fficient implementation of NESL. In Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming, pages 213–225. ACM, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput., 21(1):4–14, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. E. Blelloch, P. B. Gibbons, and Y. Matias. Provably e fficient scheduling for languages with fine-grained parallelism. J. ACM, 46:281–321, Mar. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. D. Blumofe and C. E. Leiserson. Space-e fficient scheduling of multithreaded computations. SIAM Journal on Computing, 27(1):202–229, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720–748, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, pages 43–57, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. P. Brent. The parallel evaluation of general arithmetic expressions. J. ACM, 21(2):201–206, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. W. Burton and M. R. Sleep. Executing functional programs on a virtual tree of processors. In Functional Programming Languages and Computer Architecture (FPCA ’81), pages 187– 194. ACM Press, Oct. 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. M. T. Chakravarty, R. Leshchinskiy, S. L. Peyton Jones, G. Keller, and S. Marlow. Data parallel Haskell: a status report. In Proceedings of the POPL 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP 2007, Nice, France, January 16, 2007, pages 10–18, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Charles, C. Grotho ff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an objectoriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’05, pages 519–538. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Davies. A temporal-logic approach to binding-time analysis. In LICS, pages 184–195, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. L. Eager, J. Zahorjan, and E. D. Lazowska. Speedup versus e fficiency in parallel systems. IEEE Transactions on Computing, 38(3):408–423, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Endo, Z. Wang, J. B. Chen, and M. Seltzer. Using latency to evaluate interactive system performance. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, OSDI ’96, pages 185–199, New York, NY, USA, 1996. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling - A status report. In Job Scheduling Strategies for Parallel Processing (JSSPP), 10th International Workshop, pages 1–16, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Feltman, C. Angiuli, U. A. Acar, and K. Fatahalian. Automatically splitting a two-stage lambda calculus. In Proceedings of the 25 European Symposium on Programming, ESOP, pages 255–281, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Flautner, R. Uhlig, S. Reinhardt, and T. Mudge. Threadlevel parallelism and interactive performance of desktop applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS IX, pages 129–138, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6):1–40, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212–223, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Gao, A. Gutierrez, R. G. Dreslinski, T. Mudge, K. Flautner, and G. Blake. A study of thread level parallelism on mobile devices. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, pages 126– 127, March 2014.Google ScholarGoogle ScholarCross RefCross Ref
  28. R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416–429, 1969.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Greiner and G. E. Blelloch. A provably time-e fficient parallel implementation of full speculation. ACM Transactions on Programming Languages and Systems, 21(2):240–285, Mar. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. H. Halstead. Multilisp: a language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7:501–538, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Harchol-Balter. Performance Modeling and Design of Computer Systems: Queueing Theory in Action. Cambridge University Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Harper. Practical Foundations for Programming Languages. Cambridge University Press, New York, NY, USA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Hauser, C. Jacobi, M. Theimer, B. Welch, and M. Weiser. Using threads in interactive systems: A case study. In Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, SOSP ’93, pages 94–105, New York, NY, USA, 1993. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Imam and V. Sarkar. Load balancing prioritized tasks via work-stealing. In Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, pages 222–234, 2015.Google ScholarGoogle Scholar
  35. S. M. Imam and V. Sarkar. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14, pages 75–86, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  36. Intel. Intel threading building blocks, 2011.Google ScholarGoogle Scholar
  37. https://www.threadingbuildingblocks.org/.Google ScholarGoogle Scholar
  38. S. Jagannathan, A. Navabi, K. Sivaramakrishnan, and L. Ziarek. The design rationale for Multi-MLton. In ML ’10: Proceedings of the ACM SIGPLAN Workshop on ML. ACM, 2010.Google ScholarGoogle Scholar
  39. J. Jaja. An introduction to parallel algorithms. Addison Wesley Longman Publishing Company, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP ’10, pages 261–272, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. B. Knoblock and E. Ruf. Data specialization. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, PLDI ’96, pages 215– 225, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Lea. A Java fork /join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA ’00, pages 36–43, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’09, pages 227–242, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Ley-Wild, U. A. Acar, and M. Fluet. A cost semantics for self-adjusting computation. In Proceedings of the 26th Annual ACM Symposium on Principles of Programming Languages, POPL ’09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. MLton. MLton web site. http://www.mlton.org.Google ScholarGoogle Scholar
  46. S. K. Muller and U. A. Acar. Latency-hiding work stealing: Scheduling interacting parallel computations with work stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’16, pages 71–82, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. K. Muller, U. A. Acar, and R. Harper. Responsive parallel computation: Bridging competitive and cooperative threading. Technical Report TBD, Carnegie Mellon University School of Computer Science, Apr. 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. T. Murphy, VII, K. Crary, R. Harper, and F. Pfenning. A symmetric modal lambda calculus for distributed computing. In Proceedings of the 19th IEEE Symposium on Logic in Computer Science (LICS), pages 286–295. IEEE Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. A. Nanevski and F. Pfenning. Staged computation with names and necessity. J. Funct. Program., 15(5):893–939, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. G. J. Narlikar and G. E. Blelloch. Space-e fficient scheduling of nested parallelism. ACM Transactions on Programming Languages and Systems, 21, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. F. Pfenning and R. Davies. A judgmental reconstruction of modal logic. Mathematical Structures in Computer Science, 11:511–540, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. R. Raghunathan, S. K. Muller, U. A. Acar, and G. Blelloch. Hierarchical memory management for parallel programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, ICFP 2016, pages 392–406, New York, NY, USA, 2016. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. M. Rosendahl. Automatic complexity analysis. In FPCA ’89: Functional Programming Languages and Computer Architecture, pages 144–156. ACM, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. A. Saifullah, D. Ferry, J. Li, K. Agrawal, C. Lu, and C. D. Gill. Parallel real-time scheduling of DAGs. IEEE Trans. Parallel Distrib. Syst., 25(12):3242–3252, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  55. D. Sands. Complexity analysis for a lazy higher-order language. In ESOP ’90: Proceedings of the 3rd European Symposium on Programming, pages 361–376, London, UK, 1990. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. P. M. Sansom and S. L. Peyton Jones. Time and space profiling for non-strict, higher-order functional languages. In Principles of Programming Languages, pages 355–366, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. A. Silberschatz, P. B. Galvin, and G. Gagne. Operating system concepts (7. ed.). Wiley, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. D. C. Smith, C. Irby, R. Kimball, B. Verplank, and E. Harslem. Designing the star user interface. BYTE Magazine, 7(4):242– 282, 1982.Google ScholarGoogle Scholar
  59. D. Spoonhower. Scheduling Deterministic Parallel Programs. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. D. Spoonhower, G. E. Blelloch, R. Harper, and P. B. Gibbons. Space profiling for parallel functional programs. In International Conference on Functional Programming, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. D. C. Swinehart, P. T. Zellweger, R. J. Beach, and R. B. Hagmann. A structural view of the cedar programming environment. ACM Trans. Program. Lang. Syst., 8(4):419– 490, Aug. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science, 248 (1):211 – 242, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. J. Ullman. NP-complete scheduling problems. Journal of Computer and System Sciences, 10(3):384 – 393, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. M. Wimmer, D. Cederman, J. L. Trä ff, and P. Tsigas. Workstealing with configurable scheduling strategies. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pages 315–316, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. M. Wimmer, F. Versaci, J. L. Trä ff, D. Cederman, and P. Tsigas. Data structures for task-based priority scheduling. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’14, pages 379–380, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Responsive parallel computation: bridging competitive and cooperative threading

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2017
          708 pages
          ISBN:9781450349888
          DOI:10.1145/3062341

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate406of2,067submissions,20%

          Upcoming Conference

          PLDI '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader