article

Free Access

Provably efficient scheduling for languages with fine-grained parallelism

Authors:
Guy E. Blelloch

Carnegie Mellon Univ., Pittsburgh, PA

Carnegie Mellon Univ., Pittsburgh, PA
View Profile

,
Phillip B. Gibbons

Bell Labs, Murray Hill, NJ

Bell Labs, Murray Hill, NJ
View Profile

,
Yossi Matias

Bell Labs, Murray Hill, NJ

Bell Labs, Murray Hill, NJ
View Profile

Authors Info & Claims

Journal of the ACM Volume 46 Issue 2pp 281–321https://doi.org/10.1145/301970.301974

Published:01 March 1999Publication History

Journal of the ACM

Abstract

Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program.

This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any computation with w units of work and critical path length d, and for any sequential schedule that takes space s₁, we provide a parallel schedule that takes fewer than w/p + d steps on p processors and requires less than s₁ + p·d space. This matches the lower bound that we show, and significantly improves upon the best previous bound of s₁·p spaces for the common case where d«s₁.

The paper then describes a scheduler for implementing high-level languages with nested parallelism, that generates schedules in this class. During program execution, as the structure of the computation is revealed, the scheduler keeps track of the active tasks, allocates the tasks to the processors, and performs the necessary task synchronization. The scheduler is itself a parallel algorithm, and incurs at most a constant factor overhead in time and space, even when the scheduling granularity is individual units of work. The algorithm is the first efficient solution to the scheduling problem discussed here, even if space considerations are ignored.

References

ALLEN, F., BURKE, M., CHARLES, P., CYTRON, R., AND FERRANTE, J. 1988. An overview of the PTRAN analysis system for multiprocessing. J. Paral. Distr. Comput. 5, 5 (Oct.), 617-640. Google Scholar
ARVIND, NIKHIL, R. S., AND PINGALI, K.K. 1989. I-structures: Data structures for parallel computing. ACM Trans. Prog. Lang. Syst. 11, 4 (Oct.), 598-632. Google Scholar
BAKER, H.G., AND HEWITT, C. 1977. The incremental garbage collection of processes. ACM Sigplan Not. 12, 8 (Aug.), 55-59. Google Scholar
BECKMAN, P., GANNON, D., AND JOHNSON, E. 1996. Portable parallel programming in HPC+ +. In Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing (Bloomingdale, Ill., Aug. 12), pp. 132-139.Google Scholar
BERNSTEIN, D. 1988. PREFACE-2: Supporting nested parallelism in Fortran. Res. Rep. RC-14160. IBM Thomas J. Watson Research Center, Yorktown Heights, N.Y.Google Scholar
BLELLOCH, G.E. 1989. Scans as primitive parallel operations. IEEE Trans. Comput. C-38, 11 (Nov.), 1526-1538. Google Scholar
BLELLOCH, G.E. 1990. Vector Models for Data-Parallel Computing. The MIT Press, Cambridge, Mass. Google Scholar
BLELLOCH, G. E., CHATTERJEE, S., HARDWICK, J. C., SIPELSTEIN, J., AND ZAGHA, M. 1994. Implementation of a portable nested data-parallel language. J. Paral. Distr. Comput. 21, 1 (Apr.), 4-14. Google Scholar
BLELLOCH, G. E., GIBBONS, P. B., MATIAS, Y., AND NARLIKAR, G.J. 1997. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures (Newport, R.I., June 22-25). ACM, New York, pp. 12-23. Google Scholar
BLELLOCH, G.E., AND GREINER, J. 1995. Parallelism in sequential functional languages. In Proceedings of the 7th SIGPLAN-SIGARCH-WG2.8 Conference on Functional Programming Languages and Computer Architecture (La Jolla, Calif., June 25-28). ACM, New York, pp. 226-237. Google Scholar
BLELLOCH, G. E., AND GREINER, J. 1996. A provable time and space efficient implementation of NESL. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (Philadelphia, Pa., May 24-26). ACM, New York, pp. 213-225. Google Scholar
BLUMOFE, R.D., FRIGO, M., JOERG, C.F., LEISERSON, C.E., AND RANDALL, K.H. 1996. An analysis of DAG-consistent distributed shared-memory algorithms. In Proceedings of the 8th ACM Symposium on Parallel Algorithms and Architectures (Padua, Italy, June 24-26). ACM, New York, pp. 297-308. Google Scholar
BLUMOFE, R. D., JOERG, C. F., KUSZMAUL, B. C., LEISERSON, C. E., RANDALL, K. H., AND ZHOU, Y. 1995. CILK: An efficient multithreaded runtime system. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Santa Barbara, Calif., July 19-21). ACM, New York, pp. 207-216. Google Scholar
BLUMOFE, R. D., AND LEISERSON, C.E. 1993. Space-efficient scheduling of multithreaded computations. In Proceedings of the 25th Annual ACM Symposium on Theory of Computing (San Diego, Calif., May 16-18). ACM, New York, pp. 362-371. Google Scholar
BLUMOFE, R.D., AND LEISERSON, C.E. 1994. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th IEEE Symposium on Foundations of Computer Science (Santa Fe, N.M., Nov. 20-22). IEEE Computer Science Press, Los Alamitos, Calif., pp. 356-368.Google Scholar
BRENT, R.P. 1974. The parallel evaluation of general arithmetic expressions. J. ACM 21, 2 (Apr.), 201-208. Google Scholar
BURTON, F.W. 1988. Storage management in virtual tree machines. IEEE Trans. Comput. 37, 3 (Mar.), 321-328. Google Scholar
BURTON, F.W. 1996. Guaranteeing good memory bounds for parallel programs. IEEE Trans. Softw. Eng. 22, 10 (Nov.), 762-773. Google Scholar
BURTON, F.W., MCKEOWN, G.P., AND RAYWARD-SMITH, g.J. Applications of UET scheduling theory to the implementation of declarative languages. Comput. J. 33, 4 (Aug.), 330-336. Google Scholar
BURTON, F.W., AND SIMPSON, D.J. 1994. Space efficient execution of deterministic parallel programs. Manuscript.Google Scholar
BURTON, F.W., AND SLEEP, M.R. 1981. Executing functional programs on a virtual tree of processors. In Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture (Portsmouth, N.H., Oct. 18-22). ACM, New York, pp. 187-194. Google Scholar
COFFMAN, E.G. 1976. Computer and Job-Shop Scheduling Theory. Wiley, New York.Google Scholar
CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1990. Introduction to Algorithms. McGraw- Hill, New York, N.Y. Google Scholar
CULLER, D. E., AND ARVIND 1988. Resource requirements of dataflow programs. In Proceedings of the 15th International Symposium on Computer Architecture (Honolulu, Ha., May 30-June 2). IEEE Computer Society Press, Los Alamitos, Calif., pp. 141-150. Google Scholar
FANG, Z., TANG, P., YEW, P.-C., AND ZHU, C.-Q. 1990. Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. Comput. 39, 7 (July), 919-929. Google Scholar
FEO, J. T., CANN, D. C., AND OLDEHOEFT, R.R. 1990. A report on the Sisal language project. J. Paral. Distr. Comput. 10, 4 (Dec.), 349-366. Google Scholar
GII, J. 1994. Renaming and dispersing: Techniques for fast load balancing. J. Paral. Distr. Comput. 23, 2 (Nov.), 149-157. Google Scholar
GIL, J., AND MATIAS, Y. 1991. Fast hashing on a PRAM-Designing by expectation. In Proceedings of the 2nd Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, Calif., Jan. 28-30). ACM, New York, pp. 271-280. Google Scholar
GIL, J., AND MATIAS, Y. 1996. An effective load balancing policy for geometric decaying algorithms. J. Paral. Distr. Comput. 36, 2 (Aug.), 185-188. Google Scholar
GIL, J., MATIAS, Y., AND VISHKIN, U. 1991. Towards a theory of nearly constant time parallel algorithms. In Proceedings of the 32nd IEEE Symposium on Foundations of Computer Science (San Juan, P.R., Oct. 1-4). IEEE Computer Society Press, Los Alamitos, Calif., pp. 698-710. Google Scholar
GOLDBERG, T., AND ZWlCK, U. 1995. Optimal deterministic approximate parallel prefix sums and their applications. In Proceedings of the 3rd Israel Symposium on the Theory of Computing and Systems (Tel-Aviv, Israel, Jan. 4-6). pp. 220-228. Google Scholar
GOODRICH, M.T. 1991. Using approximation algorithms to design parallel algorithms that may ignore processor allocation. In Proceedings of the 32nd IEEE Symposium on Foundations of Computer Science (San Juan, P.R., Oct. 1-4). IEEE Computer Society Press, Los Alamitos, Calif., pp. 711-722. Google Scholar
GOODRICH, M. T., MATIAS, Y., AND VISHKIN, U. 1994. Optimal parallel approximation algorithms for prefix sums and integer sorting. In Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms (Arlington, Va., Jan. 23-25). ACM, New York, pp. 241-250. Google Scholar
GRAHAM, R.L. 1966. Bounds for certain multiprocessing anomalies. Bell Syst. Tech. J. 45, 9, 1563-1581.Google Scholar
GRAHAM, R.L. 1969. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17, 2, 416-429.Google Scholar
HAGERUP, T. 1991. Fast parallel space allocation, estimation and integer sorting. Tech. Rep. 03/91, SFB 124. Fachbereich Informatik, Universitfit des Saarlandes, D-6600 Saarbrticken, Germany.Google Scholar
HAGERUP, T. 1993. Fast deterministic processor allocation. In Proceedings of the 4th ACM-SIAM Symposium on Discrete Algorithms (Austin, Tex., Jan. 25-27). ACM, New York, pp. 1-10. Google Scholar
HALSTEAD, R.H., JR. 1985. Multilisp: A language for concurrent symbolic computation. ACM Trans. Prog. Lang. Syst. 7, 4, 501-538. Google Scholar
HIGH PERFORMANCE FORTRAN FORUM. 1993. High Performance FORTRAN Language Specification (May).Google Scholar
HOPCROFT, J., PAUL, W., AND VALIANT, L. 1977. On time versus space. J. ACM 24, 2 (Apr.), 332-337. Google Scholar
HUMMEL, S. F., AND SCHONBERG, E. 1991. Low-overhead scheduling of nested parallelism. IBM J. Res. Devel. 35, 5/6, 743-765. Google Scholar
JAGANNATHAN, S., AND PHILBIN, J. 1992. A foundation for an efficient multi-threaded Scheme system. In Proceedings of the 1992 ACM Conference on Lisp and Functional Programming (San Francisco, Calif., June 22-24). ACM, New York, pp. 345-357. Google Scholar
LEASURE, B. 1989. PCF programming model and FORTRAN bindings. In Proceedings of the 13th IEEE International Computer Software and Applications Conference (Orland, Fla., Sept.). IEEE Computer Society Press, Los Alamitos, Calif., pp. 111-118.Google Scholar
LEIGHTON, F. T., MAGGS, B. M., RANADE, A. G., AND RAO, S.B. 1994. Randomized routing and sorting on fixed-connection networks. J. Algorithms 17, 1 (July), 157-205. Google Scholar
MATIAS, Y. 1992. Highly Parallel Randomized Algorithmics. Ph.D. dissertation. Tel Aviv University, Tel Aviv, Israel.Google Scholar
MATIAS, Y., AND VISHKIN, U. 1991. Converting high probability into nearly-constant time--with applications to parallel hashing. In Proceedings of the 23rd Annual ACM Symposium on Theory of Computing (New Orleans, La., May 6-8). ACM, New York, pp. 307-316. Google Scholar
MILLS, P.H., NYLAND, L.S., PRINS, J.F., REIF, J.H., AND WAGNER, R.A. 1990. Prototyping parallel and distributed programs in Proteus. Tech. Rep. UNC-CH TR90-041. Computer Science Dept., Univ. North Carolina, Chapel Hill, N.C.Google Scholar
NARLIKAR, G.J. 1999. Scheduling threads for low space requirement and goal locality. In Proceedings of the llth Annual ACM Symposium on Parallel Algorithms and Architecture (Saint-Halo, France, June 27-30). ACM, New York, to appear. Google Scholar
NARLIKAR, G.J., AND BLELLOCH, G.E. 1997. Space-efficient implementation of nested parallelism. In Proceedings of the 6th A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Las Vegas, Nev., June 18-21). ACM, New York, pp. 25-36. Google Scholar
PAPADIMITRIOU, C. H., AND ULLMAN, J.D. 1987. A communication-time tradeoff. SIAM J. Comput. 16, 4 (Aug.), 639-646. Google Scholar
PAPADIMITRIOU, C. H., AND YANNAKAKIS, M. 1988. Towards an architecture-independent analysis of parallel algorithms. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing (Chicago, Ill., May 2-4). ACM, New York, pp. 510-513. Google Scholar
PAUL, W. J., VISHKIN, U., AND WAGENER, H. 1983. Parallel dictionaries on 2-3 trees. In Proceedings of the l Oth International Colloquium on Automata Languages and Programming. Lecture Notes in Computer Science, vol. 154. Springer-Verlag, New York, pp. 597-609. Google Scholar
PIPPENGER, N. 1980. Pebbling. In Proceedings of the 5th IBM Symposium on Mathematical Foundations of Computer Science: Computational Complexity. (May).Google Scholar
RANADE, A.G. 1991. How to emulate shared memory. J. Comput. Syst. Sci. 42, 307-326. Google Scholar
REIF, J. H., ED. 1993. A Synthesis of Parallel Algorithms. Morgan-Kaufmann, San Mateo, Calif. Google Scholar
RUGGIERO, C.A., AND SARGEANT, J. 1987. Control of parallelism in the Manchester dataflow machine. In Functional Programming Languages and Computer Architecture. Lecture Notes in Computer Science, vol. 174. Springer-Verlag, New York, pp. 1-15. Google Scholar
SABOT, G.W. 1988. The Paralation Model: Architecture-Independent Parallel Programming. The MIT Press, Cambridge, Mass. Google Scholar
SAVAGE, J. E., AND VITTER, J.S. 1984. Parallelism in space-time tradeoffs. In Proceedings of the International Workshop on Parallel Computing and VLSI (Amulfi, Italy, May). Elsevier Science Press, Amsterdam, The Netherlands, pp. 49-58.Google Scholar
SucIu, D., AND TANNEN, V. 1994. Efficient compilation of high-level data parallel algorithms. In Proceedings of the 6th ACM Symposium on Parallel Algorithms and Architectures (Cape May, N.J., June 27-29). ACM, New York, pp. 57-66. Google Scholar
VALIANT, L.G. 1990. General purpose parallel architectures. In Handbook of Theoretical Computer Science, Volume A, J. van Leeuwen, ed. Elsevier Science Publishers B.V., Amsterdam, The Netherlands, pp. 943-972. Google Scholar
VENKATESWARAN, H., AND TOMPA, M. 1989. A new pebble game that characterizes parallel complexity classes. SIAM J. Comput. 18, 3, 533-549. Google Scholar
VENKATESWARAN, P. W., AND TOMPA, M. 1985. Speedups of deterministic machines by synchronous parallel machines. J. Comput. Syst. Sci. 30, 149-161.Google Scholar

Index Terms

Provably efficient scheduling for languages with fine-grained parallelism
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multiprocessing / multiprogramming / multitasking
        Multithreading
        Scheduling

Recommendations

Provably Efficient Online Nonclairvoyant Adaptive Scheduling

Multiprocessor scheduling in a shared multiprogramming environment can be structured in two levels, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler maps the ready threads of a job onto the allotted ...
Read More
Provably efficient two-level adaptive scheduling
JSSPP'06: Proceedings of the 12th international conference on Job scheduling strategies for parallel processing

Multiprocessor scheduling in a shared multiprogramming environment can be structured in two levels, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler maps the ready threads of a job onto the allotted ...
Read More
Adaptive scheduling with parallelism feedback
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of the ACM Volume 46, Issue 2
March 1999
137 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/301970
Issue’s Table of Contents

Copyright © 1999 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 1999
Published in jacm Volume 46, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 87
  Total Citations
  View Citations
- 1,246
  Total Downloads
- Downloads (Last 12 months)171
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Provably efficient scheduling for languages with fine-grained parallelism

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Provably Efficient Online Nonclairvoyant Adaptive Scheduling

Provably efficient two-level adaptive scheduling

Adaptive scheduling with parallelism feedback

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Provably efficient scheduling for languages with fine-grained parallelism

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Provably Efficient Online Nonclairvoyant Adaptive Scheduling

Provably efficient two-level adaptive scheduling

Adaptive scheduling with parallelism feedback

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media