skip to main content
article
Free Access

Provably efficient scheduling for languages with fine-grained parallelism

Published:01 March 1999Publication History
Skip Abstract Section

Abstract

Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program.

This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any computation with w units of work and critical path length d, and for any sequential schedule that takes space s1, we provide a parallel schedule that takes fewer than w/p + d steps on p processors and requires less than s1 + p·d space. This matches the lower bound that we show, and significantly improves upon the best previous bound of s1·p spaces for the common case where d«s1.

The paper then describes a scheduler for implementing high-level languages with nested parallelism, that generates schedules in this class. During program execution, as the structure of the computation is revealed, the scheduler keeps track of the active tasks, allocates the tasks to the processors, and performs the necessary task synchronization. The scheduler is itself a parallel algorithm, and incurs at most a constant factor overhead in time and space, even when the scheduling granularity is individual units of work. The algorithm is the first efficient solution to the scheduling problem discussed here, even if space considerations are ignored.

References

  1. ALLEN, F., BURKE, M., CHARLES, P., CYTRON, R., AND FERRANTE, J. 1988. An overview of the PTRAN analysis system for multiprocessing. J. Paral. Distr. Comput. 5, 5 (Oct.), 617-640. Google ScholarGoogle Scholar
  2. ARVIND, NIKHIL, R. S., AND PINGALI, K.K. 1989. I-structures: Data structures for parallel computing. ACM Trans. Prog. Lang. Syst. 11, 4 (Oct.), 598-632. Google ScholarGoogle Scholar
  3. BAKER, H.G., AND HEWITT, C. 1977. The incremental garbage collection of processes. ACM Sigplan Not. 12, 8 (Aug.), 55-59. Google ScholarGoogle Scholar
  4. BECKMAN, P., GANNON, D., AND JOHNSON, E. 1996. Portable parallel programming in HPC+ +. In Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing (Bloomingdale, Ill., Aug. 12), pp. 132-139.Google ScholarGoogle Scholar
  5. BERNSTEIN, D. 1988. PREFACE-2: Supporting nested parallelism in Fortran. Res. Rep. RC-14160. IBM Thomas J. Watson Research Center, Yorktown Heights, N.Y.Google ScholarGoogle Scholar
  6. BLELLOCH, G.E. 1989. Scans as primitive parallel operations. IEEE Trans. Comput. C-38, 11 (Nov.), 1526-1538. Google ScholarGoogle Scholar
  7. BLELLOCH, G.E. 1990. Vector Models for Data-Parallel Computing. The MIT Press, Cambridge, Mass. Google ScholarGoogle Scholar
  8. BLELLOCH, G. E., CHATTERJEE, S., HARDWICK, J. C., SIPELSTEIN, J., AND ZAGHA, M. 1994. Implementation of a portable nested data-parallel language. J. Paral. Distr. Comput. 21, 1 (Apr.), 4-14. Google ScholarGoogle Scholar
  9. BLELLOCH, G. E., GIBBONS, P. B., MATIAS, Y., AND NARLIKAR, G.J. 1997. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures (Newport, R.I., June 22-25). ACM, New York, pp. 12-23. Google ScholarGoogle Scholar
  10. BLELLOCH, G.E., AND GREINER, J. 1995. Parallelism in sequential functional languages. In Proceedings of the 7th SIGPLAN-SIGARCH-WG2.8 Conference on Functional Programming Languages and Computer Architecture (La Jolla, Calif., June 25-28). ACM, New York, pp. 226-237. Google ScholarGoogle Scholar
  11. BLELLOCH, G. E., AND GREINER, J. 1996. A provable time and space efficient implementation of NESL. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (Philadelphia, Pa., May 24-26). ACM, New York, pp. 213-225. Google ScholarGoogle Scholar
  12. BLUMOFE, R.D., FRIGO, M., JOERG, C.F., LEISERSON, C.E., AND RANDALL, K.H. 1996. An analysis of DAG-consistent distributed shared-memory algorithms. In Proceedings of the 8th ACM Symposium on Parallel Algorithms and Architectures (Padua, Italy, June 24-26). ACM, New York, pp. 297-308. Google ScholarGoogle Scholar
  13. BLUMOFE, R. D., JOERG, C. F., KUSZMAUL, B. C., LEISERSON, C. E., RANDALL, K. H., AND ZHOU, Y. 1995. CILK: An efficient multithreaded runtime system. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Santa Barbara, Calif., July 19-21). ACM, New York, pp. 207-216. Google ScholarGoogle Scholar
  14. BLUMOFE, R. D., AND LEISERSON, C.E. 1993. Space-efficient scheduling of multithreaded computations. In Proceedings of the 25th Annual ACM Symposium on Theory of Computing (San Diego, Calif., May 16-18). ACM, New York, pp. 362-371. Google ScholarGoogle Scholar
  15. BLUMOFE, R.D., AND LEISERSON, C.E. 1994. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th IEEE Symposium on Foundations of Computer Science (Santa Fe, N.M., Nov. 20-22). IEEE Computer Science Press, Los Alamitos, Calif., pp. 356-368.Google ScholarGoogle Scholar
  16. BRENT, R.P. 1974. The parallel evaluation of general arithmetic expressions. J. ACM 21, 2 (Apr.), 201-208. Google ScholarGoogle Scholar
  17. BURTON, F.W. 1988. Storage management in virtual tree machines. IEEE Trans. Comput. 37, 3 (Mar.), 321-328. Google ScholarGoogle Scholar
  18. BURTON, F.W. 1996. Guaranteeing good memory bounds for parallel programs. IEEE Trans. Softw. Eng. 22, 10 (Nov.), 762-773. Google ScholarGoogle Scholar
  19. BURTON, F.W., MCKEOWN, G.P., AND RAYWARD-SMITH, g.J. Applications of UET scheduling theory to the implementation of declarative languages. Comput. J. 33, 4 (Aug.), 330-336. Google ScholarGoogle Scholar
  20. BURTON, F.W., AND SIMPSON, D.J. 1994. Space efficient execution of deterministic parallel programs. Manuscript.Google ScholarGoogle Scholar
  21. BURTON, F.W., AND SLEEP, M.R. 1981. Executing functional programs on a virtual tree of processors. In Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture (Portsmouth, N.H., Oct. 18-22). ACM, New York, pp. 187-194. Google ScholarGoogle Scholar
  22. COFFMAN, E.G. 1976. Computer and Job-Shop Scheduling Theory. Wiley, New York.Google ScholarGoogle Scholar
  23. CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1990. Introduction to Algorithms. McGraw- Hill, New York, N.Y. Google ScholarGoogle Scholar
  24. CULLER, D. E., AND ARVIND 1988. Resource requirements of dataflow programs. In Proceedings of the 15th International Symposium on Computer Architecture (Honolulu, Ha., May 30-June 2). IEEE Computer Society Press, Los Alamitos, Calif., pp. 141-150. Google ScholarGoogle Scholar
  25. FANG, Z., TANG, P., YEW, P.-C., AND ZHU, C.-Q. 1990. Dynamic processor self-scheduling for general parallel nested loops. IEEE Trans. Comput. 39, 7 (July), 919-929. Google ScholarGoogle Scholar
  26. FEO, J. T., CANN, D. C., AND OLDEHOEFT, R.R. 1990. A report on the Sisal language project. J. Paral. Distr. Comput. 10, 4 (Dec.), 349-366. Google ScholarGoogle Scholar
  27. GII, J. 1994. Renaming and dispersing: Techniques for fast load balancing. J. Paral. Distr. Comput. 23, 2 (Nov.), 149-157. Google ScholarGoogle Scholar
  28. GIL, J., AND MATIAS, Y. 1991. Fast hashing on a PRAM-Designing by expectation. In Proceedings of the 2nd Annual ACM-SIAM Symposium on Discrete Algorithms (San Francisco, Calif., Jan. 28-30). ACM, New York, pp. 271-280. Google ScholarGoogle Scholar
  29. GIL, J., AND MATIAS, Y. 1996. An effective load balancing policy for geometric decaying algorithms. J. Paral. Distr. Comput. 36, 2 (Aug.), 185-188. Google ScholarGoogle Scholar
  30. GIL, J., MATIAS, Y., AND VISHKIN, U. 1991. Towards a theory of nearly constant time parallel algorithms. In Proceedings of the 32nd IEEE Symposium on Foundations of Computer Science (San Juan, P.R., Oct. 1-4). IEEE Computer Society Press, Los Alamitos, Calif., pp. 698-710. Google ScholarGoogle Scholar
  31. GOLDBERG, T., AND ZWlCK, U. 1995. Optimal deterministic approximate parallel prefix sums and their applications. In Proceedings of the 3rd Israel Symposium on the Theory of Computing and Systems (Tel-Aviv, Israel, Jan. 4-6). pp. 220-228. Google ScholarGoogle Scholar
  32. GOODRICH, M.T. 1991. Using approximation algorithms to design parallel algorithms that may ignore processor allocation. In Proceedings of the 32nd IEEE Symposium on Foundations of Computer Science (San Juan, P.R., Oct. 1-4). IEEE Computer Society Press, Los Alamitos, Calif., pp. 711-722. Google ScholarGoogle Scholar
  33. GOODRICH, M. T., MATIAS, Y., AND VISHKIN, U. 1994. Optimal parallel approximation algorithms for prefix sums and integer sorting. In Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms (Arlington, Va., Jan. 23-25). ACM, New York, pp. 241-250. Google ScholarGoogle Scholar
  34. GRAHAM, R.L. 1966. Bounds for certain multiprocessing anomalies. Bell Syst. Tech. J. 45, 9, 1563-1581.Google ScholarGoogle Scholar
  35. GRAHAM, R.L. 1969. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17, 2, 416-429.Google ScholarGoogle Scholar
  36. HAGERUP, T. 1991. Fast parallel space allocation, estimation and integer sorting. Tech. Rep. 03/91, SFB 124. Fachbereich Informatik, Universitfit des Saarlandes, D-6600 Saarbrticken, Germany.Google ScholarGoogle Scholar
  37. HAGERUP, T. 1993. Fast deterministic processor allocation. In Proceedings of the 4th ACM-SIAM Symposium on Discrete Algorithms (Austin, Tex., Jan. 25-27). ACM, New York, pp. 1-10. Google ScholarGoogle Scholar
  38. HALSTEAD, R.H., JR. 1985. Multilisp: A language for concurrent symbolic computation. ACM Trans. Prog. Lang. Syst. 7, 4, 501-538. Google ScholarGoogle Scholar
  39. HIGH PERFORMANCE FORTRAN FORUM. 1993. High Performance FORTRAN Language Specification (May).Google ScholarGoogle Scholar
  40. HOPCROFT, J., PAUL, W., AND VALIANT, L. 1977. On time versus space. J. ACM 24, 2 (Apr.), 332-337. Google ScholarGoogle Scholar
  41. HUMMEL, S. F., AND SCHONBERG, E. 1991. Low-overhead scheduling of nested parallelism. IBM J. Res. Devel. 35, 5/6, 743-765. Google ScholarGoogle Scholar
  42. JAGANNATHAN, S., AND PHILBIN, J. 1992. A foundation for an efficient multi-threaded Scheme system. In Proceedings of the 1992 ACM Conference on Lisp and Functional Programming (San Francisco, Calif., June 22-24). ACM, New York, pp. 345-357. Google ScholarGoogle Scholar
  43. LEASURE, B. 1989. PCF programming model and FORTRAN bindings. In Proceedings of the 13th IEEE International Computer Software and Applications Conference (Orland, Fla., Sept.). IEEE Computer Society Press, Los Alamitos, Calif., pp. 111-118.Google ScholarGoogle Scholar
  44. LEIGHTON, F. T., MAGGS, B. M., RANADE, A. G., AND RAO, S.B. 1994. Randomized routing and sorting on fixed-connection networks. J. Algorithms 17, 1 (July), 157-205. Google ScholarGoogle Scholar
  45. MATIAS, Y. 1992. Highly Parallel Randomized Algorithmics. Ph.D. dissertation. Tel Aviv University, Tel Aviv, Israel.Google ScholarGoogle Scholar
  46. MATIAS, Y., AND VISHKIN, U. 1991. Converting high probability into nearly-constant time--with applications to parallel hashing. In Proceedings of the 23rd Annual ACM Symposium on Theory of Computing (New Orleans, La., May 6-8). ACM, New York, pp. 307-316. Google ScholarGoogle Scholar
  47. MILLS, P.H., NYLAND, L.S., PRINS, J.F., REIF, J.H., AND WAGNER, R.A. 1990. Prototyping parallel and distributed programs in Proteus. Tech. Rep. UNC-CH TR90-041. Computer Science Dept., Univ. North Carolina, Chapel Hill, N.C.Google ScholarGoogle Scholar
  48. NARLIKAR, G.J. 1999. Scheduling threads for low space requirement and goal locality. In Proceedings of the llth Annual ACM Symposium on Parallel Algorithms and Architecture (Saint-Halo, France, June 27-30). ACM, New York, to appear. Google ScholarGoogle Scholar
  49. NARLIKAR, G.J., AND BLELLOCH, G.E. 1997. Space-efficient implementation of nested parallelism. In Proceedings of the 6th A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Las Vegas, Nev., June 18-21). ACM, New York, pp. 25-36. Google ScholarGoogle Scholar
  50. PAPADIMITRIOU, C. H., AND ULLMAN, J.D. 1987. A communication-time tradeoff. SIAM J. Comput. 16, 4 (Aug.), 639-646. Google ScholarGoogle Scholar
  51. PAPADIMITRIOU, C. H., AND YANNAKAKIS, M. 1988. Towards an architecture-independent analysis of parallel algorithms. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing (Chicago, Ill., May 2-4). ACM, New York, pp. 510-513. Google ScholarGoogle Scholar
  52. PAUL, W. J., VISHKIN, U., AND WAGENER, H. 1983. Parallel dictionaries on 2-3 trees. In Proceedings of the l Oth International Colloquium on Automata Languages and Programming. Lecture Notes in Computer Science, vol. 154. Springer-Verlag, New York, pp. 597-609. Google ScholarGoogle Scholar
  53. PIPPENGER, N. 1980. Pebbling. In Proceedings of the 5th IBM Symposium on Mathematical Foundations of Computer Science: Computational Complexity. (May).Google ScholarGoogle Scholar
  54. RANADE, A.G. 1991. How to emulate shared memory. J. Comput. Syst. Sci. 42, 307-326. Google ScholarGoogle Scholar
  55. REIF, J. H., ED. 1993. A Synthesis of Parallel Algorithms. Morgan-Kaufmann, San Mateo, Calif. Google ScholarGoogle Scholar
  56. RUGGIERO, C.A., AND SARGEANT, J. 1987. Control of parallelism in the Manchester dataflow machine. In Functional Programming Languages and Computer Architecture. Lecture Notes in Computer Science, vol. 174. Springer-Verlag, New York, pp. 1-15. Google ScholarGoogle Scholar
  57. SABOT, G.W. 1988. The Paralation Model: Architecture-Independent Parallel Programming. The MIT Press, Cambridge, Mass. Google ScholarGoogle Scholar
  58. SAVAGE, J. E., AND VITTER, J.S. 1984. Parallelism in space-time tradeoffs. In Proceedings of the International Workshop on Parallel Computing and VLSI (Amulfi, Italy, May). Elsevier Science Press, Amsterdam, The Netherlands, pp. 49-58.Google ScholarGoogle Scholar
  59. SucIu, D., AND TANNEN, V. 1994. Efficient compilation of high-level data parallel algorithms. In Proceedings of the 6th ACM Symposium on Parallel Algorithms and Architectures (Cape May, N.J., June 27-29). ACM, New York, pp. 57-66. Google ScholarGoogle Scholar
  60. VALIANT, L.G. 1990. General purpose parallel architectures. In Handbook of Theoretical Computer Science, Volume A, J. van Leeuwen, ed. Elsevier Science Publishers B.V., Amsterdam, The Netherlands, pp. 943-972. Google ScholarGoogle Scholar
  61. VENKATESWARAN, H., AND TOMPA, M. 1989. A new pebble game that characterizes parallel complexity classes. SIAM J. Comput. 18, 3, 533-549. Google ScholarGoogle Scholar
  62. VENKATESWARAN, P. W., AND TOMPA, M. 1985. Speedups of deterministic machines by synchronous parallel machines. J. Comput. Syst. Sci. 30, 149-161.Google ScholarGoogle Scholar

Index Terms

  1. Provably efficient scheduling for languages with fine-grained parallelism

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of the ACM
        Journal of the ACM  Volume 46, Issue 2
        March 1999
        137 pages
        ISSN:0004-5411
        EISSN:1557-735X
        DOI:10.1145/301970
        Issue’s Table of Contents

        Copyright © 1999 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 March 1999
        Published in jacm Volume 46, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader