ABSTRACT
We study the problem of executing parallel programs, in particular Cilk programs, on a collection of processors of different speeds. We consider a model in which each processor maintains an estimate of its own speed, where communication between processors has a cost, and where all scheduling must be online. This problem has been considered previously in the fields of asynchronous parallel computing and scheduling theory. Our model is a bridge between the assumptions in these fields. We provide a new more accurate analysis of of an old scheduling algorithm called the maximum utilization scheduler. Based on this analysis, we generalize this scheduling policy and define the high utilization scheduler. We next focus on the Cilck platform and introduce a new algorithm for scheduling Cilk multithreaded parallel programs on heterogeneous processors. This scheduler is inspired by the high utilization scheduler and is modified to fit in a Cilk context. A crucial aspect of our algorithm is that it keeps the original spirit of the Cilk scheduler. In fact, when our new algorithm runs on homogeneous processors, it exactly mimics the dynamics of the original Cilk scheduler.
- 1.N. Arora, R. Blumofe, and G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In SPAA: Annual A CM Symposium on Parallel Algorithms and Architectures, 1998.]] Google ScholarDigital Library
- 2.Y. Aumann, M. A. Bender, and L. Zhang. Efficient execution of nondeterministic parallel programs on asynchronous systems. Information and Computation, 139(1):1-16, 25 Nov. 1997. An earlier version of this paper appeared in the 8th Annual A CM Symposium on Parallel Algorithms and Architectures (SPAA), June 1996.]] Google ScholarDigital Library
- 3.Y. Aumann, K. Palem, Z. Kedem, and M. O. Rabin. Highly efficient asynchronous execution of large grained parallel programs. In Proceedings of the 3Jth Annual Symposium on the Foundations of Computer Science, pages 271-280, November 1993.]]Google ScholarDigital Library
- 4.Y. Aumann and M. O. Rabin. Clock construction in fully asynchronous parallel systems and PRAM simulation. In Proceedings of the 33rd Annual Symposium on the Foundations of Computer Science, pages 147-156, 1992.]]Google ScholarDigital Library
- 5.Y. Aumann and M. O. Rabin. Clock construction in fully asynchronous parallel systems and pram simulation. Theoretical Computer Science, 128:3-30, 1994.]] Google ScholarDigital Library
- 6.B. Awerbuch, Y. Azar, S. Leonardi, and O. Regev. Minimizing the flow time without migration. In Proceedings of the 31st Annual A CM Symposium on Theory of Computing, pages 198-205, May 1999.]] Google ScholarDigital Library
- 7.R. Blumofe. Scheduling multithreaded computations by work stealing. Seminar Talk. Joint work with N. Arora C. Leiserson, and G. Plaxton, 1998.]]Google Scholar
- 8.R. D. Blumofe. Executing Multithreaded Programs Efficiently. PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Sept. 1995.]] Google ScholarDigital Library
- 9.R. D. Blumofe and C. E. Leiserson. Space-efficient scheduling of multithreaded computations. In Proceedings of the Twenty Fifth Annual A CM Symposium on Theory of Computing, pages 362-371, San Diego, California, May 1993.]] Google ScholarDigital Library
- 10.R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, pages 356-368, Santa Fe, New Mexico, Nov. 1994.]]Google ScholarDigital Library
- 11.R. P. Brent. The parallel evaluation of general arithmetic expressions. J. A CM, 21(2):201-206, Apr. 1974.]] Google ScholarDigital Library
- 12.C. Chekuri and M. A. Bender. An efficient approximation algorithm for minimizing makespan on uniformly related machines. In Integer Programming and Combinatorial Optimization, volume 1412, pages 383-393, 1998.]]Google ScholarCross Ref
- 13.F. A. Chudak and D. B. Shmoys. Approximation algorithms for precedence-constrained scheduling problems on parallel machines that run at different speeds (extended abstract). In Proceedings of the Eighth Annual A CM-SIAM Symposium on Discrete A19orithms, pages 581-590, New Orleans, Louisiana, 5-7 Jan. 1997.]] Google ScholarDigital Library
- 14.E. G. Coffman and P. J. Denning. Operating Systems Theory. Prentice-Hall, Englewood Cliffs, N.J., 1973.]] Google ScholarDigital Library
- 15.R. Cole and O. Zajicek. The expected advantage of asynchrony. In Proc. of the A CM Symposium on Parallel Architectures and A19orithms, pages 85-94, 1989.]] Google ScholarDigital Library
- 16.P. B. Gibbons. A more practical PRAM model. In Proc. of the 1st A CM Symposium on Parallel Architectures and A19orithms, pages 158-168, June 1989.]] Google ScholarDigital Library
- 17.R. L. Graham. Bounds for certain multiprocessing anomalies. The Bell System Technical Journal, 45:1563-1581, Nov. 1966.]]Google ScholarCross Ref
- 18.R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2):416-429, Mar. 1969.]]Google ScholarDigital Library
- 19.J. M. Jaffe. An analysis of preemptive multiprocessor job scheduling. Mathematics of Operations Research, 5(3):415-421, Aug. 1980.]]Google ScholarDigital Library
- 20.J. M. Jaffe. Efficient scheduling of tasks without full use of processor resources. Theoretical Computer Science, 12:1-17, Aug. 1980.]]Google ScholarCross Ref
- 21.P. Kanellakis and A. Shvartsman. Efficient parallel algorithms can be made robust. In Proceedings of the 8th Annual A CM Symposium on the Principles of Distributed Computing, pages 211-221, 1989.]] Google ScholarDigital Library
- 22.P. Kanellakis and A. Shvartsman. Effecient parallel algorothms on restartable fail-stop processors. In Proceedings of the l Oth Annual A CM Symposium on the Principles of Distributed Computing, pages 23-36, 1991.]] Google ScholarDigital Library
- 23.P. Kanellakis and A. Shvartsman. Fault-Tolerant Parallel Computation. Kluwer Academic Publishers, 1997.]] Google ScholarDigital Library
- 24.Z. M. Kedem, K. V. Palem, M. O. Rabin, and A. Raghunathan. Efficient program transformation for resilient parallel computation via randomization. In Proceedings of the 2~th Annual A CM Symposium on the Theory of Computing, May 1992.]] Google ScholarDigital Library
- 25.Z. M. Kedem, K. V. Palem, A. Raghunathan, and P. G. Spirakis. Combining tentative and definite executions for very fast dependable parallel computing. In Proceedings of the 23rd Annual A CM Symposium on Theory of Computing, pages 381-390, May 1991.]] Google ScholarDigital Library
- 26.Z. M. Kedem, K. V. Palem, and P. G. Spirakis. Efficient robust parallel computations. In Proceedings of the 22rd Annual A CM Symposium on Theory of Computing, pages 138-148, May 1990.]] Google ScholarDigital Library
- 27.J. W. W. Liu and C. L. Liu. Bounds on scheduling algorithms for heterogeneous computing systems. North-Holland, pages 349-353, 1974.]]Google Scholar
- 28.C. Martel, A. Park, and R. Subramonian. Asynchronous PRAMs are (almost) as good as synchronous PRAMs. In Proceedings of the 31st Annual Symposium on the Foundations of Computer Science, pages 590-599, 1990.]]Google ScholarDigital Library
- 29.N. Nishimura. Asynchronous shared memory parallel computation. In Proc. of the 2nd A CM Symposium on Parallel Architectures and Algorithms, pages 76-84, 1990.]] Google ScholarDigital Library
- 30.J. Ullman. NP-complete scheduling problems. Journal Computing System Science, 10:384-393, 1975.]]Google ScholarDigital Library
Index Terms
- Scheduling Cilk multithreaded parallel programs on processors of different speeds
Recommendations
Linear-Time Algorithms for Scheduling on Parallel Processors
Linear-time algorithms are presented for several problems of scheduling n equal-length tasks on m identical parallel processors subject to precedence constraints. This improves upon previous time bounds for the maximum lateness problem with treelike ...
Parallel Scheduling Algorithms
<P>Parallel algorithms are given for scheduling problems such as scheduling to minimize the number of tardy jobs, job sequencing with deadlines, scheduling to minimize earliness and tardiness penalties, channel assignment, and minimizing the mean finish ...
Comments