ABSTRACT
We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparable to the “best” hand-written programs.
- 1.S.tt. Bokhari, Assignment Problems in Parallel and Distributed Computing, Kluwer Academic Publisher, 1990. Google ScholarDigital Library
- 2.D. Callahan, K. Kennedy, Compiling programs for distributed-memory multi-processors, The Journal of Supercomputing, (2): 151-169, 1988.Google ScholarCross Ref
- 3.Ph. Chretienne, Task Scheduling over Distributed Memory Machines. Proc. of the International Workshop on Paral}{el and Distributed Algorithms, (North Holland, Ed.), 1989.Google Scholar
- 4.R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis, Distributed Memory Compiler Methods for Irregular Problems: Data Copy Reuse and Runtime Partitioning, ICASE Report No. 91-73, 1991.Google Scholar
- 5.J. J. Dongarra, and D.C. Sorensen, SCHEDULE: Tools for Developing and Analyzing Parallel Fortran Programs, in D.B. Gannon, L.H. Jamieson and tLJ. Douglass, editors, The Characteristics of Parallel Algorithms, pp. 363-394. The MIT Press, Cambridge, Massachusetts, 1987.Google Scholar
- 6.J. J. Dongarra, J. Du Croz, I. Duff and S. Hammarling, A Set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. on Math. Software, 16(1):1-17, Mar. 1990. Google ScholarDigital Library
- 7.T.H. Dunigan, Performance of the 1NTEL iPSC/860 and nCUBE 6~00 Hypercube, ORNL TM-11790, Oak Ridge National Laboratory, Oak Ridge, TN, Nov. 1991.Google Scholar
- 8.H.E1-Rewini and T.G. Lewis, Scheduling Parallel Program Tasks onto Arbitrary Target Machines, J. of parallel and Distributed Computing 9, 138- 153(1990). Google ScholarDigital Library
- 9.A. Gerasoulis and T. Yang, On the Granularity and Clustering of Directed Acyclic Task Graphs, LCSR-TR-153, Dept. of Computer Science, tLutgers Univ., 1990.Google Scholar
- 10.A. Gerasoulis and T. Yang, A Comparison of Clustering Heuristics for Scheduling DAGs on Multiprocessors, LCStL-TR-169, Dept. of Computer Science~ l~utgers Univ.~ 1991.Google Scholar
- 11.A. Gerasoulis and T. Yang, Compile-time scheduling and code generation for message-passing architectures, P~eport.Google Scholar
- 12.A. George, , M.T. Heath, and J. Liu, Parallel Cholesky Factorization on a Shared Memory Processor, Lin. Algebra Appl., 77(1986), pp. 165-187.Google ScholarCross Ref
- 13.S. Hiranandani, K. Kennedy and C.W. Tseng, Compiler Optimizations for Fortran D on MIMD Distributed-Memory Machines, Supercomputing 91, pp. 86-100. Google ScholarDigital Library
- 14.C. Koelbel and P. Mehrotra, Compiling global name-space parallel loops for distributed execution, IEEE Trans. on parallel and Distributed Systems 2(4), 199i. Google ScholarDigital Library
- 15.S.J. Kim and J.C Browne, A General Approach to Mapping of Parallel Computation upon Multiprocessor Architectures, Int'l Confi on Parallel Processing, vol 3, pp. 1-8, 1988.Google Scholar
- 16.J. Li and M. Chen, Compiling Communication- Efficent Programs for Massively Parallel Machines., May 1991, accepted in IEEE Trans. on Distributed and Parallel Computing. Google ScholarDigital Library
- 17.j.M. Ortega, Introduction to Parallel and Vector Solution of Linear Systems, New York:Plenum, 1988. Google ScholarDigital Library
- 18.C. D. Polychronopoulos, M.B. Girkar, M. R.Haghighat, C.L. Lee, B.P. Leung, and D.A. Schouten, The Structure of Parafrase-2: an Advanced Parallelizing Compiler for C and Fortran, in D. Gelernter, A. Nicolau and D. Padua (Eds.), Languages and Compilers/or Parallel Computing, 1990. Google ScholarDigital Library
- 19.A. Rogers, K. Pingali, Process Decomposition Through Locality of References SIGPLAN 1989, pp. 69-80. Google ScholarDigital Library
- 20.Y. Saad, Gaussian Elimination on Hypercubes, Parallel Algorithms and Architectures, Cosnard, M. et al. Eds., Elsevier Science Publishers, North- Holland, 1986. Google ScholarDigital Library
- 21.Y. Saad and M. H. Schultz, Data Communication in Hypercubes, DCS/RR-428, l~esearch l~eport, Yale Uiverstiy, 1985.Google Scholar
- 22.V. Sarkar, Partitioning and Scheduling Parallel Programs for Execution on Multiprocessors, The MIT Press, 1989. Google ScholarDigital Library
- 23.V. Sarkar, Determining Average Program Execution Times and their Variance, SIGPLAN 1989, pp. 298-312. Google ScholarDigital Library
- 24.Min-You Wu and D. Gajski, A Programming Aid for Hypercube Architectures, The Journal of Supercomputing, vol. 2, pp. 349-372, 1988.Google ScholarCross Ref
- 25.T. Yang and A. Gerasoulis, A Fast Scheduling Algorithm for DA Gs on an Unbounded Number of Processors, The Proceeding of IEEE Supercomputing 91, pp. 633-642. Google ScholarDigital Library
- 26.T. Yang and A. Gerasoulis, List Scheduling with and without Communication Delay, l~eport.Google Scholar
Index Terms
- PYRROS: static task scheduling and code generation for message passing multiprocessors
Recommendations
PYRROS: static task scheduling and code generation for message passing multiprocessors
ACM International Conference on Supercomputing 25th Anniversary VolumeWe describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and ...
Author retrospective for PYRROS: static task scheduling and code generation for message passing multiprocessors
ACM International Conference on Supercomputing 25th Anniversary VolumeGiven a program with annotated task parallelism represented as a directed acyclic graph (DAG), the PYRROS project was focused on fast DAG scheduling, code generation and runtime execution on distributed memory architectures. PYRROS scheduling goes ...
The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600: Performances
This paper compares the performance and scalability of SHMEM and MPI-2 one-sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were chosen to represent commonly used communication patterns ...
Comments