Abstract
Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared-memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared-memory machines. The proposed techniques allow MPI nodes to be executed safety and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provably correct communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.
- Anderson, T. E. 1990. The performance of spin lock alternatives for shared-money multiprocessors. IEEE Trans. Parall. Distrib. Syst. 1, 1 (Jan.), 6-16. Google ScholarDigital Library
- Arora, N. S., Blumofe, R. D., and Plaxton, C. G. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the 10th Symposium on Paral lel Algorithms and Architectures. Puerto Vallarta, Mexico, 119-29. Google ScholarDigital Library
- Brightwell, R. and Skjellum, A. 1996. MPICH on the T3D: a case study of high performance message passing. Tech. rep., Computer Sci. Dept., Mississippi State Univ.Google Scholar
- Bruck, J., Dolev, D., Ho, C.-T., Rosu, M.-C., and Strong, R. 1997. Efficient message passing interface (MPI) for parallel computing on clusters of workstations. Journal of Parallel and Distributed Computing 40, 1 (10 Jan.), 19-34. Google ScholarDigital Library
- Cannon, L. E. 1969. A cellular computer to implement the kalman filter algorithm. Ph.D. thesis, Department of Electrical Engineering, Montana State University, Bozeman, MT. Available from UMI, Ann Arbor, MI. Google ScholarDigital Library
- Crovella, M., Das, P., Dubnicki, C., LeBlanc, T., and Markatos, E. 1991. Multiprogramming on multiprocessors. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, 590-597.Google Scholar
- Culler, D. E., Singh, J. P., and Gupta, A. 1999. Parallel Computer Architecture A Hardware/Software Approach, 1 ed. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
- Feitelson, D. 1997. Job scheduling in multiprogrammed parallel systems. Tech. Rep. Research Report RC 19790, IBM.Google ScholarCross Ref
- Ferrari, A. and Sunderam, V. 1995. TPVM: distributed concurrent computing with lightweight processes. In Proceedings of IEEE High Performance Distributed Computing. IEEE, Washington, D.C., 211-218. Google ScholarDigital Library
- Foster, I., Kesselman, C., and Tuecke, S. 1996. The Nexus approach to integrating multithreading and communication. Journal of Parallel and Distributed Computing 37, 1 (25 Aug.), 70-82. Google ScholarDigital Library
- Gropp, W. and Lusk, E. 1997. A high-performance MPI implementation on a shared-memory vector supercomputer. Parallel Computing 22, 11 (Jan.), 1513-1526. Google ScholarDigital Library
- Gropp, W., Lusk, E., Doss, N., and Skjellum, A. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing 22, 6 (Sept.), 789-828. Google ScholarDigital Library
- Herlihy, M. 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 11, 1 (Jan.), 124-149. Google ScholarDigital Library
- Jiang, D., Shan, H., and Singh, J. P. 1997. Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors. In Proceedings of the 6th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 217-29. Google ScholarDigital Library
- Kernighan, B. W. and Ritchie, D. M. 1988. The C Programming Language , 2ed. Prentice Hall, Inc, Englewood Cliffs, NJ. Google ScholarDigital Library
- Kontothanassis, L. I., Wisniewski, R. W., and Scott, M. L. 1997. Scheduler-conscious synchronization. ACM Trans. Comput. Syst. 15, 1 (Feb.), 3-40. Google ScholarDigital Library
- Leutenegger, S. T. and Vernon, M. K. 1990. The performance of multiprogrammed multiprocessor scheduling algorithms. In Proceedings of ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. New York, 226. Google ScholarDigital Library
- Lumetta, S. S. and Culler, D. E. 1998. Managing concurrent access for shared memory active messages. In Proceedings of the International Paral lel Processing Symposium. Orlando, Florida, 272-8. Google ScholarDigital Library
- Massalin, H. and Pu, C. 1991. A lock-free multiprocessor OS kernel. Tech. Rep. CUCS-005-91, Computer Science Department, Columbia University. June.Google Scholar
- MPI-Forum. 1999. MPI Forum. http://www.mpi-forum.org.Google Scholar
- NCSA. 1999. NCSA note on SGI Origin 2000 IRIX 6.5. http://www.ncsa.uiuc.edu/SCD/ Consulting/Tips/Scheduler.html.Google Scholar
- NEC. 1999. MPI for NEC Supercomputers. http://www.ccrl-nece.technopark.gmd.de/~mpich/.Google Scholar
- Nichols, B., Buttlar, D., and Farrell, J. P. 1996. Pthread Programming, 1 ed. O'Reilly & Associates. Google ScholarDigital Library
- Ousterhout, J. 1982. Scheduling techniques for concurrent systems. In Proceedings of the 3rd International Conference of Distributed Computing Systems. IEEE, 22-30.Google Scholar
- Patterson, D. A. and Hennessy, J. L. 1998. Computer Organization & Design, 2 ed. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
- Prakash, S. and Bagrodia, R. 1998. MPI-SIM: using parallel simulation to evaluate MPI programs. In Proceedings of Winter simulation. Washington, DC., 467-474. Google ScholarDigital Library
- Protopopov, B. and Skjellum, A. 1998. A multi-threaded message passing interface(MPI) architecture: performance and program issues. Tech. rep., Computer Science Department, Mississippi State Univ.Google Scholar
- Salo, E. 1998. Personal communication.Google Scholar
- Shen, K., Tang, H., and Yang, T. 1999. Adaptive two-level thread Management for fast MPI execution on shared memory machines. In Proceedings of ACM/IEEE SuperComputing '99. ACM/IEEE, New York. Will be available from www.cs.ucsb.edu/research/tmpi. Google ScholarDigital Library
- Skjellum, A., Protopopov, B., and Hebert, S. 1996. A thread taxonomy for MPI. MPIDC. Google ScholarDigital Library
- Snir, M., Otto, S., Huss-Lederman, S., Walker, D., and Dongarra, J. 1996. MPI: The Complete Reference. MIT Press. Google ScholarDigital Library
- Tucker, A. and Gupta, A. 1989. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the 12th ACM Symposium on Operating System Principles. ACM, New York. Google ScholarDigital Library
- Yue, K. K. and Lilja, D. J. 1998. Dynamic processor allocation with the Solaris operating system. In Proceedings of the International Paral lel Processing Symposium. Orlando, Florida. Google ScholarDigital Library
- Zahorjan, J. and McCann, C. 1990. Processor scheduling in shared memory multiprocessors. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 214-225. Google ScholarDigital Library
- Zhou, H. and Geist, A. 1997. LPVM: a step towards multithread PVM. Concurrency - Practice and Experience.Google Scholar
Index Terms
- Program transformation and runtime support for threaded MPI execution on shared-memory machines
Recommendations
Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles
When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), ...
Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines
PPoPP '99: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programmingMPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map each MPI node to an OS process, ...
Comments