Program transformation and runtime support for threaded MPI execution on shared-memory machines

Authors:
Hong Tang

Univ. of California, Santa Barbara

Univ. of California, Santa Barbara
View Profile

,
Kai Shen

Univ. of California, Santa Barbara

Univ. of California, Santa Barbara
View Profile

,
Tao Yang

Univ. of California, Santa Barbara

Univ. of California, Santa Barbara
View Profile

Authors Info & Claims

ACM Transactions on Programming Languages and Systems Volume 22 Issue 4pp 673–700https://doi.org/10.1145/363911.363920

Published:01 July 2000Publication History

ACM Transactions on Programming Languages and Systems

Abstract

Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared-memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared-memory machines. The proposed techniques allow MPI nodes to be executed safety and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provably correct communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.

References

Anderson, T. E. 1990. The performance of spin lock alternatives for shared-money multiprocessors. IEEE Trans. Parall. Distrib. Syst. 1, 1 (Jan.), 6-16. Google ScholarDigital Library
Arora, N. S., Blumofe, R. D., and Plaxton, C. G. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the 10th Symposium on Paral lel Algorithms and Architectures. Puerto Vallarta, Mexico, 119-29. Google ScholarDigital Library
Brightwell, R. and Skjellum, A. 1996. MPICH on the T3D: a case study of high performance message passing. Tech. rep., Computer Sci. Dept., Mississippi State Univ.Google Scholar
Bruck, J., Dolev, D., Ho, C.-T., Rosu, M.-C., and Strong, R. 1997. Efficient message passing interface (MPI) for parallel computing on clusters of workstations. Journal of Parallel and Distributed Computing 40, 1 (10 Jan.), 19-34. Google ScholarDigital Library
Cannon, L. E. 1969. A cellular computer to implement the kalman filter algorithm. Ph.D. thesis, Department of Electrical Engineering, Montana State University, Bozeman, MT. Available from UMI, Ann Arbor, MI. Google ScholarDigital Library
Crovella, M., Das, P., Dubnicki, C., LeBlanc, T., and Markatos, E. 1991. Multiprogramming on multiprocessors. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, 590-597.Google Scholar
Culler, D. E., Singh, J. P., and Gupta, A. 1999. Parallel Computer Architecture A Hardware/Software Approach, 1 ed. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
Feitelson, D. 1997. Job scheduling in multiprogrammed parallel systems. Tech. Rep. Research Report RC 19790, IBM.Google ScholarCross Ref
Ferrari, A. and Sunderam, V. 1995. TPVM: distributed concurrent computing with lightweight processes. In Proceedings of IEEE High Performance Distributed Computing. IEEE, Washington, D.C., 211-218. Google ScholarDigital Library
Foster, I., Kesselman, C., and Tuecke, S. 1996. The Nexus approach to integrating multithreading and communication. Journal of Parallel and Distributed Computing 37, 1 (25 Aug.), 70-82. Google ScholarDigital Library
Gropp, W. and Lusk, E. 1997. A high-performance MPI implementation on a shared-memory vector supercomputer. Parallel Computing 22, 11 (Jan.), 1513-1526. Google ScholarDigital Library
Gropp, W., Lusk, E., Doss, N., and Skjellum, A. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing 22, 6 (Sept.), 789-828. Google ScholarDigital Library
Herlihy, M. 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 11, 1 (Jan.), 124-149. Google ScholarDigital Library
Jiang, D., Shan, H., and Singh, J. P. 1997. Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors. In Proceedings of the 6th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 217-29. Google ScholarDigital Library
Kernighan, B. W. and Ritchie, D. M. 1988. The C Programming Language , 2ed. Prentice Hall, Inc, Englewood Cliffs, NJ. Google ScholarDigital Library
Kontothanassis, L. I., Wisniewski, R. W., and Scott, M. L. 1997. Scheduler-conscious synchronization. ACM Trans. Comput. Syst. 15, 1 (Feb.), 3-40. Google ScholarDigital Library
Leutenegger, S. T. and Vernon, M. K. 1990. The performance of multiprogrammed multiprocessor scheduling algorithms. In Proceedings of ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. New York, 226. Google ScholarDigital Library
Lumetta, S. S. and Culler, D. E. 1998. Managing concurrent access for shared memory active messages. In Proceedings of the International Paral lel Processing Symposium. Orlando, Florida, 272-8. Google ScholarDigital Library
Massalin, H. and Pu, C. 1991. A lock-free multiprocessor OS kernel. Tech. Rep. CUCS-005-91, Computer Science Department, Columbia University. June.Google Scholar
MPI-Forum. 1999. MPI Forum. http://www.mpi-forum.org.Google Scholar
NCSA. 1999. NCSA note on SGI Origin 2000 IRIX 6.5. http://www.ncsa.uiuc.edu/SCD/ Consulting/Tips/Scheduler.html.Google Scholar
NEC. 1999. MPI for NEC Supercomputers. http://www.ccrl-nece.technopark.gmd.de/~mpich/.Google Scholar
Nichols, B., Buttlar, D., and Farrell, J. P. 1996. Pthread Programming, 1 ed. O'Reilly & Associates. Google ScholarDigital Library
Ousterhout, J. 1982. Scheduling techniques for concurrent systems. In Proceedings of the 3rd International Conference of Distributed Computing Systems. IEEE, 22-30.Google Scholar
Patterson, D. A. and Hennessy, J. L. 1998. Computer Organization & Design, 2 ed. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
Prakash, S. and Bagrodia, R. 1998. MPI-SIM: using parallel simulation to evaluate MPI programs. In Proceedings of Winter simulation. Washington, DC., 467-474. Google ScholarDigital Library
Protopopov, B. and Skjellum, A. 1998. A multi-threaded message passing interface(MPI) architecture: performance and program issues. Tech. rep., Computer Science Department, Mississippi State Univ.Google Scholar
Salo, E. 1998. Personal communication.Google Scholar
Shen, K., Tang, H., and Yang, T. 1999. Adaptive two-level thread Management for fast MPI execution on shared memory machines. In Proceedings of ACM/IEEE SuperComputing '99. ACM/IEEE, New York. Will be available from www.cs.ucsb.edu/research/tmpi. Google ScholarDigital Library
Skjellum, A., Protopopov, B., and Hebert, S. 1996. A thread taxonomy for MPI. MPIDC. Google ScholarDigital Library
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., and Dongarra, J. 1996. MPI: The Complete Reference. MIT Press. Google ScholarDigital Library
Tucker, A. and Gupta, A. 1989. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the 12th ACM Symposium on Operating System Principles. ACM, New York. Google ScholarDigital Library
Yue, K. K. and Lilja, D. J. 1998. Dynamic processor allocation with the Solaris operating system. In Proceedings of the International Paral lel Processing Symposium. Orlando, Florida. Google ScholarDigital Library
Zahorjan, J. and McCann, C. 1990. Processor scheduling in shared memory multiprocessors. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 214-225. Google ScholarDigital Library
Zhou, H. and Geist, A. 1997. LPVM: a step towards multithread PVM. Concurrency - Practice and Experience.Google Scholar

Index Terms

Recommendations

Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles

When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), ...
Read More
Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines
PPoPP '99: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming

MPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map each MPI node to an OS process, ...
Read More
Compile/Run-time Support for Threaded MPI Execution on Multiprogrammed Shared Memory Machines
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Programming Languages and Systems Volume 22, Issue 4
July 2000
189 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/363911
Editor:
William Pugh
Univ. of Maryland, College Park
Issue’s Table of Contents
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2000
Published in toplas Volume 22, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MPI
lock-free synchronization
multiprogrammed environments
program transformation
shared-memory machines
threaded execution
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 29
  Total Citations
  View Citations
- 713
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Program transformation and runtime support for threaded MPI execution on shared-memory machines

ACM Transactions on Programming Languages and Systems

Abstract

References

Cited By

Index Terms

Recommendations

Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles

Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines

Compile/Run-time Support for Threaded MPI Execution on Multiprogrammed Shared Memory Machines