skip to main content
article
Open Access

Program transformation and runtime support for threaded MPI execution on shared-memory machines

Published:01 July 2000Publication History
Skip Abstract Section

Abstract

Parallel programs written in MPI have been widely used for developing high-performance applications on various platforms. Because of a restriction of the MPI computation model, conventional MPI implementations on shared-memory machines map each MPI node to an OS process, which can suffer serious performance degradation in the presence of multiprogramming. This paper studies compile-time and runtime techniques for enhancing performance portability of MPI code running on multiprogrammed shared-memory machines. The proposed techniques allow MPI nodes to be executed safety and efficiently as threads. Compile-time transformation eliminates global and static variables in C code using node-specific data. The runtime support includes an efficient and provably correct communication protocol that uses lock-free data structure and takes advantage of address space sharing among threads. The experiments on SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and that it has significant performance advantages in a multiprogrammed environment.

References

  1. Anderson, T. E. 1990. The performance of spin lock alternatives for shared-money multiprocessors. IEEE Trans. Parall. Distrib. Syst. 1, 1 (Jan.), 6-16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arora, N. S., Blumofe, R. D., and Plaxton, C. G. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the 10th Symposium on Paral lel Algorithms and Architectures. Puerto Vallarta, Mexico, 119-29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brightwell, R. and Skjellum, A. 1996. MPICH on the T3D: a case study of high performance message passing. Tech. rep., Computer Sci. Dept., Mississippi State Univ.Google ScholarGoogle Scholar
  4. Bruck, J., Dolev, D., Ho, C.-T., Rosu, M.-C., and Strong, R. 1997. Efficient message passing interface (MPI) for parallel computing on clusters of workstations. Journal of Parallel and Distributed Computing 40, 1 (10 Jan.), 19-34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cannon, L. E. 1969. A cellular computer to implement the kalman filter algorithm. Ph.D. thesis, Department of Electrical Engineering, Montana State University, Bozeman, MT. Available from UMI, Ann Arbor, MI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Crovella, M., Das, P., Dubnicki, C., LeBlanc, T., and Markatos, E. 1991. Multiprogramming on multiprocessors. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. IEEE, 590-597.Google ScholarGoogle Scholar
  7. Culler, D. E., Singh, J. P., and Gupta, A. 1999. Parallel Computer Architecture A Hardware/Software Approach, 1 ed. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Feitelson, D. 1997. Job scheduling in multiprogrammed parallel systems. Tech. Rep. Research Report RC 19790, IBM.Google ScholarGoogle ScholarCross RefCross Ref
  9. Ferrari, A. and Sunderam, V. 1995. TPVM: distributed concurrent computing with lightweight processes. In Proceedings of IEEE High Performance Distributed Computing. IEEE, Washington, D.C., 211-218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Foster, I., Kesselman, C., and Tuecke, S. 1996. The Nexus approach to integrating multithreading and communication. Journal of Parallel and Distributed Computing 37, 1 (25 Aug.), 70-82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gropp, W. and Lusk, E. 1997. A high-performance MPI implementation on a shared-memory vector supercomputer. Parallel Computing 22, 11 (Jan.), 1513-1526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gropp, W., Lusk, E., Doss, N., and Skjellum, A. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing 22, 6 (Sept.), 789-828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Herlihy, M. 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 11, 1 (Jan.), 124-149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jiang, D., Shan, H., and Singh, J. P. 1997. Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors. In Proceedings of the 6th ACM Symposium on Principles and Practice of Parallel Programming. ACM, New York, 217-29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kernighan, B. W. and Ritchie, D. M. 1988. The C Programming Language , 2ed. Prentice Hall, Inc, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kontothanassis, L. I., Wisniewski, R. W., and Scott, M. L. 1997. Scheduler-conscious synchronization. ACM Trans. Comput. Syst. 15, 1 (Feb.), 3-40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Leutenegger, S. T. and Vernon, M. K. 1990. The performance of multiprogrammed multiprocessor scheduling algorithms. In Proceedings of ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. New York, 226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lumetta, S. S. and Culler, D. E. 1998. Managing concurrent access for shared memory active messages. In Proceedings of the International Paral lel Processing Symposium. Orlando, Florida, 272-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Massalin, H. and Pu, C. 1991. A lock-free multiprocessor OS kernel. Tech. Rep. CUCS-005-91, Computer Science Department, Columbia University. June.Google ScholarGoogle Scholar
  20. MPI-Forum. 1999. MPI Forum. http://www.mpi-forum.org.Google ScholarGoogle Scholar
  21. NCSA. 1999. NCSA note on SGI Origin 2000 IRIX 6.5. http://www.ncsa.uiuc.edu/SCD/ Consulting/Tips/Scheduler.html.Google ScholarGoogle Scholar
  22. NEC. 1999. MPI for NEC Supercomputers. http://www.ccrl-nece.technopark.gmd.de/~mpich/.Google ScholarGoogle Scholar
  23. Nichols, B., Buttlar, D., and Farrell, J. P. 1996. Pthread Programming, 1 ed. O'Reilly & Associates. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ousterhout, J. 1982. Scheduling techniques for concurrent systems. In Proceedings of the 3rd International Conference of Distributed Computing Systems. IEEE, 22-30.Google ScholarGoogle Scholar
  25. Patterson, D. A. and Hennessy, J. L. 1998. Computer Organization & Design, 2 ed. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Prakash, S. and Bagrodia, R. 1998. MPI-SIM: using parallel simulation to evaluate MPI programs. In Proceedings of Winter simulation. Washington, DC., 467-474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Protopopov, B. and Skjellum, A. 1998. A multi-threaded message passing interface(MPI) architecture: performance and program issues. Tech. rep., Computer Science Department, Mississippi State Univ.Google ScholarGoogle Scholar
  28. Salo, E. 1998. Personal communication.Google ScholarGoogle Scholar
  29. Shen, K., Tang, H., and Yang, T. 1999. Adaptive two-level thread Management for fast MPI execution on shared memory machines. In Proceedings of ACM/IEEE SuperComputing '99. ACM/IEEE, New York. Will be available from www.cs.ucsb.edu/research/tmpi. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Skjellum, A., Protopopov, B., and Hebert, S. 1996. A thread taxonomy for MPI. MPIDC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., and Dongarra, J. 1996. MPI: The Complete Reference. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tucker, A. and Gupta, A. 1989. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the 12th ACM Symposium on Operating System Principles. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yue, K. K. and Lilja, D. J. 1998. Dynamic processor allocation with the Solaris operating system. In Proceedings of the International Paral lel Processing Symposium. Orlando, Florida. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zahorjan, J. and McCann, C. 1990. Processor scheduling in shared memory multiprocessors. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, 214-225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Zhou, H. and Geist, A. 1997. LPVM: a step towards multithread PVM. Concurrency - Practice and Experience.Google ScholarGoogle Scholar

Index Terms

  1. Program transformation and runtime support for threaded MPI execution on shared-memory machines

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM Transactions on Programming Languages and Systems
                    ACM Transactions on Programming Languages and Systems  Volume 22, Issue 4
                    July 2000
                    189 pages
                    ISSN:0164-0925
                    EISSN:1558-4593
                    DOI:10.1145/363911
                    Issue’s Table of Contents

                    Copyright © 2000 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 1 July 2000
                    Published in toplas Volume 22, Issue 4

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • article

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader