skip to main content
10.1145/777412.777433acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Published:07 June 2003Publication History

ABSTRACT

When using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a choice of a broad range of programming approaches.To help the programmer in his selection, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B) and two shared memory multiprocessors (IBM SP3 Night Hawk II, SGI Origin 3800). We also present a path from MPI to OpenMP SPMD guiding the programmers starting from an existing MPI code. We present the first SPMD OpenMP version of the NAS benchmark and compare it with other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared to MPI for a large set of experimental conditions. However the price of this performance is a strong programming effort on data set adaptation and inter-thread communications. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences.

References

  1. F. Cappello and D. Etiemble. MPI versus MPI+OpenMP on IBM SP for the NAS Benchmarks. Proc. of the international Conference on Supercomputing 2000 : High-Performance Networking and Computing (SC2000), 2000. http://www.sc2000.org/proceedings/techpapr/index.htm.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Kloos amd F. Mathey and P. Blaise. OpenMP and MPI programming with a CG algorithm. In Proceedings of the Second European Workshop on OpenMP (EWOMP 2000), http://www.epcc.ed.ac.uk/ewomp2000/proceedings.html, 2000.]]Google ScholarGoogle Scholar
  3. A. Kneer. Industrial Mixed OpenMP/MPI CFD application for Practical Use in Free-surface Flow Calculations. In International Workshop on OpenMP Applications and Tools, WOMPAT 2000, http://www.cs.uh.edu/wompat2000/Program.html, 2000.]]Google ScholarGoogle Scholar
  4. L. Smith and M. Bull. Development of Mixed Mode MPI/ OpenMP Applications. In International Workshop on OpenMP Applications and Tools, WOMPAT 2000, http://www.cs.uh.edu/wompat2000/Program.html, 2000.]]Google ScholarGoogle Scholar
  5. Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. MPI: The Complete Reference. Massachussets Institute of Technology Press, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jack Dongarra et al. Message Passing Interface Forum. www.mpi-forum.org/docs/docs.html, 1994.]]Google ScholarGoogle Scholar
  7. M.K. Bane et al. A Comparison of MPI and OpenMP Implementations of a Finite Analysis Code. In Cray User Group, (CUG-2000) (Noordwijk, Netherlands, 22--26 May 2000), 2000.]]Google ScholarGoogle Scholar
  8. Kazuhiro Kusano, Shigehisa Satoh, and Mitsuhisa Sato. Performance Evaluation of the Omni OpenMP Compiler. Lecture Notes in Computer Science, 1940:403--414, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Chapman, A. Patil, and A. Prabhakar. Performance Oriented Programming for NUMA Architectures. In Springer-Verlag~Berlin Heidelberg, editor, LNCS 2104, International Workshop on OpenMP Applications and Tools, WOMPAT 2001, West Lafayette, IN, USA, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Kloos amd F. Mathey and P. Blaise. OpenMP and MPI programming with a CG algorithm. In Proceedings of the Second European Workshop on OpenMP (EWOMP 2000), http://www.epcc.ed.ac.uk/ewomp2000/proceedings.html, 2000.]]Google ScholarGoogle Scholar
  11. E. Ayguade et al. NANOS: Effective Integration of Fine-grain Parallelism Exploitation and Multiprogramming, 1999.]]Google ScholarGoogle Scholar
  12. Yoshizumi Tanaka, Kenjiro Taura, Mitsuhisa Sato, and Akinori Yonezawa. Performance Evaluation of OpenMP Applications with Nested Parallelism. In Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 100--112, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David Bailey, Tim Harris, William Saphir, Rob van~der Wijngaart, Alex Woo, and Maurice Yarrow. The NAS Parallel Benchmarks 2.0. Report NAS-95-020, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Mail Stop T 27 A-1, Moffett Field, CA 94035-1000, USA, December 1995.]]Google ScholarGoogle Scholar
  14. F. C. Wong, R. P. Martin, R. H. Arpaci-Dusseau, and D. E. Culler. Architectural Requirements and Scalability of the NAS Parallel Benchmarks. In Proc. of international Conference on Supercomputing 1999 : High-Performance Networking and Computing (SC299), 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of the NAS Parallel Benchmarks and its Performance. In NASA Ames~Research Center, editor, Technical Report NAS-99-01, 1999.]]Google ScholarGoogle Scholar
  16. B. Armstrong, S. Wook Kim, and R. Eigenmann. Quantifying Differences between OpenMP and MPI Using a Large-Scale Application Suite. In Springer-Verlag~Berlin Heidelberg, editor, LNCS 1940, ISHPC International Workshop on OpenMP: Experiences and Implementations (WOMPEI 2000), 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. J. Wallcraft. SPMD OpenMP vs MPI for Ocean Models. In First European Workshop on OpenMP - EWOMP'99, http://www.it.lth.se/ewomp99/programme.html, 2000.]]Google ScholarGoogle Scholar

Index Terms

  1. Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

            Recommendations

            Reviews

            James Harold Davenport

            This article is precisely what the title says: a performance comparison of the message passing interface (MPI) and three openMP programming styles on shared memory multiprocessors. The three openMP styles are described as loop level, loop level with large parallel sections, and single program multiple data (SPMD). In fact, a total of seven different implementations of each of four elements of the NASA Ames (NAS) benchmark are compared. Each is tested on two dataset sizes, and on both an IBM SP3 Nighthawk II and an SGI Origin 3800, in each case with varying numbers of processors. This is clearly a large amount of data. Nonetheless, the authors draw some fairly clear conclusions, including The naive loop level openMP is simply not competitive and Currently, only SPMD programming with openMP [of the openMP variants] can provide good performance consistently. It would appear from the paper that it was a substantial effort to write and tune the openMP SPMD versions, and the authors list other drawbacks of this style. There is always the issue of (inadvertent) bias in studies like this, since the authors started from MPI versions, but this is an inevitable hazard, and the authors seem to have done what they could to reduce it. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SPAA '03: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
              June 2003
              374 pages
              ISBN:1581136617
              DOI:10.1145/777412

              Copyright © 2003 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 7 June 2003

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              SPAA '03 Paper Acceptance Rate38of106submissions,36%Overall Acceptance Rate447of1,461submissions,31%

              Upcoming Conference

              SPAA '24

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader