ABSTRACT
When using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a choice of a broad range of programming approaches.To help the programmer in his selection, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B) and two shared memory multiprocessors (IBM SP3 Night Hawk II, SGI Origin 3800). We also present a path from MPI to OpenMP SPMD guiding the programmers starting from an existing MPI code. We present the first SPMD OpenMP version of the NAS benchmark and compare it with other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared to MPI for a large set of experimental conditions. However the price of this performance is a strong programming effort on data set adaptation and inter-thread communications. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences.
- F. Cappello and D. Etiemble. MPI versus MPI+OpenMP on IBM SP for the NAS Benchmarks. Proc. of the international Conference on Supercomputing 2000 : High-Performance Networking and Computing (SC2000), 2000. http://www.sc2000.org/proceedings/techpapr/index.htm.]] Google ScholarDigital Library
- P. Kloos amd F. Mathey and P. Blaise. OpenMP and MPI programming with a CG algorithm. In Proceedings of the Second European Workshop on OpenMP (EWOMP 2000), http://www.epcc.ed.ac.uk/ewomp2000/proceedings.html, 2000.]]Google Scholar
- A. Kneer. Industrial Mixed OpenMP/MPI CFD application for Practical Use in Free-surface Flow Calculations. In International Workshop on OpenMP Applications and Tools, WOMPAT 2000, http://www.cs.uh.edu/wompat2000/Program.html, 2000.]]Google Scholar
- L. Smith and M. Bull. Development of Mixed Mode MPI/ OpenMP Applications. In International Workshop on OpenMP Applications and Tools, WOMPAT 2000, http://www.cs.uh.edu/wompat2000/Program.html, 2000.]]Google Scholar
- Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. MPI: The Complete Reference. Massachussets Institute of Technology Press, 1996.]] Google ScholarDigital Library
- Jack Dongarra et al. Message Passing Interface Forum. www.mpi-forum.org/docs/docs.html, 1994.]]Google Scholar
- M.K. Bane et al. A Comparison of MPI and OpenMP Implementations of a Finite Analysis Code. In Cray User Group, (CUG-2000) (Noordwijk, Netherlands, 22--26 May 2000), 2000.]]Google Scholar
- Kazuhiro Kusano, Shigehisa Satoh, and Mitsuhisa Sato. Performance Evaluation of the Omni OpenMP Compiler. Lecture Notes in Computer Science, 1940:403--414, 2000.]] Google ScholarDigital Library
- B. Chapman, A. Patil, and A. Prabhakar. Performance Oriented Programming for NUMA Architectures. In Springer-Verlag~Berlin Heidelberg, editor, LNCS 2104, International Workshop on OpenMP Applications and Tools, WOMPAT 2001, West Lafayette, IN, USA, 2001.]] Google ScholarDigital Library
- P. Kloos amd F. Mathey and P. Blaise. OpenMP and MPI programming with a CG algorithm. In Proceedings of the Second European Workshop on OpenMP (EWOMP 2000), http://www.epcc.ed.ac.uk/ewomp2000/proceedings.html, 2000.]]Google Scholar
- E. Ayguade et al. NANOS: Effective Integration of Fine-grain Parallelism Exploitation and Multiprogramming, 1999.]]Google Scholar
- Yoshizumi Tanaka, Kenjiro Taura, Mitsuhisa Sato, and Akinori Yonezawa. Performance Evaluation of OpenMP Applications with Nested Parallelism. In Languages, Compilers, and Run-Time Systems for Scalable Computers, pages 100--112, 2000.]] Google ScholarDigital Library
- David Bailey, Tim Harris, William Saphir, Rob van~der Wijngaart, Alex Woo, and Maurice Yarrow. The NAS Parallel Benchmarks 2.0. Report NAS-95-020, Numerical Aerodynamic Simulation Facility, NASA Ames Research Center, Mail Stop T 27 A-1, Moffett Field, CA 94035-1000, USA, December 1995.]]Google Scholar
- F. C. Wong, R. P. Martin, R. H. Arpaci-Dusseau, and D. E. Culler. Architectural Requirements and Scalability of the NAS Parallel Benchmarks. In Proc. of international Conference on Supercomputing 1999 : High-Performance Networking and Computing (SC299), 1999.]] Google ScholarDigital Library
- H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of the NAS Parallel Benchmarks and its Performance. In NASA Ames~Research Center, editor, Technical Report NAS-99-01, 1999.]]Google Scholar
- B. Armstrong, S. Wook Kim, and R. Eigenmann. Quantifying Differences between OpenMP and MPI Using a Large-Scale Application Suite. In Springer-Verlag~Berlin Heidelberg, editor, LNCS 1940, ISHPC International Workshop on OpenMP: Experiences and Implementations (WOMPEI 2000), 2000.]] Google ScholarDigital Library
- A. J. Wallcraft. SPMD OpenMP vs MPI for Ocean Models. In First European Workshop on OpenMP - EWOMP'99, http://www.it.lth.se/ewomp99/programme.html, 2000.]]Google Scholar
Index Terms
- Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors
Recommendations
Towards automatic translation of OpenMP to MPI
ICS '05: Proceedings of the 19th annual international conference on SupercomputingWe present compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems. This translation aims to extend the ease of creating parallel applications with ...
Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles
When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), ...
A Detailed Performance Analysis of the Interpolation Supplemented Lattice Boltzmann Method on the Cray T3E and Cray X1A Detailed Performance Analysis of the Interpolation Supplemented Lattice Boltzmann Method on the Cray T3E and Cray X1
A detailed study of the parallel performance of the interpolation supplemented lattice Boltzmann (ISLB) method using SHMEM and MPI on the Cray T3E-900 and Cray X1 architectures is presented. The noteworthy feature of the ...
Comments