Article

Towards automatic translation of OpenMP to MPI

Authors:
Ayon Basumallik

Purdue University, West Lafayette, IN

Purdue University, West Lafayette, IN
View Profile

,
Rudolf Eigenmann

Purdue University, West Lafayette, IN

Purdue University, West Lafayette, IN
View Profile

ICS '05: Proceedings of the 19th annual international conference on SupercomputingJune 2005Pages 189–198https://doi.org/10.1145/1088149.1088174

Published:20 June 2005Publication History

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

Pages 189–198

ABSTRACT

We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems. This translation aims to extend the ease of creating parallel applications with OpenMP to a wider variety of platforms, such as commodity cluster systems. We present key concepts and describe techniques to analyze and efficiently handle both regular and irregular accesses to shared data.We evaluate the performance achieved by our translation scheme on seven representative OpenMP applications, two from SPEC OMPM2001 and five from the NAS Parallel Benchmarks suite, on two different platforms. The average scalability (execution time relative to the serial version) achieved is within 12% of that achieved by corresponding hand-tuned MPI applications. We also compare our programs with versions deployed for a Software Distributed Shared Memory (SDSM) system and find that the direct translation to MPI achieves up to 30% higher scalability. A comparison with High Performance Fortran (HPF) versions of two NAS benchmarks indicates that our translated OpenMP versions achieve 12% to 89% better performance than the HPF versions.

References

T. Abdelrahman and T. Wong, Compiler support for data distribution on NUMA multiprocessors. Journal of Supercomputing, 12(4):349--371, October 1998.]] Google ScholarDigital Library
C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Treadmarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, 29(2):18--28, 1996.]] Google ScholarDigital Library
V. Aslot, M. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B, Parady. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In Proc. of the Workshop on OpenMP Applications and Tools (WOMPAT2001), Lecture Notes in Computer Science, 2104, pages 1--10, July 2001.]] Google ScholarDigital Library
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.]]Google ScholarDigital Library
P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su. The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers. In The First International Workshop on Parallel Processing, pages 322--330, Bangalore, India, Dec. 1994.]]Google Scholar
J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris, C. Nelson, and C. Offner. Extending OpenMP for NUMA Machines. In Proc. of the IEEE/ACM Supercomputing'2000: High Performance Networking and Computing Conference (SC2000), November 2000.]] Google ScholarDigital Library
W. Blume and R. Eigenmann. The range test: a dependence test for symbolic, non-linear expressions. In Proceedings of the 1994 ACM/IEEE conference on Supercomputing, pages 528--537. ACM Press, 1994.]]Google ScholarCross Ref
Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Pghpf-an optimizing high performance fortran compiler for distributed memory machines. Sci. Program., 6(1):29--40, 1997.]] Google ScholarDigital Library
D. Callahan and K. Kennedy. Analysis of interprocedural side effects in a parallel programming environment. J. Parallel Distrib. Comput., 5(5):517--550, 1988.]] Google ScholarDigital Library
B. Chapman, P. Mehrotra, and H. Zima. Enhancing OpenMP with features for Locality Control. Technical Report TR99-02, Inst. for Software Technology and Parallel Systems, U. Vienna, www.par.univie.ac.at., 1999.]]Google Scholar
D. E. Culler, A. C. Arpaci-Dusseau, S. C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. A. Yelick. Parallel Programming in Split-C. In Supercomputing, pages 262--273, 1993.]] Google ScholarDigital Library
F. Darema, D. A. George, V. A. Norton, and G. F. Pfister. A single-program-multiple-data computational model for epex/fortran. Parallel Computing, 7(1):11--24, 1988.]]Google ScholarCross Ref
R. Das, M. Uysal, J. Saltz, and Y.-S. S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--478, 1994.]] Google ScholarDigital Library
T. El-Ghazawi and F. Cantonnet. "upc performance and potential: a npb experimental study". In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1--26. IEEE Computer Society Press, 2002.]] Google ScholarDigital Library
T. El-Ghazawi, W. Carlson, and J. Draper. UPC Language Specifications V1.0, Feb. 2001.]]Google Scholar
M. P. I. Forum. MPI: A Message-Passing Interface Standard, Technical Report UT-CS-94-230, 1994.]] Google ScholarDigital Library
M. Frumkin, H. Jin, and J. Yan. Implementation of NAS Parallel Benchmarks in High Performance Fortran. Technical Report NAS-98-009.]]Google Scholar
M. Gupta, E. Schonberg, and H. Srinivasan. A unified framework for optimizing communication in data-parallel programs:. IEEE Transactions on Parallel and Distributed Systems, 7(7):689--704, 1996.]] Google ScholarDigital Library
P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, 1991.]] Google ScholarDigital Library
High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRP-CTR92225, Houston, Tex., 1993.]]Google Scholar
Y. Hu, H. Lu, A. Cox, and W. Zwaenepoel. OpenMP for Networks of SMPs. Journal of Parallel and Distributed Computing, 60(12):1512--1530, December 2000.]] Google ScholarDigital Library
H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011.]]Google Scholar
K. Kennedy and N. Nedeljković. Combining dependence and data-flow analyses to optimize communication. In Proceedings of the 9th International Parallel Processing Symposium, Santa Barbara, CA, 1995.]] Google ScholarDigital Library
C. Koelbel, D. Loveman, R. Schreiber, G. S. Jr., and M. Zosel. The High Performance Fortran Handbook. MIT Press, 1994.]] Google ScholarDigital Library
U. Kremer. Automatic data layout for distributed memory machines. Technical Report TR96-261, 14, 1996.]]Google Scholar
S.-I. Lee, T. A. Johnson, and R. Eigenmann. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation. In Proc. of the Workshop on Languages and Compilers for Parallel Computing(LCPC'03), pages 539--553. (Springer-Verlag Lecture Notes in Computer Science), Oct. 2003.]]Google Scholar
J. Li and M. Chen. Generating explicit communication from shared-memory program references. In Proceedings of the 1990 ACM/IEEE conference on Supercomputing, pages 865--876. IEEE Computer Society, 1990.]]Google ScholarDigital Library
J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. Parallel Distrib. Syst., 2(3):361--376, 1991.]] Google ScholarDigital Library
Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 157--168. ACM Press, 2000.]] Google ScholarDigital Library
H. Lu, S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. Quantifying the performance differences between PVM and TreadMarks. Journal of Parallel and Distributed Computing, 43(2):65--78, 1997.]] Google ScholarDigital Library
S.-J. Min, A. Basumallik, and R. Eigenmann. Optimizing OpenMP programs on Software Distributed Shared Memory Systems. International Journal of Parallel Programming, 31(3):225--249, June 2003.]] Google ScholarDigital Library
S.-J. Min, A. Basumallik, and R. Eigenmann. Supporting realistic OpenMP applications on a commodity cluster of workstations. In OpenMP Shared Memory Parallel Programming: International Workshop on OpenMP Applications and Tools, WOMPAT 2003, Toronto, Canada, June 26--27, 2003. Proceedings Editors: M. J. Voss (Ed.), pages 170--179, 2003.]]Google ScholarCross Ref
OpenMP Forum. OpenMP: A Proposed Industry Standard API for Shared Memory Programming. Technical report, October 1997.]]Google Scholar
Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Trans. Program. Lang. Syst., 24(1):65--109, 2002.]] Google ScholarDigital Library
W. M. Pottenger and R. Eigenmann. Idiom recognition in the Polaris Parallelizing Compiler. In International Conference on Supercomputing, pages 444--448, 1995.]] Google ScholarDigital Library
V. Schuster and D. Miles. Distributed OpenMP, Extensions to OpenMP for SMP Clusters. In Proc. of the Workshop on OpenMP Applications and Tools (WOMPAT2000), July 2000.]]Google Scholar
R. von Hanxleden, K. Kennedy, C. H. Koelbel, R. Das, and J. H. Saltz. Compiler analysis for irregular problems in Fortran D. In 1992 Workshop on Languages and Compilers for Parallel Computing, number 757, pages 97--111, New Haven, Conn., 1992. Berlin: Springer Verlag.]] Google ScholarDigital Library
J. Zhu and J. Hoeflinger. Compiling for a Hybrid Programming Model Using the LMAD Representation. In Proc. of the 14th annual workshop on Languages and Compilers for Parallel Computing (LCPC2001), August 2001.]]Google Scholar

Recommendations

Optimizing irregular shared-memory applications for distributed-memory systems
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

In prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this ...
Read More
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors
SPAA '03: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures

When using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a ...
Read More
Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles

When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '05: Proceedings of the 19th annual international conference on Supercomputing
June 2005
414 pages
ISBN:1595931678
DOI:10.1145/1088149
General Chair:
Arvind
MIT
,
Program Chair:
Larry Rudolph
MIT
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MPI
OpenMP
commodity clusters
compiler techniques
performance
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate629of2,180submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 997
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards automatic translation of OpenMP to MPI

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

Optimizing irregular shared-memory applications for distributed-memory systems

Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards automatic translation of OpenMP to MPI

ICS '05: Proceedings of the 19th annual international conference on Supercomputing

ABSTRACT

References

Cited By

Recommendations

Optimizing irregular shared-memory applications for distributed-memory systems

Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors

Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media