skip to main content
10.1145/1088149.1088174acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Towards automatic translation of OpenMP to MPI

Published:20 June 2005Publication History

ABSTRACT

We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems. This translation aims to extend the ease of creating parallel applications with OpenMP to a wider variety of platforms, such as commodity cluster systems. We present key concepts and describe techniques to analyze and efficiently handle both regular and irregular accesses to shared data.We evaluate the performance achieved by our translation scheme on seven representative OpenMP applications, two from SPEC OMPM2001 and five from the NAS Parallel Benchmarks suite, on two different platforms. The average scalability (execution time relative to the serial version) achieved is within 12% of that achieved by corresponding hand-tuned MPI applications. We also compare our programs with versions deployed for a Software Distributed Shared Memory (SDSM) system and find that the direct translation to MPI achieves up to 30% higher scalability. A comparison with High Performance Fortran (HPF) versions of two NAS benchmarks indicates that our translated OpenMP versions achieve 12% to 89% better performance than the HPF versions.

References

  1. T. Abdelrahman and T. Wong, Compiler support for data distribution on NUMA multiprocessors. Journal of Supercomputing, 12(4):349--371, October 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Treadmarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, 29(2):18--28, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. Aslot, M. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B, Parady. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In Proc. of the Workshop on OpenMP Applications and Tools (WOMPAT2001), Lecture Notes in Computer Science, 2104, pages 1--10, July 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su. The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers. In The First International Workshop on Parallel Processing, pages 322--330, Bangalore, India, Dec. 1994.]]Google ScholarGoogle Scholar
  6. J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris, C. Nelson, and C. Offner. Extending OpenMP for NUMA Machines. In Proc. of the IEEE/ACM Supercomputing'2000: High Performance Networking and Computing Conference (SC2000), November 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Blume and R. Eigenmann. The range test: a dependence test for symbolic, non-linear expressions. In Proceedings of the 1994 ACM/IEEE conference on Supercomputing, pages 528--537. ACM Press, 1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Pghpf-an optimizing high performance fortran compiler for distributed memory machines. Sci. Program., 6(1):29--40, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Callahan and K. Kennedy. Analysis of interprocedural side effects in a parallel programming environment. J. Parallel Distrib. Comput., 5(5):517--550, 1988.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Chapman, P. Mehrotra, and H. Zima. Enhancing OpenMP with features for Locality Control. Technical Report TR99-02, Inst. for Software Technology and Parallel Systems, U. Vienna, www.par.univie.ac.at., 1999.]]Google ScholarGoogle Scholar
  11. D. E. Culler, A. C. Arpaci-Dusseau, S. C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. A. Yelick. Parallel Programming in Split-C. In Supercomputing, pages 262--273, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Darema, D. A. George, V. A. Norton, and G. F. Pfister. A single-program-multiple-data computational model for epex/fortran. Parallel Computing, 7(1):11--24, 1988.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. R. Das, M. Uysal, J. Saltz, and Y.-S. S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--478, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. El-Ghazawi and F. Cantonnet. "upc performance and potential: a npb experimental study". In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1--26. IEEE Computer Society Press, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. El-Ghazawi, W. Carlson, and J. Draper. UPC Language Specifications V1.0, Feb. 2001.]]Google ScholarGoogle Scholar
  16. M. P. I. Forum. MPI: A Message-Passing Interface Standard, Technical Report UT-CS-94-230, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Frumkin, H. Jin, and J. Yan. Implementation of NAS Parallel Benchmarks in High Performance Fortran. Technical Report NAS-98-009.]]Google ScholarGoogle Scholar
  18. M. Gupta, E. Schonberg, and H. Srinivasan. A unified framework for optimizing communication in data-parallel programs:. IEEE Transactions on Parallel and Distributed Systems, 7(7):689--704, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRP-CTR92225, Houston, Tex., 1993.]]Google ScholarGoogle Scholar
  21. Y. Hu, H. Lu, A. Cox, and W. Zwaenepoel. OpenMP for Networks of SMPs. Journal of Parallel and Distributed Computing, 60(12):1512--1530, December 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011.]]Google ScholarGoogle Scholar
  23. K. Kennedy and N. Nedeljković. Combining dependence and data-flow analyses to optimize communication. In Proceedings of the 9th International Parallel Processing Symposium, Santa Barbara, CA, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Koelbel, D. Loveman, R. Schreiber, G. S. Jr., and M. Zosel. The High Performance Fortran Handbook. MIT Press, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. U. Kremer. Automatic data layout for distributed memory machines. Technical Report TR96-261, 14, 1996.]]Google ScholarGoogle Scholar
  26. S.-I. Lee, T. A. Johnson, and R. Eigenmann. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation. In Proc. of the Workshop on Languages and Compilers for Parallel Computing(LCPC'03), pages 539--553. (Springer-Verlag Lecture Notes in Computer Science), Oct. 2003.]]Google ScholarGoogle Scholar
  27. J. Li and M. Chen. Generating explicit communication from shared-memory program references. In Proceedings of the 1990 ACM/IEEE conference on Supercomputing, pages 865--876. IEEE Computer Society, 1990.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. Parallel Distrib. Syst., 2(3):361--376, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 157--168. ACM Press, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. H. Lu, S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. Quantifying the performance differences between PVM and TreadMarks. Journal of Parallel and Distributed Computing, 43(2):65--78, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S.-J. Min, A. Basumallik, and R. Eigenmann. Optimizing OpenMP programs on Software Distributed Shared Memory Systems. International Journal of Parallel Programming, 31(3):225--249, June 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S.-J. Min, A. Basumallik, and R. Eigenmann. Supporting realistic OpenMP applications on a commodity cluster of workstations. In OpenMP Shared Memory Parallel Programming: International Workshop on OpenMP Applications and Tools, WOMPAT 2003, Toronto, Canada, June 26--27, 2003. Proceedings Editors: M. J. Voss (Ed.), pages 170--179, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  33. OpenMP Forum. OpenMP: A Proposed Industry Standard API for Shared Memory Programming. Technical report, October 1997.]]Google ScholarGoogle Scholar
  34. Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Trans. Program. Lang. Syst., 24(1):65--109, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. M. Pottenger and R. Eigenmann. Idiom recognition in the Polaris Parallelizing Compiler. In International Conference on Supercomputing, pages 444--448, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. V. Schuster and D. Miles. Distributed OpenMP, Extensions to OpenMP for SMP Clusters. In Proc. of the Workshop on OpenMP Applications and Tools (WOMPAT2000), July 2000.]]Google ScholarGoogle Scholar
  37. R. von Hanxleden, K. Kennedy, C. H. Koelbel, R. Das, and J. H. Saltz. Compiler analysis for irregular problems in Fortran D. In 1992 Workshop on Languages and Compilers for Parallel Computing, number 757, pages 97--111, New Haven, Conn., 1992. Berlin: Springer Verlag.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Zhu and J. Hoeflinger. Compiling for a Hybrid Programming Model Using the LMAD Representation. In Proc. of the 14th annual workshop on Languages and Compilers for Parallel Computing (LCPC2001), August 2001.]]Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ICS '05: Proceedings of the 19th annual international conference on Supercomputing
    June 2005
    414 pages
    ISBN:1595931678
    DOI:10.1145/1088149

    Copyright © 2005 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 20 June 2005

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate629of2,180submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader