ABSTRACT
We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems. This translation aims to extend the ease of creating parallel applications with OpenMP to a wider variety of platforms, such as commodity cluster systems. We present key concepts and describe techniques to analyze and efficiently handle both regular and irregular accesses to shared data.We evaluate the performance achieved by our translation scheme on seven representative OpenMP applications, two from SPEC OMPM2001 and five from the NAS Parallel Benchmarks suite, on two different platforms. The average scalability (execution time relative to the serial version) achieved is within 12% of that achieved by corresponding hand-tuned MPI applications. We also compare our programs with versions deployed for a Software Distributed Shared Memory (SDSM) system and find that the direct translation to MPI achieves up to 30% higher scalability. A comparison with High Performance Fortran (HPF) versions of two NAS benchmarks indicates that our translated OpenMP versions achieve 12% to 89% better performance than the HPF versions.
- T. Abdelrahman and T. Wong, Compiler support for data distribution on NUMA multiprocessors. Journal of Supercomputing, 12(4):349--371, October 1998.]] Google ScholarDigital Library
- C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Treadmarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, 29(2):18--28, 1996.]] Google ScholarDigital Library
- V. Aslot, M. Domeika, R. Eigenmann, G. Gaertner, W. B. Jones, and B, Parady. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance. In Proc. of the Workshop on OpenMP Applications and Tools (WOMPAT2001), Lecture Notes in Computer Science, 2104, pages 1--10, July 2001.]] Google ScholarDigital Library
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.]]Google ScholarDigital Library
- P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su. The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers. In The First International Workshop on Parallel Processing, pages 322--330, Bangalore, India, Dec. 1994.]]Google Scholar
- J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris, C. Nelson, and C. Offner. Extending OpenMP for NUMA Machines. In Proc. of the IEEE/ACM Supercomputing'2000: High Performance Networking and Computing Conference (SC2000), November 2000.]] Google ScholarDigital Library
- W. Blume and R. Eigenmann. The range test: a dependence test for symbolic, non-linear expressions. In Proceedings of the 1994 ACM/IEEE conference on Supercomputing, pages 528--537. ACM Press, 1994.]]Google ScholarCross Ref
- Z. Bozkus, L. Meadows, S. Nakamoto, V. Schuster, and M. Young. Pghpf-an optimizing high performance fortran compiler for distributed memory machines. Sci. Program., 6(1):29--40, 1997.]] Google ScholarDigital Library
- D. Callahan and K. Kennedy. Analysis of interprocedural side effects in a parallel programming environment. J. Parallel Distrib. Comput., 5(5):517--550, 1988.]] Google ScholarDigital Library
- B. Chapman, P. Mehrotra, and H. Zima. Enhancing OpenMP with features for Locality Control. Technical Report TR99-02, Inst. for Software Technology and Parallel Systems, U. Vienna, www.par.univie.ac.at., 1999.]]Google Scholar
- D. E. Culler, A. C. Arpaci-Dusseau, S. C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. A. Yelick. Parallel Programming in Split-C. In Supercomputing, pages 262--273, 1993.]] Google ScholarDigital Library
- F. Darema, D. A. George, V. A. Norton, and G. F. Pfister. A single-program-multiple-data computational model for epex/fortran. Parallel Computing, 7(1):11--24, 1988.]]Google ScholarCross Ref
- R. Das, M. Uysal, J. Saltz, and Y.-S. S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--478, 1994.]] Google ScholarDigital Library
- T. El-Ghazawi and F. Cantonnet. "upc performance and potential: a npb experimental study". In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1--26. IEEE Computer Society Press, 2002.]] Google ScholarDigital Library
- T. El-Ghazawi, W. Carlson, and J. Draper. UPC Language Specifications V1.0, Feb. 2001.]]Google Scholar
- M. P. I. Forum. MPI: A Message-Passing Interface Standard, Technical Report UT-CS-94-230, 1994.]] Google ScholarDigital Library
- M. Frumkin, H. Jin, and J. Yan. Implementation of NAS Parallel Benchmarks in High Performance Fortran. Technical Report NAS-98-009.]]Google Scholar
- M. Gupta, E. Schonberg, and H. Srinivasan. A unified framework for optimizing communication in data-parallel programs:. IEEE Transactions on Parallel and Distributed Systems, 7(7):689--704, 1996.]] Google ScholarDigital Library
- P. Havlak and K. Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, 1991.]] Google ScholarDigital Library
- High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRP-CTR92225, Houston, Tex., 1993.]]Google Scholar
- Y. Hu, H. Lu, A. Cox, and W. Zwaenepoel. OpenMP for Networks of SMPs. Journal of Parallel and Distributed Computing, 60(12):1512--1530, December 2000.]] Google ScholarDigital Library
- H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011.]]Google Scholar
- K. Kennedy and N. Nedeljković. Combining dependence and data-flow analyses to optimize communication. In Proceedings of the 9th International Parallel Processing Symposium, Santa Barbara, CA, 1995.]] Google ScholarDigital Library
- C. Koelbel, D. Loveman, R. Schreiber, G. S. Jr., and M. Zosel. The High Performance Fortran Handbook. MIT Press, 1994.]] Google ScholarDigital Library
- U. Kremer. Automatic data layout for distributed memory machines. Technical Report TR96-261, 14, 1996.]]Google Scholar
- S.-I. Lee, T. A. Johnson, and R. Eigenmann. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation. In Proc. of the Workshop on Languages and Compilers for Parallel Computing(LCPC'03), pages 539--553. (Springer-Verlag Lecture Notes in Computer Science), Oct. 2003.]]Google Scholar
- J. Li and M. Chen. Generating explicit communication from shared-memory program references. In Proceedings of the 1990 ACM/IEEE conference on Supercomputing, pages 865--876. IEEE Computer Society, 1990.]]Google ScholarDigital Library
- J. Li and M. Chen. Compiling communication-efficient programs for massively parallel machines. IEEE Trans. Parallel Distrib. Syst., 2(3):361--376, 1991.]] Google ScholarDigital Library
- Y. Lin and D. Padua. Compiler analysis of irregular memory accesses. In Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pages 157--168. ACM Press, 2000.]] Google ScholarDigital Library
- H. Lu, S. Dwarkadas, A. L. Cox, and W. Zwaenepoel. Quantifying the performance differences between PVM and TreadMarks. Journal of Parallel and Distributed Computing, 43(2):65--78, 1997.]] Google ScholarDigital Library
- S.-J. Min, A. Basumallik, and R. Eigenmann. Optimizing OpenMP programs on Software Distributed Shared Memory Systems. International Journal of Parallel Programming, 31(3):225--249, June 2003.]] Google ScholarDigital Library
- S.-J. Min, A. Basumallik, and R. Eigenmann. Supporting realistic OpenMP applications on a commodity cluster of workstations. In OpenMP Shared Memory Parallel Programming: International Workshop on OpenMP Applications and Tools, WOMPAT 2003, Toronto, Canada, June 26--27, 2003. Proceedings Editors: M. J. Voss (Ed.), pages 170--179, 2003.]]Google ScholarCross Ref
- OpenMP Forum. OpenMP: A Proposed Industry Standard API for Shared Memory Programming. Technical report, October 1997.]]Google Scholar
- Y. Paek, J. Hoeflinger, and D. Padua. Efficient and precise array access analysis. ACM Trans. Program. Lang. Syst., 24(1):65--109, 2002.]] Google ScholarDigital Library
- W. M. Pottenger and R. Eigenmann. Idiom recognition in the Polaris Parallelizing Compiler. In International Conference on Supercomputing, pages 444--448, 1995.]] Google ScholarDigital Library
- V. Schuster and D. Miles. Distributed OpenMP, Extensions to OpenMP for SMP Clusters. In Proc. of the Workshop on OpenMP Applications and Tools (WOMPAT2000), July 2000.]]Google Scholar
- R. von Hanxleden, K. Kennedy, C. H. Koelbel, R. Das, and J. H. Saltz. Compiler analysis for irregular problems in Fortran D. In 1992 Workshop on Languages and Compilers for Parallel Computing, number 757, pages 97--111, New Haven, Conn., 1992. Berlin: Springer Verlag.]] Google ScholarDigital Library
- J. Zhu and J. Hoeflinger. Compiling for a Hybrid Programming Model Using the LMAD Representation. In Proc. of the 14th annual workshop on Languages and Compilers for Parallel Computing (LCPC2001), August 2001.]]Google Scholar
Recommendations
Optimizing irregular shared-memory applications for distributed-memory systems
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programmingIn prior work, we have proposed techniques to extend the ease of shared-memory parallel programming to distributed-memory platforms by automatic translation of OpenMP programs to MPI. In the case of irregular applications, the performance of this ...
Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors
SPAA '03: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architecturesWhen using a shared memory multiprocessor, the programmer faces the selection of the portable programming model which will deliver the best performance. Even if he restricts his choice to the standard programming environments (MPI and OpenMP), he has a ...
Performance comparison of MPI and OpenMP on shared memory multiprocessors: Research Articles
When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), ...
Comments