ABSTRACT
The Partitioned Global Address Space (PGAS) programming model has become a viable alternative to traditional message passing using MPI. The DASH project provides a PGAS abstraction entirely based on C++11. The underlying DASH RunTime, DART, provides communication and management functionality transparently to the user. In order to facilitate incremental transitions of existing MPI-parallel codes, the development of DART has focused on creating a PGAS runtime based on the MPI-3 RMA standard. From an MPI-RMA user perspective, this paper outlines our recent experiences in the development of DART and presents insights into issues that we faced and how we attempted to solve them, including issues surrounding memory allocation and memory consistency as well as communication latencies. We implemented a set of benchmarks for global memory allocation latency in the framework of the OSU micro-benchmark suite and present results for allocation and communication latency measurements of different global memory allocation strategies under three different MPI implementations.
- Saman Amarasinghe, Dan Campbell, William Carlson, Andrew Chien, William Dally, Elmootazbellah Elnohazy, Mary Hall, Robert Harrison, William Harrod, Kerry Hill, et al. 2009. Exascale software study: Software challenges in extreme scale systems. DARPA IPTO, Air Force Research Labs, Tech. Rep (2009).Google Scholar
- Roberto Belli and Torsten Hoefler. 2015. Notified access: Extending remote memory access programming models for producer-consumer synchronization. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE. Google ScholarDigital Library
- Dan Bonachea and Jason Duell. 2004. Problems with Using MPI 1.1 and 2.0 As Compilation Targets for Parallel Language Implementations. Int. J. High Perform. Comput. Netw. 1, 1--3 (Aug. 2004). Google ScholarDigital Library
- Bradford L. Chamberlain, David Callahan, and Hans P. Zima. 2007. Parallel Programmability and the Chapel Language. International Journal of High Performance Computing Applications 21 (August 2007). Google ScholarDigital Library
- Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. Google ScholarDigital Library
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph Von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. ACM Sigplan Notices 40, 10 (2005). Google ScholarDigital Library
- J. Dinan, P. Balaji, J. R. Hammond, S. Krishnamoorthy, and V. Tipparaju. 2012. Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication. In IEEE 26th International Parallel and Distributed Processing Symposium. Google ScholarDigital Library
- Alessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, and Damian Rouson. 2014. OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM. Google ScholarDigital Library
- MPI Forum. 2015. MPI: A Message-Passing Interface Standard. Standard. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdfGoogle Scholar
- Tobias Fuchs and Karl Fürlinger. 2016. Expressing and Exploiting Multi-Dimensional Locality in DASH. In Software for Exascale Computing-SPPEXA 2013--2015. Springer.Google Scholar
- Karl Fuerlinger, Tobias Fuchs, and Roger Kowalewski. 2016. DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms. In 2016 IEEE 18th International Conference on High Performance Computing and Communications.Google ScholarCross Ref
- Antonio Gómez-Iglesias, Dmitry Pekurovsky, Khaled Hamidouche, Jie Zhang, and Jérôme Vienne. 2015. Porting Scientific Libraries to PGAS in XSEDE Resources: Practice and Experience. In Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (XSEDE '15). ACM. Google ScholarDigital Library
- William D. Gropp and Rajeev Thakur. 2007. Revealing the Performance of MPI RMA Implementations. Springer Berlin Heidelberg.Google Scholar
- Daniel Grünewald and Christian Simmendinger. 2013. The GASPI API specification and its implementation GPI 2.0. In 7th International Conference on PGAS Programming Models.Google Scholar
- Jeff R. Hammond, Sayan Ghosh, and Barbara M. Chapman. 2014. Implementing OpenSHMEM Using MPI-3 One-Sided Communication. Springer International Publishing.Google Scholar
- Nathan Hjelm. 2014. Optimizing One-sided Operations in Open MPI. In Proceedings of the 21st European MPI Users' Group Meeting (EuroMPI/ASIA '14). ACM. Google ScholarDigital Library
- Troy A Johnson. 2013. Coarray C++. In 7th International Conference on PGAS Programming Models.Google Scholar
- J. Lee and M. Sato. 2010. Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems. In 2010 39th International Conference on Parallel Processing Workshops. Google ScholarDigital Library
- Jarek Nieplocha and Bryan Carpenter. 1999. ARMCI: A portable remote memory copy library for distributed array libraries and compiler runtime systems. In Parallel and Distributed Processing. Springer. Google ScholarDigital Library
- Jaroslaw Nieplocha, Robert J Harrison, and Richard J Littlefield. 1994. Global Arrays: a portable shared-memory programming model for distributed memory computers. In Proceedings of the 1994 ACM/IEEE conference on Supercomputing. Google ScholarDigital Library
- Robert W. Numrich and John Reid. 1998. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum (Aug. 1998). Google ScholarDigital Library
- Monika ten Bruggencate and Duncan Roweth. 2010. DMAPP - An API for One-sided Program Models on Baker Systems. In 52. Cray User Group (CUG).Google Scholar
- UPC Consortium. 2005. UPC Language Specifications, v1.2. Tech Report LBNL-59208. Lawrence Berkeley National Lab. http://www.gwu.edu/~upc/publications/LBNL-59208.pdfGoogle Scholar
- Chaoran Yang, Wesley Bland, John Mellor-Crummey, and Pavan Balaji. 2014. Portable, MPI-interoperable Coarray Fortran. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM. Google ScholarDigital Library
- Katherine Yelick, Dan Bonachea, Wei-Yu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L. Graham, Paul Hargrove, Paul Hilfinger, Parry Husbands, Costin Iancu, Amir Kamil, Rajesh Nishtala, Jimmy Su, Michael Welcome, and Tong Wen. 2007. Productivity and Performance Using Partitioned Global Address Space Languages. In Proceedings of the 2007 International Workshop on Parallel Symbolic Computation (PASCO '07). ACM. Google ScholarDigital Library
- Yili Zheng, Amir Kamil, Michael B Driscoll, Hongzhang Shan, and Katherine Yelick. 2014. UPC++: a PGAS Extension for C++. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. Google ScholarDigital Library
- Huan Zhou, Kamran Idrees, and José Gracia. 2015. Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems. In Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24--28, 2015, Proceedings.Google ScholarCross Ref
- Huan Zhou, Yousri Mhedheb, Kamran Idrees, Colin Glass, José Gracia, Karl Fürlinger, and Jie Tao. 2014. DART-MPI: An MPI-based Implementation of a PGAS Runtime System. In The 8th International Conference on Partitioned Global Address Space Programming Models (PGAS). Google ScholarDigital Library
Index Terms
- Recent experiences in using MPI-3 RMA in the DASH PGAS runtime
Recommendations
Productivity and performance using partitioned global address space languages
PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computationPartitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a ...
Asynchronous PGAS runtime for Myrinet networks
PGAS '10: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming ModelPGAS languages aim to enhance productivity for large scale systems. The IBM Asynchronous PGAS runtime (APGAS) supports various high productivity programming languages including UPC, X10 and CAF. The runtime has been designed for scalability and ...
Hybrid PGAS runtime support for multicore nodes
PGAS '10: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming ModelWith multicore processors as the standard building block for high performance systems, parallel runtime systems need to provide excellent performance on shared memory, distributed memory, and hybrids. Conventional wisdom suggests that threads should be ...
Comments