research-article

Recent experiences in using MPI-3 RMA in the DASH PGAS runtime

Authors:
Joseph Schuchart

University of Stuttgart, Germany

University of Stuttgart, Germany
View Profile

,
Roger Kowalewski

Ludwig-Maximilians-Universität (LMU) Munich, Germany

Ludwig-Maximilians-Universität (LMU) Munich, Germany
View Profile

,
Karl Fuerlinger

Ludwig-Maximilians-Universität (LMU) Munich, Germany

Ludwig-Maximilians-Universität (LMU) Munich, Germany
View Profile

HPCAsia '18 Workshops: Proceedings of Workshops of HPC AsiaJanuary 2018Pages 21–30https://doi.org/10.1145/3176364.3176367

Published:31 January 2018Publication History

HPCAsia '18 Workshops: Proceedings of Workshops of HPC Asia

Pages 21–30

ABSTRACT

The Partitioned Global Address Space (PGAS) programming model has become a viable alternative to traditional message passing using MPI. The DASH project provides a PGAS abstraction entirely based on C++11. The underlying DASH RunTime, DART, provides communication and management functionality transparently to the user. In order to facilitate incremental transitions of existing MPI-parallel codes, the development of DART has focused on creating a PGAS runtime based on the MPI-3 RMA standard. From an MPI-RMA user perspective, this paper outlines our recent experiences in the development of DART and presents insights into issues that we faced and how we attempted to solve them, including issues surrounding memory allocation and memory consistency as well as communication latencies. We implemented a set of benchmarks for global memory allocation latency in the framework of the OSU micro-benchmark suite and present results for allocation and communication latency measurements of different global memory allocation strategies under three different MPI implementations.

References

Saman Amarasinghe, Dan Campbell, William Carlson, Andrew Chien, William Dally, Elmootazbellah Elnohazy, Mary Hall, Robert Harrison, William Harrod, Kerry Hill, et al. 2009. Exascale software study: Software challenges in extreme scale systems. DARPA IPTO, Air Force Research Labs, Tech. Rep (2009).Google Scholar
Roberto Belli and Torsten Hoefler. 2015. Notified access: Extending remote memory access programming models for producer-consumer synchronization. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. IEEE. Google ScholarDigital Library
Dan Bonachea and Jason Duell. 2004. Problems with Using MPI 1.1 and 2.0 As Compilation Targets for Parallel Language Implementations. Int. J. High Perform. Comput. Netw. 1, 1--3 (Aug. 2004). Google ScholarDigital Library
Bradford L. Chamberlain, David Callahan, and Hans P. Zima. 2007. Parallel Programmability and the Chapel Language. International Journal of High Performance Computing Applications 21 (August 2007). Google ScholarDigital Library
Barbara Chapman, Tony Curtis, Swaroop Pophale, Stephen Poole, Jeff Kuehn, Chuck Koelbel, and Lauren Smith. 2010. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. Google ScholarDigital Library
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph Von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. ACM Sigplan Notices 40, 10 (2005). Google ScholarDigital Library
J. Dinan, P. Balaji, J. R. Hammond, S. Krishnamoorthy, and V. Tipparaju. 2012. Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication. In IEEE 26th International Parallel and Distributed Processing Symposium. Google ScholarDigital Library
Alessandro Fanfarillo, Tobias Burnus, Valeria Cardellini, Salvatore Filippone, Dan Nagle, and Damian Rouson. 2014. OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM. Google ScholarDigital Library
MPI Forum. 2015. MPI: A Message-Passing Interface Standard. Standard. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdfGoogle Scholar
Tobias Fuchs and Karl Fürlinger. 2016. Expressing and Exploiting Multi-Dimensional Locality in DASH. In Software for Exascale Computing-SPPEXA 2013--2015. Springer.Google Scholar
Karl Fuerlinger, Tobias Fuchs, and Roger Kowalewski. 2016. DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms. In 2016 IEEE 18th International Conference on High Performance Computing and Communications.Google ScholarCross Ref
Antonio Gómez-Iglesias, Dmitry Pekurovsky, Khaled Hamidouche, Jie Zhang, and Jérôme Vienne. 2015. Porting Scientific Libraries to PGAS in XSEDE Resources: Practice and Experience. In Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure (XSEDE '15). ACM. Google ScholarDigital Library
William D. Gropp and Rajeev Thakur. 2007. Revealing the Performance of MPI RMA Implementations. Springer Berlin Heidelberg.Google Scholar
Daniel Grünewald and Christian Simmendinger. 2013. The GASPI API specification and its implementation GPI 2.0. In 7th International Conference on PGAS Programming Models.Google Scholar
Jeff R. Hammond, Sayan Ghosh, and Barbara M. Chapman. 2014. Implementing OpenSHMEM Using MPI-3 One-Sided Communication. Springer International Publishing.Google Scholar
Nathan Hjelm. 2014. Optimizing One-sided Operations in Open MPI. In Proceedings of the 21st European MPI Users' Group Meeting (EuroMPI/ASIA '14). ACM. Google ScholarDigital Library
Troy A Johnson. 2013. Coarray C++. In 7th International Conference on PGAS Programming Models.Google Scholar
J. Lee and M. Sato. 2010. Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems. In 2010 39th International Conference on Parallel Processing Workshops. Google ScholarDigital Library
Jarek Nieplocha and Bryan Carpenter. 1999. ARMCI: A portable remote memory copy library for distributed array libraries and compiler runtime systems. In Parallel and Distributed Processing. Springer. Google ScholarDigital Library
Jaroslaw Nieplocha, Robert J Harrison, and Richard J Littlefield. 1994. Global Arrays: a portable shared-memory programming model for distributed memory computers. In Proceedings of the 1994 ACM/IEEE conference on Supercomputing. Google ScholarDigital Library
Robert W. Numrich and John Reid. 1998. Co-array Fortran for parallel programming. SIGPLAN Fortran Forum (Aug. 1998). Google ScholarDigital Library
Monika ten Bruggencate and Duncan Roweth. 2010. DMAPP - An API for One-sided Program Models on Baker Systems. In 52. Cray User Group (CUG).Google Scholar
UPC Consortium. 2005. UPC Language Specifications, v1.2. Tech Report LBNL-59208. Lawrence Berkeley National Lab. http://www.gwu.edu/~upc/publications/LBNL-59208.pdfGoogle Scholar
Chaoran Yang, Wesley Bland, John Mellor-Crummey, and Pavan Balaji. 2014. Portable, MPI-interoperable Coarray Fortran. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM. Google ScholarDigital Library
Katherine Yelick, Dan Bonachea, Wei-Yu Chen, Phillip Colella, Kaushik Datta, Jason Duell, Susan L. Graham, Paul Hargrove, Paul Hilfinger, Parry Husbands, Costin Iancu, Amir Kamil, Rajesh Nishtala, Jimmy Su, Michael Welcome, and Tong Wen. 2007. Productivity and Performance Using Partitioned Global Address Space Languages. In Proceedings of the 2007 International Workshop on Parallel Symbolic Computation (PASCO '07). ACM. Google ScholarDigital Library
Yili Zheng, Amir Kamil, Michael B Driscoll, Hongzhang Shan, and Katherine Yelick. 2014. UPC++: a PGAS Extension for C++. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. Google ScholarDigital Library
Huan Zhou, Kamran Idrees, and José Gracia. 2015. Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems. In Euro-Par 2015: Parallel Processing - 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24--28, 2015, Proceedings.Google ScholarCross Ref
Huan Zhou, Yousri Mhedheb, Kamran Idrees, Colin Glass, José Gracia, Karl Fürlinger, and Jie Tao. 2014. DART-MPI: An MPI-based Implementation of a PGAS Runtime System. In The 8th International Conference on Partitioned Global Address Space Programming Models (PGAS). Google ScholarDigital Library

Index Terms

Recent experiences in using MPI-3 RMA in the DASH PGAS runtime
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Productivity and performance using partitioned global address space languages
PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computation

Partitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a ...
Read More
Asynchronous PGAS runtime for Myrinet networks
PGAS '10: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model

PGAS languages aim to enhance productivity for large scale systems. The IBM Asynchronous PGAS runtime (APGAS) supports various high productivity programming languages including UPC, X10 and CAF. The runtime has been designed for scalability and ...
Read More
Hybrid PGAS runtime support for multicore nodes
PGAS '10: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model

With multicore processors as the standard building block for high performance systems, parallel runtime systems need to provide excellent performance on shared memory, distributed memory, and hybrids. Conventional wisdom suggests that threads should be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPCAsia '18 Workshops: Proceedings of Workshops of HPC Asia
January 2018
86 pages
ISBN:9781450363471
DOI:10.1145/3176364
Conference Chair:
Toshio Endo
Tokyo Institute of Technology
,
General Chairs:
Mitsuo Yokokawa
Kobe University
,
Toshihiro Hanawa
The University of Tokyo
,
Program Chair:
Osamu Tatebe
University of Tsukuba
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 January 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DASH
MPI-RMA
PGAS
communication latency
global memory allocation
partitioned global address space
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate69of143submissions,48%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 115
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Recent experiences in using MPI-3 RMA in the DASH PGAS runtime

HPCAsia '18 Workshops: Proceedings of Workshops of HPC Asia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Productivity and performance using partitioned global address space languages

Asynchronous PGAS runtime for Myrinet networks

Hybrid PGAS runtime support for multicore nodes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Recent experiences in using MPI-3 RMA in the DASH PGAS runtime

HPCAsia '18 Workshops: Proceedings of Workshops of HPC Asia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Productivity and performance using partitioned global address space languages

Asynchronous PGAS runtime for Myrinet networks

Hybrid PGAS runtime support for multicore nodes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media