STAPL-RTS: An Application Driven Runtime System

Authors:
Ioannis Papadopoulos

Texas A&M University, College Station, TX, USA

Texas A&M University, College Station, TX, USA
View Profile

,
Nathan Thomas

Texas A&M University, College Station, TX, USA

Texas A&M University, College Station, TX, USA
View Profile

,
Adam Fidel

Texas A&M University, College Station, TX, USA

Texas A&M University, College Station, TX, USA
View Profile

,
Nancy M. Amato

Texas A&M University, College Station, TX, USA

Texas A&M University, College Station, TX, USA
View Profile

,
Lawrence Rauchwerger

Texas A&M University, College Station, TX, USA

Texas A&M University, College Station, TX, USA
View Profile

ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingJune 2015Pages 425–434https://doi.org/10.1145/2751205.2751233

Published:08 June 2015Publication History

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

Pages 425–434

ABSTRACT

Modern HPC systems are growing in complexity, as they move towards deeper memory hierarchies and increasing use of computational heterogeneity via GPUs or other accelerators. When developing applications for these platforms, programmers are faced with two bad choices. On one hand, they can explicitly manage all machine resources, writing programs decorated with low level primitives from multiple APIs (e.g. Hybrid MPI / OpenMP applications). Though seemingly necessary for efficient execution, it is an inherently non-scalable way to write software. Without a separation of concerns, only small programs written by expert developers actually achieve this efficiency. Furthermore, the implementations are rigid, difficult to extend, and not portable. Alternatively, users can adopt higher level programming environments to abstract away these concerns. Extensibility and portability, however, often come at the cost of lost performance. The mapping of a user's application onto the system now occurs without the contextual information that was immediately available in the more coupled approach.

In this paper, we describe a framework for the transfer of high level, application semantic knowledge into lower levels of the software stack at an appropriate level of abstraction. Using the STAPL library, we demonstrate how this information guides important decisions in the runtime system (STAPL-RTS), such as multi-protocol communication coordination and request aggregation. Through examples, we show how generic programming idioms already known to C++ programmers are used to annotate calls and increase performance.

References

Boost. http://www.boost.org/.Google Scholar
Performance Application Programming Interface. http://icl.cs.utk.edu/papi/.Google Scholar
C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Treadmarks: Shared memory computing on networks of workstations. Computer, 29(2):18--28, Feb. 1996. Google ScholarDigital Library
C. G. Baker and M. A. Heroux. Tpetra, and the use of generic programming in scientific computing. Sci. Program., 20(2):115--128, Apr. 2012. Google ScholarDigital Library
H. C. Baker, Jr. and C. Hewitt. The incremental garbage collection of processes. SIGPLAN Not., 12(8):55--59, Aug. 1977. Google ScholarDigital Library
A. Birka and M. D. Ernst. A practical type system and language for reference immutability. In Object-Oriented Prog. Systems, Langs., and Apps. (OOPSLA 2004), pages 35--49, Vancouver, BC, Canada, Oct. 2004. Google ScholarDigital Library
D. Bonachea and J. Duell. Problems with using mpi 1.1 and 2.0 as compilation targets for parallel language implementations. Int. J. High Perf. Comput. Netw., 1(1-3):91--99, Aug. 2004. Google ScholarDigital Library
D. Buntinas and G. Mercier. Design and evaluation of nemesis, a scalable, low-latency, message-passing communication subsystem. In Proc. of the Int. Symp. on Cluster Computing and the Grid, pages 521--530. IEEE Computer Society, 2006. Google ScholarDigital Library
A. Buss, A. Fidel, Harshvardhan, T. Smith, G. Tanase, N. Thomas, X. Xu, M. Bianco, N. M. Amato, and L. Rauchwerger. The STAPL pView. In Int. Wkshp. on Langs. and Comps. for Par. Comp. (LCPC), Houston, TX, USA, Sept. 2010. Google ScholarDigital Library
A. Buss, Harshvardhan, I. Papadopoulos, O. Pearce, T. Smith, G. Tanase, N. Thomas, X. Xu, M. Bianco, N. M. Amato, and L. Rauchwerger. STAPL: Standard template adaptive parallel library. In Proc. Annual Haifa Experimental Systems Conf. (SYSTOR), pages 1--10, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
D. Callahan, B. L. Chamberlain, and H. P. Zima. The cascade high productivity language. In 9th Int. Wkshp. on High-Level Par. Prog. Models and Supportive Environments (HIPS), pages 52--60, 2004.Google ScholarCross Ref
C. Campbell and A. Miller. A Parallel Programming with Microsoft Visual C++: Design Patterns for Decomposition and Coordination on Multicore Architectures. Microsoft Press, 1st edition, 2011. Google ScholarDigital Library
F. Cappello and D. Etiemble. Mpi versus mpi+openmp on ibm sp for the nas benchmarks. In Proc. of the 2000 ACM/IEEE Conf. on Supercomputing, SC '00, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarDigital Library
S. Chatterjee, S. Tasirlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with mpi. In Proc. Int. Par. and Dist. Proc. Symp. (IPDPS), pages 712--725, May 2013. Google ScholarDigital Library
I. Dhillon and D. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Par. Data Mining, volume 1759 of LNAI, pages 245--260. Springer-Verlag, 2000. Google ScholarDigital Library
A. Fidel, S. A. Jacobs, S. Sharma, N. M. Amato, and L. Rauchwerger. Using load balancing to scalably parallelize sampling-based motion planning algorithms. In Proc. Int. Par. and Dist. Proc. Symp. (IPDPS), Phoenix, Arizona, USA, May 2014. Google ScholarDigital Library
J. Ghosh and A. Liu. K-means. In The Top Ten Algorithms in Data Mining. CRC Press, Boca Raton, FL, USA, 2009.Google ScholarCross Ref
C. S. Gordon, M. J. Parkinson, J. Parsons, A. Bromfield, and J. Duffy. Uniqueness and reference immutability for safe parallelism. In Proc. of the ACM Int. Conf. on Object Oriented Prog. Systems Langs. and Apps., OOPSLA '12, pages 21--40, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
D. Gregor, B. Stroustrup, J. Widman, and J. Siek. Improvements to std::future and related apis. technical report n3857. ISO/IEC JTC 1, Information Technology Subcommittee SC 22, Programming Language C++, 2014.Google Scholar
Harshvardhan, A. Fidel, N. M. Amato, and L. Rauchwerger. The stapl parallel graph library. In Lecture Notes in Comp. Sci. (LNCS), pages 46--60. Springer Berlin Heidelberg, 2012.Google Scholar
T. Heller, H. Kaiser, A. Schäfer, and D. Fey. Using hpx and libgeodecomp for scaling hpc applications on heterogeneous supercomputers. In Proc. of the Wkshp. on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, pages 1:1--1:8, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
T. Hoefler, J. Dinan, D. Buntinas, P. Balaji, B. Barrett, R. Brightwell, W. D. Gropp, V. Kale, and R. Thakur. Mpi + mpi: A new hybrid approach to parallel programming with mpi plus shared memory. Computing, 95:1121--1136, 2013. Google ScholarDigital Library
F. Jiao, N. Mahajan, J. Willcock, A. Chauhan, and A. Lumsdaine. Partial globalization of partitioned address spaces for zero-copy communication with shared memory. In High Perf. Computing (HiPC), pages 1--10, Dec 2011. Google ScholarDigital Library
L. V. Kale and S. Krishnan. Charm++: A portable concurrent object oriented system based on c++. SIGPLAN Not., 28(10):91--108, Oct. 1993. Google ScholarDigital Library
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system- implementation and observations, 2009.Google Scholar
S. N. Labs. Portals Message Passing Interface. http://www.sandia.gov/Portals.Google Scholar
MPI forum. MPI: A Message-Passing Interface Standard Version 3.0. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf, 2012.Google Scholar
D. Musser, G. Derge, and A. Saini. STL Tutorial and Reference Guide, 2nd Edition. Addison-Wesley, 2001. Google ScholarDigital Library
S. Nanz, S. West, and K. S. D. Silveira. B.: Benchmarking usability and performance of multicore languages. In In: ESEM'13. IEEE Computer Society, 2013.Google Scholar
OpenMP Architecture Review Board. OpenMP - C and C++ Application Program Interface, October 1998. Document DN 004-2229-001.Google Scholar
Parallel Programming Laboratory, University of Illinois at Urbana-Champaign. The Charm++ Programming Language Manual. Version 6 (Release 1).Google Scholar
I. Pechtchanski and V. Sarkar. Immutability specification and its applications: Research articles. Concurr. Comput. : Pract. Exper., 17(5-6):639--662, Apr. 2005. Google ScholarDigital Library
S. Saunders and L. Rauchwerger. ARMI: an adaptive, platform independent communication library. In Proc. of the 9th ACM SIGPLAN Symp. on Principles and Practice of Par. Prog. (PPoPP), pages 230--241, San Diego, California, USA, 2003. ACM. Google ScholarDigital Library
S. S. Shende and A. D. Malony. The tau parallel performance system. The Int. J. of High Perf. Computing Apps., 20:287--331, 2006. Google ScholarDigital Library
J. Sillero, G. Borrell, J. Jiménez, and R. D. Moser. Hybrid openmp-mpi turbulent boundary layer code over 32k cores. In Proc. of the 18th European MPI Users' Group Conf. on Recent Advances in the Message Passing Interface, EuroMPI'11, pages 218--227, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
A. B. Sinha, L. V. Kalé, and B. Ramkumar. A dynamic and adaptive quiescence detection algorithm, 1993.Google Scholar
M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The complete reference. MIT Press, Cambridge, MA, 1996. Google ScholarDigital Library
B. Stroustrup. The C++ Programming Language. Addison Wesley Professional, 2013. Google ScholarDigital Library
G. Tanase, A. Buss, A. Fidel, Harshvardhan, I. Papadopoulos, O. Pearce, T. Smith, N. Thomas, X. Xu, N. Mourad, J. Vu, M. Bianco, N. M. Amato, and L. Rauchwerger. The STAPL Parallel Container Framework. In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPoPP), pages 235--246, San Antonio, Texas, USA, 2011. Google ScholarDigital Library
R. Thakur and W. D. Gropp. Test suite for evaluating performance of multithreaded mpi communication. Par. Computing, 35:608--617, Nov. 2008. Google ScholarDigital Library
J. J. Willcock, T. Hoefler, N. G. Edmonds, and A. Lumsdaine. Am++: A generalized active message framework. In Proc. of the 19th Int. Conf. on Par. Architectures and Compilation Techniques, PACT '10, pages 401--410, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
C. E. Wu et al. From trace generation to visualization: A performance framework for distributed parallel systems. In Proc. of SC2000: High Perf. Networking and Computing, Nov. 2000. Google ScholarDigital Library
M. Zandifar, N. Thomas, N. M. Amato, and L. Rauchwerger. The STAPL skeleton framework. In Proc. 27th Int. Wkshp. on Langs. and Comps. for Par. Comp. (LCPC), Hillsboro, OR, US, 2014.Google Scholar
Y. Zibin, A. Potanin, M. Ali, S. Artzi, A. Kiezun, and M. D. Ernst. Object and reference immutability using java generics. In Proc. of the the 6th Joint Meeting of the European Soft. Eng. Conf. and the ACM SIGSOFT Symp. on The Foundations of Soft. Eng., ESEC-FSE '07, pages 75--84, New York, NY, USA, 2007. ACM. Google ScholarDigital Library

Index Terms

STAPL-RTS: An Application Driven Runtime System
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Efficient Java RMI for parallel programming

Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation (RMI) provides a flexible kind of remote procedure call (RPC) that supports polymorphism. Sun's RMI implementation achieves this kind of ...
Read More
The semi-automatic parallelisation of scientific application codes using a computer aided parallelisation toolkit

The shared-memory programming model can be an effective way to achieve parallelism on shared memory parallel computers. Historically however, the lack of a programming standard using directives and the limited scalability have affected its take-up. ...
Read More
Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor
SBAC-PAD '13: Proceedings of the 2013 25th International Symposium on Computer Architecture and High Performance Computing

This paper presents preliminary performance comparisons of parallel applications developed natively for the Intel Xeon Phi accelerator using three different parallel programming environments and their associated runtime systems. We compare Intel OpenMP, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
June 2015
446 pages
ISBN:9781450335591
DOI:10.1145/2751205
General Chair:
Laxmi N. Bhuyan
University of California, Riverside
,
Program Chairs:
Fred Chong
University of California, Santa Barbara
,
Vivek Sarkar
Rice University
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
application driven optimizations
data flow
distributed memory
parallel programming
remote method invocation
runtime systems
shared memory
Qualifiers
- research-article
Conference

Acceptance Rates
ICS '15 Paper Acceptance Rate40of160submissions,25%Overall Acceptance Rate629of2,180submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 439
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

STAPL-RTS: An Application Driven Runtime System

ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient Java RMI for parallel programming

The semi-automatic parallelisation of scientific application codes using a computer aided parallelisation toolkit

Preliminary Experiments with XKaapi on Intel Xeon Phi Coprocessor