skip to main content
10.1145/2751205.2751233acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

STAPL-RTS: An Application Driven Runtime System

Published:08 June 2015Publication History

ABSTRACT

Modern HPC systems are growing in complexity, as they move towards deeper memory hierarchies and increasing use of computational heterogeneity via GPUs or other accelerators. When developing applications for these platforms, programmers are faced with two bad choices. On one hand, they can explicitly manage all machine resources, writing programs decorated with low level primitives from multiple APIs (e.g. Hybrid MPI / OpenMP applications). Though seemingly necessary for efficient execution, it is an inherently non-scalable way to write software. Without a separation of concerns, only small programs written by expert developers actually achieve this efficiency. Furthermore, the implementations are rigid, difficult to extend, and not portable. Alternatively, users can adopt higher level programming environments to abstract away these concerns. Extensibility and portability, however, often come at the cost of lost performance. The mapping of a user's application onto the system now occurs without the contextual information that was immediately available in the more coupled approach.

In this paper, we describe a framework for the transfer of high level, application semantic knowledge into lower levels of the software stack at an appropriate level of abstraction. Using the STAPL library, we demonstrate how this information guides important decisions in the runtime system (STAPL-RTS), such as multi-protocol communication coordination and request aggregation. Through examples, we show how generic programming idioms already known to C++ programmers are used to annotate calls and increase performance.

References

  1. Boost. http://www.boost.org/.Google ScholarGoogle Scholar
  2. Performance Application Programming Interface. http://icl.cs.utk.edu/papi/.Google ScholarGoogle Scholar
  3. C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Treadmarks: Shared memory computing on networks of workstations. Computer, 29(2):18--28, Feb. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. G. Baker and M. A. Heroux. Tpetra, and the use of generic programming in scientific computing. Sci. Program., 20(2):115--128, Apr. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. C. Baker, Jr. and C. Hewitt. The incremental garbage collection of processes. SIGPLAN Not., 12(8):55--59, Aug. 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Birka and M. D. Ernst. A practical type system and language for reference immutability. In Object-Oriented Prog. Systems, Langs., and Apps. (OOPSLA 2004), pages 35--49, Vancouver, BC, Canada, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Bonachea and J. Duell. Problems with using mpi 1.1 and 2.0 as compilation targets for parallel language implementations. Int. J. High Perf. Comput. Netw., 1(1-3):91--99, Aug. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Buntinas and G. Mercier. Design and evaluation of nemesis, a scalable, low-latency, message-passing communication subsystem. In Proc. of the Int. Symp. on Cluster Computing and the Grid, pages 521--530. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Buss, A. Fidel, Harshvardhan, T. Smith, G. Tanase, N. Thomas, X. Xu, M. Bianco, N. M. Amato, and L. Rauchwerger. The STAPL pView. In Int. Wkshp. on Langs. and Comps. for Par. Comp. (LCPC), Houston, TX, USA, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Buss, Harshvardhan, I. Papadopoulos, O. Pearce, T. Smith, G. Tanase, N. Thomas, X. Xu, M. Bianco, N. M. Amato, and L. Rauchwerger. STAPL: Standard template adaptive parallel library. In Proc. Annual Haifa Experimental Systems Conf. (SYSTOR), pages 1--10, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Callahan, B. L. Chamberlain, and H. P. Zima. The cascade high productivity language. In 9th Int. Wkshp. on High-Level Par. Prog. Models and Supportive Environments (HIPS), pages 52--60, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Campbell and A. Miller. A Parallel Programming with Microsoft Visual C++: Design Patterns for Decomposition and Coordination on Multicore Architectures. Microsoft Press, 1st edition, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Cappello and D. Etiemble. Mpi versus mpi+openmp on ibm sp for the nas benchmarks. In Proc. of the 2000 ACM/IEEE Conf. on Supercomputing, SC '00, Washington, DC, USA, 2000. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Chatterjee, S. Tasirlar, Z. Budimlic, V. Cave, M. Chabbi, M. Grossman, V. Sarkar, and Y. Yan. Integrating asynchronous task parallelism with mpi. In Proc. Int. Par. and Dist. Proc. Symp. (IPDPS), pages 712--725, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Dhillon and D. Modha. A data-clustering algorithm on distributed memory multiprocessors. In Large-Scale Par. Data Mining, volume 1759 of LNAI, pages 245--260. Springer-Verlag, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Fidel, S. A. Jacobs, S. Sharma, N. M. Amato, and L. Rauchwerger. Using load balancing to scalably parallelize sampling-based motion planning algorithms. In Proc. Int. Par. and Dist. Proc. Symp. (IPDPS), Phoenix, Arizona, USA, May 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Ghosh and A. Liu. K-means. In The Top Ten Algorithms in Data Mining. CRC Press, Boca Raton, FL, USA, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  18. C. S. Gordon, M. J. Parkinson, J. Parsons, A. Bromfield, and J. Duffy. Uniqueness and reference immutability for safe parallelism. In Proc. of the ACM Int. Conf. on Object Oriented Prog. Systems Langs. and Apps., OOPSLA '12, pages 21--40, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Gregor, B. Stroustrup, J. Widman, and J. Siek. Improvements to std::future and related apis. technical report n3857. ISO/IEC JTC 1, Information Technology Subcommittee SC 22, Programming Language C++, 2014.Google ScholarGoogle Scholar
  20. Harshvardhan, A. Fidel, N. M. Amato, and L. Rauchwerger. The stapl parallel graph library. In Lecture Notes in Comp. Sci. (LNCS), pages 46--60. Springer Berlin Heidelberg, 2012.Google ScholarGoogle Scholar
  21. T. Heller, H. Kaiser, A. Schäfer, and D. Fey. Using hpx and libgeodecomp for scaling hpc applications on heterogeneous supercomputers. In Proc. of the Wkshp. on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA '13, pages 1:1--1:8, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Hoefler, J. Dinan, D. Buntinas, P. Balaji, B. Barrett, R. Brightwell, W. D. Gropp, V. Kale, and R. Thakur. Mpi + mpi: A new hybrid approach to parallel programming with mpi plus shared memory. Computing, 95:1121--1136, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Jiao, N. Mahajan, J. Willcock, A. Chauhan, and A. Lumsdaine. Partial globalization of partitioned address spaces for zero-copy communication with shared memory. In High Perf. Computing (HiPC), pages 1--10, Dec 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. V. Kale and S. Krishnan. Charm++: A portable concurrent object oriented system based on c++. SIGPLAN Not., 28(10):91--108, Oct. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system- implementation and observations, 2009.Google ScholarGoogle Scholar
  26. S. N. Labs. Portals Message Passing Interface. http://www.sandia.gov/Portals.Google ScholarGoogle Scholar
  27. MPI forum. MPI: A Message-Passing Interface Standard Version 3.0. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf, 2012.Google ScholarGoogle Scholar
  28. D. Musser, G. Derge, and A. Saini. STL Tutorial and Reference Guide, 2nd Edition. Addison-Wesley, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Nanz, S. West, and K. S. D. Silveira. B.: Benchmarking usability and performance of multicore languages. In In: ESEM'13. IEEE Computer Society, 2013.Google ScholarGoogle Scholar
  30. OpenMP Architecture Review Board. OpenMP - C and C++ Application Program Interface, October 1998. Document DN 004-2229-001.Google ScholarGoogle Scholar
  31. Parallel Programming Laboratory, University of Illinois at Urbana-Champaign. The Charm++ Programming Language Manual. Version 6 (Release 1).Google ScholarGoogle Scholar
  32. I. Pechtchanski and V. Sarkar. Immutability specification and its applications: Research articles. Concurr. Comput. : Pract. Exper., 17(5-6):639--662, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Saunders and L. Rauchwerger. ARMI: an adaptive, platform independent communication library. In Proc. of the 9th ACM SIGPLAN Symp. on Principles and Practice of Par. Prog. (PPoPP), pages 230--241, San Diego, California, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. S. Shende and A. D. Malony. The tau parallel performance system. The Int. J. of High Perf. Computing Apps., 20:287--331, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Sillero, G. Borrell, J. Jiménez, and R. D. Moser. Hybrid openmp-mpi turbulent boundary layer code over 32k cores. In Proc. of the 18th European MPI Users' Group Conf. on Recent Advances in the Message Passing Interface, EuroMPI'11, pages 218--227, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. B. Sinha, L. V. Kalé, and B. Ramkumar. A dynamic and adaptive quiescence detection algorithm, 1993.Google ScholarGoogle Scholar
  37. M. Snir, S. W. Otto, S. Huss-Lederman, D. W. Walker, and J. Dongarra. MPI: The complete reference. MIT Press, Cambridge, MA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. B. Stroustrup. The C++ Programming Language. Addison Wesley Professional, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. Tanase, A. Buss, A. Fidel, Harshvardhan, I. Papadopoulos, O. Pearce, T. Smith, N. Thomas, X. Xu, N. Mourad, J. Vu, M. Bianco, N. M. Amato, and L. Rauchwerger. The STAPL Parallel Container Framework. In Proc. ACM SIGPLAN Symp. Prin. Prac. Par. Prog. (PPoPP), pages 235--246, San Antonio, Texas, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Thakur and W. D. Gropp. Test suite for evaluating performance of multithreaded mpi communication. Par. Computing, 35:608--617, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. J. Willcock, T. Hoefler, N. G. Edmonds, and A. Lumsdaine. Am++: A generalized active message framework. In Proc. of the 19th Int. Conf. on Par. Architectures and Compilation Techniques, PACT '10, pages 401--410, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. C. E. Wu et al. From trace generation to visualization: A performance framework for distributed parallel systems. In Proc. of SC2000: High Perf. Networking and Computing, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Zandifar, N. Thomas, N. M. Amato, and L. Rauchwerger. The STAPL skeleton framework. In Proc. 27th Int. Wkshp. on Langs. and Comps. for Par. Comp. (LCPC), Hillsboro, OR, US, 2014.Google ScholarGoogle Scholar
  44. Y. Zibin, A. Potanin, M. Ali, S. Artzi, A. Kiezun, and M. D. Ernst. Object and reference immutability using java generics. In Proc. of the the 6th Joint Meeting of the European Soft. Eng. Conf. and the ACM SIGSOFT Symp. on The Foundations of Soft. Eng., ESEC-FSE '07, pages 75--84, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. STAPL-RTS: An Application Driven Runtime System

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICS '15: Proceedings of the 29th ACM on International Conference on Supercomputing
        June 2015
        446 pages
        ISBN:9781450335591
        DOI:10.1145/2751205

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 June 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ICS '15 Paper Acceptance Rate40of160submissions,25%Overall Acceptance Rate629of2,180submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader