Top

Published in:

2016 | OriginalPaper | Chapter

Closing the Performance Gap with Modern C++

Authors : Thomas Heller, Hartmut Kaiser, Patrick Diehl, Dietmar Fey, Marc Alexander Schweitzer

Published in: High Performance Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as hardware architectures are becoming more and more diverse. Today’s heterogeneous systems often include two or more completely distinct and incompatible hardware execution models, such as GPGPU’s, SIMD vector units, and general purpose cores which conventionally have to be programmed using separate tool chains representing non-overlapping programming models. The recent revival of interest in the industry and the wider community for the C++ language has spurred a remarkable amount of standardization proposals and technical specifications in the arena of concurrency and parallelism. This recently includes an increasing amount of discussion around the need for a uniform, higher-level abstraction and programming model for parallelism in the C++ standard targeting heterogeneous and distributed computing. Such an abstraction should perfectly blend with existing, already standardized language and library features, but should also be generic enough to support future hardware developments. In this paper, we present the results from developing such a higher-level programming abstraction for parallelism in C++ which aims at enabling code and performance portability over a wide range of architectures and for various types of parallelism. We present and compare performance data obtained from running the well-known STREAM benchmark ported to our higher level C++ abstraction with the corresponding results from running it natively. We show that our abstractions enable performance at least as good as the comparable base-line benchmarks while providing a uniform programming API on all compared target architectures.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Behavioral Emulation for Scalable Design-Space Exploration of Algorithms and Architectures

next chapter Energy Efficient Runtime Framework for Exascale Systems

Bolt C++ Template Library. http://developer.amd.com/tools-and-sdks/opencl-zone/bolt-c-template-library/

C++ Single-source Heterogeneous Programming for OpenCL. https://www.khronos.org/sycl

HCC: an open source C++ compiler for heterogeneous devices. https://github.com/RadeonOpenCompute/hcc

OpenACC (Directives for Accelerators). http://www.openacc.org/

OpenMP: a proposed Industry standard API for shared memory programming, October 1997. http://www.openmp.org/mp-documents/paper/paper.ps

CUDA (2013). http://www.nvidia.com/object/cuda_home_new.html

N4406: parallel algorithms need executors. Technical report (2015). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4406.pdf

Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in HPC applications. In: PDP 2010 - The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing. IEEE, Pisa, Italy. https://hal.inria.fr/inria-00429889

Deakin, T., McIntosh-Smith, S.: GPU-STREAM: benchmarking the achievable memory bandwidth of graphics processing units. In: IEEE/ACM SuperComputing (2015)

10.

Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-Specific Languages and High-Level Frameworks for High-Performance ComputingCrossRef

11.

Hoberock, J., Bell, N.: Thrust: a parallel template library, vol. 42, p. 43 (2010). http://thrust.googlecode.com

12.

Hornung, R., Keasler, J., et al.: The Raja portability layer: overview andstatus. Lawrence Livermore National Laboratory, Livermore, USA (2014)

13.

Kaiser, H., Adelstein-Lelbach, B., Heller, T., Berg, A., Biddiscombe, J., Bikineev, A., Mercer, G., Schfer, A., Habraken, J., Serio, A., Anderson, M., Stumpf, M., Bourgeois, D., Grubel, P., Brandt, S.R., Copik, M., Amatya, V., Huck, K., Viklund, L., Khatami, Z., Bacharwar, D., Yang, S., Schnetter, E., Bcorde5, Brodowicz, M., Bibek, atrantan, Troska, L., Byerly, Z., Upadhyay, S.: hpx: HPX V0.9.99: a general purpose C++ runtime system for parallel and distributed applications of any scale, July 2016. http://dx.doi.org/10.5281/zenodo.58027

14.

Kaiser, H., Heller, T., Bourgeois, D., Fey, D.: Higher-level parallelization for local and distributed asynchronous task-based programming. In: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, pp. 29–37. ACM (2015)

15.

McCalpin, J.D.: Stream: sustainable memory bandwidth in high performance computers. Technical report, University of Virginia, Charlottesville, Virginia (1991–2007), a continually updated Technical report. http://www.cs.virginia.edu/stream/

16.

McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Committee Comput. Archit. (TCCA) Newsl. 59, 19–25 (1995)

17.

The C++ Standards Committee: ISO International Standard ISO/IEC 14882: 2014, Programming Language C++. Technical report, Geneva, Switzerland: International Organization for Standardization (ISO) (2014). http://www.open-std.org/jtc1/sc22/wg21

18.

The C++ Standards Committee: N4578: Working Draft, Technical Specification for C++ Extensions for Parallelism Version 2. Technical report (2016). http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/n4578.pdf

19.

The C++ Standards Committee: N4594: Working Draft, Standard for Programming Language C ++. Technical report (2016). http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/n4594.pdf

Title: Closing the Performance Gap with Modern C++
Authors: Thomas Heller
Hartmut Kaiser
Patrick Diehl
Dietmar Fey
Marc Alexander Schweitzer
Publisher: Springer International Publishing
Book: High Performance Computing
Print ISBN: 978-3-319-46078-9

Electronic ISBN: 978-3-319-46079-6

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-46079-6_2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"