Top

International Journal of Parallel Programming

Published in:

11-05-2016

High Level Data Structures for GPGPU Programming in a Statically Typed Language

Authors: Mathias Bourgoin, Emmanuel Chailloux, Jean-Luc Lamotte

Published in: International Journal of Parallel Programming | Issue 2/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

To increase software performance, it is now common to use hardware accelerators. Currently, GPUs are the most widespread accelerators that can handle general computations. This requires to use GPGPU frameworks such as Cuda or OpenCL. Both are very low-level and make the benefit of GPGPU programming difficult to achieve. In particular, they require to write programs as a combination of two subprograms, and, to manually manage devices and memory transfers. This increases the complexity of the overall software design. The idea we develop in this paper is to guarantee expressiveness and safety for CPU and GPU computations and memory managements with high-level data-structures and static type-checking. In this paper, we present how statically typed languages, compilers and libraries help harness high level GPGPU programming. In particular, we show how we added high-level user-defined data structures to a GPGPU programming framework based on a statically typed programming language: OCaml. Thus, we describe the introduction of records and tagged unions shared between the host program and GPGPU kernels described via a domain specific language as well as a simple pattern matching control structure to manage them. Examples, practical tests and comparisons with state of the art tools, show that our solutions improve code design, productivity, and safety while providing a high level of performance.

previous article Using the Xeon Phi Platform to Run Speculatively-Parallelized Codes

next article Automatic CPU/GPU Generation of Multi-versioned OpenCL Kernels for C++ Scientific Applications

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

CUDA C Programming Guide http://docs.nvidia.com/cuda.

OpenCL Specification http://www.khronos.org/opencl.

OpenMP. http://www.openmp.org/.

OpenACC. http://www.openacc.org/.

SPOC open-source distribution. http://www.algo-prog.info/spoc.

CuBLAS. http://developer.nvidia.com/cublas.

Magma. http://icl.eecs.utk.edu/magma.

OCaml-Ctypes. https://github.com/ocamllabs/ocaml-ctypes.

F# to OpenCL Compiler and Runtime. http://www.gabrielecocco.it/fscl

Aparapi. http://code.google.com/p/aparapi.

Thrust: C++ Template Library for CUDA. http://thrust.github.io/.

Augonnet, C., et al.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2009)CrossRef

Bergstrom, L., Fluet, M., Rainey, M., Reppy, J., Rosen, S., Shaw, A.: Data-only flattening for nested data parallelism. In: Proceedings of the 2013 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), pp. 81–92. ACM, New York, NY (2013)

Bergstrom, L., Reppy, J.: Nested data-parallelism on the GPU. In: Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP 2012), pp. 247–258 (2012)

Blelloch, G.E., et al.: Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput. 21(1), 4–14 (1994)MathSciNetCrossRef

Bourgoin, M., Chailloux, E.: GPGPU composition with OCaml. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY’14, pp. 32:32–32:37 (2014)

Bourgoin, M., Chailloux, E., Lamotte, J.L.: Efficient abstractions for GPGPU programming. Int. J. Parallel Prog. 42(4), 583–600 (2014)CrossRef

Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Not. 46(8), 47–56 (2011)CrossRef

Chakravarty, M., et al.: Accelerating Haskell array codes with multicore GPUs. In: Workshop on Declarative Aspects of Multicore Programming (DAMP), pp. 3–14 (2011)

Clifton-Everest, R., et al.: Embedding foreign code. In: International Symposium on Practical Aspects of Declarative Languages (PADL), pp. 136–151. Springer (2014)

10.

Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30, 389–406 (2004)CrossRef

11.

Cunningham, D., Bordawekar, R., Saraswat, V.: GPU programming in a high level language: compiling X10 to CUDA. In: ACM SIGPLAN X10 Workshop, X10 ’11, pp. 8:1–8:10. ACM (2011)

12.

Esterie, P., et al.: The numerical template toolbox: a modern C++ design for scientific computing. J. Parallel Distrib. Comput. 74(12), 3240–3253 (2014)CrossRef

13.

Gautier, T., et al.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1299–1308 (2013)

14.

Maranget, L.: Compiling pattern matching to good decision trees. In: Workshop on ML, ML ’08, pp. 35–46. ACM (2008)

15.

Masliah, I., Baboulin, M., Falcou, J.: Metaprogramming dense linear algebra solvers. Applications to multi and many-core architectures. In: 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015). Helsinki, Finland (2015)

16.

Nystrom, N., White, D., Das, K.: Firepile: Run-time compilation for GPUs in scala. In: Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering, GPCE ’11, pp. 107–116. ACM (2011)

17.

Rompf, T., et al.: Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In: Proceedings of the 40th Symposium on Principles of Programming Languages, POPL ’13. ACM (2013)

18.

Rubinsteyn, A., et al.: Parakeet: a just-in-time parallel accelerator for python. In: The 4th USENIX Workshop on Hot Topics in Parallelism. USENIX (2012)

19.

Scott, N.S., et al.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. pp. 2424–2449 (2009)

20.

Vouillon, J., Balat, V.: From bytecode to javascript: the Js_of_ocaml compiler. Softw. Pract. Exp. 44(8), 951–972 (2014)CrossRef

Title: High Level Data Structures for GPGPU Programming in a Statically Typed Language
Authors: Mathias Bourgoin
Emmanuel Chailloux
Jean-Luc Lamotte
Publication date: 11-05-2016
Publisher: Springer US
Published in: International Journal of Parallel Programming / Issue 2/2017
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-016-0424-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 2/2017

Towards Systematic Parallelization of Graph Transformations Over Pregel

Automatic CPU/GPU Generation of Multi-versioned OpenCL Kernels for C++ Scientific Applications

Functional Models of Hadoop MapReduce with Application to Scan

Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures

Calculating Parallel Programs in Coq Using List Homomorphisms

Using the Xeon Phi Platform to Run Speculatively-Parallelized Codes

Premium Partner