Skip to main content
Erschienen in: International Journal of Parallel Programming 2/2017

11.05.2016

High Level Data Structures for GPGPU Programming in a Statically Typed Language

verfasst von: Mathias Bourgoin, Emmanuel Chailloux, Jean-Luc Lamotte

Erschienen in: International Journal of Parallel Programming | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To increase software performance, it is now common to use hardware accelerators. Currently, GPUs are the most widespread accelerators that can handle general computations. This requires to use GPGPU frameworks such as Cuda or OpenCL. Both are very low-level and make the benefit of GPGPU programming difficult to achieve. In particular, they require to write programs as a combination of two subprograms, and, to manually manage devices and memory transfers. This increases the complexity of the overall software design. The idea we develop in this paper is to guarantee expressiveness and safety for CPU and GPU computations and memory managements with high-level data-structures and static type-checking. In this paper, we present how statically typed languages, compilers and libraries help harness high level GPGPU programming. In particular, we show how we added high-level user-defined data structures to a GPGPU programming framework based on a statically typed programming language: OCaml. Thus, we describe the introduction of records and tagged unions shared between the host program and GPGPU kernels described via a domain specific language as well as a simple pattern matching control structure to manage them. Examples, practical tests and comparisons with state of the art tools, show that our solutions improve code design, productivity, and safety while providing a high level of performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Augonnet, C., et al.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2009)CrossRef Augonnet, C., et al.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2009)CrossRef
2.
Zurück zum Zitat Bergstrom, L., Fluet, M., Rainey, M., Reppy, J., Rosen, S., Shaw, A.: Data-only flattening for nested data parallelism. In: Proceedings of the 2013 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), pp. 81–92. ACM, New York, NY (2013) Bergstrom, L., Fluet, M., Rainey, M., Reppy, J., Rosen, S., Shaw, A.: Data-only flattening for nested data parallelism. In: Proceedings of the 2013 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), pp. 81–92. ACM, New York, NY (2013)
3.
Zurück zum Zitat Bergstrom, L., Reppy, J.: Nested data-parallelism on the GPU. In: Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP 2012), pp. 247–258 (2012) Bergstrom, L., Reppy, J.: Nested data-parallelism on the GPU. In: Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP 2012), pp. 247–258 (2012)
4.
Zurück zum Zitat Blelloch, G.E., et al.: Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput. 21(1), 4–14 (1994)MathSciNetCrossRef Blelloch, G.E., et al.: Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput. 21(1), 4–14 (1994)MathSciNetCrossRef
5.
Zurück zum Zitat Bourgoin, M., Chailloux, E.: GPGPU composition with OCaml. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY’14, pp. 32:32–32:37 (2014) Bourgoin, M., Chailloux, E.: GPGPU composition with OCaml. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY’14, pp. 32:32–32:37 (2014)
6.
Zurück zum Zitat Bourgoin, M., Chailloux, E., Lamotte, J.L.: Efficient abstractions for GPGPU programming. Int. J. Parallel Prog. 42(4), 583–600 (2014)CrossRef Bourgoin, M., Chailloux, E., Lamotte, J.L.: Efficient abstractions for GPGPU programming. Int. J. Parallel Prog. 42(4), 583–600 (2014)CrossRef
7.
Zurück zum Zitat Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Not. 46(8), 47–56 (2011)CrossRef Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Not. 46(8), 47–56 (2011)CrossRef
8.
Zurück zum Zitat Chakravarty, M., et al.: Accelerating Haskell array codes with multicore GPUs. In: Workshop on Declarative Aspects of Multicore Programming (DAMP), pp. 3–14 (2011) Chakravarty, M., et al.: Accelerating Haskell array codes with multicore GPUs. In: Workshop on Declarative Aspects of Multicore Programming (DAMP), pp. 3–14 (2011)
9.
Zurück zum Zitat Clifton-Everest, R., et al.: Embedding foreign code. In: International Symposium on Practical Aspects of Declarative Languages (PADL), pp. 136–151. Springer (2014) Clifton-Everest, R., et al.: Embedding foreign code. In: International Symposium on Practical Aspects of Declarative Languages (PADL), pp. 136–151. Springer (2014)
10.
Zurück zum Zitat Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30, 389–406 (2004)CrossRef Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30, 389–406 (2004)CrossRef
11.
Zurück zum Zitat Cunningham, D., Bordawekar, R., Saraswat, V.: GPU programming in a high level language: compiling X10 to CUDA. In: ACM SIGPLAN X10 Workshop, X10 ’11, pp. 8:1–8:10. ACM (2011) Cunningham, D., Bordawekar, R., Saraswat, V.: GPU programming in a high level language: compiling X10 to CUDA. In: ACM SIGPLAN X10 Workshop, X10 ’11, pp. 8:1–8:10. ACM (2011)
12.
Zurück zum Zitat Esterie, P., et al.: The numerical template toolbox: a modern C++ design for scientific computing. J. Parallel Distrib. Comput. 74(12), 3240–3253 (2014)CrossRef Esterie, P., et al.: The numerical template toolbox: a modern C++ design for scientific computing. J. Parallel Distrib. Comput. 74(12), 3240–3253 (2014)CrossRef
13.
Zurück zum Zitat Gautier, T., et al.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1299–1308 (2013) Gautier, T., et al.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1299–1308 (2013)
14.
Zurück zum Zitat Maranget, L.: Compiling pattern matching to good decision trees. In: Workshop on ML, ML ’08, pp. 35–46. ACM (2008) Maranget, L.: Compiling pattern matching to good decision trees. In: Workshop on ML, ML ’08, pp. 35–46. ACM (2008)
15.
Zurück zum Zitat Masliah, I., Baboulin, M., Falcou, J.: Metaprogramming dense linear algebra solvers. Applications to multi and many-core architectures. In: 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015). Helsinki, Finland (2015) Masliah, I., Baboulin, M., Falcou, J.: Metaprogramming dense linear algebra solvers. Applications to multi and many-core architectures. In: 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015). Helsinki, Finland (2015)
16.
Zurück zum Zitat Nystrom, N., White, D., Das, K.: Firepile: Run-time compilation for GPUs in scala. In: Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering, GPCE ’11, pp. 107–116. ACM (2011) Nystrom, N., White, D., Das, K.: Firepile: Run-time compilation for GPUs in scala. In: Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering, GPCE ’11, pp. 107–116. ACM (2011)
17.
Zurück zum Zitat Rompf, T., et al.: Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In: Proceedings of the 40th Symposium on Principles of Programming Languages, POPL ’13. ACM (2013) Rompf, T., et al.: Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In: Proceedings of the 40th Symposium on Principles of Programming Languages, POPL ’13. ACM (2013)
18.
Zurück zum Zitat Rubinsteyn, A., et al.: Parakeet: a just-in-time parallel accelerator for python. In: The 4th USENIX Workshop on Hot Topics in Parallelism. USENIX (2012) Rubinsteyn, A., et al.: Parakeet: a just-in-time parallel accelerator for python. In: The 4th USENIX Workshop on Hot Topics in Parallelism. USENIX (2012)
19.
Zurück zum Zitat Scott, N.S., et al.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. pp. 2424–2449 (2009) Scott, N.S., et al.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. pp. 2424–2449 (2009)
20.
Zurück zum Zitat Vouillon, J., Balat, V.: From bytecode to javascript: the Js_of_ocaml compiler. Softw. Pract. Exp. 44(8), 951–972 (2014)CrossRef Vouillon, J., Balat, V.: From bytecode to javascript: the Js_of_ocaml compiler. Softw. Pract. Exp. 44(8), 951–972 (2014)CrossRef
Metadaten
Titel
High Level Data Structures for GPGPU Programming in a Statically Typed Language
verfasst von
Mathias Bourgoin
Emmanuel Chailloux
Jean-Luc Lamotte
Publikationsdatum
11.05.2016
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 2/2017
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-016-0424-7

Weitere Artikel der Ausgabe 2/2017

International Journal of Parallel Programming 2/2017 Zur Ausgabe

Premium Partner