Skip to main content
Erschienen in: International Journal of Parallel Programming 1/2018

08.05.2017

Multi-dimensional Homomorphisms and Their Implementation in OpenCL

verfasst von: Ari Rasch, Sergei Gorlatch

Erschienen in: International Journal of Parallel Programming | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Homomorphisms (traditionally defined on lists) are functions that can be parallelized by the divide-and-conquer paradigm. In this paper, we introduce an extension of the traditional homomorphism concept—multi-dimensional homomorphisms (MDHs)—which capture parallelism on multi-dimensional arrays. We propose md_hom—a new parallel pattern (a.k.a. algorithmic skeleton), based on the MDH concept, to simplify parallel programming for a broad class of applications. The md_hom pattern is general enough to subsume common parallel patterns such as map and reduce, and also more complex functions built by composing and nesting several patterns. We present a generic implementation schema for md_hom in form of an efficient, correct-by-construction OpenCL pseudocode that targets various parallel architectures such as multi-core CPU and graphics processing unit (GPU). We develop our pseudocode schema as parametrized in tuning parameters: these allow to optimize the code for different devices and input sizes by performing an automated search on the parameter space. We evaluate the schematically generated, executable OpenCL code using the example of general matrix–vector multiplication (GEMV)—an important linear algebra routine which has gained more attention recently due to its use in the application area of deep learning—on two parallel architectures—Intel CPU and NVIDIA GPU. Our performance results are competitive and in some cases even better than the hand-tuned GEMV implementations provided by the state-of-the-art libraries Intel MKL and NVIDIA cuBLAS, as well as the auto-tunable OpenCL BLAS library CLBlast.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aldinucci, M., Danelutto, M., Drocco, M., Kilpatrick, P., Pezzi, G.P., Torquati, M.: The loop-of-stencil-reduce paradigm. In: Trustcom/BigDataSE/ISPA, 2015 IEEE, vol. 3, pp. 172–177. IEEE (2015) Aldinucci, M., Danelutto, M., Drocco, M., Kilpatrick, P., Pezzi, G.P., Torquati, M.: The loop-of-stencil-reduce paradigm. In: Trustcom/BigDataSE/ISPA, 2015 IEEE, vol. 3, pp. 172–177. IEEE (2015)
2.
Zurück zum Zitat Ansel, J., Kamil, S., Veeramachaneni, K., Ragan-Kelley, J., Bosboom, J., O’Reilly, U.M., Amarasinghe, S.: OpenTuner: an extensible framework for program autotuning. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp. 303–316. ACM (2014) Ansel, J., Kamil, S., Veeramachaneni, K., Ragan-Kelley, J., Bosboom, J., O’Reilly, U.M., Amarasinghe, S.: OpenTuner: an extensible framework for program autotuning. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp. 303–316. ACM (2014)
4.
Zurück zum Zitat Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Surrounding theorem: developing parallel programs for matrix-convolutions. In: Euro-Par 2006 Parallel Processing, pp. 605–614. Springer (2006) Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Surrounding theorem: developing parallel programs for matrix-convolutions. In: Euro-Par 2006 Parallel Processing, pp. 605–614. Springer (2006)
5.
Zurück zum Zitat Enmyren, J., Kessler, C.W.: SkePU: A multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14. ACM (2010) Enmyren, J., Kessler, C.W.: SkePU: A multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, pp. 5–14. ACM (2010)
6.
Zurück zum Zitat Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Netw. 7(2), 129–138 (2012)CrossRef Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Netw. 7(2), 129–138 (2012)CrossRef
7.
Zurück zum Zitat Gorlatch, S.: Extracting and implementing list homomorphisms in parallel program development. Sci. Comput. Program. 33(1), 1–27 (1999)MathSciNetCrossRefMATH Gorlatch, S.: Extracting and implementing list homomorphisms in parallel program development. Sci. Comput. Program. 33(1), 1–27 (1999)MathSciNetCrossRefMATH
8.
Zurück zum Zitat Gorlatch, S., Cole, M.: Parallel skeletons. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1417–1422. Springer (2011) Gorlatch, S., Cole, M.: Parallel skeletons. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1417–1422. Springer (2011)
9.
Zurück zum Zitat Grelck, C., Scholz, S.B.: SAC—a functional array language for efficient multi-threaded execution. Int. J. Parallel Program. 34(4), 383–427 (2006)CrossRefMATH Grelck, C., Scholz, S.B.: SAC—a functional array language for efficient multi-threaded execution. Int. J. Parallel Program. 34(4), 383–427 (2006)CrossRefMATH
10.
Zurück zum Zitat Intel: OpenCL Optimization Guide (2011) Intel: OpenCL Optimization Guide (2011)
12.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
13.
Zurück zum Zitat Keller, G., Chakravarty, M.M., Leshchinskiy, R., Peyton Jones, S., Lippmeier, B.: Regular, shape-polymorphic, parallel arrays in Haskell. In: ACM Sigplan Notices, vol. 45, pp. 261–272. ACM (2010) Keller, G., Chakravarty, M.M., Leshchinskiy, R., Peyton Jones, S., Lippmeier, B.: Regular, shape-polymorphic, parallel arrays in Haskell. In: ACM Sigplan Notices, vol. 45, pp. 261–272. ACM (2010)
16.
Zurück zum Zitat Nugteren, C., Codreanu, V.: CLTune: a generic auto-tuner for OpenCL kernels. In: Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), pp. 195–202. IEEE (2015) Nugteren, C., Codreanu, V.: CLTune: a generic auto-tuner for OpenCL kernels. In: Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), pp. 195–202. IEEE (2015)
17.
Zurück zum Zitat NVIDIA: NVIDIA OpenCL Best Practices Guide (2015) NVIDIA: NVIDIA OpenCL Best Practices Guide (2015)
19.
Zurück zum Zitat Sørensen, H.H.B.: High-performance matrix-vector multiplication on the GPU. In: Alexander, M. (ed.) Euro-Par 2011: Parallel Processing Workshops, pp. 377–386. Springer (2011) Sørensen, H.H.B.: High-performance matrix-vector multiplication on the GPU. In: Alexander, M. (ed.) Euro-Par 2011: Parallel Processing Workshops, pp. 377–386. Springer (2011)
20.
Zurück zum Zitat Steuwer, M., Gorlatch, S.: SkelCL: a high-level extension of OpenCL for multi-GPU systems. J. Supercomput. 69(1), 25–33 (2014)CrossRef Steuwer, M., Gorlatch, S.: SkelCL: a high-level extension of OpenCL for multi-GPU systems. J. Supercomput. 69(1), 25–33 (2014)CrossRef
21.
Zurück zum Zitat Steuwer, M., Fensch, C., Lindley, S., Dubach, C.: Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance Opencl code. In: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, pp. 205–217. ACM (2015) Steuwer, M., Fensch, C., Lindley, S., Dubach, C.: Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance Opencl code. In: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, pp. 205–217. ACM (2015)
22.
Zurück zum Zitat Steuwer, M., Remmelg, T., Dubach, C.: Matrix multiplication beyond auto-tuning: rewrite-based GPU code generation. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, p. 15. ACM (2016) Steuwer, M., Remmelg, T., Dubach, C.: Matrix multiplication beyond auto-tuning: rewrite-based GPU code generation. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, p. 15. ACM (2016)
23.
Zurück zum Zitat Xu, W., Liu, Z., Wu, J., Ye, X., Jiao, S., Wang, D., Song, F., Fan, D.: Auto-tuning GEMV on many-core GPU. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), pp. 30–36. IEEE (2012) Xu, W., Liu, Z., Wu, J., Ye, X., Jiao, S., Wang, D., Song, F., Fan, D.: Auto-tuning GEMV on many-core GPU. In: 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), pp. 30–36. IEEE (2012)
Metadaten
Titel
Multi-dimensional Homomorphisms and Their Implementation in OpenCL
verfasst von
Ari Rasch
Sergei Gorlatch
Publikationsdatum
08.05.2017
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 1/2018
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-017-0508-z

Weitere Artikel der Ausgabe 1/2018

International Journal of Parallel Programming 1/2018 Zur Ausgabe