Skip to main content

2020 | OriginalPaper | Buchkapitel

A Portable SIMD Primitive Using Kokkos for Heterogeneous Architectures

verfasst von : Damodar Sahasrabudhe, Eric T. Phipps, Sivasankaran Rajamanickam, Martin Berzins

Erschienen in: Accelerator Programming Using Directives

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As computer architectures are rapidly evolving (e.g. those designed for exascale), multiple portability frameworks have been developed to avoid new architecture-specific development and tuning. However, portability frameworks depend on compilers for auto-vectorization and may lack support for explicit vectorization on heterogeneous platforms. Alternatively, programmers can use intrinsics-based primitives to achieve more efficient vectorization, but the lack of a gpu back-end for these primitives makes such code non-portable. A unified, portable, Single Instruction Multiple Data (simd) primitive proposed in this work, allows intrinsics-based vectorization on cpus and many-core architectures such as Intel Knights Landing (knl), and also facilitates Single Instruction Multiple Threads (simt) based execution on gpus. This unified primitive, coupled with the Kokkos portability ecosystem, makes it possible to develop explicitly vectorized code, which is portable across heterogeneous platforms. The new simd primitive is used on different architectures to test the performance boost against hard-to-auto-vectorize baseline, to measure the overhead against efficiently vectroized baseline, and to evaluate the new feature called the “logical vector length” (lvl). The simd primitive provides portability across cpus and gpus without any performance degradation being observed experimentally.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adamczyk, W., et al.: Application of LES-CFD for predicting pulverized-coal working conditions after installation of NOx control system. Energy 160, 693–709 (2018)CrossRef Adamczyk, W., et al.: Application of LES-CFD for predicting pulverized-coal working conditions after installation of NOx control system. Energy 160, 693–709 (2018)CrossRef
3.
Zurück zum Zitat Carr, S.: Combining optimization for cache and instruction-level parallelism. In: Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique, pp. 238–247. IEEE (1996) Carr, S.: Combining optimization for cache and instruction-level parallelism. In: Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique, pp. 238–247. IEEE (1996)
4.
Zurück zum Zitat Cope, B., et al.: Implementation of 2D Convolution on FPGA, GPU and CPU. Imperial College Report, pp. 2–5 (2006) Cope, B., et al.: Implementation of 2D Convolution on FPGA, GPU and CPU. Imperial College Report, pp. 2–5 (2006)
5.
Zurück zum Zitat Edwards, H., Trott, C., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)CrossRef Edwards, H., Trott, C., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)CrossRef
7.
Zurück zum Zitat Espasa, R., Valero, M.: Exploiting instruction-and data-level parallelism. IEEE Micro 17(5), 20–27 (1997)CrossRef Espasa, R., Valero, M.: Exploiting instruction-and data-level parallelism. IEEE Micro 17(5), 20–27 (1997)CrossRef
9.
Zurück zum Zitat Holewinski, J., et al.: Dynamic trace-based analysis of vectorization potential of applications. ACM SIGPLAN Not. 47(6), 371–382 (2012)CrossRef Holewinski, J., et al.: Dynamic trace-based analysis of vectorization potential of applications. ACM SIGPLAN Not. 47(6), 371–382 (2012)CrossRef
10.
Zurück zum Zitat Holmen, J.: Private communication (2018) Holmen, J.: Private communication (2018)
12.
Zurück zum Zitat Hornung, R., Keasler, J.: The RAJA portability layer: overview and status. Technical report, Lawrence Livermore National Laboratories (LLNL), Livermore, CA, United States (2014) Hornung, R., Keasler, J.: The RAJA portability layer: overview and status. Technical report, Lawrence Livermore National Laboratories (LLNL), Livermore, CA, United States (2014)
13.
Zurück zum Zitat Howard, M., et al.: Employing multiple levels of parallelism for CFD at large scales on next generation high-performance computing platforms. In: 2018 Proceedings of the Tenth International Conference on Computational Fluid Dynamics (ICCFD 10), Barcelona, 9–13 July 2018 Howard, M., et al.: Employing multiple levels of parallelism for CFD at large scales on next generation high-performance computing platforms. In: 2018 Proceedings of the Tenth International Conference on Computational Fluid Dynamics (ICCFD 10), Barcelona, 9–13 July 2018
15.
Zurück zum Zitat Jacob, A., et al.: Towards performance portable GPU programming with RAJA. In: Workshop on Portability Among HPC Architectures for Scientific Applications (2015) Jacob, A., et al.: Towards performance portable GPU programming with RAJA. In: Workshop on Portability Among HPC Architectures for Scientific Applications (2015)
16.
Zurück zum Zitat Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann, Burlington (2016) Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann, Burlington (2016)
17.
Zurück zum Zitat Karpiński, P., McDonald, J.: A high-performance portable abstract interface for explicit SIMD vectorization. In: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores. ACM (2017) Karpiński, P., McDonald, J.: A high-performance portable abstract interface for explicit SIMD vectorization. In: Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores. ACM (2017)
18.
Zurück zum Zitat Kim, K., et al.: Designing vector-friendly compact BLAS and LAPACK kernels. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 55. ACM (2017) Kim, K., et al.: Designing vector-friendly compact BLAS and LAPACK kernels. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 55. ACM (2017)
20.
Zurück zum Zitat Kretz, M., Lindenstruth, V.: Vc: a C++ library for explicit vectorization. Softw. Pract. Exp. 42(11), 1409–1430 (2012)CrossRef Kretz, M., Lindenstruth, V.: Vc: a C++ library for explicit vectorization. Softw. Pract. Exp. 42(11), 1409–1430 (2012)CrossRef
21.
Zurück zum Zitat Leißa, R., Hack, S., Wald, I.: Extending a C-like language for portable SIMD programming. ACM SIGPLAN Not. 47(8), 65–74 (2012)CrossRef Leißa, R., Hack, S., Wald, I.: Extending a C-like language for portable SIMD programming. ACM SIGPLAN Not. 47(8), 65–74 (2012)CrossRef
22.
Zurück zum Zitat Medina, D., St-Cyr, A., Warburton, T.: OCCA: A unified approach to multi-threading languages. arXiv preprint arXiv:1403.0968 (2014) Medina, D., St-Cyr, A., Warburton, T.: OCCA: A unified approach to multi-threading languages. arXiv preprint arXiv:​1403.​0968 (2014)
26.
Zurück zum Zitat Pai, S., Govindarajan, R., Thazhuthaveetil, M.: PLASMA: portable programming for SIMD heterogeneous accelerators. In: Workshop on Language, Compiler, and Architecture Support for GPGPU, held in conjunction with HPCA/PPoPP (2010) Pai, S., Govindarajan, R., Thazhuthaveetil, M.: PLASMA: portable programming for SIMD heterogeneous accelerators. In: Workshop on Language, Compiler, and Architecture Support for GPGPU, held in conjunction with HPCA/PPoPP (2010)
27.
Zurück zum Zitat Phipps, E., D’Elia, M., Edwards, H., Hoemmen, M., Hu, J., Rajamanickam, S.: Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures. SIAM J. Sci. Comput. 39(2), C162–C193 (2017)MathSciNetCrossRef Phipps, E., D’Elia, M., Edwards, H., Hoemmen, M., Hu, J., Rajamanickam, S.: Embedded ensemble propagation for improving performance, portability, and scalability of uncertainty quantification on emerging computational architectures. SIAM J. Sci. Comput. 39(2), C162–C193 (2017)MathSciNetCrossRef
28.
Zurück zum Zitat Phipps, E., Tuminaro, R., Miller, C.: Stokhos: trilinos tools for embedded stochastic-galerkin uncertainty quantification methods. Technical report, Sandia National Laboratories (SNL-NM), Albuquerque, NM, United States (2008) Phipps, E., Tuminaro, R., Miller, C.: Stokhos: trilinos tools for embedded stochastic-galerkin uncertainty quantification methods. Technical report, Sandia National Laboratories (SNL-NM), Albuquerque, NM, United States (2008)
29.
Zurück zum Zitat Stephens, N., et al.: The ARM scalable vector extension. IEEE Micro 37(2), 26–39 (2017)CrossRef Stephens, N., et al.: The ARM scalable vector extension. IEEE Micro 37(2), 26–39 (2017)CrossRef
30.
Zurück zum Zitat Tian, X., et al.: LLVM compiler implementation for explicit parallelization and SIMD vectorization. In: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, p. 4. ACM (2017) Tian, X., et al.: LLVM compiler implementation for explicit parallelization and SIMD vectorization. In: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, p. 4. ACM (2017)
31.
Zurück zum Zitat Trott, C.R.: Kokkos: the C++ performance portability programming model. Technical report, Sandia National Laboratories (SNL-NM), Albuquerque, NM, United States (2017) Trott, C.R.: Kokkos: the C++ performance portability programming model. Technical report, Sandia National Laboratories (SNL-NM), Albuquerque, NM, United States (2017)
32.
Zurück zum Zitat Wang, H., Wu, P., Tanase, I., Serrano, M., Moreira, J.: Simple, portable and fast SIMD intrinsic programming: generic simd library. In: Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing. ACM (2014) Wang, H., Wu, P., Tanase, I., Serrano, M., Moreira, J.: Simple, portable and fast SIMD intrinsic programming: generic simd library. In: Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing. ACM (2014)
33.
Zurück zum Zitat Zenker, E., et al.: Alpaka-an abstraction library for parallel kernel acceleration. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 631–640. IEEE (2016) Zenker, E., et al.: Alpaka-an abstraction library for parallel kernel acceleration. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 631–640. IEEE (2016)
Metadaten
Titel
A Portable SIMD Primitive Using Kokkos for Heterogeneous Architectures
verfasst von
Damodar Sahasrabudhe
Eric T. Phipps
Sivasankaran Rajamanickam
Martin Berzins
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-49943-3_7