Skip to main content
Erschienen in: The Journal of Supercomputing 12/2019

26.11.2018

Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors

verfasst von: Roktaek Lim, Yeongha Lee, Raehyun Kim, Jaeyoung Choi, Myungho Lee

Erschienen in: The Journal of Supercomputing | Ausgabe 12/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The general matrix–matrix multiplication is a core building block for implementing Basic Linear Algebra Subprograms. This paper presents a methodology for automatically producing the matrix–matrix multiplication kernels tuned for the Intel Xeon Phi Processor code-named Knights Landing and the Intel Skylake-SP processors with AVX-512 intrinsic functions. The architecture of the latest manycore processors has been complicated in the levels of parallelism and cache hierarchies; it is not easy to find the best combination of optimization techniques for a given application. Our approach produces matrix multiplication kernels through a process of heuristic auto-tuning based on generating multiple kernels and selecting the fastest ones through performance tests. The tuning parameters include the size of block matrices for registers and caches, prefetch distances, and loop unrolling depth. Parameters for multithreaded execution, such as identifying loops to parallelize and the optimal number of threads for such loops are also investigated. We also present a method to reduce the parameter search space based on our previous research results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bilmes J, Asanovic K, Chin CW, Demmel J (2014) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: ACM International Conference on Supercomputing 25th Anniversary Volume. ACM, pp 253–260 Bilmes J, Asanovic K, Chin CW, Demmel J (2014) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: ACM International Conference on Supercomputing 25th Anniversary Volume. ACM, pp 253–260
2.
Zurück zum Zitat Goto K, Geijn RA (2008) Anatomy of high-performance matrix multiplication. ACM Trans Math Softw (TOMS) 34(3):12MathSciNetCrossRef Goto K, Geijn RA (2008) Anatomy of high-performance matrix multiplication. ACM Trans Math Softw (TOMS) 34(3):12MathSciNetCrossRef
3.
Zurück zum Zitat Gunnels JA, Henry GM, Van De Geijn RA (2001) A family of high-performance matrix multiplication algorithms. In: International Conference on Computational Science. Springer, pp 51–60 Gunnels JA, Henry GM, Van De Geijn RA (2001) A family of high-performance matrix multiplication algorithms. In: International Conference on Computational Science. Springer, pp 51–60
4.
Zurück zum Zitat Heinecke A, Vaidyanathan K, Smelyanskiy M, Kobotov A, Dubtsov R, Henry G, Shet AG, Chrysos G, Dubey P (2013) Design and implementation of the linpack benchmark for single and multi-node systems based on Intel® Xeon Phi Coprocessor. In: IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013. IEEE, pp 126–137 Heinecke A, Vaidyanathan K, Smelyanskiy M, Kobotov A, Dubtsov R, Henry G, Shet AG, Chrysos G, Dubey P (2013) Design and implementation of the linpack benchmark for single and multi-node systems based on Intel® Xeon Phi Coprocessor. In: IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013. IEEE, pp 126–137
6.
Zurück zum Zitat Jeffers J, Reinders J, Sodani A (2016) Intel Xeon Phi processor high performance programming: knights, landing edn. Morgan Kaufmann, Burlington Jeffers J, Reinders J, Sodani A (2016) Intel Xeon Phi processor high performance programming: knights, landing edn. Morgan Kaufmann, Burlington
7.
Zurück zum Zitat Lim R, Lee Y, Kim R, Choi J (2018) OpenMP-based parallel implementation of matrix-matrix multiplication on the Intel Knights Landing. In: HPC Asia 2018, pp 63–66 Lim R, Lee Y, Kim R, Choi J (2018) OpenMP-based parallel implementation of matrix-matrix multiplication on the Intel Knights Landing. In: HPC Asia 2018, pp 63–66
8.
Zurück zum Zitat Lim R, Lee Y, Kim R, Choi J (2018) An implementation of matrix-matrix multiplication on the Intel KNL processor with AVX-512. Cluster Comput 21(4):1785–1795CrossRef Lim R, Lee Y, Kim R, Choi J (2018) An implementation of matrix-matrix multiplication on the Intel KNL processor with AVX-512. Cluster Comput 21(4):1785–1795CrossRef
9.
Zurück zum Zitat Low TM, Igual FD, Smith TM, Quintana-Orti ES (2016) Analytical modeling is enough for high-performance blis. ACM Trans Math Softw (TOMS) 43(2):12MathSciNetCrossRef Low TM, Igual FD, Smith TM, Quintana-Orti ES (2016) Analytical modeling is enough for high-performance blis. ACM Trans Math Softw (TOMS) 43(2):12MathSciNetCrossRef
10.
Zurück zum Zitat Smith TM, Van De Geijn RA, Smelyanskiy M, Hammond JR, Van Zee FG (2014) Anatomy of high-performance many-threaded matrix multiplication. In: Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, pp 1049–1059 Smith TM, Van De Geijn RA, Smelyanskiy M, Hammond JR, Van Zee FG (2014) Anatomy of high-performance many-threaded matrix multiplication. In: Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, pp 1049–1059
11.
Zurück zum Zitat Van Zee FG, Van De Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw (TOMS) 41(3):14MathSciNetCrossRef Van Zee FG, Van De Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw (TOMS) 41(3):14MathSciNetCrossRef
12.
Zurück zum Zitat Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, pp 1–27 Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, pp 1–27
13.
Zurück zum Zitat Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the atlas project. Parallel Comput 27(1–2):3–35CrossRef Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the atlas project. Parallel Comput 27(1–2):3–35CrossRef
14.
Zurück zum Zitat Van Zee FG, Smith TM, Marker B, Low TM, Van De Geign RA, Igual FD, Smelyanskiy M, Zhang X, Kistler M, Austel V, Gunnels JA, Killough L (2016) The BLIS framework: experiments in portability. ACM Trans Math Softw (TOMS) 42(2):12:1–12:19CrossRef Van Zee FG, Smith TM, Marker B, Low TM, Van De Geign RA, Igual FD, Smelyanskiy M, Zhang X, Kistler M, Austel V, Gunnels JA, Killough L (2016) The BLIS framework: experiments in portability. ACM Trans Math Softw (TOMS) 42(2):12:1–12:19CrossRef
Metadaten
Titel
Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
verfasst von
Roktaek Lim
Yeongha Lee
Raehyun Kim
Jaeyoung Choi
Myungho Lee
Publikationsdatum
26.11.2018
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 12/2019
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2702-1

Weitere Artikel der Ausgabe 12/2019

The Journal of Supercomputing 12/2019 Zur Ausgabe

Premium Partner