nach oben

The Journal of Supercomputing

Erschienen in:

26.11.2018

Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors

verfasst von: Roktaek Lim, Yeongha Lee, Raehyun Kim, Jaeyoung Choi, Myungho Lee

Erschienen in: The Journal of Supercomputing | Ausgabe 12/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The general matrix–matrix multiplication is a core building block for implementing Basic Linear Algebra Subprograms. This paper presents a methodology for automatically producing the matrix–matrix multiplication kernels tuned for the Intel Xeon Phi Processor code-named Knights Landing and the Intel Skylake-SP processors with AVX-512 intrinsic functions. The architecture of the latest manycore processors has been complicated in the levels of parallelism and cache hierarchies; it is not easy to find the best combination of optimization techniques for a given application. Our approach produces matrix multiplication kernels through a process of heuristic auto-tuning based on generating multiple kernels and selecting the fastest ones through performance tests. The tuning parameters include the size of block matrices for registers and caches, prefetch distances, and loop unrolling depth. Parameters for multithreaded execution, such as identifying loops to parallelize and the optimal number of threads for such loops are also investigated. We also present a method to reduce the parameter search space based on our previous research results.

Vorheriger Artikel Tuning lock-based multicore program based on sliding windows to tolerate data race

Nächster Artikel A formally based parallelization of data mining algorithms for multi-core systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bilmes J, Asanovic K, Chin CW, Demmel J (2014) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: ACM International Conference on Supercomputing 25th Anniversary Volume. ACM, pp 253–260

Goto K, Geijn RA (2008) Anatomy of high-performance matrix multiplication. ACM Trans Math Softw (TOMS) 34(3):12MathSciNetCrossRef

Gunnels JA, Henry GM, Van De Geijn RA (2001) A family of high-performance matrix multiplication algorithms. In: International Conference on Computational Science. Springer, pp 51–60

Heinecke A, Vaidyanathan K, Smelyanskiy M, Kobotov A, Dubtsov R, Henry G, Shet AG, Chrysos G, Dubey P (2013) Design and implementation of the linpack benchmark for single and multi-node systems based on Intel^® Xeon Phi Coprocessor. In: IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), 2013. IEEE, pp 126–137

Intel: Math kernel library (2018) https://software.intel.com/en-us/intel-mkl. Accessed 24 July 2018

Jeffers J, Reinders J, Sodani A (2016) Intel Xeon Phi processor high performance programming: knights, landing edn. Morgan Kaufmann, Burlington

Lim R, Lee Y, Kim R, Choi J (2018) OpenMP-based parallel implementation of matrix-matrix multiplication on the Intel Knights Landing. In: HPC Asia 2018, pp 63–66

Lim R, Lee Y, Kim R, Choi J (2018) An implementation of matrix-matrix multiplication on the Intel KNL processor with AVX-512. Cluster Comput 21(4):1785–1795CrossRef

Low TM, Igual FD, Smith TM, Quintana-Orti ES (2016) Analytical modeling is enough for high-performance blis. ACM Trans Math Softw (TOMS) 43(2):12MathSciNetCrossRef

10.

Smith TM, Van De Geijn RA, Smelyanskiy M, Hammond JR, Van Zee FG (2014) Anatomy of high-performance many-threaded matrix multiplication. In: Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, pp 1049–1059

11.

Van Zee FG, Van De Geijn RA (2015) BLIS: a framework for rapidly instantiating BLAS functionality. ACM Trans Math Softw (TOMS) 41(3):14MathSciNetCrossRef

12.

Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, pp 1–27

13.

Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimizations of software and the atlas project. Parallel Comput 27(1–2):3–35CrossRef

14.

Van Zee FG, Smith TM, Marker B, Low TM, Van De Geign RA, Igual FD, Smelyanskiy M, Zhang X, Kistler M, Austel V, Gunnels JA, Killough L (2016) The BLIS framework: experiments in portability. ACM Trans Math Softw (TOMS) 42(2):12:1–12:19CrossRef

15.

Zhang X, Wang Q, Werber S (2018) Openblas. http://www.openblas.net. Accessed 24 July 2018

Titel: Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors
verfasst von: Roktaek Lim
Yeongha Lee
Raehyun Kim
Jaeyoung Choi
Myungho Lee
Publikationsdatum: 26.11.2018
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 12/2019
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-018-2702-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 12/2019

Hybrid binary consensus in anonymous asynchronous systems using coins and failure detectors

Parallel computing technologies 2018

Domino pattern formation by cellular automata agents

ginSODA: massive parallel integration of stiff ODE systems on GPUs

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

A similarity study of I/O traces via string kernels

Premium Partner