Skip to main content
Erschienen in: The Journal of Supercomputing 2/2014

01.11.2014

Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study

verfasst von: S. Tabik, G. Ortega, E. M. Garzón

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Programmers usually implement iterative methods that solve partial differential equations by expressing them using a sequence of basic kernels from libraries optimized for the graphics processing unit (GPU). The global runtime of the resulting combination is often penalized by the smallest and most inefficient vector operations. To improve the GPU exploitation, we identify and analyze the potential kernels to be fused according to the data dependence, data type and size, and GPU resources. This paper provides an extensive analysis of the impact of fusing vector operations [level 1 of Basic Linear Algebra Subprograms (BLAS)] on the performance of the GPU. The experimental evaluation shows that this optimization provides noticeable improvement especially for kernels with lower memory requirements and on more modern GPUs. It is worth noting that the fused BLAS operations can be very useful to help programmers efficiently code iterative methods to solve large linear systems of equations for the GPU. Iterative methods such as biconjugate gradient method (BCG) are one of the examples that can benefit from this optimization strategy. Indeed, kernel fusion of vector routines makes the most efficient GPU implementation of BCG run between \(1.09\times \) and \(1.27\times \) faster on three GPUs of different characteristics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dehnavi MM, Fernandez DM, Giannacopoulos D (2011) Enhancing the performance of conjugate gradient solvers on graphic processing units. IEEE Trans Magn 47(5):1162–1165CrossRef Dehnavi MM, Fernandez DM, Giannacopoulos D (2011) Enhancing the performance of conjugate gradient solvers on graphic processing units. IEEE Trans Magn 47(5):1162–1165CrossRef
2.
Zurück zum Zitat Filipovič J, Madzin M, Fousek J, Matyska L (2013) Optimizing cuda code by kernel fusion—application on BLAS. CoRR abs/1305.1183 Filipovič J, Madzin M, Fousek J, Matyska L (2013) Optimizing cuda code by kernel fusion—application on BLAS. CoRR abs/1305.1183
3.
Zurück zum Zitat Gaikwad A, Toke IM (2010) Parallel iterative linear solvers on GPU: a financial engineering case. In: Proceediongs of PDP, pp 607–614 Gaikwad A, Toke IM (2010) Parallel iterative linear solvers on GPU: a financial engineering case. In: Proceediongs of PDP, pp 607–614
4.
Zurück zum Zitat Garcia N (2010) Parallel power flow solutions using a biconjugate gradient algorithm and a newton method: a GPU-based approach. In: IEEE Power and Energy Society general meeting, pp 1–4 Garcia N (2010) Parallel power flow solutions using a biconjugate gradient algorithm and a newton method: a GPU-based approach. In: IEEE Power and Energy Society general meeting, pp 1–4
5.
Zurück zum Zitat Golub GH, van Van Loan CF (1996) Matrix computations (Johns Hopkins studies in mathematical sciences), 3rd edn. The Johns Hopkins University Press. Baltimore, MD Golub GH, van Van Loan CF (1996) Matrix computations (Johns Hopkins studies in mathematical sciences), 3rd edn. The Johns Hopkins University Press. Baltimore, MD
6.
Zurück zum Zitat Haidar A, Ltaief H, Luszczek P, Dongarra J (2012) A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In: Proceedings of of IEEE IPDPS, pp 25–35 Haidar A, Ltaief H, Luszczek P, Dongarra J (2012) A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In: Proceedings of of IEEE IPDPS, pp 25–35
7.
Zurück zum Zitat Hwu W (2011) Computing Gems Jade Edition. Applications of GPU computing series, Jade edn. Elsevier Science, Amsterdam Hwu W (2011) Computing Gems Jade Edition. Applications of GPU computing series, Jade edn. Elsevier Science, Amsterdam
8.
Zurück zum Zitat Lanczos C (1952) Solution of systems of linear equations by minimized iterations. J Res Natl Bur Stand 49:33–53MathSciNetCrossRef Lanczos C (1952) Solution of systems of linear equations by minimized iterations. J Res Natl Bur Stand 49:33–53MathSciNetCrossRef
9.
Zurück zum Zitat Lawson CL, Hanson RJ, Kincaid DR, Krogh FT (1979) Basic linear algebra subprograms for fortran usage. ACM Trans Math Softw 5(3):308–323CrossRefMATH Lawson CL, Hanson RJ, Kincaid DR, Krogh FT (1979) Basic linear algebra subprograms for fortran usage. ACM Trans Math Softw 5(3):308–323CrossRefMATH
10.
Zurück zum Zitat Navarro AG, Asenjo R, Tabik S, Cascaval C (2009) Analytical modeling of pipeline parallelism. In: Proceedings of PACT, pp 281–290. IEEE Computer Society Navarro AG, Asenjo R, Tabik S, Cascaval C (2009) Analytical modeling of pipeline parallelism. In: Proceedings of PACT, pp 281–290. IEEE Computer Society
13.
Zurück zum Zitat Ortega G, Garzón EM, Vázquez F, García I (2013) The biconjugate gradient method on GPUs. J Supercomput 64:49–58CrossRef Ortega G, Garzón EM, Vázquez F, García I (2013) The biconjugate gradient method on GPUs. J Supercomput 64:49–58CrossRef
14.
Zurück zum Zitat Vázquez F, Fernández JJ, Garzón EM (2012) Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Parallel Comput 38:408–420CrossRef Vázquez F, Fernández JJ, Garzón EM (2012) Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach. Parallel Comput 38:408–420CrossRef
15.
Zurück zum Zitat Vázquez F, Ortega G, Fernández JJ, Garzón EM (2010) Improving the performance of the sparse matrix vector product with GPUs. In: Proceedings of IEEE CIT, pp 1146–1151. IEEE Computer Society Vázquez F, Ortega G, Fernández JJ, Garzón EM (2010) Improving the performance of the sparse matrix vector product with GPUs. In: Proceedings of IEEE CIT, pp 1146–1151. IEEE Computer Society
16.
Zurück zum Zitat Wozniak M, Olas T, Wyrzykowski R (2010) Parallel implementation of conjugate gradient method on graphics processors. In: Parallel processing and applied mathematics, LNCS vol 6067, pp 125–135 Wozniak M, Olas T, Wyrzykowski R (2010) Parallel implementation of conjugate gradient method on graphics processors. In: Parallel processing and applied mathematics, LNCS vol 6067, pp 125–135
17.
Zurück zum Zitat Wu H, Diamos G, Wang J, Cadambi S, Yalamanchili S, Chakradhar S (2012) Optimizing data warehousing applications for GPUs using kernel fusion/fission. In: Proceedings of IEEE IPDPSW, pp 2433–2442 Wu H, Diamos G, Wang J, Cadambi S, Yalamanchili S, Chakradhar S (2012) Optimizing data warehousing applications for GPUs using kernel fusion/fission. In: Proceedings of IEEE IPDPSW, pp 2433–2442
Metadaten
Titel
Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study
verfasst von
S. Tabik
G. Ortega
E. M. Garzón
Publikationsdatum
01.11.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1102-4

Weitere Artikel der Ausgabe 2/2014

The Journal of Supercomputing 2/2014 Zur Ausgabe