Skip to main content
Erschienen in: Cluster Computing 4/2015

01.12.2015

Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction

verfasst von: Peter Benner, Ernesto Dufrechou, Pablo Ezzatti, Enrique S. Quintana-Ortí, Alfredo Remón

Erschienen in: Cluster Computing | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Linear algebra operations arise in a myriad of scientific and engineering applications and, therefore, their optimization is targeted by a significant number of high performance computing research efforts. In particular, the matrix multiplication and the solution of linear systems are two key problems with efficient implementations (or kernels) for a variety of high performance parallel architectures. For these specific problems, leveraging the structure of the associated matrices often leads to remarkable time and memory savings, as is the case, e.g., for symmetric band problems. In this work, we exploit the ample hardware concurrency of many-core graphics processors (GPUs) to accelerate the solution of symmetric positive definite band linear systems, introducing highly tuned versions of the corresponding LAPACK routines. The experimental results with the new GPU kernels reveal important reductions of the execution time when compared with tuned implementations of the same operations provided in Intel’s MKL. In addition, we evaluate the performance of the GPU kernels when applied to the solution of model order reduction problems and the associated matrix equations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Alternatively, one can decompose the matrix as \(A=U^TU\), where \(U=L^T\) is upper triangular.
 
Literatur
1.
Zurück zum Zitat Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992) Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)
2.
Zurück zum Zitat Antoulas, A.: Approximation of Large-Scale Dynamical Systems. SIAM Publications, Philadelphia (2005)CrossRef Antoulas, A.: Approximation of Large-Scale Dynamical Systems. SIAM Publications, Philadelphia (2005)CrossRef
3.
Zurück zum Zitat Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S.: Solving dense linear systems on graphics processors. In: Proceedings of 14th International Euro-Par Conference on Parallel Processing, pp. 739–748 (2008) Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S.: Solving dense linear systems on graphics processors. In: Proceedings of 14th International Euro-Par Conference on Parallel Processing, pp. 739–748 (2008)
4.
Zurück zum Zitat Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S., Quintana-Ortí, G.: Exploiting the capabilities of modern gpus for dense matrix computations. Concurr. Comput. 21(18), 2457–2477 (2009)CrossRef Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S., Quintana-Ortí, G.: Exploiting the capabilities of modern gpus for dense matrix computations. Concurr. Comput. 21(18), 2457–2477 (2009)CrossRef
5.
Zurück zum Zitat Benner, P., Dufrechou, E., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: accelerating band linear algebra operations on GPUs with application in model reduction, Lecture Notes in Computer Science, vol. 8584. Springer (2014) Benner, P., Dufrechou, E., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: accelerating band linear algebra operations on GPUs with application in model reduction, Lecture Notes in Computer Science, vol. 8584. Springer (2014)
6.
Zurück zum Zitat Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU-GPU platforms with applications in control theory. Concurr. Comput. 25(8), 1170–1182 (2013)CrossRef Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU-GPU platforms with applications in control theory. Concurr. Comput. 25(8), 1170–1182 (2013)CrossRef
7.
Zurück zum Zitat Benner, P., Kürschner, P., Saak, J.: Efficient handling of complex shift parameters in the low-rank Cholesky factor ADI method. Numer. Algorithm. 62(2), 225–251 (2013)CrossRef Benner, P., Kürschner, P., Saak, J.: Efficient handling of complex shift parameters in the low-rank Cholesky factor ADI method. Numer. Algorithm. 62(2), 225–251 (2013)CrossRef
8.
Zurück zum Zitat Benner, P., Kürschner, P., Saak, J.: Self-generating and efficient shift parameters in ADI methods for large Lyapunov and Sylvester equations. Electron. Trans. Numer. Anal. 43, 142–162 (2014)MathSciNet Benner, P., Kürschner, P., Saak, J.: Self-generating and efficient shift parameters in ADI methods for large Lyapunov and Sylvester equations. Electron. Trans. Numer. Anal. 43, 142–162 (2014)MathSciNet
9.
Zurück zum Zitat Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. Proceedings of the 1969 24th National Conference. ACM ’69, pp. 157–172. ACM, New York, NY, USA (1969) Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. Proceedings of the 1969 24th National Conference. ACM ’69, pp. 157–172. ACM, New York, NY, USA (1969)
10.
Zurück zum Zitat Du Croz, J., Mayes, P., Radicati, G.: Factorization of band matrices using level 3 BLAS. LAPACK Working Note 21, Technical Report CS-90-109, University of Tennessee (1990) Du Croz, J., Mayes, P., Radicati, G.: Factorization of band matrices using level 3 BLAS. LAPACK Working Note 21, Technical Report CS-90-109, University of Tennessee (1990)
11.
Zurück zum Zitat Dufrechou, E., Ezzatti, P., Quintana-Ortí, E., Remón, A.: Efficient symmetric band matrix-matrix multiplication on GPUs. Commun. Comput. Inf. Sci. 485, 1–12 (2014)CrossRef Dufrechou, E., Ezzatti, P., Quintana-Ortí, E., Remón, A.: Efficient symmetric band matrix-matrix multiplication on GPUs. Commun. Comput. Inf. Sci. 485, 1–12 (2014)CrossRef
12.
Zurück zum Zitat Farber, R.: CUDA application design and development. Morgan Kaufmann (2011) Farber, R.: CUDA application design and development. Morgan Kaufmann (2011)
15.
Zurück zum Zitat Kirk, D., Hwu, W.: Programming Massively Parallel Processors, Second Edition: A Hands-on Approach. Morgan Kaufmann (2012) Kirk, D., Hwu, W.: Programming Massively Parallel Processors, Second Edition: A Hands-on Approach. Morgan Kaufmann (2012)
17.
Zurück zum Zitat Peise, E., Bientinesi, P.: Performance modeling for dense linear algebra. Proceedings of the 2012 SC Companion: High Performance Computing. Networking Storage and Analysis, SCC ’12, pp. 406–416. IEEE Computer Society, Washington, DC, USA (2012) Peise, E., Bientinesi, P.: Performance modeling for dense linear algebra. Proceedings of the 2012 SC Companion: High Performance Computing. Networking Storage and Analysis, SCC ’12, pp. 406–416. IEEE Computer Society, Washington, DC, USA (2012)
18.
Zurück zum Zitat Penzl, T.: A cyclic low-rank Smith method for large sparse Lyapunov equations. SIAM J. Sci. Comput. 21(4), 1401–1418 (1999)MathSciNetCrossRef Penzl, T.: A cyclic low-rank Smith method for large sparse Lyapunov equations. SIAM J. Sci. Comput. 21(4), 1401–1418 (1999)MathSciNetCrossRef
Metadaten
Titel
Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction
verfasst von
Peter Benner
Ernesto Dufrechou
Pablo Ezzatti
Enrique S. Quintana-Ortí
Alfredo Remón
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2015
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-015-0489-x

Weitere Artikel der Ausgabe 4/2015

Cluster Computing 4/2015 Zur Ausgabe

Premium Partner