nach oben

Cluster Computing

Erschienen in:

01.12.2015

Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction

verfasst von: Peter Benner, Ernesto Dufrechou, Pablo Ezzatti, Enrique S. Quintana-Ortí, Alfredo Remón

Erschienen in: Cluster Computing | Ausgabe 4/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Linear algebra operations arise in a myriad of scientific and engineering applications and, therefore, their optimization is targeted by a significant number of high performance computing research efforts. In particular, the matrix multiplication and the solution of linear systems are two key problems with efficient implementations (or kernels) for a variety of high performance parallel architectures. For these specific problems, leveraging the structure of the associated matrices often leads to remarkable time and memory savings, as is the case, e.g., for symmetric band problems. In this work, we exploit the ample hardware concurrency of many-core graphics processors (GPUs) to accelerate the solution of symmetric positive definite band linear systems, introducing highly tuned versions of the corresponding LAPACK routines. The experimental results with the new GPU kernels reveal important reductions of the execution time when compared with tuned implementations of the same operations provided in Intel’s MKL. In addition, we evaluate the performance of the GPU kernels when applied to the solution of model order reduction problems and the associated matrix equations.

Vorheriger Artikel Harvesting idle CPU resources for desktop grid computing while limiting the slowdown generated to end-users

Nächster Artikel A domain decomposition strategy for hybrid parallelization of moving particle semi-implicit (MPS) method for computer cluster

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Alternatively, one can decompose the matrix as \(A=U^TU\), where \(U=L^T\) is upper triangular.

Anderson, E., Bai, Z., Demmel, J., Dongarra, J.E., DuCroz, J., Greenbaum, A., Hammarling, S., McKenney, A.E., Ostrouchov, S., Sorensen, D.: LAPACK Users’ Guide. SIAM, Philadelphia (1992)

Antoulas, A.: Approximation of Large-Scale Dynamical Systems. SIAM Publications, Philadelphia (2005)CrossRef

Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S.: Solving dense linear systems on graphics processors. In: Proceedings of 14th International Euro-Par Conference on Parallel Processing, pp. 739–748 (2008)

Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ortí, E.S., Quintana-Ortí, G.: Exploiting the capabilities of modern gpus for dense matrix computations. Concurr. Comput. 21(18), 2457–2477 (2009)CrossRef

Benner, P., Dufrechou, E., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: accelerating band linear algebra operations on GPUs with application in model reduction, Lecture Notes in Computer Science, vol. 8584. Springer (2014)

Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Matrix inversion on CPU-GPU platforms with applications in control theory. Concurr. Comput. 25(8), 1170–1182 (2013)CrossRef

Benner, P., Kürschner, P., Saak, J.: Efficient handling of complex shift parameters in the low-rank Cholesky factor ADI method. Numer. Algorithm. 62(2), 225–251 (2013)CrossRef

Benner, P., Kürschner, P., Saak, J.: Self-generating and efficient shift parameters in ADI methods for large Lyapunov and Sylvester equations. Electron. Trans. Numer. Anal. 43, 142–162 (2014)MathSciNet

Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. Proceedings of the 1969 24th National Conference. ACM ’69, pp. 157–172. ACM, New York, NY, USA (1969)

10.

Du Croz, J., Mayes, P., Radicati, G.: Factorization of band matrices using level 3 BLAS. LAPACK Working Note 21, Technical Report CS-90-109, University of Tennessee (1990)

11.

Dufrechou, E., Ezzatti, P., Quintana-Ortí, E., Remón, A.: Efficient symmetric band matrix-matrix multiplication on GPUs. Commun. Comput. Inf. Sci. 485, 1–12 (2014)CrossRef

12.

Farber, R.: CUDA application design and development. Morgan Kaufmann (2011)

13.

IMTEK: Oberwolfach model reduction benchmark collection. http://portal.uni-freiburg.de/imteksimulation/downloads/benchmark. Accessed April 2015

14.

Khronos group: http://www.khronos.org/opencl. Accessed April 2015

15.

Kirk, D., Hwu, W.: Programming Massively Parallel Processors, Second Edition: A Hands-on Approach. Morgan Kaufmann (2012)

16.

OpenACC.org: http://www.openacc-standard.org. Accessed April 2015

17.

Peise, E., Bientinesi, P.: Performance modeling for dense linear algebra. Proceedings of the 2012 SC Companion: High Performance Computing. Networking Storage and Analysis, SCC ’12, pp. 406–416. IEEE Computer Society, Washington, DC, USA (2012)

18.

Penzl, T.: A cyclic low-rank Smith method for large sparse Lyapunov equations. SIAM J. Sci. Comput. 21(4), 1401–1418 (1999)MathSciNetCrossRef

19.

The Top500 list: http://www.top500.org (2013)

20.

Volkov, V., Demmel, J.: LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech. Rep. UCB/EECS-2008-49, EECS Department, University of California, Berkeley (2008). http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-49.html

Titel: Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction
verfasst von: Peter Benner
Ernesto Dufrechou
Pablo Ezzatti
Enrique S. Quintana-Ortí
Alfredo Remón
Publikationsdatum: 01.12.2015
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe 4/2015
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-015-0489-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2015

Auto-localized multimedia platform based on a modular Cyber Physical System aligned in a two-dimensional grid

A parallel cellular automata algorithm for the deterministic simulation of 3-D multicellular tissue growth

An effective game theoretic static load balancing applied to distributed computing

Identifying preferred solutions for multi-objective optimization: application to capacitated vehicle routing problem

Decomposition tree: a spatio-temporal indexing method for movement big data

Cloud computing adoption by higher education institutions in Saudi Arabia: an exploratory study

Premium Partner