Skip to main content
Erschienen in: The Journal of Supercomputing 2/2013

01.02.2013

GPU-accelerated preconditioned iterative linear solvers

verfasst von: Ruipeng Li, Yousef Saad

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work is an overview of our preliminary experience in developing a high-performance iterative linear solver accelerated by GPU coprocessors. Our goal is to illustrate the advantages and difficulties encountered when deploying GPU technology to perform sparse linear algebra computations. Techniques for speeding up sparse matrix-vector product (SpMV) kernels and finding suitable preconditioning methods are discussed. Our experiments with an NVIDIA TESLA M2070 show that for unstructured matrices SpMV kernels can be up to 8 times faster on the GPU than the Intel MKL on the host Intel Xeon X5675 Processor. Overall performance of the GPU-accelerated Incomplete Cholesky (IC) factorization preconditioned CG method can outperform its CPU counterpart by a smaller factor, up to 3, and GPU-accelerated The incomplete LU (ILU) factorization preconditioned GMRES method can achieve a speed-up nearing 4. However, with better suited preconditioning techniques for GPUs, this performance can be further improved.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agarwal A, Levy M (2007) The kill rule for multicore. In: DAC’07: proceedings of the 44th annual design automation conference, New York, NY, USA. ACM, New York, pp 750–753 CrossRef Agarwal A, Levy M (2007) The kill rule for multicore. In: DAC’07: proceedings of the 44th annual design automation conference, New York, NY, USA. ACM, New York, pp 750–753 CrossRef
2.
Zurück zum Zitat Agullo E, Demmel J, Dongarra J, Hadri B, Kurzak J, Langou J, Ltaief H, Luszczek P, Tomov S (2009) Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J Phys Conf Ser 180(1):012037 CrossRef Agullo E, Demmel J, Dongarra J, Hadri B, Kurzak J, Langou J, Ltaief H, Luszczek P, Tomov S (2009) Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J Phys Conf Ser 180(1):012037 CrossRef
3.
Zurück zum Zitat Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a Multi-GPU platform. In: PDP’10: proceedings of the 2010 18th euromicro conference on parallel, distributed and network-based processing, Washington, DC, USA. IEEE Comput. Soc., Los Alamitos, pp 583–592 Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the Poisson problem on a Multi-GPU platform. In: PDP’10: proceedings of the 2010 18th euromicro conference on parallel, distributed and network-based processing, Washington, DC, USA. IEEE Comput. Soc., Los Alamitos, pp 583–592
4.
Zurück zum Zitat Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix-vector multiplication on GPUs. Tech report, IBM Research Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix-vector multiplication on GPUs. Tech report, IBM Research
5.
Zurück zum Zitat Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09: proceedings of the conference on high performance computing networking, storage and analysis, New York, NY, USA. ACM, New York, pp 1–11 CrossRef Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC’09: proceedings of the conference on high performance computing networking, storage and analysis, New York, NY, USA. ACM, New York, pp 1–11 CrossRef
6.
Zurück zum Zitat Bell N, Garland M (2010) Cusp: generic parallel algorithms for sparse matrix and graph computations. Version 0.1.0 Bell N, Garland M (2010) Cusp: generic parallel algorithms for sparse matrix and graph computations. Version 0.1.0
7.
Zurück zum Zitat Bolz J, Farmer I, Grinspun E, Schröoder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917–924 CrossRef Bolz J, Farmer I, Grinspun E, Schröoder P (2003) Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans Graph 22(3):917–924 CrossRef
8.
Zurück zum Zitat Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. ACM SIGPLAN Not 45:115–126 CrossRef Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix-vector multiply on GPUs. ACM SIGPLAN Not 45:115–126 CrossRef
9.
Zurück zum Zitat Davis PJ (1963) Interpolation and approximation. Blaisdell, Waltham MATH Davis PJ (1963) Interpolation and approximation. Blaisdell, Waltham MATH
10.
Zurück zum Zitat Davis TA (1994) University of Florinda sparse matrix collection, na digest Davis TA (1994) University of Florinda sparse matrix collection, na digest
11.
Zurück zum Zitat Erhel J, Guyomarc’H F, Saad Y (2001) Least-squares polynomial filters for ill-conditioned linear systems. Tech report umsi-2001-32, Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN Erhel J, Guyomarc’H F, Saad Y (2001) Least-squares polynomial filters for ill-conditioned linear systems. Tech report umsi-2001-32, Minnesota Supercomputer Institute, University of Minnesota, Minneapolis, MN
13.
Zurück zum Zitat Georgescu S, Okuda H (2007) Conjugate gradients on graphic hardware: performance & feasibility Georgescu S, Okuda H (2007) Conjugate gradients on graphic hardware: performance & feasibility
14.
Zurück zum Zitat Gupta R (2009) A GPU implementation of a bubbly flow solver. Master’s thesis, Delft Institute of Applied Mathematics, Delft University of Technology, 2628 BL, Delft, The Netherlands Gupta R (2009) A GPU implementation of a bubbly flow solver. Master’s thesis, Delft Institute of Applied Mathematics, Delft University of Technology, 2628 BL, Delft, The Netherlands
15.
Zurück zum Zitat Karypis G, Kumar V (1998) Metis—a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 4.0. Tech report, University of Minnesota, Department of Computer Science/Army HPC Research Center Karypis G, Kumar V (1998) Metis—a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, version 4.0. Tech report, University of Minnesota, Department of Computer Science/Army HPC Research Center
16.
Zurück zum Zitat Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Natl Bur Stand 45:255–282 MathSciNetCrossRef Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Natl Bur Stand 45:255–282 MathSciNetCrossRef
17.
Zurück zum Zitat Monakov A, Avetisyan A (2009) Implementing blocked sparse matrix-vector multiplication on nvidia GPUs. In: Bertels K, Dimopoulos N, Silvano C, Wong S (eds) Embedded computer systems: architectures, modeling, and simulation. Lecture notes in computer science, vol 5657. Springer, Berlin, pp 289–297 CrossRef Monakov A, Avetisyan A (2009) Implementing blocked sparse matrix-vector multiplication on nvidia GPUs. In: Bertels K, Dimopoulos N, Silvano C, Wong S (eds) Embedded computer systems: architectures, modeling, and simulation. Lecture notes in computer science, vol 5657. Springer, Berlin, pp 289–297 CrossRef
18.
Zurück zum Zitat Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt Y, Foglia P, Duesterwald E, Faraboschi P, Martorell X (eds) High performance embedded architectures and compilers. Lecture notes in computer science, vol 5952. Springer, Berlin, pp 111–125 CrossRef Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt Y, Foglia P, Duesterwald E, Faraboschi P, Martorell X (eds) High performance embedded architectures and compilers. Lecture notes in computer science, vol 5952. Springer, Berlin, pp 111–125 CrossRef
19.
Zurück zum Zitat NVIDIA (2012) CUBLAS library user guide 4.2 NVIDIA (2012) CUBLAS library user guide 4.2
20.
21.
Zurück zum Zitat NVIDIA (2012) NVIDIA CUDA C programming guide 4.2 NVIDIA (2012) NVIDIA CUDA C programming guide 4.2
22.
Zurück zum Zitat Oberhuber T, Suzuki A, Vacata J (2010) New row-grouped csr format for storing the sparse matrices on GPU with implementation in CUDA. CoRR abs/1012.2270 Oberhuber T, Suzuki A, Vacata J (2010) New row-grouped csr format for storing the sparse matrices on GPU with implementation in CUDA. CoRR abs/1012.2270
23.
24.
Zurück zum Zitat Saad Y (1990) SPARSKIT: A basic tool kit for sparse matrix computations. Tech report RIACS-90-20, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffett Field, CA Saad Y (1990) SPARSKIT: A basic tool kit for sparse matrix computations. Tech report RIACS-90-20, Research Institute for Advanced Computer Science, NASA Ames Research Center, Moffett Field, CA
26.
Zurück zum Zitat Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia MATHCrossRef Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia MATHCrossRef
27.
Zurück zum Zitat Sengupta S, Harris M, Zhang Y, Owens JD (2007) Scan primitives for GPU computing. Graphics hardware 2007. ACM, New York, pp 97–106 Sengupta S, Harris M, Zhang Y, Owens JD (2007) Scan primitives for GPU computing. Graphics hardware 2007. ACM, New York, pp 97–106
28.
Zurück zum Zitat Sudan H, Klie H, Li R, Saad Y (2010) High performance manycore solvers for reservoir simulation. In: 12th European conference on the mathematics of oil recovery Sudan H, Klie H, Li R, Saad Y (2010) High performance manycore solvers for reservoir simulation. In: 12th European conference on the mathematics of oil recovery
29.
Zurück zum Zitat Vázquez F, Garzon EM, Martinez JA, Fernandez JJ (2009) The sparse matrix vector product on GPUs. Tech report, Department of Computer Architecture and Electronics, University of Almeria Vázquez F, Garzon EM, Martinez JA, Fernandez JJ (2009) The sparse matrix vector product on GPUs. Tech report, Department of Computer Architecture and Electronics, University of Almeria
30.
Zurück zum Zitat Volkov V, Demmel J (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech report, Computer Science Division University of California at Berkeley Volkov V, Demmel J (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech report, Computer Science Division University of California at Berkeley
31.
Zurück zum Zitat Wang M, Klie H, Parashar M, Sudan H (2009) Solving sparse linear systems on nvidia tesla GPUs. In: ICCS’09: proceedings of the 9th international conference on computational science. Springer, Berlin, pp 864–873 Wang M, Klie H, Parashar M, Sudan H (2009) Solving sparse linear systems on nvidia tesla GPUs. In: ICCS’09: proceedings of the 9th international conference on computational science. Springer, Berlin, pp 864–873
32.
Zurück zum Zitat Williams S, Bell N, Choi JW, Garland M, Oliker L, Vuduc R (2010) Scientific computing with multicore and accelerators. CRC Press, Boca Raton, pp 83–109. Chap 5 CrossRef Williams S, Bell N, Choi JW, Garland M, Oliker L, Vuduc R (2010) Scientific computing with multicore and accelerators. CRC Press, Boca Raton, pp 83–109. Chap 5 CrossRef
33.
Zurück zum Zitat Zhou Y, Saad Y, Tiago ML, Chelikowsky JR (2006) Parallel self-consistent-field calculations via Chebyshev-filtered subspace acceleration. Phys Rev E 74:066704 CrossRef Zhou Y, Saad Y, Tiago ML, Chelikowsky JR (2006) Parallel self-consistent-field calculations via Chebyshev-filtered subspace acceleration. Phys Rev E 74:066704 CrossRef
Metadaten
Titel
GPU-accelerated preconditioned iterative linear solvers
verfasst von
Ruipeng Li
Yousef Saad
Publikationsdatum
01.02.2013
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-012-0825-3

Weitere Artikel der Ausgabe 2/2013

The Journal of Supercomputing 2/2013 Zur Ausgabe

Premium Partner