Skip to main content
Top
Published in:
Cover of the book

2015 | OriginalPaper | Chapter

Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs

Authors : Edmond Chow, Hartwig Anzt, Jack Dongarra

Published in: High Performance Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents a GPU implementation of an asynchronous iterative algorithm for computing incomplete factorizations. Asynchronous algorithms, with their ability to tolerate memory latency, form an important class of algorithms for modern computer architectures. Our GPU implementation considers several non-traditional techniques that can be important for asynchronous algorithms to optimize convergence and data locality. These techniques include controlling the order in which variables are updated by controlling the order of execution of thread blocks, taking advantage of cache reuse between thread blocks, and managing the amount of parallelism to control the convergence of the algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A block-asynchronous relaxation method for graphics processing units. J. Parallel Distrib. Comput. 73(12), 1613–1626 (2013)CrossRef Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A block-asynchronous relaxation method for graphics processing units. J. Parallel Distrib. Comput. 73(12), 1613–1626 (2013)CrossRef
2.
go back to reference Benzi, M., Joubert, W., Mateescu, G.: Numerical experiments with parallel orderings for ILU preconditioners. Electron. Trans. Numer. Anal. 8, 88–114 (1999)MATHMathSciNet Benzi, M., Joubert, W., Mateescu, G.: Numerical experiments with parallel orderings for ILU preconditioners. Electron. Trans. Numer. Anal. 8, 88–114 (1999)MATHMathSciNet
3.
go back to reference Bergman, K. et al.: ExaScale computing study: technology challenges in achieving exascale systems (2008) Bergman, K. et al.: ExaScale computing study: technology challenges in achieving exascale systems (2008)
4.
go back to reference Chow, E., Patel, A.: Fine-grained parallel incomplete LU factorization. SIAM J. Sci. Comput. 37, C169–C193 (2015)MathSciNetCrossRef Chow, E., Patel, A.: Fine-grained parallel incomplete LU factorization. SIAM J. Sci. Comput. 37, C169–C193 (2015)MathSciNetCrossRef
5.
go back to reference Contassot-Vivier, S., Jost, T., Vialle, S.: Impact of asynchronism on gpu accelerated parallel iterative computations. In: Jónasson, K. (ed.) PARA 2010, Part I. LNCS, vol. 7133, pp. 43–53. Springer, Heidelberg (2012) CrossRef Contassot-Vivier, S., Jost, T., Vialle, S.: Impact of asynchronism on gpu accelerated parallel iterative computations. In: Jónasson, K. (ed.) PARA 2010, Part I. LNCS, vol. 7133, pp. 43–53. Springer, Heidelberg (2012) CrossRef
11.
go back to reference Lukarski, D.: Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms - Parallel Solvers and Preconditioners. Ph.D. thesis, Karlsruhe Institute of Technology (KIT), Germany (2012) Lukarski, D.: Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms - Parallel Solvers and Preconditioners. Ph.D. thesis, Karlsruhe Institute of Technology (KIT), Germany (2012)
12.
go back to reference Naumov, M.: Parallel incomplete-LU and Cholesky factorization in the preconditioned iterative methods on the GPU. Technical report. NVR-2012-003, NVIDIA (2012) Naumov, M.: Parallel incomplete-LU and Cholesky factorization in the preconditioned iterative methods on the GPU. Technical report. NVR-2012-003, NVIDIA (2012)
13.
go back to reference NVIDIA Corporation: NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110. Whitepaper (2012) NVIDIA Corporation: NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110. Whitepaper (2012)
14.
go back to reference NVIDIA Corporation: CUSPARSE LIBRARY, July 2013 NVIDIA Corporation: CUSPARSE LIBRARY, July 2013
15.
go back to reference NVIDIA Corporation: NVIDIA CUDA TOOLKIT V6.0, July 2013 NVIDIA Corporation: NVIDIA CUDA TOOLKIT V6.0, July 2013
17.
go back to reference Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia (2003) MATHCrossRef Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia (2003) MATHCrossRef
18.
go back to reference Venkatasubramanian, S., Vuduc, R.W.: Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: Proceedings of the 23rd International Conference on Supercomputing, ICS 2009, pp. 244–255. ACM, New York (2009) Venkatasubramanian, S., Vuduc, R.W.: Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: Proceedings of the 23rd International Conference on Supercomputing, ICS 2009, pp. 244–255. ACM, New York (2009)
19.
go back to reference Volkov, V.: Better performance at lower occupancy. In: GPU Technology Conference (2010) Volkov, V.: Better performance at lower occupancy. In: GPU Technology Conference (2010)
Metadata
Title
Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs
Authors
Edmond Chow
Hartwig Anzt
Jack Dongarra
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-20119-1_1