Skip to main content
Top

2016 | OriginalPaper | Chapter

Domain Overlap for Iterative Sparse Triangular Solves on GPUs

Authors : Hartwig Anzt, Edmond Chow, Daniel B. Szyld, Jack Dongarra

Published in: Software for Exascale Computing - SPPEXA 2013-2015

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Iterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the block-asynchronous Jacobi method with directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and time-to-solution.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
2.
go back to reference Anderson, E.C., Saad, Y.: Solving sparse triangular systems on parallel computers. Int. J. High Speed Comput. 1, 73–96 (1989)CrossRefMATH Anderson, E.C., Saad, Y.: Solving sparse triangular systems on parallel computers. Int. J. High Speed Comput. 1, 73–96 (1989)CrossRefMATH
3.
go back to reference Anzt, H.: Asynchronous and multiprecision linear solvers – scalable and fault-tolerant numerics for energy efficient high performance computing. Ph.D. thesis, Karlsruhe Institute of Technology, Institute for Applied and Numerical Mathematics (2012) Anzt, H.: Asynchronous and multiprecision linear solvers – scalable and fault-tolerant numerics for energy efficient high performance computing. Ph.D. thesis, Karlsruhe Institute of Technology, Institute for Applied and Numerical Mathematics (2012)
4.
go back to reference Anzt, H., Luszczek, P., Dongarra, J., Heuveline, V.: GPU-accelerated asynchronous error correction for mixed precision iterative refinement. In: Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, pp. 908–919. Springer, Berlin/New York (2012) Anzt, H., Luszczek, P., Dongarra, J., Heuveline, V.: GPU-accelerated asynchronous error correction for mixed precision iterative refinement. In: Euro-Par 2012 Parallel Processing. Lecture Notes in Computer Science, pp. 908–919. Springer, Berlin/New York (2012)
5.
go back to reference Anzt, H., Tomov, S., Gates, M., Dongarra, J., Heuveline, V.: Block-asynchronous multigrid smoothers for GPU-accelerated systems. In: Ali, H.H., Shi, Y., Khazanchi, D., Lees, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS. Procedia Computer Science, vol. 9, pp. 7–16. Elsevier, Amsterdam (2012) Anzt, H., Tomov, S., Gates, M., Dongarra, J., Heuveline, V.: Block-asynchronous multigrid smoothers for GPU-accelerated systems. In: Ali, H.H., Shi, Y., Khazanchi, D., Lees, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS. Procedia Computer Science, vol. 9, pp. 7–16. Elsevier, Amsterdam (2012)
6.
go back to reference Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A block-asynchronous relaxation method for graphics processing units. J. Parallel Distrib. Comput. 73 (12), 1613–1626 (2013)CrossRef Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A block-asynchronous relaxation method for graphics processing units. J. Parallel Distrib. Comput. 73 (12), 1613–1626 (2013)CrossRef
7.
go back to reference Anzt, H., Chow, E., Dongarra, J.: Iterative sparse triangular solves for preconditioning. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015: Parallel Processing. Lecture Notes in Computer Science, vol. 9233, pp. 650–661. Springer, Berlin/Heidelberg (2015)CrossRef Anzt, H., Chow, E., Dongarra, J.: Iterative sparse triangular solves for preconditioning. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015: Parallel Processing. Lecture Notes in Computer Science, vol. 9233, pp. 650–661. Springer, Berlin/Heidelberg (2015)CrossRef
8.
go back to reference Anzt, H., Dongarra, J., Quintana-Ortí, E.S.: Tuning stationary iterative solvers for fault resilience. In: Proceedings of the 6th workshop on latest advances in scalable algorithms for large-scale systems, ScalA’15, pp. 1:1–1:8. ACM, New York (2015) Anzt, H., Dongarra, J., Quintana-Ortí, E.S.: Tuning stationary iterative solvers for fault resilience. In: Proceedings of the 6th workshop on latest advances in scalable algorithms for large-scale systems, ScalA’15, pp. 1:1–1:8. ACM, New York (2015)
9.
10.
go back to reference Benzi, M., Szyld, D.B., van Duin, A.: Orderings for incomplete factorization preconditionings of nonsymmetric problems. SIAM J. Sci. Comput. 20, 1652–1670 (1999)MathSciNetCrossRefMATH Benzi, M., Szyld, D.B., van Duin, A.: Orderings for incomplete factorization preconditionings of nonsymmetric problems. SIAM J. Sci. Comput. 20, 1652–1670 (1999)MathSciNetCrossRefMATH
11.
go back to reference Benzi, M., Frommer, A., Nabben, R., Szyld, D.B.: Algebraic theory of multiplicative Schwarz methods. Numer. Math. 89, 605–639 (2001)MathSciNetCrossRefMATH Benzi, M., Frommer, A., Nabben, R., Szyld, D.B.: Algebraic theory of multiplicative Schwarz methods. Numer. Math. 89, 605–639 (2001)MathSciNetCrossRefMATH
12.
go back to reference Cai, X.C., Sarkis, M.: A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J. Sci. Comput. 21, 792–797 (1999)MathSciNetCrossRefMATH Cai, X.C., Sarkis, M.: A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J. Sci. Comput. 21, 792–797 (1999)MathSciNetCrossRefMATH
15.
go back to reference Chow, E., Anzt, H., Dongarra, J.: Asynchronous iterative algorithm for computing incomplete factorizations on GPUs. In: Kunkel, J., Ludwig, T. (eds.) Proceedings of 30th International Conference, ISC High Performance 2015. Lecture Notes in Computer Science, vol. 9137, pp. 1–16. Springer, Cham (2015) Chow, E., Anzt, H., Dongarra, J.: Asynchronous iterative algorithm for computing incomplete factorizations on GPUs. In: Kunkel, J., Ludwig, T. (eds.) Proceedings of 30th International Conference, ISC High Performance 2015. Lecture Notes in Computer Science, vol. 9137, pp. 1–16. Springer, Cham (2015)
17.
go back to reference Duin, A.C.N.V.: Scalable parallel preconditioning with the sparse approximate inverse of triangular matrices. SIAM J. Matrix Anal. Appl. 20, 987–1006 (1996)MathSciNetCrossRefMATH Duin, A.C.N.V.: Scalable parallel preconditioning with the sparse approximate inverse of triangular matrices. SIAM J. Matrix Anal. Appl. 20, 987–1006 (1996)MathSciNetCrossRefMATH
19.
go back to reference Frommer, A., Szyld, D.B.: An algebraic convergence theory for restricted additive Schwarz methods using weighted max norms. SIAM J. Numer. Anal. 39, 463–479 (2001)MathSciNetCrossRefMATH Frommer, A., Szyld, D.B.: An algebraic convergence theory for restricted additive Schwarz methods using weighted max norms. SIAM J. Numer. Anal. 39, 463–479 (2001)MathSciNetCrossRefMATH
20.
go back to reference Frommer, A., Schwandt, H., Szyld, D.B.: Asynchronous weighted additive Schwarz methods. Electron. Trans. Numer. Anal. 5, 48–61 (1997)MathSciNetMATH Frommer, A., Schwandt, H., Szyld, D.B.: Asynchronous weighted additive Schwarz methods. Electron. Trans. Numer. Anal. 5, 48–61 (1997)MathSciNetMATH
21.
go back to reference Hammond, S.W., Schreiber, R.: Efficient ICCG on a shared memory multiprocessor. Int. J. High Speed Comput. 4, 1–21 (1992)CrossRef Hammond, S.W., Schreiber, R.: Efficient ICCG on a shared memory multiprocessor. Int. J. High Speed Comput. 4, 1–21 (1992)CrossRef
22.
23.
go back to reference Naumov, M.: Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. Technical Report, NVR-2011-001, NVIDIA (2011) Naumov, M.: Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. Technical Report, NVR-2011-001, NVIDIA (2011)
25.
go back to reference NVIDIA Corporation: NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 2.3.1 edn. (2009) NVIDIA Corporation: NVIDIA CUDA Compute Unified Device Architecture Programming Guide, 2.3.1 edn. (2009)
26.
go back to reference NVIDIA Corporation: NVIDIA CUDA TOOLKIT V7.0 (2015) NVIDIA Corporation: NVIDIA CUDA TOOLKIT V7.0 (2015)
27.
go back to reference Pothen, A., Alvarado, F.: A fast reordering algorithm for parallel sparse triangular solution. SIAM J. Sci. Stat. Comput. 13, 645–653 (1992)MathSciNetCrossRefMATH Pothen, A., Alvarado, F.: A fast reordering algorithm for parallel sparse triangular solution. SIAM J. Sci. Stat. Comput. 13, 645–653 (1992)MathSciNetCrossRefMATH
28.
29.
go back to reference Saad, Y., Zhang, J.: Bilum: block versions of multi-elimination and multi-level ILU preconditioner for general sparse linear systems. SIAM J. Sci. Comput. 20, 2103–2121 (1997)MathSciNetCrossRefMATH Saad, Y., Zhang, J.: Bilum: block versions of multi-elimination and multi-level ILU preconditioner for general sparse linear systems. SIAM J. Sci. Comput. 20, 2103–2121 (1997)MathSciNetCrossRefMATH
30.
go back to reference Saltz, J.H.: Aggregation methods for solving sparse triangular systems on multiprocessors. SIAM J. Sci. Stat. Comput. 11, 123–144 (1990)MathSciNetCrossRefMATH Saltz, J.H.: Aggregation methods for solving sparse triangular systems on multiprocessors. SIAM J. Sci. Stat. Comput. 11, 123–144 (1990)MathSciNetCrossRefMATH
31.
go back to reference Shang, Y.: A parallel finite element variational multiscale method based on fully overlapping domain decomposition for incompressible flows. Numer. Methods Partial Differ. Equ. 31, 856–875 (2015)MathSciNetCrossRefMATH Shang, Y.: A parallel finite element variational multiscale method based on fully overlapping domain decomposition for incompressible flows. Numer. Methods Partial Differ. Equ. 31, 856–875 (2015)MathSciNetCrossRefMATH
32.
go back to reference Smith, B.F., Bjørstad, P.E., Gropp, W.D.: Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, New York (1996)MATH Smith, B.F., Bjørstad, P.E., Gropp, W.D.: Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations. Cambridge University Press, New York (1996)MATH
33.
go back to reference Szyld, D.B.: Different models of parallel asynchronous iterations with overlapping blocks. Comput. Appl. Math. 17, 101–115 (1998)MathSciNetMATH Szyld, D.B.: Different models of parallel asynchronous iterations with overlapping blocks. Comput. Appl. Math. 17, 101–115 (1998)MathSciNetMATH
34.
go back to reference Toselli, A., Widlund, O.B.: Domain Decomposition Methods – Algorithms and Theory. Springer Series in Computational Mathematics, vol. 34. Springer, Berlin/Heidelberg (2005) Toselli, A., Widlund, O.B.: Domain Decomposition Methods – Algorithms and Theory. Springer Series in Computational Mathematics, vol. 34. Springer, Berlin/Heidelberg (2005)
Metadata
Title
Domain Overlap for Iterative Sparse Triangular Solves on GPUs
Authors
Hartwig Anzt
Edmond Chow
Daniel B. Szyld
Jack Dongarra
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-40528-5_24

Premium Partner