Skip to main content
Erschienen in: Soft Computing 24/2018

12.08.2017 | Methodologies and Application

Parallel ILU preconditioners in GPU computation

verfasst von: Yan Chen, Xuhong Tian, Hui Liu, Zhangxin Chen, Bo Yang, Wenyuan Liao, Peng Zhang, Ruijian He, Min Yang

Erschienen in: Soft Computing | Ausgabe 24/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Accelerating large-scale linear solvers is always crucial for scientific research and industrial applications. In this regard, preconditioners play a key role in improving the performance of iterative linear solvers. This paper presents a summary and review of our work about the development of parallel ILU preconditioners on GPUs. The mechanisms of ILU(0), ILU(k), ILUT, enhanced ILUT, and block-wise ILU(k) are reviewed and analyzed, which give a clear guidance in the development of iterative linear solvers. ILU(0) is the most commonly used preconditioner, and the nonzero pattern of its matrix is exactly the same as the original matrix to be solved. ILU(k) uses k levels to control the pattern of its preconditioner matrix. ILUT selects entries for its preconditioner matrix by setting thresholds without considering its original matrix pattern. In addition to point-wise ILU preconditioners, a block-wise ILU(k) preconditioner is designed delicately in support of block-wise matrices. In implementation, the RAS (Restricted Additive Schwarz) method is adopted to optimize the parallel structure of a preconditioner matrix. Coupling with the configuration parameters of ILU preconditioners, a complex situation appears in the parallel solution process, so decoupled algorithms are adopted. These algorithms are implemented and tested on NVIDIA GPUs. The experiment results show that a single-GPU implementation can speed up an ILU preconditioner by a factor of 10, compared to traditional CPU implementation. The results also show that the ILU(0) has better speedup than ILU(k) but slower convergence than ILU(k). Level k of ILU(k) and threshold (pt) of ILUT are effective adjustment factors for controlling the equilibrium point between acceleration and convergence for ILU(k) and ILUT, respectively. All these ILU preconditioners are characterized and compared in this work, which shows a clear picture and numerical insights for practitioners in the ILU family.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Vander VH (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. SIAM, PhiladelphiaCrossRef Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Vander VH (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. SIAM, PhiladelphiaCrossRef
Zurück zum Zitat Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA, NVIDIA Technical Report, NVR-2008-004, NVIDIA Corporation Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA, NVIDIA Technical Report, NVR-2008-004, NVIDIA Corporation
Zurück zum Zitat Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the supercomputing Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the supercomputing
Zurück zum Zitat Bell N, Dalton S, Olson L (2011) Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods, NVIDIA Technical Report NVR-2011-002 Bell N, Dalton S, Olson L (2011) Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods, NVIDIA Technical Report NVR-2011-002
Zurück zum Zitat Cai X-C, Sarkis M (1999) A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J Sci Comput 21:792–797MathSciNetCrossRef Cai X-C, Sarkis M (1999) A restricted additive Schwarz preconditioner for general sparse linear systems. SIAM J Sci Comput 21:792–797MathSciNetCrossRef
Zurück zum Zitat Cao H, Tchelepi HA, Wallis JR, Yardumian HE (2005) Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In: SPE annual technical conference and exhibition Cao H, Tchelepi HA, Wallis JR, Yardumian HE (2005) Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In: SPE annual technical conference and exhibition
Zurück zum Zitat Chen Z, Zhang Y (2008) Development, analysis and numerical tests of a compositional reservoir simulator. Int J Numer Anal Model 4:86–100MathSciNetMATH Chen Z, Zhang Y (2008) Development, analysis and numerical tests of a compositional reservoir simulator. Int J Numer Anal Model 4:86–100MathSciNetMATH
Zurück zum Zitat Chen Z, Ewing RE, Lazarov RD, Maliassov S, Kuznetsov YA (1996) Multilevel preconditioners for mixed methods for second order elliptic problems. Numer Linear Algebra Appl 3(5):427–453MathSciNetCrossRef Chen Z, Ewing RE, Lazarov RD, Maliassov S, Kuznetsov YA (1996) Multilevel preconditioners for mixed methods for second order elliptic problems. Numer Linear Algebra Appl 3(5):427–453MathSciNetCrossRef
Zurück zum Zitat Chen Z, Huan G, Ma Y (2006) Computational methods for multiphase flows in porous media. In: The computational science and engineering series, vol 2. SIAM, Philadelphia Chen Z, Huan G, Ma Y (2006) Computational methods for multiphase flows in porous media. In: The computational science and engineering series, vol 2. SIAM, Philadelphia
Zurück zum Zitat Chen Z, Liu H, Yang B (2013a) Parallel triangular solvers on GPU. In: Proceedings of international workshop on data-intensive scientific discovery (DISD), Shanghai University, Shanghai, China Chen Z, Liu H, Yang B (2013a) Parallel triangular solvers on GPU. In: Proceedings of international workshop on data-intensive scientific discovery (DISD), Shanghai University, Shanghai, China
Zurück zum Zitat Chen Z, Liu H, Yu S (2013b) Development of algebraic multigrid solvers using GPUs, SPE-163661-MS. In: SPE reservoir simulation symposium, 18–20 February. The Woodlands, TX, USA Chen Z, Liu H, Yu S (2013b) Development of algebraic multigrid solvers using GPUs, SPE-163661-MS. In: SPE reservoir simulation symposium, 18–20 February. The Woodlands, TX, USA
Zurück zum Zitat Chen Z, Liu H, Yang B (2015) Accelerating iterative linear solvers using multiple graphical processing units. Int J Comput Math 97:1422–1438MathSciNetCrossRef Chen Z, Liu H, Yang B (2015) Accelerating iterative linear solvers using multiple graphical processing units. Int J Comput Math 97:1422–1438MathSciNetCrossRef
Zurück zum Zitat Chen Y, Liu H, Wang K, Chen Z, Zhang P (2016) Large-scale reservoir simulations on parallel computers. In: Proceedings of the 2nd IEEE international conference on high performance and smart computing (HPSC 2016), New York, NY, April 9–10. doi:10.1109/BigDataSecurity-HPSC-IDS.2016.20 Chen Y, Liu H, Wang K, Chen Z, Zhang P (2016) Large-scale reservoir simulations on parallel computers. In: Proceedings of the 2nd IEEE international conference on high performance and smart computing (HPSC 2016), New York, NY, April 9–10. doi:10.​1109/​BigDataSecurity-HPSC-IDS.​2016.​20
Zurück zum Zitat Davis TA (1994) University of Florida sparse matrix collection, NA digest Davis TA (1994) University of Florida sparse matrix collection, NA digest
Zurück zum Zitat Deng W, Zhao H, Liu J, Yan X, Li Y, Yin L, Ding C (2015) An improved CACO algorithm based on adaptive method and multi-variant strategies. Soft Comput 19(3):701–713CrossRef Deng W, Zhao H, Liu J, Yan X, Li Y, Yin L, Ding C (2015) An improved CACO algorithm based on adaptive method and multi-variant strategies. Soft Comput 19(3):701–713CrossRef
Zurück zum Zitat Fu Z, Sun X, Liu Q, Zhou L, Shu J (2015) Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans Commun E98–B(1):190–200. doi:10.1587/transcom.E98.B.190 CrossRef Fu Z, Sun X, Liu Q, Zhou L, Shu J (2015) Achieving efficient cloud search services: multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans Commun E98–B(1):190–200. doi:10.​1587/​transcom.​E98.​B.​190 CrossRef
Zurück zum Zitat Haase G, Liebmann M, Douglas CC, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units, high performance computing and applications, pp 38–47 Haase G, Liebmann M, Douglas CC, Plank G (2010) A parallel algebraic multigrid solver on graphics processing units, high performance computing and applications, pp 38–47
Zurück zum Zitat Heuveline et al. V (2011) Enhanced parallel ILU(p)-based preconditioners for multi-core CPUs and GPUs, The Power(q)-pattern Method, EMCL Preprint 2011-08 Heuveline et al. V (2011) Enhanced parallel ILU(p)-based preconditioners for multi-core CPUs and GPUs, The Power(q)-pattern Method, EMCL Preprint 2011-08
Zurück zum Zitat Hu X, Liu W, Qin G, Xu J, Yan Y, Zhang C (2011) Development of a fast auxiliary subspace pre-conditioner for numerical reservoir simulators. In: SPE reservoir characterisation and simulation conference and exhibition, 9C11 October, Abu Dhabi, UAE, SPE-148388-MS Hu X, Liu W, Qin G, Xu J, Yan Y, Zhang C (2011) Development of a fast auxiliary subspace pre-conditioner for numerical reservoir simulators. In: SPE reservoir characterisation and simulation conference and exhibition, 9C11 October, Abu Dhabi, UAE, SPE-148388-MS
Zurück zum Zitat Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, ISBN: 978-0-12-381472-2 Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, ISBN: 978-0-12-381472-2
Zurück zum Zitat Klie H, Sudan H, Li R, Saad Y (2011) Exploiting capabilities of many core platforms in reservoir simulation. In: SPE RSS reservoir simulation symposium, 21–23 February Klie H, Sudan H, Li R, Saad Y (2011) Exploiting capabilities of many core platforms in reservoir simulation. In: SPE RSS reservoir simulation symposium, 21–23 February
Zurück zum Zitat Li R, Saad Y (2010) GPU-accelerated preconditioned iterative linear solvers, Technical Report umsi-2010-112. University of Minnesota, Minneapolis, MN, Minnesota Supercomputer Institute Li R, Saad Y (2010) GPU-accelerated preconditioned iterative linear solvers, Technical Report umsi-2010-112. University of Minnesota, Minneapolis, MN, Minnesota Supercomputer Institute
Zurück zum Zitat Liu H, Yu S, Chen Z, Hsieh B, Shao L (2012) Sparse matrix-vector multiplication on NVIDIA GPU. Int J Numer Anal Model Ser B 3(2):185–191MathSciNetMATH Liu H, Yu S, Chen Z, Hsieh B, Shao L (2012) Sparse matrix-vector multiplication on NVIDIA GPU. Int J Numer Anal Model Ser B 3(2):185–191MathSciNetMATH
Zurück zum Zitat Liu H, Chen Z, Yu S, Hsieh B, Shao L (2014) Development of a restricted additive Schwarz preconditioner for sparse linear systems on NVIDIA GPU. Int J Numer Anal Model Ser B Comput Inf 5(1–2):13–20MathSciNetMATH Liu H, Chen Z, Yu S, Hsieh B, Shao L (2014) Development of a restricted additive Schwarz preconditioner for sparse linear systems on NVIDIA GPU. Int J Numer Anal Model Ser B Comput Inf 5(1–2):13–20MathSciNetMATH
Zurück zum Zitat Liu H, Wang K, Chen Z (2016a) A family of constrained pressure residual preconditioners for parallel reservoir simulations. Numer Linear Algebra Appl 23(1):120–146MathSciNetCrossRef Liu H, Wang K, Chen Z (2016a) A family of constrained pressure residual preconditioners for parallel reservoir simulations. Numer Linear Algebra Appl 23(1):120–146MathSciNetCrossRef
Zurück zum Zitat Liu Q, Cai W, Shen J, Fu Z, Liu X, Linge N (2016b) A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment. Secur Commun Netw 9(17):4002–4012. doi:10.1002/sec.1582 CrossRef Liu Q, Cai W, Shen J, Fu Z, Liu X, Linge N (2016b) A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment. Secur Commun Netw 9(17):4002–4012. doi:10.​1002/​sec.​1582 CrossRef
Zurück zum Zitat Liu H, Zhang P, Wang K, Yang B, Chen Z (2016c) Performance and scalability analysis for parallel reservoir simulations on three supercomputer architectures. In: Proceedings of the 2016 XSEDE conference: diversity, big data, & science at scale, Miami, FL, USA. doi:10.1145/2949550.2949577 Liu H, Zhang P, Wang K, Yang B, Chen Z (2016c) Performance and scalability analysis for parallel reservoir simulations on three supercomputer architectures. In: Proceedings of the 2016 XSEDE conference: diversity, big data, & science at scale, Miami, FL, USA. doi:10.​1145/​2949550.​2949577
Zurück zum Zitat Lukarski D, Anzt H, Tomov S, Dongarra J (2014) Multi-elimination ILU preconditioners on GPUs, Technical report UT-CS-14-723. University of Tennessee, Innovative Computing Laboratory Lukarski D, Anzt H, Tomov S, Dongarra J (2014) Multi-elimination ILU preconditioners on GPUs, Technical report UT-CS-14-723. University of Tennessee, Innovative Computing Laboratory
Zurück zum Zitat NVIDIA Corporation (2010) Nvidia CUDA programming guide (version 3.2) NVIDIA Corporation (2010) Nvidia CUDA programming guide (version 3.2)
Zurück zum Zitat Qu Z, Keeney J, Robitzsch S, Zaman F, Wang X (2016) Multilevel pattern mining architecture for automatic network monitoring in heterogeneous wireless communication networks. China Commun 13(7):108–116. doi:10.1109/CC.2016.7559082 CrossRef Qu Z, Keeney J, Robitzsch S, Zaman F, Wang X (2016) Multilevel pattern mining architecture for automatic network monitoring in heterogeneous wireless communication networks. China Commun 13(7):108–116. doi:10.​1109/​CC.​2016.​7559082 CrossRef
Zurück zum Zitat Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, PhiladelphiaCrossRef Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, PhiladelphiaCrossRef
Zurück zum Zitat Tian Q, Chen S (2017) Cross-heterogeneous-database age estimation through correlation representation learning. Neurocomputing 238:286–295CrossRef Tian Q, Chen S (2017) Cross-heterogeneous-database age estimation through correlation representation learning. Neurocomputing 238:286–295CrossRef
Zurück zum Zitat Vinsome PKW (1976) An iterative method for solving sparse sets of simultaneous linear equations. In: SPE symposium on numerical simulation of reservoir performance, Los Angeles, CA Vinsome PKW (1976) An iterative method for solving sparse sets of simultaneous linear equations. In: SPE symposium on numerical simulation of reservoir performance, Los Angeles, CA
Zurück zum Zitat Yang B, Liu H, Chen Z (2016) GPU-accelerated preconditioned GMRES solver. In: The 2nd IEEE international conference on high performance and smart computing, IEEE HPSC 2016, 8–10 April, Columbia University, New York, USA Yang B, Liu H, Chen Z (2016) GPU-accelerated preconditioned GMRES solver. In: The 2nd IEEE international conference on high performance and smart computing, IEEE HPSC 2016, 8–10 April, Columbia University, New York, USA
Zurück zum Zitat Yuan C, Xia Z, Sun X (2017) Coverless image steganography based on SIFT and BOF. J Internet Technol 18(2):209–216 Yuan C, Xia Z, Sun X (2017) Coverless image steganography based on SIFT and BOF. J Internet Technol 18(2):209–216
Zurück zum Zitat Zhang P, Gao Y (2015) Matrix multiplication on high-density multi-GPU architectures: theoretical and experimental investigations. Lect Notes Comput Sci 9137(20):17–30CrossRef Zhang P, Gao Y (2015) Matrix multiplication on high-density multi-GPU architectures: theoretical and experimental investigations. Lect Notes Comput Sci 9137(20):17–30CrossRef
Metadaten
Titel
Parallel ILU preconditioners in GPU computation
verfasst von
Yan Chen
Xuhong Tian
Hui Liu
Zhangxin Chen
Bo Yang
Wenyuan Liao
Peng Zhang
Ruijian He
Min Yang
Publikationsdatum
12.08.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 24/2018
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-017-2764-7

Weitere Artikel der Ausgabe 24/2018

Soft Computing 24/2018 Zur Ausgabe