Skip to main content
Erschienen in: Computing 9/2020

24.06.2020 | Regular Paper

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

verfasst von: Utpal Kiran, Sachin Singh Gautam, Deepak Sharma

Erschienen in: Computing | Ausgabe 9/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Matrix-free solvers for finite element method (FEM) avoid assembly of elemental matrices and replace sparse matrix-vector multiplication required in iterative solution method by an element level dense matrix-vector product. In this paper, a novel matrix-free strategy for FEM is proposed which computes element level matrix-vector product by using only the symmetric part of the elemental matrices. The proposed strategy is developed to take advantage of the massive parallelism of Graphics Processing Unit (GPU). A unique data structure is also introduced which ensures localized and coalesced memory access suitable for a GPU while storing only the symmetric part of the elemental matrices. In addition, the proposed strategy emphasizes the efficient use of register cache, uniform workload distribution, reducing thread synchronization, and maintaining sufficient granularity to make the best use of GPU resources. The performance of the proposed strategy is evaluated by solving elasticity and heat conduction problems using 4-noded quadrilateral element with two degrees of freedom (DOFs) and one DOF per node, respectively. The performance is compared with the matrix-free solver strategies on GPU from the literature. It is found that a maximum speedup of 4.9 \(\times \) is obtained for the elasticity problem and a maximum of 3.2 \(\times \) speedup for the heat conduction problem. Further, the proposed strategy takes the least amount of GPU memory as compared to the existing strategies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abdelfattah A, Dongarra J, Keyes D, Ltaief H (2012) Optimizing memory-bound SYMV kernel on GPU hardware accelerators. In: International conference on high performance computing for computational science. Springer, pp 72–79 Abdelfattah A, Dongarra J, Keyes D, Ltaief H (2012) Optimizing memory-bound SYMV kernel on GPU hardware accelerators. In: International conference on high performance computing for computational science. Springer, pp 72–79
5.
Zurück zum Zitat Anzt H, Gates M, Dongarra J, Kreutzer M, Wellein G, Köhler M (2017) Preconditioned Krylov solvers on GPUs. Parallel Comput 68:32–44MathSciNetCrossRef Anzt H, Gates M, Dongarra J, Kreutzer M, Wellein G, Köhler M (2017) Preconditioned Krylov solvers on GPUs. Parallel Comput 68:32–44MathSciNetCrossRef
7.
Zurück zum Zitat Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, ACM, p 18 Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, ACM, p 18
9.
Zurück zum Zitat Carey GF, Jiang BN (1986) Element-by-element linear and nonlinear solution schemes. Int J Numer Methods Biomed Eng 2(2):145–153MATH Carey GF, Jiang BN (1986) Element-by-element linear and nonlinear solution schemes. Int J Numer Methods Biomed Eng 2(2):145–153MATH
10.
Zurück zum Zitat Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669CrossRef Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669CrossRef
12.
Zurück zum Zitat Corporation NVIDIA (2019) CUDA C programming guide. Version 10 Corporation NVIDIA (2019) CUDA C programming guide. Version 10
13.
Zurück zum Zitat Deakin T, McIntosh-Smith S (2015) GPU-STREAM: benchmarking the achievable memory bandwidth of graphics processing units. In: SuperComputing, IEEE/ACM, Austin, USA Deakin T, McIntosh-Smith S (2015) GPU-STREAM: benchmarking the achievable memory bandwidth of graphics processing units. In: SuperComputing, IEEE/ACM, Austin, USA
14.
Zurück zum Zitat Fehn N, Wall WA, Kronbichler M (2019) A matrix-free high-order discontinuous Galerkin compressible Navier–Stokes solver: a performance comparison of compressible and incompressible formulations for turbulent incompressible flows. Int J Numer Methods Fluids 89(3):71–102. https://doi.org/10.1002/fld.4683MathSciNetCrossRef Fehn N, Wall WA, Kronbichler M (2019) A matrix-free high-order discontinuous Galerkin compressible Navier–Stokes solver: a performance comparison of compressible and incompressible formulations for turbulent incompressible flows. Int J Numer Methods Fluids 89(3):71–102. https://​doi.​org/​10.​1002/​fld.​4683MathSciNetCrossRef
15.
Zurück zum Zitat Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw TOMS 43(4):30MathSciNetMATH Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw TOMS 43(4):30MathSciNetMATH
17.
Zurück zum Zitat Göddeke D (2011) Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. Logos Verlag Berlin GmbH Göddeke D (2011) Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. Logos Verlag Berlin GmbH
19.
Zurück zum Zitat Joldes GR, Wittek A, Miller K (2010) Real-time nonlinear finite element computations on GPU-application to neurosurgical simulation. Comput Methods Appl Mech Eng 199(49–52):3305–3314CrossRef Joldes GR, Wittek A, Miller K (2010) Real-time nonlinear finite element computations on GPU-application to neurosurgical simulation. Comput Methods Appl Mech Eng 199(49–52):3305–3314CrossRef
21.
Zurück zum Zitat Kiss I, Gyimothy S, Badics Z, Pavo J (2012) Parallel realization of the element-by-element FEM technique by CUDA. Magn IEEE Trans 48(2):507–510CrossRef Kiss I, Gyimothy S, Badics Z, Pavo J (2012) Parallel realization of the element-by-element FEM technique by CUDA. Magn IEEE Trans 48(2):507–510CrossRef
22.
Zurück zum Zitat Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460CrossRef Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460CrossRef
24.
Zurück zum Zitat Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466CrossRef Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466CrossRef
25.
Zurück zum Zitat Macioł P, Płaszewski P, Banaś K (2010) 3D finite element numerical integration on GPUs. Proc Comput Sci 1(1):1093–1100CrossRef Macioł P, Płaszewski P, Banaś K (2010) 3D finite element numerical integration on GPUs. Proc Comput Sci 1(1):1093–1100CrossRef
26.
Zurück zum Zitat Markall G, Slemmer A, Ham D, Kelly P, Cantwell C, Sherwin S (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71(1):80–97MathSciNetCrossRef Markall G, Slemmer A, Ham D, Kelly P, Cantwell C, Sherwin S (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71(1):80–97MathSciNetCrossRef
27.
Zurück zum Zitat Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18CrossRef Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18CrossRef
30.
Zurück zum Zitat Nath R, Tomov S, Dong TT, Dongarra J (2011) Optimizing symmetric dense matrix-vector multiplication on GPUs. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis. ACM, New York, NY, USA, SC ’11, pp 6:1–6:10. https://doi.org/10.1145/2063384.2063392 Nath R, Tomov S, Dong TT, Dongarra J (2011) Optimizing symmetric dense matrix-vector multiplication on GPUs. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis. ACM, New York, NY, USA, SC ’11, pp 6:1–6:10. https://​doi.​org/​10.​1145/​2063384.​2063392
31.
Zurück zum Zitat Ohshima S, Hayashi M, Katagiri T, Nakajima K (2013) Implementation and evaluation of 3D finite element method application for CUDA. In: Daydé M, Marques O, Nakajima K (eds) High performance computing for computational science—VECPAR 2012. Springer, Berlin, Heidelberg, pp 140–148CrossRef Ohshima S, Hayashi M, Katagiri T, Nakajima K (2013) Implementation and evaluation of 3D finite element method application for CUDA. In: Daydé M, Marques O, Nakajima K (eds) High performance computing for computational science—VECPAR 2012. Springer, Berlin, Heidelberg, pp 140–148CrossRef
33.
Zurück zum Zitat Ram L, Sharma D (2017) Evolutionary and GPU computing for topology optimization of structures. Swarm Evolut Comput 35:1–13CrossRef Ram L, Sharma D (2017) Evolutionary and GPU computing for topology optimization of structures. Swarm Evolut Comput 35:1–13CrossRef
34.
Zurück zum Zitat Reguly I, Giles M (2013) Finite element algorithms and data structures on graphical processing units. Int J Parallel Progr 43(2):203–239CrossRef Reguly I, Giles M (2013) Finite element algorithms and data structures on graphical processing units. Int J Parallel Progr 43(2):203–239CrossRef
37.
Zurück zum Zitat Sanfui S, Sharma D (2017) A two-kernel based strategy for performing assembly in FEA on the graphics processing unit. In: 2017 international conference on advances in mechanical, industrial, automation and management systems (AMIAMS), IEEE, pp 1–9 Sanfui S, Sharma D (2017) A two-kernel based strategy for performing assembly in FEA on the graphics processing unit. In: 2017 international conference on advances in mechanical, industrial, automation and management systems (AMIAMS), IEEE, pp 1–9
38.
Zurück zum Zitat Sanfui S, Sharma D (2019) Exploiting symmetry in elemental computation and assembly stage of GPU-accelerated FEA. In: Proceedings at the 10th international conference on computational methods (ICCM2019). ScienTech Publisher, pp 641–651 Sanfui S, Sharma D (2019) Exploiting symmetry in elemental computation and assembly stage of GPU-accelerated FEA. In: Proceedings at the 10th international conference on computational methods (ICCM2019). ScienTech Publisher, pp 641–651
40.
Zurück zum Zitat Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Tech. Rep, Pittsburgh Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Tech. Rep, Pittsburgh
43.
Zurück zum Zitat van Rietbergen B, Weinans H, Huiskes R, Polman B (1996) Computational strategies for iterative solutions of large FEM applications employing voxel data. Int J Numer Methods Eng 39(16):2743–2767CrossRef van Rietbergen B, Weinans H, Huiskes R, Polman B (1996) Computational strategies for iterative solutions of large FEM applications employing voxel data. Int J Numer Methods Eng 39(16):2743–2767CrossRef
46.
Zurück zum Zitat Zhang J, Shen D (2013) GPU-based implementation of finite element method for elasticity using CUDA. In: 2013 IEEE 10th international conference on high performance computing and communications, 2013 IEEE international conference on embedded and ubiquitous computing, pp 1003–1008. https://doi.org/10.1109/HPCC.and.EUC.2013.142 Zhang J, Shen D (2013) GPU-based implementation of finite element method for elasticity using CUDA. In: 2013 IEEE 10th international conference on high performance computing and communications, 2013 IEEE international conference on embedded and ubiquitous computing, pp 1003–1008. https://​doi.​org/​10.​1109/​HPCC.​and.​EUC.​2013.​142
Metadaten
Titel
GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices
verfasst von
Utpal Kiran
Sachin Singh Gautam
Deepak Sharma
Publikationsdatum
24.06.2020
Verlag
Springer Vienna
Erschienen in
Computing / Ausgabe 9/2020
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-020-00827-4

Weitere Artikel der Ausgabe 9/2020

Computing 9/2020 Zur Ausgabe

Premium Partner