nach oben

Computing

Erschienen in:

24.06.2020 | Regular Paper

GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices

verfasst von: Utpal Kiran, Sachin Singh Gautam, Deepak Sharma

Erschienen in: Computing | Ausgabe 9/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Matrix-free solvers for finite element method (FEM) avoid assembly of elemental matrices and replace sparse matrix-vector multiplication required in iterative solution method by an element level dense matrix-vector product. In this paper, a novel matrix-free strategy for FEM is proposed which computes element level matrix-vector product by using only the symmetric part of the elemental matrices. The proposed strategy is developed to take advantage of the massive parallelism of Graphics Processing Unit (GPU). A unique data structure is also introduced which ensures localized and coalesced memory access suitable for a GPU while storing only the symmetric part of the elemental matrices. In addition, the proposed strategy emphasizes the efficient use of register cache, uniform workload distribution, reducing thread synchronization, and maintaining sufficient granularity to make the best use of GPU resources. The performance of the proposed strategy is evaluated by solving elasticity and heat conduction problems using 4-noded quadrilateral element with two degrees of freedom (DOFs) and one DOF per node, respectively. The performance is compared with the matrix-free solver strategies on GPU from the literature. It is found that a maximum speedup of 4.9 \(\times \) is obtained for the elasticity problem and a maximum of 3.2 \(\times \) speedup for the heat conduction problem. Further, the proposed strategy takes the least amount of GPU memory as compared to the existing strategies.

Nächster Artikel Uniform reliable broadcast in anonymous distributed systems with fair lossy channels

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abdelfattah A, Dongarra J, Keyes D, Ltaief H (2012) Optimizing memory-bound SYMV kernel on GPU hardware accelerators. In: International conference on high performance computing for computational science. Springer, pp 72–79

Ahamed AKC, Magoulès F (2017) Conjugate gradient method with graphics processing unit acceleration: CUDA vs OpenCL. Adv Eng Softw 111:32–42. https://doi.org/10.1016/j.advengsoft.2016.10.002CrossRef

Alexandersen J, Sigmund O, Aage N (2016) Large scale three-dimensional topology optimisation of heat sinks cooled by natural convection. Int J Heat Mass Transf 100:876–891. https://doi.org/10.1016/j.ijheatmasstransfer.2016.05.013CrossRef

Altinkaynak A (2017) An efficient sparse matrix-vector multiplication on CUDA-enabled graphic processing units for finite element method simulations. Int J Numer Methods Eng 110(1):57–78. https://doi.org/10.1002/nme.5346MathSciNetCrossRefMATH

Anzt H, Gates M, Dongarra J, Kreutzer M, Wellein G, Köhler M (2017) Preconditioned Krylov solvers on GPUs. Parallel Comput 68:32–44MathSciNetCrossRef

Bauer S, Drzisga D, Mohr M, Rüde U, Waluga C, Wohlmuth B (2018) A stencil scaling approach for accelerating matrix-free finite element implementations. SIAM J Sci Comput 40(6):C748–C778. https://doi.org/10.1137/17M1148384MathSciNetCrossRefMATH

Bell N, Garland M (2009) Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis, ACM, p 18

Cai Y, Li G, Wang H (2013) A parallel node-based solution scheme for implicit finite element method using GPU. Proc Eng 61:318–324. https://doi.org/10.1016/j.proeng.2013.08.022CrossRef

Carey GF, Jiang BN (1986) Element-by-element linear and nonlinear solution schemes. Int J Numer Methods Biomed Eng 2(2):145–153MATH

10.

Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669CrossRef

11.

Charara A, Keyes D, Ltaief H (2019) Batched triangular dense linear algebra kernels for very small matrix sizes on GPUs. ACM Trans Math Softw TOMS 45(2):15:1–15:28. https://doi.org/10.1145/3267101MathSciNetCrossRefMATH

12.

Corporation NVIDIA (2019) CUDA C programming guide. Version 10

13.

Deakin T, McIntosh-Smith S (2015) GPU-STREAM: benchmarking the achievable memory bandwidth of graphics processing units. In: SuperComputing, IEEE/ACM, Austin, USA

14.

Fehn N, Wall WA, Kronbichler M (2019) A matrix-free high-order discontinuous Galerkin compressible Navier–Stokes solver: a performance comparison of compressible and incompressible formulations for turbulent incompressible flows. Int J Numer Methods Fluids 89(3):71–102. https://doi.org/10.1002/fld.4683MathSciNetCrossRef

15.

Filippone S, Cardellini V, Barbieri D, Fanfarillo A (2017) Sparse matrix-vector multiplication on GPGPUs. ACM Trans Math Softw TOMS 43(4):30MathSciNetMATH

16.

Fu Z, Lewis TJ, Kirby RM, Whitaker RT (2014) Architecting the finite element method pipeline for the GPU. J Comput Appl Math 257:195–211. https://doi.org/10.1016/j.cam.2013.09.001MathSciNetCrossRefMATH

17.

Göddeke D (2011) Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters. Logos Verlag Berlin GmbH

18.

Hughes TJR, Levit I, Winget J (1983) An element-by-element solution algorithm for problems of structural and solid mechanics. Comput Methods Appl Mech Eng 36(2):241–254. https://doi.org/10.1016/0045-7825(83)90115-9CrossRefMATH

19.

Joldes GR, Wittek A, Miller K (2010) Real-time nonlinear finite element computations on GPU-application to neurosurgical simulation. Comput Methods Appl Mech Eng 199(49–52):3305–3314CrossRef

20.

Kiran U, Sharma D, Gautam SS (2019) GPU-warp based finite element matrices generation and assembly using coloring method. J Comput Des Eng 6(4):705–718. https://doi.org/10.1016/j.jcde.2018.11.001CrossRef

21.

Kiss I, Gyimothy S, Badics Z, Pavo J (2012) Parallel realization of the element-by-element FEM technique by CUDA. Magn IEEE Trans 48(2):507–510CrossRef

22.

Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J Parallel Distrib Comput 69(5):451–460CrossRef

23.

Kronbichler M, Kormann K (2019) Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Trans Math Softw. https://doi.org/10.1145/3325864MathSciNetCrossRefMATH

24.

Li R, Saad Y (2013) GPU-accelerated preconditioned iterative linear solvers. J Supercomput 63(2):443–466CrossRef

25.

Macioł P, Płaszewski P, Banaś K (2010) 3D finite element numerical integration on GPUs. Proc Comput Sci 1(1):1093–1100CrossRef

26.

Markall G, Slemmer A, Ham D, Kelly P, Cantwell C, Sherwin S (2013) Finite element assembly strategies on multi-core and many-core architectures. Int J Numer Methods Fluids 71(1):80–97MathSciNetCrossRef

27.

Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18CrossRef

28.

Martínez-Frutos J, Herrero-Pérez D (2015) Efficient matrix-free GPU implementation of fixed grid finite element analysis. Finite Elem Anal Des 104:61–71. https://doi.org/10.1016/j.finel.2015.06.005CrossRef

29.

Müller E, Guo X, Scheichl R, Shi S (2013) Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs. Comput Vis Sci 16(2):41–58. https://doi.org/10.1007/s00791-014-0223-xMathSciNetCrossRefMATH

30.

Nath R, Tomov S, Dong TT, Dongarra J (2011) Optimizing symmetric dense matrix-vector multiplication on GPUs. In: Proceedings of 2011 international conference for high performance computing, networking, storage and analysis. ACM, New York, NY, USA, SC ’11, pp 6:1–6:10. https://doi.org/10.1145/2063384.2063392

31.

Ohshima S, Hayashi M, Katagiri T, Nakajima K (2013) Implementation and evaluation of 3D finite element method application for CUDA. In: Daydé M, Marques O, Nakajima K (eds) High performance computing for computational science—VECPAR 2012. Springer, Berlin, Heidelberg, pp 140–148CrossRef

32.

Pikle NK, Sathe SR, Vyavahare AY (2018) High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy. Computing 100(12):1273–1297. https://doi.org/10.1007/s00607-018-0613-xMathSciNetCrossRef

33.

Ram L, Sharma D (2017) Evolutionary and GPU computing for topology optimization of structures. Swarm Evolut Comput 35:1–13CrossRef

34.

Reguly I, Giles M (2013) Finite element algorithms and data structures on graphical processing units. Int J Parallel Progr 43(2):203–239CrossRef

35.

Rupp K, Weinbub J, Jüngel A, Grasser T (2016) Pipelined iterative solvers with kernel fusion for graphics processing units. ACM Trans Math Softw TOMS 43(2):11:1–11:27. https://doi.org/10.1145/2907944MathSciNetCrossRefMATH

36.

Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia. https://doi.org/10.1137/1.9780898718003CrossRefMATH

37.

Sanfui S, Sharma D (2017) A two-kernel based strategy for performing assembly in FEA on the graphics processing unit. In: 2017 international conference on advances in mechanical, industrial, automation and management systems (AMIAMS), IEEE, pp 1–9

38.

Sanfui S, Sharma D (2019) Exploiting symmetry in elemental computation and assembly stage of GPU-accelerated FEA. In: Proceedings at the 10th international conference on computational methods (ICCM2019). ScienTech Publisher, pp 641–651

39.

Sanfui S, Sharma D (2020) A three-stage gpu-based fea matrix generation strategy for unstructured meshes. International Journal of Numerical Methods in Engineering. (in press). https://doi.org/10.1002/nme.6383

40.

Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Tech. Rep, Pittsburgh

41.

Tezduyar T, Aliabadi S, Behr M, Mittal S (1994) Massively parallel finite element simulation of compressible and incompressible flows. Comput Methods Appl Mech Eng 119(1):157–177. https://doi.org/10.1016/0045-7825(94)00082-4CrossRefMATH

42.

Top500 Supercomputers (2019). https://www.top500.org. Accessed 2 Jan 2020

43.

van Rietbergen B, Weinans H, Huiskes R, Polman B (1996) Computational strategies for iterative solutions of large FEM applications employing voxel data. Int J Numer Methods Eng 39(16):2743–2767CrossRef

44.

Wong J, Kuhl E, Darve E (2015) A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems. Int J Numer Methods Eng 102(12):1784–1814. https://doi.org/10.1002/nme.4865MathSciNetCrossRefMATH

45.

Yagawa G, Soneda N, Yoshimura S (1991) A large scale finite element analysis using domain decomposition method on a parallel computer. Comput Struct 38(5):615–625. https://doi.org/10.1016/0045-7949(91)90013-CCrossRefMATH

46.

Zhang J, Shen D (2013) GPU-based implementation of finite element method for elasticity using CUDA. In: 2013 IEEE 10th international conference on high performance computing and communications, 2013 IEEE international conference on embedded and ubiquitous computing, pp 1003–1008. https://doi.org/10.1109/HPCC.and.EUC.2013.142

Titel: GPU-based matrix-free finite element solver exploiting symmetry of elemental matrices
verfasst von: Utpal Kiran
Sachin Singh Gautam
Deepak Sharma
Publikationsdatum: 24.06.2020
Verlag: Springer Vienna
Erschienen in: Computing / Ausgabe 9/2020
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI: https://doi.org/10.1007/s00607-020-00827-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 9/2020

Comparison of analytical and ML-based models for predicting CPU–GPU data transfer time

A multi-objective load balancing algorithm for virtual machine placement in cloud data centers based on machine learning

An energy-efficient load distribution framework for SDN controllers

A system for effectively predicting flight delays based on IoT data

Failure prediction of tasks in the cloud at an earlier stage: a solution based on domain information mining

Uniform reliable broadcast in anonymous distributed systems with fair lossy channels

Premium Partner