nach oben

Computing

Erschienen in:

31.03.2018

High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy

verfasst von: Nileshchandra K. Pikle, Shailesh R. Sathe, Arvind Y. Vyavahare

Erschienen in: Computing | Ausgabe 12/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The Matrix-vector products (MvPs) are computed either at element level or Degree-of-freedom (DoF) level in assembly-free Finite Element Method. The MvPs are mapped on GPU at element level or DoF level on per thread basis. Both strategies exploit the computing power of the GPU with cogent improvement in performance. However, these strategies suffer from poor global memory load/store efficiency. This paper proposes an efficient implementation of DoF based MvPs strategy using faster on-chip shared memory to store elemental matrices on GPU. Since the GPU has smaller shared memory size, MvPs are carried out iteratively in chunks to alleviate the poor occupancy issue. Performance of the iterative method is improved by two factors, first by coalesced access to global memory and second by improving the occupancy. Numerical experiments have shown that proposed iterative method outperforms the DoF based strategy approximately by factor 3.

Vorheriger Artikel Proposing a measurement criterion to evaluate the border problem in localization algorithms in WSNs

Nächster Artikel Reproducibility of scientific workflows execution using cloud-aware provenance (ReCAP)

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Theis TN, Wong HSP (2017) The end of Moore’s law: a new beginning for information technology. Comput Sci Eng 19(2):41–50CrossRef

Nickolls J, Kirk D (2009) Graphics and computing GPUs Computer Organization and Design (DA Patterson and JL Hennessy). The Hardware/Software Interface Edition 4, San Francisco. CA, Morgan Kaufmann, Appendix A, pp A1–A77

Comas O, Taylor Z A, Allard J, Ourselin S, Cotin S and Passenger J (2008) Efficient nonlinear FEM for soft tissue modelling and its GPU implementation within the open source framework SOFA. In: International symposium on biomedical simulation. Springer, Berlin, pp 28–39

Georgescu S, Chow P, Okuda H (2013) GPU acceleration for FEM-based structural analysis. Arch Comput Methods Eng 20(2):111–121MathSciNetCrossRef

Bathe KJ (2008) Finite element method. Wiley Online Library, New YorkCrossRef

Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst H (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRef

Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMU-CS-94-125. School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania

Macioł P, Płaszewski P, Banaś K (2010) 3D finite element numerical integration on GPUs. Procedia Comput Sci 1(1):1093–1100CrossRef

Płaszewski P, Banaś K, MaciołP (2010) Higher order FEM numerical integration on GPUs with OpenCL. In: Proceedings of the 2010 international multiconference on computer science and information technology (IMCSIT), IEEE, pp 337–342

10.

Banaś K, Płaszewski P, Macoił P (2014) Numerical integration on GPUs for higher order finite elements. Comput Math Appl 67(6):1319–1344MathSciNetCrossRef

11.

Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order Fnite element earthquake modeling application to nvidia graphics cards using cuda. J Parallel Distrib Comput 69(5):451–460CrossRef

12.

Dziekonski A, Sypek P, Lamecki A, Mrozowski M (2012) Finite element matrix generation on a GPU. Prog Electromagn Res 128:249–265CrossRef

13.

Woźniak M (2015) Fast GPU integration algorithm for isogeometric finite element method solvers using task dependency graphs. J Comput Sci 11:145–52MathSciNetCrossRef

14.

Dziekonski A, Sypek P, Lamecki A, Mrozowski M (2013) Generation of large finite element matrices on multiple graphics processors. Int J Numer Methods Eng 94(2):204–20MathSciNetCrossRef

15.

Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–69CrossRef

16.

Markall GR, Slemmer A, Ham DA, Kelly PH, Cantwell CD, Sherwin SJ (2013) Finite element assembly strategies on multi core and many core architectures. Int J Numer Methods Fluids 71(1):80–97MathSciNetCrossRef

17.

Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-gpu platform. In: 2010 18th Euromicro international conference parallel, distributed and network-based processing (PDP), pp 583–592

18.

Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–90MathSciNetCrossRef

19.

Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: International conference on computational science. Springer, Berlin, pp 893–903

20.

Fialko SY, Zeglen F (2016) Preconditioned conjugate gradient method for solution of large finite element problems on CPU and GPU. J Telecommun Inf Technol nr–2:26–33

21.

Nvidia CUDA (2014) Cusparse library. NVIDIA Corporation, Santa Clara

22.

Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. Nvidia Technical Report NVR-2008-004, Nvidia Corporation

23.

Vázquez F, Fernández JJ, Garzón EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput Pract Exp 23(8):815–26CrossRef

24.

Dehnavi MM, Fernández DM, Giannacopoulos D (2010) Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Trans Magn 46(8):2982–5CrossRef

25.

He G, Gao J (2016) A novel CSR-based sparse matrix-vector multiplication on GPUs. Math Probl Eng 2016:1–12MathSciNet

26.

Nvidia CUDA Cublas library (2008) NVIDIA Corporation, Santa Clara. California 15:27

27.

Bell N, Hoberock J (2011) Thrust: a productivity-oriented library for CUDA. GPU Comput GEMS Jade Ed 2:359–71

28.

Kiss I, Gyimothy S, Badics Z, Pavo J (2012) Parallel realization of the element-by-element FEM technique by CUDA. IEEE Trans Magn 48(2):507–10CrossRef

29.

Fernández DM, Dehnavi MM, Gross WJ, Giannacopoulos D (2012) Alternate parallel processing approach for FEM. IEEE Trans Magn 48(2):399–402CrossRef

30.

Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18CrossRef

31.

Garcia-Ruiz MJ, Steven GP (1999) Fixed grid finite elements in elasticity problems. Eng Comput 16(2):145–164CrossRef

32.

Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2017) Efficient topology optimization using GPU computing with multilevel granularity. Adv Eng Softw 106:47–62CrossRef

33.

Dick C, Georgii J, Westermann R (2011) A real-time multigrid finite hexahedra method for elasticity simulation using CUDA. Simul Modell Pract Theory 19(2):801–16CrossRef

34.

Martínez-Frutos J, Herrero-Pérez D (2015) Efficient matrix-free GPU implementation of fixed grid finite element analysis. Finite Elem Anal Des 104:61–71CrossRef

35.

Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–3590MathSciNetCrossRef

36.

Volkov V (2010) Better performance at lower occupancy. In: Proceedings of the GPU technology conference, GTC 2010 10:16

Titel: High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy
verfasst von: Nileshchandra K. Pikle
Shailesh R. Sathe
Arvind Y. Vyavahare
Publikationsdatum: 31.03.2018
Verlag: Springer Vienna
Erschienen in: Computing / Ausgabe 12/2018
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI: https://doi.org/10.1007/s00607-018-0613-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Premium Partner