Skip to main content
Erschienen in: Computing 12/2018

31.03.2018

High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy

verfasst von: Nileshchandra K. Pikle, Shailesh R. Sathe, Arvind Y. Vyavahare

Erschienen in: Computing | Ausgabe 12/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Matrix-vector products (MvPs) are computed either at element level or Degree-of-freedom (DoF) level in assembly-free Finite Element Method. The MvPs are mapped on GPU at element level or DoF level on per thread basis. Both strategies exploit the computing power of the GPU with cogent improvement in performance. However, these strategies suffer from poor global memory load/store efficiency. This paper proposes an efficient implementation of DoF based MvPs strategy using faster on-chip shared memory to store elemental matrices on GPU. Since the GPU has smaller shared memory size, MvPs are carried out iteratively in chunks to alleviate the poor occupancy issue. Performance of the iterative method is improved by two factors, first by coalesced access to global memory and second by improving the occupancy. Numerical experiments have shown that proposed iterative method outperforms the DoF based strategy approximately by factor 3.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Theis TN, Wong HSP (2017) The end of Moore’s law: a new beginning for information technology. Comput Sci Eng 19(2):41–50CrossRef Theis TN, Wong HSP (2017) The end of Moore’s law: a new beginning for information technology. Comput Sci Eng 19(2):41–50CrossRef
2.
Zurück zum Zitat Nickolls J, Kirk D (2009) Graphics and computing GPUs Computer Organization and Design (DA Patterson and JL Hennessy). The Hardware/Software Interface Edition 4, San Francisco. CA, Morgan Kaufmann, Appendix A, pp A1–A77 Nickolls J, Kirk D (2009) Graphics and computing GPUs Computer Organization and Design (DA Patterson and JL Hennessy). The Hardware/Software Interface Edition 4, San Francisco. CA, Morgan Kaufmann, Appendix A, pp A1–A77
3.
Zurück zum Zitat Comas O, Taylor Z A, Allard J, Ourselin S, Cotin S and Passenger J (2008) Efficient nonlinear FEM for soft tissue modelling and its GPU implementation within the open source framework SOFA. In: International symposium on biomedical simulation. Springer, Berlin, pp 28–39 Comas O, Taylor Z A, Allard J, Ourselin S, Cotin S and Passenger J (2008) Efficient nonlinear FEM for soft tissue modelling and its GPU implementation within the open source framework SOFA. In: International symposium on biomedical simulation. Springer, Berlin, pp 28–39
4.
Zurück zum Zitat Georgescu S, Chow P, Okuda H (2013) GPU acceleration for FEM-based structural analysis. Arch Comput Methods Eng 20(2):111–121MathSciNetCrossRef Georgescu S, Chow P, Okuda H (2013) GPU acceleration for FEM-based structural analysis. Arch Comput Methods Eng 20(2):111–121MathSciNetCrossRef
5.
Zurück zum Zitat Bathe KJ (2008) Finite element method. Wiley Online Library, New YorkCrossRef Bathe KJ (2008) Finite element method. Wiley Online Library, New YorkCrossRef
6.
Zurück zum Zitat Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst H (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRef Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, Van der Vorst H (1994) Templates for the solution of linear systems: building blocks for iterative methods, 2nd edn. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRef
7.
Zurück zum Zitat Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMU-CS-94-125. School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain. Technical Report CMU-CS-94-125. School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania
8.
Zurück zum Zitat Macioł P, Płaszewski P, Banaś K (2010) 3D finite element numerical integration on GPUs. Procedia Comput Sci 1(1):1093–1100CrossRef Macioł P, Płaszewski P, Banaś K (2010) 3D finite element numerical integration on GPUs. Procedia Comput Sci 1(1):1093–1100CrossRef
9.
Zurück zum Zitat Płaszewski P, Banaś K, MaciołP (2010) Higher order FEM numerical integration on GPUs with OpenCL. In: Proceedings of the 2010 international multiconference on computer science and information technology (IMCSIT), IEEE, pp 337–342 Płaszewski P, Banaś K, MaciołP (2010) Higher order FEM numerical integration on GPUs with OpenCL. In: Proceedings of the 2010 international multiconference on computer science and information technology (IMCSIT), IEEE, pp 337–342
10.
Zurück zum Zitat Banaś K, Płaszewski P, Macoił P (2014) Numerical integration on GPUs for higher order finite elements. Comput Math Appl 67(6):1319–1344MathSciNetCrossRef Banaś K, Płaszewski P, Macoił P (2014) Numerical integration on GPUs for higher order finite elements. Comput Math Appl 67(6):1319–1344MathSciNetCrossRef
11.
Zurück zum Zitat Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order Fnite element earthquake modeling application to nvidia graphics cards using cuda. J Parallel Distrib Comput 69(5):451–460CrossRef Komatitsch D, Michéa D, Erlebacher G (2009) Porting a high-order Fnite element earthquake modeling application to nvidia graphics cards using cuda. J Parallel Distrib Comput 69(5):451–460CrossRef
12.
Zurück zum Zitat Dziekonski A, Sypek P, Lamecki A, Mrozowski M (2012) Finite element matrix generation on a GPU. Prog Electromagn Res 128:249–265CrossRef Dziekonski A, Sypek P, Lamecki A, Mrozowski M (2012) Finite element matrix generation on a GPU. Prog Electromagn Res 128:249–265CrossRef
13.
Zurück zum Zitat Woźniak M (2015) Fast GPU integration algorithm for isogeometric finite element method solvers using task dependency graphs. J Comput Sci 11:145–52MathSciNetCrossRef Woźniak M (2015) Fast GPU integration algorithm for isogeometric finite element method solvers using task dependency graphs. J Comput Sci 11:145–52MathSciNetCrossRef
14.
Zurück zum Zitat Dziekonski A, Sypek P, Lamecki A, Mrozowski M (2013) Generation of large finite element matrices on multiple graphics processors. Int J Numer Methods Eng 94(2):204–20MathSciNetCrossRef Dziekonski A, Sypek P, Lamecki A, Mrozowski M (2013) Generation of large finite element matrices on multiple graphics processors. Int J Numer Methods Eng 94(2):204–20MathSciNetCrossRef
15.
Zurück zum Zitat Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–69CrossRef Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–69CrossRef
16.
Zurück zum Zitat Markall GR, Slemmer A, Ham DA, Kelly PH, Cantwell CD, Sherwin SJ (2013) Finite element assembly strategies on multi core and many core architectures. Int J Numer Methods Fluids 71(1):80–97MathSciNetCrossRef Markall GR, Slemmer A, Ham DA, Kelly PH, Cantwell CD, Sherwin SJ (2013) Finite element assembly strategies on multi core and many core architectures. Int J Numer Methods Fluids 71(1):80–97MathSciNetCrossRef
17.
Zurück zum Zitat Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-gpu platform. In: 2010 18th Euromicro international conference parallel, distributed and network-based processing (PDP), pp 583–592 Ament M, Knittel G, Weiskopf D, Strasser W (2010) A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-gpu platform. In: 2010 18th Euromicro international conference parallel, distributed and network-based processing (PDP), pp 583–592
18.
Zurück zum Zitat Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–90MathSciNetCrossRef Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–90MathSciNetCrossRef
19.
Zurück zum Zitat Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: International conference on computational science. Springer, Berlin, pp 893–903 Cevahir A, Nukada A, Matsuoka S (2009) Fast conjugate gradients with multiple GPUs. In: International conference on computational science. Springer, Berlin, pp 893–903
20.
Zurück zum Zitat Fialko SY, Zeglen F (2016) Preconditioned conjugate gradient method for solution of large finite element problems on CPU and GPU. J Telecommun Inf Technol nr–2:26–33 Fialko SY, Zeglen F (2016) Preconditioned conjugate gradient method for solution of large finite element problems on CPU and GPU. J Telecommun Inf Technol nr–2:26–33
21.
Zurück zum Zitat Nvidia CUDA (2014) Cusparse library. NVIDIA Corporation, Santa Clara Nvidia CUDA (2014) Cusparse library. NVIDIA Corporation, Santa Clara
22.
Zurück zum Zitat Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. Nvidia Technical Report NVR-2008-004, Nvidia Corporation Bell N, Garland M (2008) Efficient sparse matrix-vector multiplication on CUDA. Nvidia Technical Report NVR-2008-004, Nvidia Corporation
23.
Zurück zum Zitat Vázquez F, Fernández JJ, Garzón EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput Pract Exp 23(8):815–26CrossRef Vázquez F, Fernández JJ, Garzón EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput Pract Exp 23(8):815–26CrossRef
24.
Zurück zum Zitat Dehnavi MM, Fernández DM, Giannacopoulos D (2010) Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Trans Magn 46(8):2982–5CrossRef Dehnavi MM, Fernández DM, Giannacopoulos D (2010) Finite-element sparse matrix vector multiplication on graphic processing units. IEEE Trans Magn 46(8):2982–5CrossRef
25.
Zurück zum Zitat He G, Gao J (2016) A novel CSR-based sparse matrix-vector multiplication on GPUs. Math Probl Eng 2016:1–12MathSciNet He G, Gao J (2016) A novel CSR-based sparse matrix-vector multiplication on GPUs. Math Probl Eng 2016:1–12MathSciNet
26.
Zurück zum Zitat Nvidia CUDA Cublas library (2008) NVIDIA Corporation, Santa Clara. California 15:27 Nvidia CUDA Cublas library (2008) NVIDIA Corporation, Santa Clara. California 15:27
27.
Zurück zum Zitat Bell N, Hoberock J (2011) Thrust: a productivity-oriented library for CUDA. GPU Comput GEMS Jade Ed 2:359–71 Bell N, Hoberock J (2011) Thrust: a productivity-oriented library for CUDA. GPU Comput GEMS Jade Ed 2:359–71
28.
Zurück zum Zitat Kiss I, Gyimothy S, Badics Z, Pavo J (2012) Parallel realization of the element-by-element FEM technique by CUDA. IEEE Trans Magn 48(2):507–10CrossRef Kiss I, Gyimothy S, Badics Z, Pavo J (2012) Parallel realization of the element-by-element FEM technique by CUDA. IEEE Trans Magn 48(2):507–10CrossRef
29.
Zurück zum Zitat Fernández DM, Dehnavi MM, Gross WJ, Giannacopoulos D (2012) Alternate parallel processing approach for FEM. IEEE Trans Magn 48(2):399–402CrossRef Fernández DM, Dehnavi MM, Gross WJ, Giannacopoulos D (2012) Alternate parallel processing approach for FEM. IEEE Trans Magn 48(2):399–402CrossRef
30.
Zurück zum Zitat Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18CrossRef Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2015) Fine-grained GPU implementation of assembly-free iterative solver for finite element problems. Comput Struct 157:9–18CrossRef
31.
Zurück zum Zitat Garcia-Ruiz MJ, Steven GP (1999) Fixed grid finite elements in elasticity problems. Eng Comput 16(2):145–164CrossRef Garcia-Ruiz MJ, Steven GP (1999) Fixed grid finite elements in elasticity problems. Eng Comput 16(2):145–164CrossRef
32.
Zurück zum Zitat Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2017) Efficient topology optimization using GPU computing with multilevel granularity. Adv Eng Softw 106:47–62CrossRef Martínez-Frutos J, Martínez-Castejón PJ, Herrero-Pérez D (2017) Efficient topology optimization using GPU computing with multilevel granularity. Adv Eng Softw 106:47–62CrossRef
33.
Zurück zum Zitat Dick C, Georgii J, Westermann R (2011) A real-time multigrid finite hexahedra method for elasticity simulation using CUDA. Simul Modell Pract Theory 19(2):801–16CrossRef Dick C, Georgii J, Westermann R (2011) A real-time multigrid finite hexahedra method for elasticity simulation using CUDA. Simul Modell Pract Theory 19(2):801–16CrossRef
34.
Zurück zum Zitat Martínez-Frutos J, Herrero-Pérez D (2015) Efficient matrix-free GPU implementation of fixed grid finite element analysis. Finite Elem Anal Des 104:61–71CrossRef Martínez-Frutos J, Herrero-Pérez D (2015) Efficient matrix-free GPU implementation of fixed grid finite element analysis. Finite Elem Anal Des 104:61–71CrossRef
35.
Zurück zum Zitat Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–3590MathSciNetCrossRef Helfenstein R, Koko J (2012) Parallel preconditioned conjugate gradient algorithm on GPU. J Comput Appl Math 236(15):3584–3590MathSciNetCrossRef
36.
Zurück zum Zitat Volkov V (2010) Better performance at lower occupancy. In: Proceedings of the GPU technology conference, GTC 2010 10:16 Volkov V (2010) Better performance at lower occupancy. In: Proceedings of the GPU technology conference, GTC 2010 10:16
Metadaten
Titel
High performance iterative elemental product strategy in assembly-free FEM on GPU with improved occupancy
verfasst von
Nileshchandra K. Pikle
Shailesh R. Sathe
Arvind Y. Vyavahare
Publikationsdatum
31.03.2018
Verlag
Springer Vienna
Erschienen in
Computing / Ausgabe 12/2018
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-018-0613-x

Premium Partner