Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

Lei, Jiang; Li, Da-li; Zhou, Yun-long; Liu, Wei

doi:10.1007/s40430-019-1793-9

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

Technical Paper
Published: 17 June 2019

Volume 41, article number 290, (2019)
Cite this article

Journal of the Brazilian Society of Mechanical Sciences and Engineering Aims and scope Submit manuscript

Jiang Lei ORCID: orcid.org/0000-0003-0009-6308¹,
Da-li Li¹,
Yun-long Zhou¹ &
…
Wei Liu¹

847 Accesses
12 Citations
Explore all metrics

Abstract

With the increasing requirement of high computational power in computational fluid dynamics (CFD) field, the graphic processing units (GPUs) with great floating-point computing capability play more important roles. This work explores the porting of an Euler solver from central processing units (CPUs) to three different CPU/GPU heterogeneous hardware platforms using MUSCL and NND schemes, and then the computational acceleration of one-dimensional (1D) Riemann problem and two-dimensional (2D) flow past a forward-facing step is investigated. Based on hardware structures, memory models and programming methods, the working manner of heterogeneous systems was firstly introduced in this paper. Subsequently, three different heterogeneous methods employed in the current study were presented in detail, while porting all parts of the solver loop to GPU possessed the best performance among them. Several optimization strategies suitable for the solver were adopted to achieve substantial execution speedups, while using shared memory on GPU was relatively rarely reported in CFD literature. Finally, the simulation of 1D Riemann verified the reliability of the modified codes on GPU, demonstrating strong ability in capturing discontinuities of both schemes. The two cases with their 1D computational domains discretized into 10,000 cells both realized a speedup exceeding 25, compared to that executed on a single-core CPU. In simulation of the 2D step flow, we came to the highest speedups of 260 for MUSCL scheme with 800 × 400 mesh size and 144 for NND scheme with 400 × 200 computational domain, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 18

Accelerating CFD simulation with high order finite difference method on curvilinear coordinates for modern GPU clusters

Article Open access 08 February 2022

Overcoming GPU Memory Capacity Limitations in Hybrid MPI Implementations of CFD

Multi GPU Implementation to Accelerate the CFD Simulation of a 3D Turbo-Machinery Benchmark Using the RapidCFD Library

References

Sun XW, Liu W, Chai ZX (2019) Method investigation for numerical simulation on aero-optical effect based on WCNS-E-5. AIAA J 57(5):2017–2029
Article Google Scholar
Jimenez J (2003) Computing high-Reynolds-number turbulence: will simulations ever replace experiments? J Turbul 4(22):1–14
MathSciNet MATH Google Scholar
Zhang S, Li Q, Zhang L et al (2016) The history of CFD in China. Acta Aerodynamica Sinica 34(2):157–174
Google Scholar
Slotnick J, Khodadoust A, Alonso J et al. (2014) CFD Vision 2030 study: a path to revolutionary computational aerosciences. https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20140003093.pdf
NVIDIA (2018) CUDA C programming guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4atgDRVPb
Jacobsen DA, Thibault JC, Senocak I (2010) An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters. In: Proceedings of the 48th AIAA aerospace sciences meeting including the new horizons forum and aerospace exposition, Orlando, Florida, United States
Jacobsen DA, Senocak I (2011) Scalability of incompressible flow computations on multi-GPU clusters using dual-level and tri-level parallelism. In: Proceedings of the 49th AIAA aerospace sciences meeting including the new horizons forum and aerospace exposition, Orlando, FL, United States
Jacobsen DA, Senocak I (2013) Multi-level parallelism for incompressible flow computations on GPU clusters. Parallel Comput 39(1):1–20
Article MathSciNet Google Scholar
Aissa M, Verstraete T, Vuik C (2017) Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes. Comput Math Appl 74(1):201–217
Article MathSciNet Google Scholar
Xia Y, Lou J, Luo H et al (2015) OpenACC acceleration of an unstructured CFD solver based on a reconstructed discontinuous Galerkin method for compressible flows. Int J Numer Meth Fluids 78(3):123–139
Article MathSciNet Google Scholar
Tsoutsanis P, Antoniadis AF, Jenkins KW (2018) Improvement of the computational performance of a parallel unstructured WENO finite volume CFD code for Implicit Large Eddy Simulation. Comput Fluids 173:157–170
Article MathSciNet Google Scholar
Kampolis IC, Trompoukis XS, Asouti VG et al (2010) CFD-based analysis and two-level aerodynamic optimization on graphics processing units. Comput Methods Appl Mech Eng 199(9–12):712–722
Article MathSciNet Google Scholar
Karantasis KI, Polychronopoulos ED, Ekaterinaris JA (2014) High order accurate simulation of compressible flows on GPU clusters over software distributed shared memory. Comput Fluids 93:18–29
Article MathSciNet Google Scholar
Darian HM, Esfahanian V (2014) Assessment of WENO schemes for multi-dimensional Euler equations using GPU. Int J Numer Meth Fluids 76(12):961–981
Article MathSciNet Google Scholar
Esfahanian V, Baghapour B, Torabzadeh M et al (2014) An efficient GPU implementation of cyclic reduction solver for high-order compressible viscous flow simulations. Comput Fluids 92:160–171
Article MathSciNet Google Scholar
Franco EE, Barrera HM, Lain S (2015) 2D lid-driven cavity flow simulation using GPU-CUDA with a high-order finite difference scheme. J Braz Soc Mech Sci Eng 37(4):1329–1338
Article Google Scholar
Vermeire BC, Witherden FD, Vincent PE (2017) On the utility of GPU accelerated high-order methods for unsteady flow simulations: a comparison with industry-standard tools. J Comput Phys 334:497–521
Article MathSciNet Google Scholar
Parna P, Meyer K, Falconer R (2018) GPU driven finite difference WENO scheme for real time solution of the shallow water equations. Comput Fluids 161:107–120
Article MathSciNet Google Scholar
Deleon R, Senocak I (2012) GPU-accelerated Large-Eddy simulation of turbulent channel flows. In: Proceedings of the 50th AIAA aerospace sciences meeting including the new horizons forum and aerospace exposition, Nashville, Tennessee, United States
DeLeon R, Jacobsen D, Senocak I (2013) Large-eddy simulations of turbulent incompressible flows on GPU clusters. Comput Sci Eng 15(1):26–33
Article Google Scholar
Salvadore F, Bernardini M, Botti M (2013) GPU accelerated flow solver for direct numerical simulation of turbulent flows. J Comput Phys 235:129–142
Article MathSciNet Google Scholar
Khajeh-Saeed A, Blair Perot J (2013) Direct numerical simulation of turbulence using GPU accelerated supercomputers. J Comput Phys 235:241–257
Article MathSciNet Google Scholar
Clay MP, Buaria D, Yeung PK et al (2018) GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5. Comput Phys Commun 228:100–114
Article Google Scholar
Hernandez Perez FE, Mukhadiyev N, Xu X et al (2018) Direct numerical simulations of reacting flows with detailed chemistry using many-core/GPU acceleration. Comput Fluids 173:73–79
Article MathSciNet Google Scholar
Ha S, Park J, You D (2018) A GPU-accelerated semi-implicit fractional-step method for numerical solutions of incompressible Navier–Stokes equations. J Comput Phys 352:246–264
Article MathSciNet Google Scholar
NIVIDA (2016) NVIDIA Tesla P100 Whitepaper. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper-v1.2.pdf
Xu C, Zhang L, Deng X et al (2014) Balancing CPU-GPU collaborative High-order CFD simulations on the Tianhe-1A supercomputer. In: Proceedings of the IEEE 28th international parallel & distributed processing symposium (IPDPS), Phoenix, AZ
Cao W, Xu C, Wang Z et al (2014) CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system. Cluster Comput J Netw Softw Tools Appl 17(2):255–270
Google Scholar
Xu C, Deng X, Zhang L et al (2014) Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. J Comput Phys 278:275–297
Article Google Scholar
Cook S (2013) CUDA programming: a Developer’s guide to parallel computing with GPUs. Elsevier Inc, Singapore
Google Scholar
Amdahl GM (1967) Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the American Federation of Information Processing Societies Conference, Atlantic City, United States, 30(2):483–485
Leer BV (1979) Towards the ultimate conservative difference scheme. V—A second-order sequel to Godunov’s method. J Comput Phys 32(1):101–136
Article Google Scholar
Hanxin Z (1988) Non-oscillatory and non-free-parameter dissipation difference scheme. Acta Aerodynamica Sinica 02:143–165 (in Chinese)
Google Scholar
Zhu X, Phillips E, Spandan V et al (2018) AFiD-GPU: a versatile Navier–Stokes solver for wall-bounded turbulent flows on GPU clusters. Comput Phys Commun 229:199–210
Article Google Scholar
Liu Y, Liu X, Wu E (2006) Real-time 3D fluid simulation on GPU with complex obstacles. Ruan Jian Xue Bao/J Softw 17(3):568–576
MATH Google Scholar
Hashimoto T, Yasuda T, Tanno I et al (2018) Multi-GPU parallel computation of unsteady incompressible flows using kinetically reduced local Navier-Stokes equations. Comput Fluids 167:215–220
Article MathSciNet Google Scholar
Cheng J, Grossman M, Mckercher T (2014) Professional CUDA C Programming. Wiley, Indianapolis
Google Scholar
Toro EF (2009) Riemann solvers and numerical methods for fluid dynamics: a practical introduction, 3rd edn. Verlag, Berlin
Book Google Scholar
Zhang J, Ma Z, Chen H et al (2018) A GPU-accelerated implicit meshless method for compressible flows. J Comput Phys 360:39–56
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was supported by National University of Defense Technology research program ZDYYJCYJ 20140101.

Author information

Authors and Affiliations

College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, Hunan, People’s Republic of China
Jiang Lei, Da-li Li, Yun-long Zhou & Wei Liu

Authors

Jiang Lei
View author publications
You can also search for this author in PubMed Google Scholar
Da-li Li
View author publications
You can also search for this author in PubMed Google Scholar
Yun-long Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiang Lei.

Additional information

Technical Editor: Erick de Moraes Franklin, Ph.D.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lei, J., Li, Dl., Zhou, Yl. et al. Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture. J Braz. Soc. Mech. Sci. Eng. 41, 290 (2019). https://doi.org/10.1007/s40430-019-1793-9

Download citation

Received: 17 February 2019
Accepted: 10 June 2019
Published: 17 June 2019
DOI: https://doi.org/10.1007/s40430-019-1793-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

Abstract

Access this article

Similar content being viewed by others

Accelerating CFD simulation with high order finite difference method on curvilinear coordinates for modern GPU clusters

Overcoming GPU Memory Capacity Limitations in Hybrid MPI Implementations of CFD

Multi GPU Implementation to Accelerate the CFD Simulation of a 3D Turbo-Machinery Benchmark Using the RapidCFD Library

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimization and acceleration of flow simulations for CFD on CPU/GPU architecture

Abstract

Access this article

Similar content being viewed by others

Accelerating CFD simulation with high order finite difference method on curvilinear coordinates for modern GPU clusters

Overcoming GPU Memory Capacity Limitations in Hybrid MPI Implementations of CFD

Multi GPU Implementation to Accelerate the CFD Simulation of a 3D Turbo-Machinery Benchmark Using the RapidCFD Library

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation