Top

Published in:

2021 | OriginalPaper | Chapter

Memory Access Optimization of High-Order CFD Stencil Computations on GPU

Authors : Shengxiang Wang, Zhuoqian Li, Yonggang Che

Published in: Parallel and Distributed Computing, Applications and Technologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Stencils computations are a class of computations commonly found in scientific and engineering applications. They have relatively lower arithmetic intensity. Therefore, their performance is greatly affected by memory access. This paper studies the issue of memory access optimization for the key stencil computations of a high-order CFD program on the NVidia GPU. Two methods are used to optimize the performance. First, we use registers to cache the data used by the stencil computations in the kernel. We use the CUDA warp shuffle functions to exchange data between neighboring grid points, and adjust the thread computation granularity to increase the data reuse. Second, we use the shared memory to buffer the grid data used by the stencil computations in the kernel, and utilize loop tiling to reduce redundant accesses to the global memory. Performance evaluation is done on an NVidia Tesla K80 GPU. The results show that compared to the original implementation that only uses the global memory, the optimized implementation that utilizes the registers achieves a maximum speedup of 2.59 and 2.79 relatively for 15M and 60M grids, and the optimized implementation that utilizes the shared memory achieves a maximum speedup of 3.51 and 3.36 relatively for 15M and 60M grids.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Novel Distributed Reinforcement Learning Method for Classical Chinese Poetry Generation

next chapter The Dataflow Runtime Environment of DFC

Tabik, S., Peemen, M., Nicol, G., Corporaal, H.: Demystifying the 16 x 16 thread-block for stencils on the GPU. Concurr. Comput. Pract. Exp. 27(18), 5557–5573 (2015)CrossRef

Peng, G., Liang, Y., Zhang, Y., Shan, H.: Parallel stencil algorithm based on tessellating. J. Front. Comput. Sci. Technol. 13(2), 181–194 (2019)

Yang, X., Liao, X., et al.: TH-1: China’s first petaflop super-computer. Front. Comput. Sci. China 4(4), 445–455 (2010)CrossRef

Xu, C., Deng, X., et al.: Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. J. Comput. Phys. 278, 275–297 (2014)CrossRef

NVIDIA Corp.: CUDA C Programming Guide v11.0, July 2020

Falch, T.L., Elster, A.C.: Register caching for stencil computations on GPUs. In: 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 479–486 (2014)

Holewinski, J., Pouchet, L.-N., Sadayappan, P.: High-performance code generation for stencil computations on GPU architectures. In: ICS 2012, pp. 311–320 (2012)

Svard, M., Carpenter, M.H., Nordstrom, J.: A stable high-order finite difference scheme for the compressible Navier-Stokes equations, far-field boundary conditions. J. Comput. Phys. 225(1), 1020–1038 (2007)MathSciNetCrossRef

Wang, S., Wang, W., Che, Y.: GPU acceleration of a high-order CFD program. In: 4th International Conference on High Performance Compilation, Computing and Communications, Guangzhou, China, pp. 123–128 (2020)

10.

Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: ASPLOS 1991, New York, USA, pp. 63–74 (1991)

11.

Pikle, N.K., Sathe, S.R., Vyavahare, A.Y.: High performance iterative elemental product strategy in assembly-free fem on GPU with improved occupancy. Computing 100(12), 1–25 (2018). https://doi.org/10.1007/s00607-018-0613-xMathSciNetCrossRef

12.

NVIDIA Corp, Kepler Tuning Guide v11.0, July 2020

Title: Memory Access Optimization of High-Order CFD Stencil Computations on GPU
Authors: Shengxiang Wang
Zhuoqian Li
Yonggang Che
Publisher: Springer International Publishing
Book: Parallel and Distributed Computing, Applications and Technologies
Print ISBN: 978-3-030-69243-8

Electronic ISBN: 978-3-030-69244-5

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-69244-5_4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner