Skip to main content
Top

2021 | OriginalPaper | Chapter

Memory Access Optimization of High-Order CFD Stencil Computations on GPU

Authors : Shengxiang Wang, Zhuoqian Li, Yonggang Che

Published in: Parallel and Distributed Computing, Applications and Technologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Stencils computations are a class of computations commonly found in scientific and engineering applications. They have relatively lower arithmetic intensity. Therefore, their performance is greatly affected by memory access. This paper studies the issue of memory access optimization for the key stencil computations of a high-order CFD program on the NVidia GPU. Two methods are used to optimize the performance. First, we use registers to cache the data used by the stencil computations in the kernel. We use the CUDA warp shuffle functions to exchange data between neighboring grid points, and adjust the thread computation granularity to increase the data reuse. Second, we use the shared memory to buffer the grid data used by the stencil computations in the kernel, and utilize loop tiling to reduce redundant accesses to the global memory. Performance evaluation is done on an NVidia Tesla K80 GPU. The results show that compared to the original implementation that only uses the global memory, the optimized implementation that utilizes the registers achieves a maximum speedup of 2.59 and 2.79 relatively for 15M and 60M grids, and the optimized implementation that utilizes the shared memory achieves a maximum speedup of 3.51 and 3.36 relatively for 15M and 60M grids.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Tabik, S., Peemen, M., Nicol, G., Corporaal, H.: Demystifying the 16 x 16 thread-block for stencils on the GPU. Concurr. Comput. Pract. Exp. 27(18), 5557–5573 (2015)CrossRef Tabik, S., Peemen, M., Nicol, G., Corporaal, H.: Demystifying the 16 x 16 thread-block for stencils on the GPU. Concurr. Comput. Pract. Exp. 27(18), 5557–5573 (2015)CrossRef
2.
go back to reference Peng, G., Liang, Y., Zhang, Y., Shan, H.: Parallel stencil algorithm based on tessellating. J. Front. Comput. Sci. Technol. 13(2), 181–194 (2019) Peng, G., Liang, Y., Zhang, Y., Shan, H.: Parallel stencil algorithm based on tessellating. J. Front. Comput. Sci. Technol. 13(2), 181–194 (2019)
3.
go back to reference Yang, X., Liao, X., et al.: TH-1: China’s first petaflop super-computer. Front. Comput. Sci. China 4(4), 445–455 (2010)CrossRef Yang, X., Liao, X., et al.: TH-1: China’s first petaflop super-computer. Front. Comput. Sci. China 4(4), 445–455 (2010)CrossRef
4.
go back to reference Xu, C., Deng, X., et al.: Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. J. Comput. Phys. 278, 275–297 (2014)CrossRef Xu, C., Deng, X., et al.: Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer. J. Comput. Phys. 278, 275–297 (2014)CrossRef
5.
go back to reference NVIDIA Corp.: CUDA C Programming Guide v11.0, July 2020 NVIDIA Corp.: CUDA C Programming Guide v11.0, July 2020
6.
go back to reference Falch, T.L., Elster, A.C.: Register caching for stencil computations on GPUs. In: 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 479–486 (2014) Falch, T.L., Elster, A.C.: Register caching for stencil computations on GPUs. In: 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 479–486 (2014)
7.
go back to reference Holewinski, J., Pouchet, L.-N., Sadayappan, P.: High-performance code generation for stencil computations on GPU architectures. In: ICS 2012, pp. 311–320 (2012) Holewinski, J., Pouchet, L.-N., Sadayappan, P.: High-performance code generation for stencil computations on GPU architectures. In: ICS 2012, pp. 311–320 (2012)
8.
go back to reference Svard, M., Carpenter, M.H., Nordstrom, J.: A stable high-order finite difference scheme for the compressible Navier-Stokes equations, far-field boundary conditions. J. Comput. Phys. 225(1), 1020–1038 (2007)MathSciNetCrossRef Svard, M., Carpenter, M.H., Nordstrom, J.: A stable high-order finite difference scheme for the compressible Navier-Stokes equations, far-field boundary conditions. J. Comput. Phys. 225(1), 1020–1038 (2007)MathSciNetCrossRef
9.
go back to reference Wang, S., Wang, W., Che, Y.: GPU acceleration of a high-order CFD program. In: 4th International Conference on High Performance Compilation, Computing and Communications, Guangzhou, China, pp. 123–128 (2020) Wang, S., Wang, W., Che, Y.: GPU acceleration of a high-order CFD program. In: 4th International Conference on High Performance Compilation, Computing and Communications, Guangzhou, China, pp. 123–128 (2020)
10.
go back to reference Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: ASPLOS 1991, New York, USA, pp. 63–74 (1991) Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: ASPLOS 1991, New York, USA, pp. 63–74 (1991)
12.
go back to reference NVIDIA Corp, Kepler Tuning Guide v11.0, July 2020 NVIDIA Corp, Kepler Tuning Guide v11.0, July 2020
Metadata
Title
Memory Access Optimization of High-Order CFD Stencil Computations on GPU
Authors
Shengxiang Wang
Zhuoqian Li
Yonggang Che
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-69244-5_4

Premium Partner