Skip to main content
Top
Published in: The Journal of Supercomputing 5/2024

07-10-2023

Improving CUDA performance of an unstructured high-order CFD application under OP2 framework

Authors: Kangjin Huang, Yonggang Che, Chuanfu Xu, Zhe Dai, Jian Zhang

Published in: The Journal of Supercomputing | Issue 5/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

OP2 is a domain-specific language-based programming framework for unstructured mesh applications. It supports automatic code generation targeting multiple parallel modes, with CUDA included. However, using OP2 to generate efficient CUDA code for real-world applications is a challenging task. This paper reports our efforts optimizing the CUDA code performance when refactoring an unstructured high-order CFD application (namely HOUR2D) based on OP2. A series of novel methods are realized, including utilizing appropriate execution strategies, using local arrays, and optimizing the OP2 data transfer function, etc. Performance evaluation shows that our optimizations significantly improve the performance of the finally generated CUDA code. The overall performance of our optimized OP2-CUDA code is 13.2 times higher than the unoptimized OP2-CUDA code and 2.4 times higher than the manual CUDA code. Meanwhile, these optimizations do not affect the portability of HOUR2D as an OP2 application.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Mullowney P, Li R, Thomas S, Ananthan S, Sharma A, Rood JS, Williams AB, Sprague MA (2021) Preparing an incompressible-flow fluid dynamics code for exascale-class wind energy simulations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–16 Mullowney P, Li R, Thomas S, Ananthan S, Sharma A, Rood JS, Williams AB, Sprague MA (2021) Preparing an incompressible-flow fluid dynamics code for exascale-class wind energy simulations. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–16
2.
go back to reference Liao X-K, Lu K, Yang C-Q, Li J-W, Yuan Y, Lai M-C, Huang L-B, Lu P-J, Fang J-B, Ren J et al (2018) Moving from exascale to zettascale computing: challenges and techniques. Front Inf Technol Electron Eng 19:1236–1244CrossRef Liao X-K, Lu K, Yang C-Q, Li J-W, Yuan Y, Lai M-C, Huang L-B, Lu P-J, Fang J-B, Ren J et al (2018) Moving from exascale to zettascale computing: challenges and techniques. Front Inf Technol Electron Eng 19:1236–1244CrossRef
3.
go back to reference Fang J, Huang C, Tang T, Wang Z (2020) Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Trans High Perform Comput 2:382–400CrossRef Fang J, Huang C, Tang T, Wang Z (2020) Parallel programming models for heterogeneous many-cores: a comprehensive survey. CCF Trans High Perform Comput 2:382–400CrossRef
4.
go back to reference Dai Z, Wang Y, Wang F, Ming L, Zhang J, et al. (2022) Performance optimization and analysis of the unstructured discontinuous galerkin solver on multi-core and many-core architectures. arXiv preprint arXiv:2209.01877 Dai Z, Wang Y, Wang F, Ming L, Zhang J, et al. (2022) Performance optimization and analysis of the unstructured discontinuous galerkin solver on multi-core and many-core architectures. arXiv preprint arXiv:​2209.​01877
5.
go back to reference Mudalige GR, Giles MB, Reguly I, Bertolli C, Kelly PH (2012) Op2: an active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In: 2012 Innovative Parallel Computing (InPar), pp 1–12. IEEE Mudalige GR, Giles MB, Reguly I, Bertolli C, Kelly PH (2012) Op2: an active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures. In: 2012 Innovative Parallel Computing (InPar), pp 1–12. IEEE
6.
go back to reference Reguly IZ, Owenson AM, Powell A, Jarvis SA, Mudalige GR (2021) Under the hood of sycl–an initial performance analysis with an unstructured-mesh cfd application. In: High Performance Computing: 36th International Conference, ISC High Performance 2021, Virtual Event, June 24–July 2, 2021, Proceedings 36, pp 391–410. Springer Reguly IZ, Owenson AM, Powell A, Jarvis SA, Mudalige GR (2021) Under the hood of sycl–an initial performance analysis with an unstructured-mesh cfd application. In: High Performance Computing: 36th International Conference, ISC High Performance 2021, Virtual Event, June 24–July 2, 2021, Proceedings 36, pp 391–410. Springer
7.
go back to reference Mudalige GR, Giles MB, Thiyagalingam J, Reguly IZ, Bertolli C, Kelly PHJ, Trefethen AE (2013) Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems. In: Elsevier B.V., pp 669–692 Mudalige GR, Giles MB, Thiyagalingam J, Reguly IZ, Bertolli C, Kelly PHJ, Trefethen AE (2013) Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems. In: Elsevier B.V., pp 669–692
8.
go back to reference Reguly IZ, László E, Mudalige GR, Giles MB (2014) Vectorizing unstructured mesh computations for many-core architectures. In: Proceedings of Programming Models and Applications on Multicores and Manycores, pp 39–50 Reguly IZ, László E, Mudalige GR, Giles MB (2014) Vectorizing unstructured mesh computations for many-core architectures. In: Proceedings of Programming Models and Applications on Multicores and Manycores, pp 39–50
9.
go back to reference Reguly IZ, Mudalige GR, Giles MB (2015) Design and development of domain specific active libraries with proxy applications. In: 2015 IEEE International Conference on Cluster Computing, pp 738–745. IEEE Reguly IZ, Mudalige GR, Giles MB (2015) Design and development of domain specific active libraries with proxy applications. In: 2015 IEEE International Conference on Cluster Computing, pp 738–745. IEEE
10.
go back to reference Reguly IZ, Mudalige GR, Bertolli C, Giles MB, Betts A, Kelly PH, Radford D (2015) Acceleration of a full-scale industrial cfd application with op2. IEEE Trans Parallel Distrib Syst 27(5):1265–1278CrossRef Reguly IZ, Mudalige GR, Bertolli C, Giles MB, Betts A, Kelly PH, Radford D (2015) Acceleration of a full-scale industrial cfd application with op2. IEEE Trans Parallel Distrib Syst 27(5):1265–1278CrossRef
11.
go back to reference Giles MB, Mudalige GR, Sharif Z, Markall G, Kelly PH (2012) Performance analysis and optimization of the op2 framework on many-core architectures. Comput J 55(2):168–180CrossRef Giles MB, Mudalige GR, Sharif Z, Markall G, Kelly PH (2012) Performance analysis and optimization of the op2 framework on many-core architectures. Comput J 55(2):168–180CrossRef
12.
go back to reference Reguly IZ, Mudalige GR (2020) Modernising an industrial cfd application. In: 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW), pp 191–196. IEEE Reguly IZ, Mudalige GR (2020) Modernising an industrial cfd application. In: 2020 Eighth International Symposium on Computing and Networking Workshops (CANDARW), pp 191–196. IEEE
13.
go back to reference Giles MB, Mudalige GR, Spencer B, Bertolli C, Reguly I (2013) Designing op2 for gpu architectures. J Parallel Distrib Comput 73(11):1451–1460CrossRef Giles MB, Mudalige GR, Spencer B, Bertolli C, Reguly I (2013) Designing op2 for gpu architectures. J Parallel Distrib Comput 73(11):1451–1460CrossRef
Metadata
Title
Improving CUDA performance of an unstructured high-order CFD application under OP2 framework
Authors
Kangjin Huang
Yonggang Che
Chuanfu Xu
Zhe Dai
Jian Zhang
Publication date
07-10-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 5/2024
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05679-1

Other articles of this Issue 5/2024

The Journal of Supercomputing 5/2024 Go to the issue

Premium Partner