nach oben

International Journal of Parallel Programming

Erschienen in:

30.07.2018

OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver

verfasst von: Andrea Crivellini, Matteo Franciolini, Alessandro Colombo, Francesco Bassi

Erschienen in: International Journal of Parallel Programming | Ausgabe 5-6/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper aims to report on the open multi-processing (OpenMP) parallel implementation of a fully unstructured high-order discontinuous Galerkin (DG) solver for computational fluid dynamics and computational aeroacoustics applications. Even if the use of OpenMP paradigm is confined to shared memory systems, it has some advantages over the use of the message passing interface (MPI) library, and getting the best of this approach potentially improves the parallel efficiency of codes running on clusters of multi-core nodes. While with MPI the use of a domain decomposition algorithm is almost unavoidable, the OpenMP shared memory context offers several opportunities. Three strategies, here optimised for a DG solver, are presented and compared: the first refers to a customization of a colouring approach, the second mimics an MPI implementation in the OpenMP context, while the third method is somehow half way between the previous two. The numerical tests performed on both inviscid and viscous test cases indicate that, thanks to the compactness of the DG discretization, all the code versions perform quite satisfactory. In particular, the domain decomposition algorithm reaches the highest level of parallel efficiency at low computational loads while the colouring approach excels at larger computational loads and it can be easily implemented within an existing MPI code. Moreover, colouring is very well suited to deal with hardware accelerators, an opportunity given by the OpenMP 4.0 standard. Finally, the performance gain observed in using a hybrid MPI/OpenMP version of the DG code on high performance computing facilities is demonstrated.

Vorheriger Artikel AdaptiveLock: Efficient Hybrid Data Race Detection Based on Real-World Locking Patterns

Nächster Artikel PolyJIT: Polyhedral Optimization Just in Time

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The auto option was introduced in the OpenMP standard only with the release 3.0, it delegates the scheduling decision to the compiler.

de Wiart, C.C., Hillewaert, K.: Development and validation of a massively parallel high-order solver for DNS and LES of industrial flows. In: IDIHOM: Industrialization of High-Order Methods-A Top-Down Approach, pp. 251–292. Springer (2015)

Renac, F., Plata, M.L., Martin, E., Chapelier, J.B., Couaillier, V.: IDIHOM: industrialization of high-order methods—A top-down approach: results of a collaborative research project funded by the European Union, 2010–2014, chapter Aghora: a high-order DG solver for turbulent flow simulations, pp. 315–335. Springer International Publishing, Cham (2015)

Brus, S.R., Wirasaet, D., Westerink, J.J., Dawson, C.: Performance and scalability improvements for discontinuous Galerkin solutions to conservation laws on unstructured grids. J. Sci. Comput. 70(1), 210–242 (2017)MathSciNetMATHCrossRef

Nair, R.D., Choi, H.W., Tufo, H.M.: Computational aspects of a scalable high-order discontinuous Galerkin atmospheric dynamical core. Comput. Fluids 38(2), 309–319 (2009)MathSciNetMATHCrossRef

Reuter, B., Aizinger, V., Köstler, : A multi-platform scaling study for an OpenMP parallelization of a discontinuous Galerkin ocean model. Comput. Fluids 117, 325–335 (2015)MathSciNetMATHCrossRef

Dong, S., Karniadakis, G.E.: Dual-level parallelism for high-order CFD methods. Parallel Comput. 30(1), 1–20 (2004)CrossRef

Chorley, M.J., Walker, D.W.: Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J. Comput. Sci. 1(3), 168–174 (2010)CrossRef

Jin, H., Jespersen, D., Mehrotra, P., Biswas, R., Huang, L., Chapman, B.: High performance computing using MPI and OpenMP on multi-core parallel systems. Parallel Comput. 37(9), 562–575 (2011)CrossRef

Bassi, F., Colombo, F., Crivellini, A., Franciolini, M.: Hybrid OpenMP/MPI parallelization of a high-order Discontinuous Galerkin CFD/CAA solver. In: 7th European Congress on Computational Methods in Applied Sciences and Engineering, ECCOMAS Congress 2016, pp. 7992–8012. National Technical University of Athens (2016)

10.

Crivellini, A., Franciolini, M.: On the implementation of OpenMP and Hybrid MPI/OpenMP parallelization strategies for an explicit DG solver. Adv. Parallel Comput. 32, 527–536 (2018)

11.

Crivellini, A., Bassi, F.: A three-dimensional parallel discontinuous Galerkin solver for acoustic propagation studies. Int. J. Aeroacoust. 2(2), 157–173 (2003)CrossRef

12.

Bassi, F., Crivellini, A., Rebay, S., Savini, M.: Discontinuous Galerkin solution of the Reynolds averaged Navier–Stokes and k-\(\omega \) turbulence model equations. Comput. Fluids 34, 507–540 (2005)MATHCrossRef

13.

Bassi, F., Crivellini, A., Ghidoni, A., Rebay, S.: High-order discontinuous Galerkin discretization of transonic turbulent flows. In: 47th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, Orlando, FL, USA, January 5–8 2009. AIAA (2009)

14.

Bassi, F., Botti, L., Colombo, A., Crivellini, A., Franchina, N., Ghidoni, A., Rebay, S.: Very high-order accurate discontinuous Galerkin computation of transonic turbulent flows on aeronautical configurations. Note Numer. Fluid Mech. Multidiscip. Des. 113, 25–38 (2010)CrossRef

15.

Bassi, F., Crivellini, A., Di Pietro, D.A., Rebay, S.: An artificial compressibility flux for the discontinuous Galerkin solution of the incompressible Navier–Stokes equations. J. Comput. Phys. 218(2), 794–815 (2006)MathSciNetMATHCrossRef

16.

Bassi, F., Crivellini, A., Di Pietro, D.A., Rebay, S.: An implicit high-order discontinuous Galerkin method for steady and unsteady incompressible flows. Comput. Fluids 36(10), 1529–1546 (2007). Special Issue Dedicated to Professor Michele Napolitano on the Occasion of his 60th BirthdayMathSciNetMATHCrossRef

17.

Crivellini, A., D’Alessandro, V., Bassi, F.: A Spalart–Allmaras turbulence model implementation in a discontinuous Galerkin solver for incompressible flows. J. Comput. Phys. 241, 388–415 (2013)MathSciNetMATHCrossRef

18.

Franciolini, M., Crivellini, A., Nigro, A.: On the efficiency of a matrix-free linearly implicit time integration strategy for high-order discontinuous Galerkin solutions of incompressible turbulent flows. Comput. Fluids 159, 276–294 (2017)MathSciNetMATHCrossRef

19.

Hu, F.Q., Atkins, H.L.: Eigensolution analysis of the discontinuous Galerkin method with nonuniform grids: I. one space dimension. J. Comput. Phys. 182(2), 516–545 (2002)MathSciNetMATHCrossRef

20.

Toulopoulos, I., Ekaterinaris, J.A.: High-order discontinuous Galerkin discretizations for computational aeroacoustics in complex domains. AIAA J. 44(3), 502–511 (2006)CrossRef

21.

Bernacki, M., Fezoui, L., Lanteri, S., Piperno, S.: Parallel discontinuous Galerkin unstructured mesh solvers for the calculation of three-dimensional wave propagation problems. Appl. Math. Model. 30(8), 744–763 (2006)MATHCrossRef

22.

Baggag, A., Atkins, H.L., Keyes, D.: Parallel implementation of the discontinuous galerkin method, August 1999. NASA/CR-1999-209546, ICASE Report No. 99-35 (1999)

23.

Bassi, F., Rebay, S., Mariotti, G., Pedinotti, S., Savini, M.: A high-order accurate discontinuous finite element method for inviscid and viscous turbomachinery flows. In: Decuypere, R., Dibelius, G. (eds) Proceedings of the 2nd European Conference on Turbomachinery Fluid Dynamics and Thermodynamics, pp. 99–108, Antwerpen, Belgium, March 5–7 1997. Technologisch Instituut (1997)

24.

Cools, R.: An encyclopædia of cubature formulas. J. Complex. 19, 445–453 (2003)MATHCrossRef

25.

Kennedy, C.A., Carpenter, M.H., Lewis, R.M.: Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations. Appl. Numer. Math. 35(3), 177–219 (2000)MathSciNetMATHCrossRef

26.

Sato, Y., Hino, T., Ohashi, K.: Parallelization of an unstructured Navier–Stokes solver using a multi-color ordering method for OpenMP. Comput. Fluids 88, 496–509 (2013)MathSciNetMATHCrossRef

27.

Komatitsch, D., Michéa, D., Erlebacher, G.: Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J. Parallel Distrib. Comput. 69(5), 451–460 (2009)CrossRef

28.

Karypis, G., Kumar, V.: METIS, a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. Technical Report Version 4.0, University of Minnesota, Department of Computer Science/Army HPC Research Center (1998)

29.

Hoeflinger, J., Alavilli, P., Jackson, T., Kuhn, B.: Producing scalable performance with OpenMP: experiments with two CFD applications. Parallel Comput. 27(4), 391–413 (2001). Parallel computing in aerospaceMATHCrossRef

30.

Hardin, J.C., Ristorcelli, J.R., Tam, C.K.W.: ICASE/LaRC Workshop on Benchmark Problems in Computational Aeroacoustics (CAA). NASA conference publication. National Aeronautics and Space Administration, Langley Research Center (1995)

31.

Tam, C.K.W., Hardin, J.C.: Second computational aeroacoustics (CAA): workshop on benchmark problems. NASA conference publication, NASA (1997)

32.

Crivellini, A.: Assessment of a sponge layer as a non-reflective boundary treatment with highly accurate gust–airfoil interaction results. Int. J. Comput. Fluid Dyn. 30(2), 176–200 (2016)MathSciNetCrossRef

33.

Colombo, A., Crivellini, A.: Assessment of a sponge layer non-reflecting boundary treatment for high-order CAA/CFD computations. Comput. Fluids 140, 478–499 (2016)MathSciNetMATHCrossRef

34.

Mani, A.: Analysis and optimization of numerical sponge layers as a nonreflective boundary treatment. J. Comput. Phys. 231(2), 704–716 (2012)MATHCrossRef

35.

Morris, P.J.: Scattering of sound by a sphere: Category 1: Problems 3 and 4. In: Tam, C.K.W., Hardin, J.C. (eds.) Second Computational Aeroacoustics (CAA) Workshop on Benchmark Problems, 1997. NASA CP 3352 (1997)

36.

Simonaho, S.P., Lähivaara, T., Huttunen, T.: Modeling of acoustic wave propagation in time-domain using the discontinuous Galerkin method—a comparison with measurements. Appl. Acoust. 73(2), 173–183 (2012)CrossRef

37.

5th International Workshop on High–Order CFD Methods. https://how5.cenaero.be/

38.

Gassner, G.J., Beck, A.D.: On the accuracy of high-order discretizations for underresolved turbulence simulations. Theoret. Comput. Fluid Dyn. 27(3–4), 221–237 (2013)CrossRef

39.

Bassi, F., Botti, L., Colombo, A., Crivellini, A., Ghidoni, A., Massa, F.C.: On the development of an implicit high-order discontinuous galerkin method for DNS and implicit LES of turbulent flows. Eur. J. Mech. B/Fluids 55, 367–379 (2016)MathSciNetMATHCrossRef

40.

Van Rees, W.M., Leonard, A., Pullin, D.I., Koumoutsakos, P.: A comparison of vortex and pseudo-spectral methods for the simulation of periodic vortical flows at high reynolds numbers. J. Comput. Phys. 230(8), 2794–2805 (2011)MathSciNetMATHCrossRef

41.

Advanced Micro Devices, Inc. AMD Opteron 6200 series processors, Linux tuning guide, 2012. Downloadable from https://developer.amd.com/wordpress/media/2012/10/51803A_OpteronLinuxTuningGuide_SCREEN.pdf

42.

Advanced Micro Devices, Inc. AMD Opteron 6200/4200 series processors compiler options quick reference guide, 2012. Downloadable from https://developer.amd.com/wordpress/media/2012/10/CompilerOptQuickRef-62004200.pdf (2012)

43.

Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: An empirical study of Intel Xeon Phi”. ArXiv e-prints, 12 (2013)

44.

Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016)CrossRef

45.

Waltz, J., Wohlbier, J.G., Risinger, L.D., Canfield, T.R., Charest, M.R.J., Long, A.R., Morgan, N.R.: Performance analysis of a 3D unstructured mesh hydrodynamics code on multi-core and many-core architectures. Int. J. Numer. Methods Fluids 77(6), 319–333 (2015)CrossRef

46.

Kannan, R., Harrand, V., Lee, M., Przekwas, A.J.: Highly scalable computational algorithms on emerging parallel machine multicore architectures: development and implementation in CFD context. Int. J. Numer. Methods Fluids 73(10), 869–882 (2013)MathSciNet

47.

Kannan, R., Harrand, V., Tan, X.G., Yang, H.Q., Przekwas, A.J.: Highly scalable computational algorithms on emerging parallel machine multicore architectures II: development and implementation in the CSD and FSI contexts. J. Parallel Distrib. Comput. 74(9), 2808–2817 (2014)CrossRef

48.

Altmann, C., Beck, A., Birkefeld, A., Gassner, G., Hindenlang, F., Munz, C.D., Staudenmaier, M.: Discontinuous Galerkin for high performance computational fluid dynamics. In: Nagel, Wolfgang E., Kröner, Dietmar H, Resch, Michael M (eds.) High Performance Computing in Science and Engineering ‘12, pp. 225–238. Springer, Berlin (2013)CrossRef

49.

Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R.M., Ayguade, E, Labarta, J.: Productive cluster programming with OmpSs. In: European Conference on Parallel Processing, pp. 555–566. Springer (2011)

50.

Matheou, George, Evripidou, Paraskevas: Data-driven concurrency for high performance computing. ACM Trans. Architect. Code Optim. (TACO) 14(4), 53 (2017)

Titel: OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver
verfasst von: Andrea Crivellini
Matteo Franciolini
Alessandro Colombo
Francesco Bassi
Publikationsdatum: 30.07.2018
Verlag: Springer US
Erschienen in: International Journal of Parallel Programming / Ausgabe 5-6/2019
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-018-0589-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 5-6/2019

Tracing and Profiling Machine Learning Dataflow Applications on GPU

Editor’s Note: Special Issue on High-Level Languages and Frameworks for High-Performance Computing

Adaptive Thread Scheduling in Chip Multiprocessors

Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime

PolyJIT: Polyhedral Optimization Just in Time

Extensibility and Composability of a Multi-Stencil Domain Specific Framework