Skip to main content

2019 | OriginalPaper | Buchkapitel

Fine-Grained MPI+OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks

verfasst von : Jérôme Richard, Guillaume Latu, Julien Bigot, Thierry Gautier

Erschienen in: Euro-Par 2019: Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper demonstrates how OpenMP 4.5 tasks can be used to efficiently overlap computations and MPI communications based on a case-study conducted on multi-core and many-core architectures. It focuses on task granularity, dependencies and priorities, and also identifies some limitations of OpenMP. Results on 64 Skylake nodes show that while 64% of the wall-clock time is spent in MPI communications, 60% of the cores are busy in computations, which is a good result. Indeed, the chosen dataset is small enough to be a challenging case in terms of overlap and thus useful to assess worst-case scenarios in future simulations.
Two key features were identified: by using task priority we improved the performance by 5.7% (mainly due to an improved overlap), and with recursive tasks we shortened the execution time by 9.7%. We also illustrate the need to have access to tools for task tracing and task visualization. These tools allowed a fine understanding and a performance increase for this task-based OpenMP+MPI code.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
See [10] for an advanced tutorial about these points.
 
2
In practice, Poisson-Ampere [7] are solved instead of Poisson. But for sake of clarity, Poisson-Ampere is not detailed here as the algorithm and performance are very close.
 
3
This construct specifies to execute iterations of one or multiple loops in parallel using (independent) tasks. Unless specified by the user, it lets the runtime choose the best granularity and perform a final synchronization.
 
5
The mode cannot be configured by the user on the selected computing machines.
 
6
The latest available versions on the computing machines during the experiments.
 
7
The idle time includes periods where threads are busy waiting for ready tasks to be executed and thread synchronization periods, and the runtime overhead includes scheduling and task submission costs.
 
8
This time could be shortened, if only one could store and resubmit the task graph from one timestep to another such as in [2].
 
9
The management cost of dependencies could also be lowered by the runtime if dedicated studies are done along this line.
 
Literatur
1.
3.
Zurück zum Zitat Bouzat, N., Rozar, F., Latu, G., Roman, J.: A new parallelization scheme for the Hermite interpolation based gyroaverage operator. In: 2017 16th ISPDC (2017) Bouzat, N., Rozar, F., Latu, G., Roman, J.: A new parallelization scheme for the Hermite interpolation based gyroaverage operator. In: 2017 16th ISPDC (2017)
4.
Zurück zum Zitat Bouzat, N., et al.: Targeting realistic geometry in Tokamak code Gysela. ESAIM Proc. Surv. 63, 179–207 (2018)MathSciNetCrossRef Bouzat, N., et al.: Targeting realistic geometry in Tokamak code Gysela. ESAIM Proc. Surv. 63, 179–207 (2018)MathSciNetCrossRef
5.
7.
Zurück zum Zitat Crouseilles, N., Latu, G., Sonnendrücker, E.: Hermite spline interpolationon patches for parallelly solving the Vlasov-Poisson equation. IJAMCS 17(3), 335–349 (2007)MATH Crouseilles, N., Latu, G., Sonnendrücker, E.: Hermite spline interpolationon patches for parallelly solving the Vlasov-Poisson equation. IJAMCS 17(3), 335–349 (2007)MATH
8.
Zurück zum Zitat Diaz, J., Muñoz-Caro, C., Niño, A.: A survey of parallel programming modelsand tools in the multi and many-core era. IEEE TPDS 23(8), 1369–1386 (2012) Diaz, J., Muñoz-Caro, C., Niño, A.: A survey of parallel programming modelsand tools in the multi and many-core era. IEEE TPDS 23(8), 1369–1386 (2012)
14.
Zurück zum Zitat Perez, J.M., Beltran, V., Labarta, J., Ayguadé, E.: Improving the integration of task nesting and dependencies in OpenMP. In: IPDPS 2017. IEEE (2017) Perez, J.M., Beltran, V., Labarta, J., Ayguadé, E.: Improving the integration of task nesting and dependencies in OpenMP. In: IPDPS 2017. IEEE (2017)
15.
Zurück zum Zitat Sala, K., et al.: Improving the interoperability between MPI and task-based programming models. In: Proceedings of EuroMPI 2018, pp. 6:1–6:11. ACM (2018) Sala, K., et al.: Improving the interoperability between MPI and task-based programming models. In: Proceedings of EuroMPI 2018, pp. 6:1–6:11. ACM (2018)
16.
Zurück zum Zitat Song, F., YarKhan, A., Dongarra, J.: Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. In: Proceedings of the Conference on HPC Networking, Storage and Analysis, SC 2009. ACM (2009) Song, F., YarKhan, A., Dongarra, J.: Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. In: Proceedings of the Conference on HPC Networking, Storage and Analysis, SC 2009. ACM (2009)
17.
Zurück zum Zitat Sonnendrücker, E., et al.: The semi-Lagrangian method for the numerical resolution of the Vlasov equation. J. Comput. Phys. 149(2), 201–220 (1999)MathSciNetCrossRef Sonnendrücker, E., et al.: The semi-Lagrangian method for the numerical resolution of the Vlasov equation. J. Comput. Phys. 149(2), 201–220 (1999)MathSciNetCrossRef
Metadaten
Titel
Fine-Grained MPI+OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks
verfasst von
Jérôme Richard
Guillaume Latu
Julien Bigot
Thierry Gautier
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-29400-7_30