Skip to main content

2017 | OriginalPaper | Buchkapitel

Porting VASP from MPI to MPI+OpenMP [SIMD]

Optimization Strategies, Insights and Feature Proposals

verfasst von : Florian Wende, Martijn Marsman, Zhengji Zhao, Jeongnim Kim

Erschienen in: Scaling OpenMP for Exascale Performance and Portability

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We describe for the VASP application (a widely used electronic structure code written in FORTRAN) the transition from an MPI-only to a hybrid code base leveraging the three relevant levels of parallelism to be addressed when optimizing for an effective execution on modern computer platforms: multiprocessing, multithreading and SIMD vectorization. To achieve code portability, we draw on MPI parallelization together with OpenMP threading and SIMD constructs. Combining the latter can be challenging in complex code bases. Optimization targets are combining multithreading and vectorization in different calling contexts as well as whole function vectorization. In addition to outlining design decisions made throughout the code transformation process, we will demonstrate the effectiveness of the code adaptations using different compilers (GNU, Intel) and target platforms (CPU, Intel Xeon Phi (KNL)).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Benchmarks were done on Cori, a Cray XC40 system at NERSC. It has over 9300 Intel Xeon Phi 7250 (KNL) nodes with 68 CPU cores (272 threads) @1.4 GHz and 96 GB DDR4 main memory per node. In addition, Cori has over 2000 dual-socket 16-core Intel Xeon E5-2698v3 (“Haswell”) nodes, each with 32 CPU cores (64 threads) @2.3 GHz, a 256-bit wide vector unit per CPU core, and 128 GB DDR4 memory. Cori’s nodes are interconnected with Cray’s Aries network with Dragonfly topology. A comprehensive study of the different kinds of parameters and options when building and running VASP on Cori is given in [8].
 
2
At the time of the writing of this paper, we used the GNU compiler gfortran-6.3. This version does not fully support OpenMP 4.5 for Fortran (the same seems to be true for gfortran- 7.1—tested on a local workstation). For remarks on that, see the text below.
 
3
gfortran-6.3 found fault with the !$omp declare simd (foo) directive for subroutine definitions within Fortran modules (not so for functions): it states that foo has been host associated already. Working around by moving subroutines outside the module causes conflicts with variable scoping. We did not implement that workaround, as subroutine vectorization fails only with the GNU compiler, and only in the module context.
 
Literatur
1.
Zurück zum Zitat Kresse, G., Furthmüller, J.: Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996)CrossRef Kresse, G., Furthmüller, J.: Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996)CrossRef
2.
Zurück zum Zitat Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6(1), 15–50 (1996)CrossRef Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6(1), 15–50 (1996)CrossRef
3.
Zurück zum Zitat Marsman, M., Paier, J., Stroppa, A., Kresse, G.: Hybrid functionals applied to extended systems. J. Phys. Condens. Matter 20(6), 064201 (2008)CrossRef Marsman, M., Paier, J., Stroppa, A., Kresse, G.: Hybrid functionals applied to extended systems. J. Phys. Condens. Matter 20(6), 064201 (2008)CrossRef
4.
Zurück zum Zitat Kaltak, M., Klimeš, J., Kresse, G.: Cubic scaling algorithm for the random phase approximation: self-interstitials and vacancies in Si. Phys. Rev. B Condens. Matter Mater. Phys. 90(5), 054115–054115 (2014)CrossRef Kaltak, M., Klimeš, J., Kresse, G.: Cubic scaling algorithm for the random phase approximation: self-interstitials and vacancies in Si. Phys. Rev. B Condens. Matter Mater. Phys. 90(5), 054115–054115 (2014)CrossRef
5.
Zurück zum Zitat Liu, P., Kaltak, M., Klimeš, J., Kresse, G.: Cubic scaling \(GW\): towards fast quasiparticle calculations. Phys. Rev. B: Condens. Matter 94(16), 165109 (2016)CrossRef Liu, P., Kaltak, M., Klimeš, J., Kresse, G.: Cubic scaling \(GW\): towards fast quasiparticle calculations. Phys. Rev. B: Condens. Matter 94(16), 165109 (2016)CrossRef
6.
Zurück zum Zitat Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2), 34–46 (2016)CrossRef Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2), 34–46 (2016)CrossRef
7.
Zurück zum Zitat Kresse, G., Joubert, D.: From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999)CrossRef Kresse, G., Joubert, D.: From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999)CrossRef
8.
Zurück zum Zitat Zhao, Z., Marsman, M., Wende, F., Kim, J.: Performance of hybrid MPI/OpenMP VASP on Cray XC40 based on Intel Knights landing many integrated core architecture. In: CUG Proceedings (2017) Zhao, Z., Marsman, M., Wende, F., Kim, J.: Performance of hybrid MPI/OpenMP VASP on Cray XC40 based on Intel Knights landing many integrated core architecture. In: CUG Proceedings (2017)
9.
Zurück zum Zitat Klemm, M., Duran, A., Tian, X., Saito, H., Caballero, D., Martorell, X.: Extending OpenMP* with vector constructs for modern multicore SIMD architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 59–72. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30961-8_5 CrossRef Klemm, M., Duran, A., Tian, X., Saito, H., Caballero, D., Martorell, X.: Extending OpenMP* with vector constructs for modern multicore SIMD architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 59–72. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-30961-8_​5 CrossRef
12.
Zurück zum Zitat Wende, F., Noack, M., Schütt, T., Sachs, S., Steinke, T.: Application performance on a Cray XC30 evaluation system with Xeon Phi coprocessors at HLRN-III. In: Cray User Group (2015) Wende, F., Noack, M., Schütt, T., Sachs, S., Steinke, T.: Application performance on a Cray XC30 evaluation system with Xeon Phi coprocessors at HLRN-III. In: Cray User Group (2015)
13.
Zurück zum Zitat Wende, F., Noack, M., Steinke, T., Klemm, M., Zitzlsberger, G., Newburn, C.J.: Portable SIMD performance with OpenMP* 4.x compiler directives. In: Euro-Par 2016, Parallel Processing, 22nd International Conference on Parallel and Distributed Computing (2016) Wende, F., Noack, M., Steinke, T., Klemm, M., Zitzlsberger, G., Newburn, C.J.: Portable SIMD performance with OpenMP* 4.x compiler directives. In: Euro-Par 2016, Parallel Processing, 22nd International Conference on Parallel and Distributed Computing (2016)
Metadaten
Titel
Porting VASP from MPI to MPI+OpenMP [SIMD]
verfasst von
Florian Wende
Martijn Marsman
Zhengji Zhao
Jeongnim Kim
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-65578-9_8

Neuer Inhalt