Skip to main content

2018 | OriginalPaper | Buchkapitel

Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model

verfasst von : Michel Müller, Takayuki Aoki

Erschienen in: Accelerator Programming Using Directives

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work we use the GPU porting task for the operative Japanese weather prediction model “ASUCA” as an opportunity to examine productivity issues with OpenACC when applied to structured grid problems. We then propose “Hybrid Fortran”, an approach that combines the advantages of directive based methods (no rewrite of existing code necessary) with that of stencil DSLs (memory layout is abstracted). This gives the ability to define multiple parallelizations with different granularities in the same code. Without compromising on performance, this approach enables a major reduction in the code changes required to achieve a hybrid GPU/CPU parallelization - as demonstrated with our ASUCA implementation using Hybrid Fortran.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
These open-sourced efforts can be found at https://​github.​com/​C2SM-RCM/​claw-compiler.
 
3
Privatization is the main difference: Hybrid Fortran generated OpenMP code uses “firstprivate” as the default policy with an explicit “shared” clause for all arrays used in the kernel.
 
4
OpenACC is mainly used for reduction support - Hybrid Fortran does not automatically generate reduction kernels, however it supports the “reduce” clause, which is forwarded to the generated OpenMP or OpenACC kernels.
 
5
Thus obviating the need for code duplication and/or deep inlining of call trees.
 
6
Please note: While this paper follows American English, the Hybrid Fortran language extension has originally been developed following British English, which becomes apparent in the spelling of “domainDependant” (https://​en.​oxforddictionari​es.​com/​definition/​dependant). We consider support for the American English spelling of this directive as part of our future efforts.
 
8
Reduction kernels are thus not supported with this backend - we use the OpenACC backend selectively for this purpose, see also the discussion in the footnotes to Sect. 2.1.
 
10
Hybrid Fortran allows the user to switch between varying backend implementations per routine, such as OpenACC and CUDA Fortran - the user specified information as well as the defaults given by the build system call thus steers this implementation class.
 
11
Alternatively, a dependency generator script can be configured as well.
 
12
Since the input to this analysis is the closed source ASUCA codebase, full reproducibility cannot be provided in this context. However the intermediate data, the method employed to gather this data as well as a sample input is provided and documented in https://​github.​com/​muellermichel/​hybrid-asuca-productivity-evidence/​blob/​master/​asuca_​productivity.​xlsx.
 
13
Please refer to https://​github.​com/​muellermichel/​Hybrid-Fortran/​blob/​v1.​00rc10/​examples/​Overview.​md for an overview of the available samples and their results.
 
Literatur
1.
Zurück zum Zitat Cumming, B., Osuna, C., Gysi, T., Bianco, M., Lapillonne, X., Fuhrer, O., Schulthess, T.C.: A review of the challenges and results of refactoring the community climate code COSMO for hybrid Cray HPC systems. In: Proceedings of Cray User Group (2013) Cumming, B., Osuna, C., Gysi, T., Bianco, M., Lapillonne, X., Fuhrer, O., Schulthess, T.C.: A review of the challenges and results of refactoring the community climate code COSMO for hybrid Cray HPC systems. In: Proceedings of Cray User Group (2013)
2.
Zurück zum Zitat Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiß, C.: Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal. 10, 21–40 (2000)MathSciNetMATH Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiß, C.: Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal. 10, 21–40 (2000)MathSciNetMATH
3.
Zurück zum Zitat Dursun, H., Nomura, K.I., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R.K., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA, pp. 533–538 (2009) Dursun, H., Nomura, K.I., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R.K., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA, pp. 533–538 (2009)
4.
Zurück zum Zitat Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-specific languages and high-level frameworks for high-performance computingCrossRef Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-specific languages and high-level frameworks for high-performance computingCrossRef
6.
Zurück zum Zitat Fuhrer, O., Osuna, C., Lapillonne, X., Gysi, T., Cumming, B., Bianco, M., Arteaga, A., Schulthess, T.C.: Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing Front. Innovations 1(1), 45–62 (2014) Fuhrer, O., Osuna, C., Lapillonne, X., Gysi, T., Cumming, B., Bianco, M., Arteaga, A., Schulthess, T.C.: Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing Front. Innovations 1(1), 45–62 (2014)
7.
Zurück zum Zitat Govett, M., Middlecoff, J., Henderson, T.: Directive-based parallelization of the NIM weather model for GPUs. In: 2014 First Workshop on Accelerator Programming using Directives (WACCPD), pp. 55–61. IEEE (2014) Govett, M., Middlecoff, J., Henderson, T.: Directive-based parallelization of the NIM weather model for GPUs. In: 2014 First Workshop on Accelerator Programming using Directives (WACCPD), pp. 55–61. IEEE (2014)
8.
Zurück zum Zitat Govett, M., Rosinski, J., Middlecoff, J., Henderson, T., Lee, J., MacDonald, A., Wang, N., Madden, P., Schramm, J., Duarte, A.: Parallelization and performance of the NIM weather model on CPU, GPU and MIC processors. Bulletin of the American Meteorological Society (2017) Govett, M., Rosinski, J., Middlecoff, J., Henderson, T., Lee, J., MacDonald, A., Wang, N., Madden, P., Schramm, J., Duarte, A.: Parallelization and performance of the NIM weather model on CPU, GPU and MIC processors. Bulletin of the American Meteorological Society (2017)
9.
Zurück zum Zitat Gysi, T., Hoefler, T.: Integrating STELLA & MODESTO: definition and optimization of complex stencil programs (2017) Gysi, T., Hoefler, T.: Integrating STELLA & MODESTO: definition and optimization of complex stencil programs (2017)
10.
Zurück zum Zitat Ishida, J., Muroi, C., Kawano, K., Kitamura, Y.: Development of a new nonhydrostatic model ASUCA at JMA. CAS/JSC WGNE Res. Activities Atmos. Oceanic Model. 40, 0511–0512 (2010) Ishida, J., Muroi, C., Kawano, K., Kitamura, Y.: Development of a new nonhydrostatic model ASUCA at JMA. CAS/JSC WGNE Res. Activities Atmos. Oceanic Model. 40, 0511–0512 (2010)
11.
Zurück zum Zitat Jumah, N., Kunkel, J., Zängl, G., Yashiro, H., Dubos, T., Meurdesoif, Y.: GGDML: icosahedral models language extensions (2017) Jumah, N., Kunkel, J., Zängl, G., Yashiro, H., Dubos, T., Meurdesoif, Y.: GGDML: icosahedral models language extensions (2017)
13.
Zurück zum Zitat Lapillonne, X., Fuhrer, O.: Using compiler directives to port large scientific applications to GPUs: an example from atmospheric science. Parallel Process. Lett. 24(01), 1450003 (2014)MathSciNetCrossRef Lapillonne, X., Fuhrer, O.: Using compiler directives to port large scientific applications to GPUs: an example from atmospheric science. Parallel Process. Lett. 24(01), 1450003 (2014)MathSciNetCrossRef
14.
Zurück zum Zitat Mielikainen, J., Huang, B., Huang, A.: Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme. In: SPIE Sensing Technology + Applications, p. 91240T. International Society for Optics and Photonics (2014) Mielikainen, J., Huang, B., Huang, A.: Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme. In: SPIE Sensing Technology + Applications, p. 91240T. International Society for Optics and Photonics (2014)
15.
Zurück zum Zitat Müller, M., Aoki, T.: New high performance GPGPU code transformation framework applied to large production weather prediction code (2017, to be published in ACM TOPC) Müller, M., Aoki, T.: New high performance GPGPU code transformation framework applied to large production weather prediction code (2017, to be published in ACM TOPC)
16.
Zurück zum Zitat Norman, M.R., Mametjanov, A., Taylor, M.: Exascale programming approaches for the accelerated model for climate and energy (2017) Norman, M.R., Mametjanov, A., Taylor, M.: Exascale programming approaches for the accelerated model for climate and energy (2017)
17.
Zurück zum Zitat Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215–226 (2000)CrossRef Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215–226 (2000)CrossRef
18.
Zurück zum Zitat Sakamoto, M., Ishida, J., Kawano, K., Matsubayashi, K., Aranami, K., Hara, T., Kusabiraki, H., Muroi, C., Kitamura, Y.: Development of yin-yang grid global model using a new dynamical core ASUCA (2014) Sakamoto, M., Ishida, J., Kawano, K., Matsubayashi, K., Aranami, K., Hara, T., Kusabiraki, H., Muroi, C., Kitamura, Y.: Development of yin-yang grid global model using a new dynamical core ASUCA (2014)
19.
Zurück zum Zitat Sawyer, W., Zaengl, G., Linardakis, L.: Towards a multi-node OpenACC implementation of the ICON model. In: EGU General Assembly Conference Abstracts, vol. 16 (2014) Sawyer, W., Zaengl, G., Linardakis, L.: Towards a multi-node OpenACC implementation of the ICON model. In: EGU General Assembly Conference Abstracts, vol. 16 (2014)
20.
Zurück zum Zitat Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society (2010) Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society (2010)
21.
Zurück zum Zitat Shimokawabe, T., Aoki, T., Onodera, N.: High-productivity framework on GPU-rich supercomputers for operational weather prediction code ASUCA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 251–261. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.26 Shimokawabe, T., Aoki, T., Onodera, N.: High-productivity framework on GPU-rich supercomputers for operational weather prediction code ASUCA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 251–261. IEEE Press, Piscataway (2014). https://​doi.​org/​10.​1109/​SC.​2014.​26
22.
Zurück zum Zitat Torres, R., Linardakis, L., Kunkel, J., Ludwig, T.: ICON DSL: A domain-specific language for climate modeling. In: International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO (2013) Torres, R., Linardakis, L., Kunkel, J., Ludwig, T.: ICON DSL: A domain-specific language for climate modeling. In: International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO (2013)
23.
Zurück zum Zitat Wahib, M., Maruyama, N.: Scalable kernel fusion for memory-bound GPU applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 191–202. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.21 Wahib, M., Maruyama, N.: Scalable kernel fusion for memory-bound GPU applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 191–202. IEEE Press, Piscataway (2014). https://​doi.​org/​10.​1109/​SC.​2014.​21
24.
Zurück zum Zitat Wicker, L.J., Skamarock, W.C.: Time-splitting methods for elastic models using forward time schemes. Mon. Weather Rev. 130(8), 2088–2097 (2002)CrossRef Wicker, L.J., Skamarock, W.C.: Time-splitting methods for elastic models using forward time schemes. Mon. Weather Rev. 130(8), 2088–2097 (2002)CrossRef
Metadaten
Titel
Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model
verfasst von
Michel Müller
Takayuki Aoki
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-74896-2_2