nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model

verfasst von : Michel Müller, Takayuki Aoki

Erschienen in: Accelerator Programming Using Directives

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this work we use the GPU porting task for the operative Japanese weather prediction model “ASUCA” as an opportunity to examine productivity issues with OpenACC when applied to structured grid problems. We then propose “Hybrid Fortran”, an approach that combines the advantages of directive based methods (no rewrite of existing code necessary) with that of stencil DSLs (memory layout is abstracted). This gives the ability to define multiple parallelizations with different granularities in the same code. Without compromising on performance, this approach enables a major reduction in the code changes required to achieve a hybrid GPU/CPU parallelization - as demonstrated with our ASUCA implementation using Hybrid Fortran.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel An Example of Porting PETSc Applications to Heterogeneous Platforms with OpenACC

Nächstes Kapitel Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation Using OpenACC

These open-sourced efforts can be found at https://github.com/C2SM-RCM/claw-compiler.

Please refer to https://github.com/muellermichel/Hybrid-Fortran.

Privatization is the main difference: Hybrid Fortran generated OpenMP code uses “firstprivate” as the default policy with an explicit “shared” clause for all arrays used in the kernel.

OpenACC is mainly used for reduction support - Hybrid Fortran does not automatically generate reduction kernels, however it supports the “reduce” clause, which is forwarded to the generated OpenMP or OpenACC kernels.

Thus obviating the need for code duplication and/or deep inlining of call trees.

Please note: While this paper follows American English, the Hybrid Fortran language extension has originally been developed following British English, which becomes apparent in the spelling of “domainDependant” (https://en.oxforddictionaries.com/definition/dependant). We consider support for the American English spelling of this directive as part of our future efforts.

Simple examples of this feature can be found in https://github.com/muellermichel/Hybrid-Fortran/blob/v1.00rc10/examples/demo/source/example.h90.

Reduction kernels are thus not supported with this backend - we use the OpenACC backend selectively for this purpose, see also the discussion in the footnotes to Sect. 2.1.

Hybrid Fortran allows the user to switch between varying backend implementations per routine, such as OpenACC and CUDA Fortran - the user specified information as well as the defaults given by the build system call thus steers this implementation class.

Alternatively, a dependency generator script can be configured as well.

Since the input to this analysis is the closed source ASUCA codebase, full reproducibility cannot be provided in this context. However the intermediate data, the method employed to gather this data as well as a sample input is provided and documented in https://github.com/muellermichel/hybrid-asuca-productivity-evidence/blob/master/asuca_productivity.xlsx.

Please refer to https://github.com/muellermichel/Hybrid-Fortran/blob/v1.00rc10/examples/Overview.md for an overview of the available samples and their results.

Cumming, B., Osuna, C., Gysi, T., Bianco, M., Lapillonne, X., Fuhrer, O., Schulthess, T.C.: A review of the challenges and results of refactoring the community climate code COSMO for hybrid Cray HPC systems. In: Proceedings of Cray User Group (2013)

Douglas, C.C., Hu, J., Kowarschik, M., Rüde, U., Weiß, C.: Cache optimization for structured and unstructured grid multigrid. Electron. Trans. Numer. Anal. 10, 21–40 (2000)MathSciNetMATH

Dursun, H., Nomura, K.I., Wang, W., Kunaseth, M., Peng, L., Seymour, R., Kalia, R.K., Nakano, A., Vashishta, P.: In-core optimization of high-order stencil computations. In: PDPTA, pp. 533–538 (2009)

Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-specific languages and high-level frameworks for high-performance computingCrossRef

Fuhrer, O.: Grid tools: towards a library for hardware oblivious implementation of stencil based codes (2014). http://www.pasc-ch.org/projects/2013-2016/grid-tools. Accessed 13 July 2017

Fuhrer, O., Osuna, C., Lapillonne, X., Gysi, T., Cumming, B., Bianco, M., Arteaga, A., Schulthess, T.C.: Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing Front. Innovations 1(1), 45–62 (2014)

Govett, M., Middlecoff, J., Henderson, T.: Directive-based parallelization of the NIM weather model for GPUs. In: 2014 First Workshop on Accelerator Programming using Directives (WACCPD), pp. 55–61. IEEE (2014)

Govett, M., Rosinski, J., Middlecoff, J., Henderson, T., Lee, J., MacDonald, A., Wang, N., Madden, P., Schramm, J., Duarte, A.: Parallelization and performance of the NIM weather model on CPU, GPU and MIC processors. Bulletin of the American Meteorological Society (2017)

Gysi, T., Hoefler, T.: Integrating STELLA & MODESTO: definition and optimization of complex stencil programs (2017)

10.

Ishida, J., Muroi, C., Kawano, K., Kitamura, Y.: Development of a new nonhydrostatic model ASUCA at JMA. CAS/JSC WGNE Res. Activities Atmos. Oceanic Model. 40, 0511–0512 (2010)

11.

Jumah, N., Kunkel, J., Zängl, G., Yashiro, H., Dubos, T., Meurdesoif, Y.: GGDML: icosahedral models language extensions (2017)

12.

Kwiatkowski, J.: Evaluation of parallel programs by measurement of its granularity. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds.) PPAM 2001. LNCS, vol. 2328, pp. 145–153. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-48086-2_16 CrossRef

13.

Lapillonne, X., Fuhrer, O.: Using compiler directives to port large scientific applications to GPUs: an example from atmospheric science. Parallel Process. Lett. 24(01), 1450003 (2014)MathSciNetCrossRef

14.

Mielikainen, J., Huang, B., Huang, A.: Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme. In: SPIE Sensing Technology + Applications, p. 91240T. International Society for Optics and Photonics (2014)

15.

Müller, M., Aoki, T.: New high performance GPGPU code transformation framework applied to large production weather prediction code (2017, to be published in ACM TOPC)

16.

Norman, M.R., Mametjanov, A., Taylor, M.: Exascale programming approaches for the accelerated model for climate and energy (2017)

17.

Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215–226 (2000)CrossRef

18.

Sakamoto, M., Ishida, J., Kawano, K., Matsubayashi, K., Aranami, K., Hara, T., Kusabiraki, H., Muroi, C., Kitamura, Y.: Development of yin-yang grid global model using a new dynamical core ASUCA (2014)

19.

Sawyer, W., Zaengl, G., Linardakis, L.: Towards a multi-node OpenACC implementation of the ICON model. In: EGU General Assembly Conference Abstracts, vol. 16 (2014)

20.

Shimokawabe, T., Aoki, T., Muroi, C., Ishida, J., Kawano, K., Endo, T., Nukada, A., Maruyama, N., Matsuoka, S.: An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society (2010)

21.

Shimokawabe, T., Aoki, T., Onodera, N.: High-productivity framework on GPU-rich supercomputers for operational weather prediction code ASUCA. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 251–261. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.26

22.

Torres, R., Linardakis, L., Kunkel, J., Ludwig, T.: ICON DSL: A domain-specific language for climate modeling. In: International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO (2013)

23.

Wahib, M., Maruyama, N.: Scalable kernel fusion for memory-bound GPU applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 191–202. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.21

24.

Wicker, L.J., Skamarock, W.C.: Time-splitting methods for elastic models using forward time schemes. Mon. Weather Rev. 130(8), 2088–2097 (2002)CrossRef

Titel: Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model
verfasst von: Michel Müller
Takayuki Aoki
Verlag: Springer International Publishing
Buch: Accelerator Programming Using Directives
Print ISBN: 978-3-319-74895-5

Electronic ISBN: 978-3-319-74896-2

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-74896-2_2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"