Top

Published in:

2015 | OriginalPaper | Chapter

Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems

Authors : G. R. Mudalige, I. Z. Reguly, M. B. Giles, A. C. Mallinson, W. P. Gaudin, J. A. Herdman

Published in: High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper we present research on applying a domain specific high-level abstractions (HLA) development strategy with the aim to “future-proof” a key class of high performance computing (HPC) applications that simulate hydrodynamics computations at AWE plc. We build on an existing high-level abstraction framework, OPS, that is being developed for the solution of multi-block structured mesh-based applications at the University of Oxford. OPS uses an “active library” approach where a single application code written using the OPS API can be transformed into different highly optimized parallel implementations which can then be linked against the appropriate parallel library enabling execution on different back-end hardware platforms. The target application in this work is the CloverLeaf mini-app from Sandia National Laboratory’s Mantevo suite of codes that consists of algorithms of interest from hydrodynamics workloads. Specifically, we present (1) the lessons learnt in re-engineering an industrial representative hydro-dynamics application to utilize the OPS high-level framework and subsequent code generation to obtain a range of parallel implementations, and (2) the performance of the auto-generated OPS versions of CloverLeaf compared to that of the performance of the hand-coded original CloverLeaf implementations on a range of platforms. Benchmarked systems include Intel multi-core CPUs and NVIDIA GPUs, the Archer (Cray XC30) CPU cluster and the Titan (Cray XK7) GPU cluster with different parallelizations (OpenMP, OpenACC, CUDA, OpenCL and MPI). Our results show that the development of parallel HPC applications using a high-level framework such as OPS is no more time consuming nor difficult than writing a one-off parallel program targeting only a single parallel implementation. However the OPS strategy pays off with a highly maintainable single application source, through which multiple parallelizations can be realized, without compromising performance portability on a range of parallel systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A CUDA Implementation of the High Performance Conjugate Gradient Benchmark

next chapter Insight into Application Performance Using Application-Dependent Characteristics

A similar approach is used in the C kernel implementations of the original CloverLeaf application.

On Intel compilers, IEEE_FLAGS=-ipo -fp-model strict -fp-model source -prec-div -prec-sqrt.

The Firedrake Project. http://www.firedrakeproject.org/

Nvidia CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/cuda-samples/#bandwidth-test

Nvidia Tesla Kepler Family Datasheet. http://www.nvidia.com/content/tesla/pdf/NVIDIA-Tesla-Kepler-Family-Datasheet.pdf

The SCALA Programming Language, http://www.scala-lang.org/

The Mantevo Project (2012). http://mantevo.org/

OP2 for Many-Core Platforms (2013). http://www.oerc.ox.ac.uk/research/op2

Archer - UK national high performance computing facility (2014). http://www.archer.ac.uk/

AWE cloverleaf (2014). http://warwick-pcav.github.io/CloverLeaf/

The montblanc project (2014). http://www.montblanc-project.eu/

10.

OPS for Many-Core Platforms (2014). http://www.oerc.ox.ac.uk/projects/ops

11.

Titan Cray XK7 (2014). https://www.olcf.ornl.gov/titan/

12.

Brandvik, T., Pullan, G.: SBLOCK: a framework for efficient stencil-based PDE solvers on multi-core platforms. In: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology, CIT 2010, pp. 1181–1188. IEEE Computer Society, Washington, DC (2010)

13.

Czarnecki, K., Glück, R., Vandevoorde, D., Veldhuizen, T.L.: Generative programming and active libraries. In: Jazayeri, M., Musser, D.R., Loos, R.G.K. (eds.) Dagstuhl Seminar 1998. LNCS, vol. 1766, pp. 25–39. Springer, Heidelberg (2000) CrossRef

14.

DeVito, Z., Joubert, N., Palacios, F., Oakley, S., Medina, M., Barrientos, M., Elsen, E., Ham, F., Aiken, A., Duraisamy, K., Darve, E., Alonso, J., Hanrahan, P.: Liszt: a domain specific language for building portable mesh-based PDE solvers. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011, pp. 9:1–9:12. ACM, New York (2011)

15.

Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990). http://doi.acm.org/10.1145/77626.79170 CrossRefMATH

16.

Gaudin, W., Mallinson, A., Perks, O., Herdman, J., Beckingsale, D., Levesque, J., Jarvis, S.: Optimising hydrodynamics applications for the cray XC30 with the application tool suite. In: The Cray User Group 2014, Lugano, Switzerland, 4–8 May 2014

17.

Herdman, J.A., Gaudin, W.P., McIntosh-Smith, S., Boulton, M., Beckingsale, D.A., Mallinson, A., Jarvis, S.: Accelerating hydrocodes with OpenACC, OpenCL and CUDA. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp. 465–471, November 2012

18.

Howes, L.W., Lokhmotov, A., Donaldson, A.F., Kelly, P.H.J.: Deriving efficient data movement from decoupled access/execute specifications. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds.) HiPEAC 2009. LNCS, vol. 5409, pp. 168–182. Springer, Heidelberg (2009) CrossRef

19.

Lindtjorn, O., Clapp, R., Pell, O., Fu, H., Flynn, M., Fu, H.: Beyond traditional microprocessors for geoscience high-performance computing applications. IEEE Micro 31(2), 41–49 (2011)CrossRef

20.

Mallinson, A., Beckingsale, D., Gaudin, W., Herdman, J., Jarvis, S.: Towards portable performance for explicit hydrodynamics codes. In: International Workshop on OpenCL (IWOCL 2013), Atlanta, USA, May 2013

21.

Markall, G.R., Slemmer, A., Ham, D.A., Kelly, P.H.J., Cantwell, C.D., Sherwin, S.J.: Finite element assembly strategies on multi- and many-core architectures. Int. J. Numer. Meth. Fluids 71, 80–97 (2013). http://dx.doi.org/10.1002/fld.3648 CrossRefMathSciNet

22.

McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995

23.

Mudalige, G.R., Giles, M.B., Thiyagalingam, J., Reguly, I.Z., Bertolli, C., Kelly, P.H.J., Trefethen, A.E.: Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems. Parallel Comput. 39(11), 669–692 (2013)CrossRef

24.

Muranushi, T.: Paraiso: an automated tuning framework for explicit solvers of partial differential equations. Comput. Sci. Discov. 5(1), 015003 (2012)CrossRef

25.

Ølgaard, K.B., Logg, A., Wells, G.N.: Automated Code Generation for Discontinuous Galerkin Methods. CoRR abs/1104.0628 (2011)

26.

Orchard, D.A., Bolingbroke, M., Mycroft, A.: Ypnos: declarative, parallel structured grid programming. In: Proceedings of the 5th ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming, DAMP 2010, pp. 15–24. ACM, New York (2010)

27.

Rathgeber, F., Markall, G.R., Mitchell, L., Loriant, M., Ham, D.A., Bertolli, C., Kelly, P.H.J.: PyOP2: a high-level framework for performance-portable simulations on unstructured meshes. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp. 1116–1123 (2012)

28.

Reguly, I.Z., Mudalige, G.R., Bertolli, C., Giles, M.B., Betts, A., Kelly, P.H.J., Radford, D.: Acceleration of a full-scale industrial CFD application with OP2. ACM Trans. Parallel Comput. (2013, under review). http://arxiv-web3.library.cornell.edu/abs/1403.7209

29.

Sujeeth, A.K., Brown, K.J., Lee, H., Rompf, T., Chafi, H., Odersky, M., Olukotun, K.: Delite: a compiler architecture for performance-oriented embedded domain-specific languages. ACM Trans. Embed. Comput. Syst. (TECS) 13(4s), 134 (2014)

30.

Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.K., Leiserson, C.E.: The pochoir stencil compiler. In: Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2011, pp. 117–128. ACM, New York (2011)

31.

Veldhuizen, T.L., Gannon, D.: Active libraries: rethinking the roles of compilers and libraries. In: Proceedings of the SIAM Workshop on Object Oriented Methods for Inter-operable Scientific and Engineering Computing (OO 1998). SIAM Press (1998)

Title: Performance Analysis of a High-Level Abstractions-Based Hydrocode on Future Computing Systems
Authors: G. R. Mudalige
I. Z. Reguly
M. B. Giles
A. C. Mallinson
W. P. Gaudin
J. A. Herdman
Publisher: Springer International Publishing
Book: High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation
Print ISBN: 978-3-319-17247-7

Electronic ISBN: 978-3-319-17248-4

Copyright Year: 2015
DOI: https://doi.org/10.1007/978-3-319-17248-4_5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"