nach oben

The Journal of Supercomputing

Erschienen in:

09.03.2020

clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters

Erschienen in: The Journal of Supercomputing | Ausgabe 12/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Heterogeneous cluster systems consisting of CPUs and different kinds of accelerators have become mainstream in HPC. Programming such systems is a difficult task and requires addressing manifold challenges that stem from the intricate composition of such systems and peculiarities of scientific applications. A broad range of obstacles preventing efficient execution have to be considered and dealt with properly. In this paper, we propose a systematic approach and a framework that is capable of providing comprehensive support for running data-parallel applications in heterogeneous asymmetric clusters. Our implementation provides work partitioning and distribution by ensuring workload balance in the cluster while handling of partitioning-induced communication and synchronization in a transparent way. In our experimental section, we choose 11 representative scientific applications from different domains to evaluate our approach. Experimental results show a strong speedup and workload balance for different cluster configurations.

Vorheriger Artikel Opportunistic scheduling and resources consolidation system based on a new economic model

Nächster Artikel Efficient data aggregation with node clustering and extreme learning machine for WSN

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Alves A, Rufino J, Pina A, Santos L (2013) clOpenCL—supporting distributed heterogeneous computing in HPC clusters. In: Euro-Par 2012: Parallel Processing Workshops, LNCS, vol 7640. Springer, Berlin, pp 112–122

AMD: AMD accelerated parallel processing SDK. https://developer.amd.com/tools-and-sdks/. Accessed May 2019

Aoki R, Oikawa S, Nakamura T, Miki S (2011) Hybrid OpenCL: enhancing OpenCL for distributed processing. In: International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp 149–154

Augonnet Cedric, Thibault Samuel, Namyst Raymond, Wacrenier Pierre-Andre (2011) Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198CrossRef

Barak A, Ben-Nun T, Levy E, Shiloh A (2010) A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE Conference on Cluster Computing Workshops and Posters, pp 1–7

Beaumont O, Becker BA, DeFlumere A, Eyraud-Dubois L, Lambert T, Lastovetsky A (2019) Recent advances in matrix partitioning for parallel computing on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 30(1):218–229CrossRef

Beaumont O, Boudet V, Rastello F, Robert Y (2002) Partitioning a square into rectangles: Np-completeness and approximation algorithms. Algorithmica 34(3):217–239MathSciNetCrossRef

Coti C, Herault T, Lemarinier P, Pilard L, Rezmerita A, Rodriguezb E, Cappello F (2006) Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI. In: SC’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p 18

Daly JT (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Fut Gener Comput Syst 22(3):303–312CrossRef

10.

Diop T, Gurfinkel S, Anderson J, Jerger NE (2013) Distcl: a framework for the distributed execution of opencl kernels. In: 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp 556–566

11.

Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: HPCS, pp 224–231

12.

Eskikaya B, Altilar DT (2012) Distributed OpenCL distributing OpenCL platform on network scale. Int J Comput Appl ACCTHPCA(2):25–30

13.

Gentzsch V (2001) Sun grid engine: towards creating a compute power grid. In: Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp 35–36

14.

Grasso I, Pellegrini S, Cosenza B, Fahringer T (2014) A uniform approach for programming distributed heterogeneous computing systems. J Parallel Distrib Comput 74(12):3228–3239 Domain-specific languages and high-level frameworks for high-performance computingCrossRef

15.

Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: 2012 Innovative Parallel Computing (InPar), pp 1–10

16.

Grewe D, O’Boyle MFP (2011) A static task partitioning approach for heterogeneous systems using opencl. In: Knoop J (ed) Compiler construction. Springer, Berlin, pp 286–305CrossRef

17.

Gschwandtner P, Durillo JJ, Fahringer T (2014) Multi-objective auto-tuning with insieme: optimization and trade-off analysis for time, energy and resource usage. In: Euro-Par 2014 Parallel Processing, LNCS, vol 8632. Springer, Berlin, pp 87–98

18.

Kegel P, Steuwer M, Gorlatch S (2012) dOpenCL: towards a uniform programming approach for distributed heterogeneous multi-/many-core systems. In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp 174–186

19.

Khronos OpenCL Working Group: The OpenCL C Specification. Version 2.0. http://www.khronos.org/opencl. Accessed May 2019

20.

Kim J, Seo S, Lee J, Nah J, Jo G, Lee J (2012) SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In: Proceedings of the 26th ACM International Conference on Supercomputing (ICS). San Servolo Island, Venice, Italy, pp 341–352

21.

Lee J, Samadi M, Mahlke S (2015) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp 355–366

22.

Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU–GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT’13. IEEE Press, Piscataway, NJ, USA, pp 245–256

23.

Louis-Noel Pouchet: PolyBench/GPU. http://web.cse.ohio-state.edu/~pouchet.2/software/polybench/ (2012). Accessed May 2019

24.

Luk CK, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 45–55

25.

MPI Forum: MPI: a message-passing interface standard, Version 3.0. http://www.mpi-forum.org. Accessed May 2019

26.

nVidia: CUDA zone. https://developer.nvidia.com/cuda-zone. Accessed May 2019

27.

nVidia: OpenCL SDK. https://developer.nvidia.com/opencl. Accessed May 2019

28.

OpenACC: OpenACC Standard. https://www.openacc.org/. Accessed May 2019

29.

OpenMP: OpenMP specification. https://www.openmp.org/. Accessed May 2019

30.

Planas J, Badia RM, Ayguadé E, Labarta J (2013) Self-adaptive OMPSS tasks in heterogeneous environments. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp 138–149

31.

Raca V, Mehofer E (2015) Device-sensitive framework for handling heterogeneous asymmetric clusters efficiently. In: 26th IEEE International Symposium on Computer Architecture and High Performance Computing. Florianopolis, Brazil, pp 181–188

32.

Raca V, Mehofer E, Hudec M (2016) Optimal time and energy efficient work distributions in heterogeneous systems. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). Heraklion, Greece, pp 440–447

33.

Rasch A, Bigge J, Wrodarczyk M, Schulze R, Gorlatch S (2019) dOCAL: high-level distributed programming with OpenCL and CUDA. J Supercomput

34.

Shen J, Varbanescu AL, Lu Y, Zou P, Sips H (2016) Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 27(9):2766–2780CrossRef

35.

Shen J, Varbanescu AL, Martorell X, Sips H (2015) Matchmaking applications and partitioning strategies for efficient execution on heterogeneous platforms. In: 2015 44th International Conference on Parallel Processing, pp 560–569

36.

Steuwer M, Kegel P, Gorlatch S (2011) Skelcl—a portable skeleton library for high-level gpu programming. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 1176–1182

37.

Top500: Top 500 list. https://www.top500.org/lists/2019/06. Accessed July 2019

38.

Wen Y, Wang Z, O’Boyle MFP (2014) Smart multi-task scheduling for openCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC), pp 1–10

39.

Yoo AB, Jette MA, Grondona M (2003) Slurm: simple linux utility for resource management. In: Feitelson D, Rudolph L, Schwiegelshohn U (eds) Job scheduling strategies for parallel processing. Springer, Berlin, pp 44–60CrossRef

40.

Young JW (1974) A first order approximation to the optimum checkpoint interval. Commun ACM 17(9):530–531CrossRef

Titel: clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters
Publikationsdatum: 09.03.2020
Erschienen in: The Journal of Supercomputing / Ausgabe 12/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-020-03234-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 12/2020

Superiorization methodology and perturbation resilience of inertial proximal gradient algorithm with application to signal recovery

Spatiotemporal feature mining algorithm based on multiple minimum supports of pattern growth in Internet of Things

GPU acceleration of Fitch’s parsimony on protein data: from Kepler to Turing

Dynamic scheduler implementation used for load distribution between hardware accelerators (RTL) and software tasks (CPU) in heterogeneous systems

Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters

Secure decentralized peer-to-peer training of deep neural networks based on distributed ledger technology