Skip to main content
Erschienen in: The Journal of Supercomputing 12/2020

09.03.2020

clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters

Erschienen in: The Journal of Supercomputing | Ausgabe 12/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Heterogeneous cluster systems consisting of CPUs and different kinds of accelerators have become mainstream in HPC. Programming such systems is a difficult task and requires addressing manifold challenges that stem from the intricate composition of such systems and peculiarities of scientific applications. A broad range of obstacles preventing efficient execution have to be considered and dealt with properly. In this paper, we propose a systematic approach and a framework that is capable of providing comprehensive support for running data-parallel applications in heterogeneous asymmetric clusters. Our implementation provides work partitioning and distribution by ensuring workload balance in the cluster while handling of partitioning-induced communication and synchronization in a transparent way. In our experimental section, we choose 11 representative scientific applications from different domains to evaluate our approach. Experimental results show a strong speedup and workload balance for different cluster configurations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alves A, Rufino J, Pina A, Santos L (2013) clOpenCL—supporting distributed heterogeneous computing in HPC clusters. In: Euro-Par 2012: Parallel Processing Workshops, LNCS, vol 7640. Springer, Berlin, pp 112–122 Alves A, Rufino J, Pina A, Santos L (2013) clOpenCL—supporting distributed heterogeneous computing in HPC clusters. In: Euro-Par 2012: Parallel Processing Workshops, LNCS, vol 7640. Springer, Berlin, pp 112–122
3.
Zurück zum Zitat Aoki R, Oikawa S, Nakamura T, Miki S (2011) Hybrid OpenCL: enhancing OpenCL for distributed processing. In: International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp 149–154 Aoki R, Oikawa S, Nakamura T, Miki S (2011) Hybrid OpenCL: enhancing OpenCL for distributed processing. In: International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp 149–154
4.
Zurück zum Zitat Augonnet Cedric, Thibault Samuel, Namyst Raymond, Wacrenier Pierre-Andre (2011) Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198CrossRef Augonnet Cedric, Thibault Samuel, Namyst Raymond, Wacrenier Pierre-Andre (2011) Starpu: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198CrossRef
5.
Zurück zum Zitat Barak A, Ben-Nun T, Levy E, Shiloh A (2010) A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE Conference on Cluster Computing Workshops and Posters, pp 1–7 Barak A, Ben-Nun T, Levy E, Shiloh A (2010) A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: 2010 IEEE Conference on Cluster Computing Workshops and Posters, pp 1–7
6.
Zurück zum Zitat Beaumont O, Becker BA, DeFlumere A, Eyraud-Dubois L, Lambert T, Lastovetsky A (2019) Recent advances in matrix partitioning for parallel computing on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 30(1):218–229CrossRef Beaumont O, Becker BA, DeFlumere A, Eyraud-Dubois L, Lambert T, Lastovetsky A (2019) Recent advances in matrix partitioning for parallel computing on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 30(1):218–229CrossRef
7.
Zurück zum Zitat Beaumont O, Boudet V, Rastello F, Robert Y (2002) Partitioning a square into rectangles: Np-completeness and approximation algorithms. Algorithmica 34(3):217–239MathSciNetCrossRef Beaumont O, Boudet V, Rastello F, Robert Y (2002) Partitioning a square into rectangles: Np-completeness and approximation algorithms. Algorithmica 34(3):217–239MathSciNetCrossRef
8.
Zurück zum Zitat Coti C, Herault T, Lemarinier P, Pilard L, Rezmerita A, Rodriguezb E, Cappello F (2006) Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI. In: SC’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p 18 Coti C, Herault T, Lemarinier P, Pilard L, Rezmerita A, Rodriguezb E, Cappello F (2006) Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI. In: SC’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p 18
9.
Zurück zum Zitat Daly JT (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Fut Gener Comput Syst 22(3):303–312CrossRef Daly JT (2006) A higher order estimate of the optimum checkpoint interval for restart dumps. Fut Gener Comput Syst 22(3):303–312CrossRef
10.
Zurück zum Zitat Diop T, Gurfinkel S, Anderson J, Jerger NE (2013) Distcl: a framework for the distributed execution of opencl kernels. In: 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp 556–566 Diop T, Gurfinkel S, Anderson J, Jerger NE (2013) Distcl: a framework for the distributed execution of opencl kernels. In: 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems, pp 556–566
11.
Zurück zum Zitat Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: HPCS, pp 224–231 Duato J, Peña AJ, Silla F, Mayo R, Quintana-Ortí ES (2010) rCUDA: reducing the number of GPU-based accelerators in high performance clusters. In: HPCS, pp 224–231
12.
Zurück zum Zitat Eskikaya B, Altilar DT (2012) Distributed OpenCL distributing OpenCL platform on network scale. Int J Comput Appl ACCTHPCA(2):25–30 Eskikaya B, Altilar DT (2012) Distributed OpenCL distributing OpenCL platform on network scale. Int J Comput Appl ACCTHPCA(2):25–30
13.
Zurück zum Zitat Gentzsch V (2001) Sun grid engine: towards creating a compute power grid. In: Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp 35–36 Gentzsch V (2001) Sun grid engine: towards creating a compute power grid. In: Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp 35–36
14.
Zurück zum Zitat Grasso I, Pellegrini S, Cosenza B, Fahringer T (2014) A uniform approach for programming distributed heterogeneous computing systems. J Parallel Distrib Comput 74(12):3228–3239 Domain-specific languages and high-level frameworks for high-performance computingCrossRef Grasso I, Pellegrini S, Cosenza B, Fahringer T (2014) A uniform approach for programming distributed heterogeneous computing systems. J Parallel Distrib Comput 74(12):3228–3239 Domain-specific languages and high-level frameworks for high-performance computingCrossRef
15.
Zurück zum Zitat Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: 2012 Innovative Parallel Computing (InPar), pp 1–10 Grauer-Gray S, Xu L, Searles R, Ayalasomayajula S, Cavazos J (2012) Auto-tuning a high-level language targeted to GPU codes. In: 2012 Innovative Parallel Computing (InPar), pp 1–10
16.
Zurück zum Zitat Grewe D, O’Boyle MFP (2011) A static task partitioning approach for heterogeneous systems using opencl. In: Knoop J (ed) Compiler construction. Springer, Berlin, pp 286–305CrossRef Grewe D, O’Boyle MFP (2011) A static task partitioning approach for heterogeneous systems using opencl. In: Knoop J (ed) Compiler construction. Springer, Berlin, pp 286–305CrossRef
17.
Zurück zum Zitat Gschwandtner P, Durillo JJ, Fahringer T (2014) Multi-objective auto-tuning with insieme: optimization and trade-off analysis for time, energy and resource usage. In: Euro-Par 2014 Parallel Processing, LNCS, vol 8632. Springer, Berlin, pp 87–98 Gschwandtner P, Durillo JJ, Fahringer T (2014) Multi-objective auto-tuning with insieme: optimization and trade-off analysis for time, energy and resource usage. In: Euro-Par 2014 Parallel Processing, LNCS, vol 8632. Springer, Berlin, pp 87–98
18.
Zurück zum Zitat Kegel P, Steuwer M, Gorlatch S (2012) dOpenCL: towards a uniform programming approach for distributed heterogeneous multi-/many-core systems. In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp 174–186 Kegel P, Steuwer M, Gorlatch S (2012) dOpenCL: towards a uniform programming approach for distributed heterogeneous multi-/many-core systems. In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), pp 174–186
20.
Zurück zum Zitat Kim J, Seo S, Lee J, Nah J, Jo G, Lee J (2012) SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In: Proceedings of the 26th ACM International Conference on Supercomputing (ICS). San Servolo Island, Venice, Italy, pp 341–352 Kim J, Seo S, Lee J, Nah J, Jo G, Lee J (2012) SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In: Proceedings of the 26th ACM International Conference on Supercomputing (ICS). San Servolo Island, Venice, Italy, pp 341–352
21.
Zurück zum Zitat Lee J, Samadi M, Mahlke S (2015) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp 355–366 Lee J, Samadi M, Mahlke S (2015) Orchestrating multiple data-parallel kernels on multiple devices. In: 2015 International Conference on Parallel Architecture and Compilation (PACT), pp 355–366
22.
Zurück zum Zitat Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU–GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT’13. IEEE Press, Piscataway, NJ, USA, pp 245–256 Lee J, Samadi M, Park Y, Mahlke S (2013) Transparent CPU–GPU collaboration for data-parallel kernels on heterogeneous systems. In: Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT’13. IEEE Press, Piscataway, NJ, USA, pp 245–256
24.
Zurück zum Zitat Luk CK, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 45–55 Luk CK, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 45–55
30.
Zurück zum Zitat Planas J, Badia RM, Ayguadé E, Labarta J (2013) Self-adaptive OMPSS tasks in heterogeneous environments. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp 138–149 Planas J, Badia RM, Ayguadé E, Labarta J (2013) Self-adaptive OMPSS tasks in heterogeneous environments. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp 138–149
31.
Zurück zum Zitat Raca V, Mehofer E (2015) Device-sensitive framework for handling heterogeneous asymmetric clusters efficiently. In: 26th IEEE International Symposium on Computer Architecture and High Performance Computing. Florianopolis, Brazil, pp 181–188 Raca V, Mehofer E (2015) Device-sensitive framework for handling heterogeneous asymmetric clusters efficiently. In: 26th IEEE International Symposium on Computer Architecture and High Performance Computing. Florianopolis, Brazil, pp 181–188
32.
Zurück zum Zitat Raca V, Mehofer E, Hudec M (2016) Optimal time and energy efficient work distributions in heterogeneous systems. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). Heraklion, Greece, pp 440–447 Raca V, Mehofer E, Hudec M (2016) Optimal time and energy efficient work distributions in heterogeneous systems. In: 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). Heraklion, Greece, pp 440–447
33.
Zurück zum Zitat Rasch A, Bigge J, Wrodarczyk M, Schulze R, Gorlatch S (2019) dOCAL: high-level distributed programming with OpenCL and CUDA. J Supercomput Rasch A, Bigge J, Wrodarczyk M, Schulze R, Gorlatch S (2019) dOCAL: high-level distributed programming with OpenCL and CUDA. J Supercomput
34.
Zurück zum Zitat Shen J, Varbanescu AL, Lu Y, Zou P, Sips H (2016) Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 27(9):2766–2780CrossRef Shen J, Varbanescu AL, Lu Y, Zou P, Sips H (2016) Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 27(9):2766–2780CrossRef
35.
Zurück zum Zitat Shen J, Varbanescu AL, Martorell X, Sips H (2015) Matchmaking applications and partitioning strategies for efficient execution on heterogeneous platforms. In: 2015 44th International Conference on Parallel Processing, pp 560–569 Shen J, Varbanescu AL, Martorell X, Sips H (2015) Matchmaking applications and partitioning strategies for efficient execution on heterogeneous platforms. In: 2015 44th International Conference on Parallel Processing, pp 560–569
36.
Zurück zum Zitat Steuwer M, Kegel P, Gorlatch S (2011) Skelcl—a portable skeleton library for high-level gpu programming. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 1176–1182 Steuwer M, Kegel P, Gorlatch S (2011) Skelcl—a portable skeleton library for high-level gpu programming. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 1176–1182
38.
Zurück zum Zitat Wen Y, Wang Z, O’Boyle MFP (2014) Smart multi-task scheduling for openCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC), pp 1–10 Wen Y, Wang Z, O’Boyle MFP (2014) Smart multi-task scheduling for openCL programs on CPU/GPU heterogeneous platforms. In: 2014 21st International Conference on High Performance Computing (HiPC), pp 1–10
39.
Zurück zum Zitat Yoo AB, Jette MA, Grondona M (2003) Slurm: simple linux utility for resource management. In: Feitelson D, Rudolph L, Schwiegelshohn U (eds) Job scheduling strategies for parallel processing. Springer, Berlin, pp 44–60CrossRef Yoo AB, Jette MA, Grondona M (2003) Slurm: simple linux utility for resource management. In: Feitelson D, Rudolph L, Schwiegelshohn U (eds) Job scheduling strategies for parallel processing. Springer, Berlin, pp 44–60CrossRef
40.
Zurück zum Zitat Young JW (1974) A first order approximation to the optimum checkpoint interval. Commun ACM 17(9):530–531CrossRef Young JW (1974) A first order approximation to the optimum checkpoint interval. Commun ACM 17(9):530–531CrossRef
Metadaten
Titel
clusterCL: comprehensive support for multi-kernel data-parallel applications in heterogeneous asymmetric clusters
Publikationsdatum
09.03.2020
Erschienen in
The Journal of Supercomputing / Ausgabe 12/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03234-w

Weitere Artikel der Ausgabe 12/2020

The Journal of Supercomputing 12/2020 Zur Ausgabe