Skip to main content
Erschienen in: The Journal of Supercomputing 2/2014

01.11.2014

Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures

verfasst von: Angeles Navarro, Antonio Vilches, Francisco Corbera, Rafael Asenjo

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper explores the possibility of efficiently executing a single application using multicores simultaneously with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel_for template to allow its exploitation on heterogeneous architectures. Due to the asymmetry of the computing resources, we propose in this work a dynamic scheduling strategy coupled with an adaptive partitioning scheme that resizes chunks to prevent underutilization and load imbalance of CPUs and GPUs. In this paper we also address the problem of the underutilization of the CPU core where a host thread operates. To solve it, we propose two different approaches: (1) a collaborative host thread strategy, in which the host thread, instead of busy-waiting for the GPU to complete, it carries out useful chunk processing; and (2) a host thread blocking strategy combined with oversubscription, that delegates on the OS the duty of scheduling threads to available CPU cores in order to guarantee that all cores are doing useful work. Using two benchmarks we evaluate the overhead introduced by our scheduling and partitioning algorithms, finding that it is negligible. We also evaluate the efficiency of the strategies proposed finding that allowing oversubscription controlled by the OS can be beneficial under certain scenarios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Augonnet C, Clet-Ortega J, Thibault S, Namyst R (2010). Data-aware task scheduling on multi-accelerator based platforms. In: Parallel and distributed systems (ICPADS) Augonnet C, Clet-Ortega J, Thibault S, Namyst R (2010). Data-aware task scheduling on multi-accelerator based platforms. In: Parallel and distributed systems (ICPADS)
2.
Zurück zum Zitat Augonnet C, Thibault S, Namyst R, Wacrenier P-A (February 2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23:187–198 Augonnet C, Thibault S, Namyst R, Wacrenier P-A (February 2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23:187–198
3.
Zurück zum Zitat Belviranli ME, Bhuyan LN, Gupta R (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim 9(4):57:1–57:20CrossRef Belviranli ME, Bhuyan LN, Gupta R (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim 9(4):57:1–57:20CrossRef
4.
Zurück zum Zitat Bueno J, Planas J, Duran A, Badia RM, Martorell X, Ayguade E, Labarta J (2012) Productive programming of GPU clusters with OmpSs. In: Proceeding of the IEEE 26th IPDPS Bueno J, Planas J, Duran A, Badia RM, Martorell X, Ayguade E, Labarta J (2012) Productive programming of GPU clusters with OmpSs. In: Proceeding of the IEEE 26th IPDPS
5.
Zurück zum Zitat Hart A (2012) The OpenACC programming model. Technical report, Cray Exascale Research Initiative Europe Hart A (2012) The OpenACC programming model. Technical report, Cray Exascale Research Initiative Europe
6.
Zurück zum Zitat Kulkarni M, Burtscher M, Cascaval C, Pingali K (2009) Lonestar: a suite of parallel irregular programs. In: International symposium on performance analysis of systems and software (ISPASS’09) Kulkarni M, Burtscher M, Cascaval C, Pingali K (2009) Lonestar: a suite of parallel irregular programs. In: International symposium on performance analysis of systems and software (ISPASS’09)
7.
Zurück zum Zitat Lima JVF, Gautier T, Maillard N, Danjean V (2012) Exploiting concurrent GPU operations for efficient work stealing on multi-GPUs. In: SBAC-PAD’12, pp 75–82 Lima JVF, Gautier T, Maillard N, Danjean V (2012) Exploiting concurrent GPU operations for efficient work stealing on multi-GPUs. In: SBAC-PAD’12, pp 75–82
8.
Zurück zum Zitat Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO-42, pp 45–55 Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO-42, pp 45–55
10.
Zurück zum Zitat Ravi VT, Agrawal G (2011) A dynamic scheduling framework for emerging heterogeneous systems. In: High performance computing (HiPC), pp 1–10 Ravi VT, Agrawal G (2011) A dynamic scheduling framework for emerging heterogeneous systems. In: High performance computing (HiPC), pp 1–10
11.
Zurück zum Zitat Reinders J (2007) Intel threading building blocks: multi-core parallelism for C++ programming. O’Reilly, USA Reinders J (2007) Intel threading building blocks: multi-core parallelism for C++ programming. O’Reilly, USA
12.
Zurück zum Zitat Rudolph DC, Polychronopoulos CD (1989) An efficient message-passing scheduler based on guided self scheduling. In: Proceeding of the third international conference on supercomputing, ICS ’89 Rudolph DC, Polychronopoulos CD (1989) An efficient message-passing scheduler based on guided self scheduling. In: Proceeding of the third international conference on supercomputing, ICS ’89
13.
Zurück zum Zitat Russel SA (2012) Levering GPGPU and OpenCL technologies for natural user interaces. You i Labs inc., Canada Technical report Russel SA (2012) Levering GPGPU and OpenCL technologies for natural user interaces. You i Labs inc., Canada Technical report
14.
Zurück zum Zitat Venkatasubramanian S, Vuduc RW (2009) Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: Procedding of the international conference on supercomputing (ICS’09) Venkatasubramanian S, Vuduc RW (2009) Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: Procedding of the international conference on supercomputing (ICS’09)
Metadaten
Titel
Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures
verfasst von
Angeles Navarro
Antonio Vilches
Francisco Corbera
Rafael Asenjo
Publikationsdatum
01.11.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1200-3

Weitere Artikel der Ausgabe 2/2014

The Journal of Supercomputing 2/2014 Zur Ausgabe

Premium Partner