Top

The Journal of Supercomputing

Published in:

01-11-2014

Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures

Authors: Angeles Navarro, Antonio Vilches, Francisco Corbera, Rafael Asenjo

Published in: The Journal of Supercomputing | Issue 2/2014

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper explores the possibility of efficiently executing a single application using multicores simultaneously with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel_for template to allow its exploitation on heterogeneous architectures. Due to the asymmetry of the computing resources, we propose in this work a dynamic scheduling strategy coupled with an adaptive partitioning scheme that resizes chunks to prevent underutilization and load imbalance of CPUs and GPUs. In this paper we also address the problem of the underutilization of the CPU core where a host thread operates. To solve it, we propose two different approaches: (1) a collaborative host thread strategy, in which the host thread, instead of busy-waiting for the GPU to complete, it carries out useful chunk processing; and (2) a host thread blocking strategy combined with oversubscription, that delegates on the OS the duty of scheduling threads to available CPU cores in order to guarantee that all cores are doing useful work. Using two benchmarks we evaluate the overhead introduced by our scheduling and partitioning algorithms, finding that it is negligible. We also evaluate the efficiency of the strategies proposed finding that allowing oversubscription controlled by the OS can be beneficial under certain scenarios.

previous article Another step to the full GPU implementation of the weather research and forecasting model

next article High performance lattice reduction on heterogeneous computing platform

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Augonnet C, Clet-Ortega J, Thibault S, Namyst R (2010). Data-aware task scheduling on multi-accelerator based platforms. In: Parallel and distributed systems (ICPADS)

Augonnet C, Thibault S, Namyst R, Wacrenier P-A (February 2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23:187–198

Belviranli ME, Bhuyan LN, Gupta R (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim 9(4):57:1–57:20CrossRef

Bueno J, Planas J, Duran A, Badia RM, Martorell X, Ayguade E, Labarta J (2012) Productive programming of GPU clusters with OmpSs. In: Proceeding of the IEEE 26th IPDPS

Hart A (2012) The OpenACC programming model. Technical report, Cray Exascale Research Initiative Europe

Kulkarni M, Burtscher M, Cascaval C, Pingali K (2009) Lonestar: a suite of parallel irregular programs. In: International symposium on performance analysis of systems and software (ISPASS’09)

Lima JVF, Gautier T, Maillard N, Danjean V (2012) Exploiting concurrent GPU operations for efficient work stealing on multi-GPUs. In: SBAC-PAD’12, pp 75–82

Luk C-K, Hong S, Kim H (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO-42, pp 45–55

NVIDIA Corporation (2013) CUDA Toolkit Documentation ver.5.5. http://docs.nvidia.com/cuda/index.html. Accessed 20 Nov 2013

10.

Ravi VT, Agrawal G (2011) A dynamic scheduling framework for emerging heterogeneous systems. In: High performance computing (HiPC), pp 1–10

11.

Reinders J (2007) Intel threading building blocks: multi-core parallelism for C++ programming. O’Reilly, USA

12.

Rudolph DC, Polychronopoulos CD (1989) An efficient message-passing scheduler based on guided self scheduling. In: Proceeding of the third international conference on supercomputing, ICS ’89

13.

Russel SA (2012) Levering GPGPU and OpenCL technologies for natural user interaces. You i Labs inc., Canada Technical report

14.

Venkatasubramanian S, Vuduc RW (2009) Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: Procedding of the international conference on supercomputing (ICS’09)

15.

Vilches A, Navarro A, Corbera F, Asenjo R (2004) Strategies for maximizing utilization on multi-CPU & multi-GPU heterogeneous architectures. Technical report, Computer Architecture Department. http://www.ac.uma.es/~asenjo/research/

Title: Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures
Authors: Angeles Navarro
Antonio Vilches
Francisco Corbera
Rafael Asenjo
Publication date: 01-11-2014
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 2/2014
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-014-1200-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 2/2014

Parallel relaxed and extrapolated algorithms for computing PageRank

A GPU implementation of an iterative receiver for energy saving MIMO ID-BICM systems

In-memory application-level checkpoint-based migration for MPI programs

Solving time-invariant differential matrix Riccati equations using GPGPU computing

Power reduction in HPC data centers: a joint server placement and chassis consolidation approach

Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria

Premium Partner