Skip to main content
Erschienen in: The Journal of Supercomputing 12/2015

01.12.2015

Performance-aware composition framework for GPU-based systems

verfasst von: Usman Dastgeer, Christoph Kessler

Erschienen in: The Journal of Supercomputing | Ausgabe 12/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

User-level components of applications can be made performance-aware by annotating them with performance model and other metadata. We present a component model and a composition framework for the automatically optimized composition of applications for modern GPU-based systems from such components, which may expose multiple implementation variants. The framework targets the composition problem in an integrated manner, with the ability to do global performance-aware composition across multiple invocations. We demonstrate several key features of our framework relating to performance-aware composition including implementation selection, both with performance characteristics being known (or learned) beforehand as well as cases when they are learned at runtime. We also demonstrate hybrid execution capabilities of our framework on real applications. “Furthermore, we present a bulk composition technique that can make better composition decisions by considering information about upcoming calls along with data flow information extracted from the source program by static analysis. The bulk composition improves over the traditional greedy performance aware policy that only considers the current call for optimization.”

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this article, we use the terms implementation, implementation variant and variant interchangably.
 
2
We consider a recursive function as a special scenario to avoid combinatorial explosion of the solution space.
 
3
We have not encountered any such scenario yet in any application that we have ported to our framework.
 
4
By default, the system generates composition code for our own GCF runtime library. The user can set the -starpu switch to generate code for the StarPU runtime system.
 
5
Registering a variable to the runtime system creates a unique data handle (with information about size, memory address etc.) for that data in the runtime system which can be used for controlling its state and data transfers.
 
6
A point represents a single execution with certain performance relevant properties.
 
Literatur
1.
Zurück zum Zitat Augonnet C et al (2009) Automatic calibration of performance models on heterogeneous multicore architectures. In: Euro-Par Workshops (HPPC 2009), LNCS, vol 6043 Augonnet C et al (2009) Automatic calibration of performance models on heterogeneous multicore architectures. In: Euro-Par Workshops (HPPC 2009), LNCS, vol 6043
2.
Zurück zum Zitat Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the Annual International Symposium on Computer Architecture (ISCA) Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the Annual International Symposium on Computer Architecture (ISCA)
3.
Zurück zum Zitat Karcher T, Pankratius V (2011) Run-time automatic performance tuning for multicore applications. In: Euro-Par, LNCS vol 6852 Karcher T, Pankratius V (2011) Run-time automatic performance tuning for multicore applications. In: Euro-Par, LNCS vol 6852
4.
Zurück zum Zitat Ansel J et al (2009) PetaBricks: a language and compiler for algorithmic choice. In: Proceedings conference on Programming Language Design and Implementation (PLDI) Ansel J et al (2009) PetaBricks: a language and compiler for algorithmic choice. In: Proceedings conference on Programming Language Design and Implementation (PLDI)
5.
Zurück zum Zitat Linderman MD et al (2008) Merge: a programming model for heterogeneous multi-core systems. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) Linderman MD et al (2008) Merge: a programming model for heterogeneous multi-core systems. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
6.
Zurück zum Zitat Wernsing JR, Stitt G (2010) Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing. In: Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES) Wernsing JR, Stitt G (2010) Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing. In: Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
7.
Zurück zum Zitat Gregg C, Hazelwood K (2011) Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In: International Symposium on Performance Analysis of Systems and Software (ISPASS) Gregg C, Hazelwood K (2011) Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In: International Symposium on Performance Analysis of Systems and Software (ISPASS)
8.
Zurück zum Zitat Quinlan D, Liao C (2011) The ROSE source-to-source compiler infrastructure. In: Cetus users and compiler infrastructure workshop, USA Quinlan D, Liao C (2011) The ROSE source-to-source compiler infrastructure. In: Cetus users and compiler infrastructure workshop, USA
9.
Zurück zum Zitat Augonnet C et al (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exper 23:187–198CrossRef Augonnet C et al (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exper 23:187–198CrossRef
10.
Zurück zum Zitat Feautrier P (1991) Dataflow analysis of array and scalar references. Intl J Parallel Program 20(1) Feautrier P (1991) Dataflow analysis of array and scalar references. Intl J Parallel Program 20(1)
11.
Zurück zum Zitat Kicherer M et al (2011) Cost-aware function migration in heterogeneous systems. In: Proceedings of the Conference on High Performance and Embedded Architectures and Compilers (HiPEAC) Kicherer M et al (2011) Cost-aware function migration in heterogeneous systems. In: Proceedings of the Conference on High Performance and Embedded Architectures and Compilers (HiPEAC)
12.
Zurück zum Zitat Li L, Dastgeer U, Kessler C (2013) Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: Seventh Int. Workshop on Automatic Performance Tuning (iWAPT-2012), Proc. VECPAR-2012 Conference Li L, Dastgeer U, Kessler C (2013) Adaptive off-line tuning for optimized composition of components for heterogeneous many-core systems. In: Seventh Int. Workshop on Automatic Performance Tuning (iWAPT-2012), Proc. VECPAR-2012 Conference
13.
Zurück zum Zitat Che S et al (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC) Che S et al (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (IISWC)
14.
Zurück zum Zitat Topcuoglu H et al (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Par Dist Syst 13(3) Topcuoglu H et al (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Par Dist Syst 13(3)
15.
Zurück zum Zitat Korch M, Rauber, T (2006) Optimizing locality and scalability of embedded Runge-Kutta solvers using block-based pipelining. J Parallel Distrib Comput 66(3) Korch M, Rauber, T (2006) Optimizing locality and scalability of embedded Runge-Kutta solvers using block-based pipelining. J Parallel Distrib Comput 66(3)
16.
Zurück zum Zitat Kicherer M, Nowak F, Buchty R, Karl W (2012) Seamlessly portable applications:Managing the diversity of modern heterogeneous systems. ACM Trans Archit Code Optim 8(4):42(1–42:20) Kicherer M, Nowak F, Buchty R, Karl W (2012) Seamlessly portable applications:Managing the diversity of modern heterogeneous systems. ACM Trans Archit Code Optim 8(4):42(1–42:20)
17.
Zurück zum Zitat Dastgeer U, Li L, Kessler C (2012) The PEPPHER composition tool: Performance-aware dynamic composition of applications for GPU-based systems. In: MuCoCoS, SC12 Dastgeer U, Li L, Kessler C (2012) The PEPPHER composition tool: Performance-aware dynamic composition of applications for GPU-based systems. In: MuCoCoS, SC12
18.
Zurück zum Zitat Lee S, Vetter JS (2012) Early evaluation of directive-based gpu programming models for productive exascale computing. In: Conference for high performance computing, networking, storage and analysis Lee S, Vetter JS (2012) Early evaluation of directive-based gpu programming models for productive exascale computing. In: Conference for high performance computing, networking, storage and analysis
19.
Zurück zum Zitat Reyes R, Sande F (2011) Automatic code generation for GPUs in llc. J Supercomput 58:349–356CrossRef Reyes R, Sande F (2011) Automatic code generation for GPUs in llc. J Supercomput 58:349–356CrossRef
20.
Zurück zum Zitat Kessler CW, Löwe W (2012) Optimized composition of performance-aware parallel components. Concurr Comput Pract Exper 24(5):481–498CrossRef Kessler CW, Löwe W (2012) Optimized composition of performance-aware parallel components. Concurr Comput Pract Exper 24(5):481–498CrossRef
21.
Zurück zum Zitat Ericsson M (2008) Composition and optimization. Växjö University Press, Kalmar Ericsson M (2008) Composition and optimization. Växjö University Press, Kalmar
Metadaten
Titel
Performance-aware composition framework for GPU-based systems
verfasst von
Usman Dastgeer
Christoph Kessler
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 12/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1105-1

Weitere Artikel der Ausgabe 12/2015

The Journal of Supercomputing 12/2015 Zur Ausgabe

Premium Partner