Skip to main content
Erschienen in:
Buchtitelbild

2010 | OriginalPaper | Buchkapitel

13. Automatic Tuning of CUDA Execution Parameters for Stencil Processing

verfasst von : Katsuto Sato, Hiroyuki Takizawa, Kazuhiko Komatsu, Hiroaki Kobayashi

Erschienen in: Software Automatic Tuning

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, Compute Unified Device Architecture (CUDA) has enabled Graphics Processing Units (GPUs) to accelerate various applications. However, to exploit the GPU’s computing power fully, a programmer has to carefully adjust some CUDA execution parameters even for simple stencil processing kernels. Hence, this paper develops an automatic parameter tuning mechanism based on profiling to predict the optimal execution parameters. This paper first discusses the scope of the parameter exploration space determined by GPU’s architectural restrictions. To find the optimal execution parameters, performance models are created by profiling execution times of kernel using each promising parameter configuration. The execution parameters are determined by using those performance models. This paper evaluates the performance improvement due to the proposed mechanism using two benchmark programs. From the evaluation results, it is clarified that the proposed mechanism can appropriately select a suboptimal Cooperative Thread Array (CTA) configuration whose performance is comparable to the optimal one.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The current SPRAT compiler does not generate a code that dynamically allocates the shared memory, and therefore the dynamically-allocated shared memory size is not considered here.
 
Literatur
3.
Zurück zum Zitat NVIDIA Corporation (2008) NVIDIA CUDA Compute Unified Device Architecture programming guide version 2.0 NVIDIA Corporation (2008) NVIDIA CUDA Compute Unified Device Architecture programming guide version 2.0
4.
Zurück zum Zitat AMD Corporation (2009) ATI STREAM ATI stream computing user guide version 1.4 beta AMD Corporation (2009) ATI STREAM ATI stream computing user guide version 1.4 beta
5.
Zurück zum Zitat Papakipos M (2006) SC06 GPGPU Course: PeakStream Platform. In: the ACM/IEEE SC06 tutorial Papakipos M (2006) SC06 GPGPU Course: PeakStream Platform. In: the ACM/IEEE SC06 tutorial
6.
Zurück zum Zitat McCool MD et al (2006) Performance Evaluation of GPUs Using the RapidMind Development Platform. In: poster reception at the ACM/IEEE SC06 McCool MD et al (2006) Performance Evaluation of GPUs Using the RapidMind Development Platform. In: poster reception at the ACM/IEEE SC06
7.
Zurück zum Zitat Ueng SZ, Lathara M, Baghsorkhi SS, Hwu WMW (2008) CUDA-Lite: Reducing GPU Programming Complexity. In: Languages and Compilers for Parallel Computing: 21th International Workshop, LCPC 2008, Edmonton, Canada, July 31–Aug 2, 2008, Revised Selected Papers, Springer, Berlin, pp 1–15 Ueng SZ, Lathara M, Baghsorkhi SS, Hwu WMW (2008) CUDA-Lite: Reducing GPU Programming Complexity. In: Languages and Compilers for Parallel Computing: 21th International Workshop, LCPC 2008, Edmonton, Canada, July 31–Aug 2, 2008, Revised Selected Papers, Springer, Berlin, pp 1–15
8.
Zurück zum Zitat Ryoo S, Rodrigues CI, Baghsorkhi SS, Stone SS, Kirk DB, Hwu WMW (2008) Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP ’08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, ACM, New York, pp 73–82 Ryoo S, Rodrigues CI, Baghsorkhi SS, Stone SS, Kirk DB, Hwu WMW (2008) Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP ’08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, ACM, New York, pp 73–82
9.
Zurück zum Zitat Buck I et al (2004) Brook for GPUs: Stream Computing on Graphics Hardware. ACM Trans Graph 23(3):777–786CrossRef Buck I et al (2004) Brook for GPUs: Stream Computing on Graphics Hardware. ACM Trans Graph 23(3):777–786CrossRef
10.
Zurück zum Zitat Han TD, Abdelrahman TS (2009) hiCUDA: a high-level directive-based language for GPU programming. In: GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, ACM, New York, 52–61 Han TD, Abdelrahman TS (2009) hiCUDA: a high-level directive-based language for GPU programming. In: GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, ACM, New York, 52–61
11.
Zurück zum Zitat Takizawa H, Sato K, Kobayashi H (2008) SPRAT: Runtime processor selection for energy-aware computing. 2008 IEEE International Conference on Cluster Computing (29 2008–Oct. 1 2008) pp 386–393 Takizawa H, Sato K, Kobayashi H (2008) SPRAT: Runtime processor selection for energy-aware computing. 2008 IEEE International Conference on Cluster Computing (29 2008–Oct. 1 2008) pp 386–393
12.
Zurück zum Zitat Flynn MJ (1972) Some computer organizations and their effectiveness. Comput IEEE Trans C-21(9):948–960 Flynn MJ (1972) Some computer organizations and their effectiveness. Comput IEEE Trans C-21(9):948–960
13.
Zurück zum Zitat Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA tesla: a unified graphics and computing architecture. IEEE Micro 28:39–55CrossRef Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA tesla: a unified graphics and computing architecture. IEEE Micro 28:39–55CrossRef
14.
Zurück zum Zitat Kongetira P, Aingaran K, Olukotun K (2005) Niagara: a 32-way multithreaded Sparc processor. Micro IEEE 25(2):21–29CrossRef Kongetira P, Aingaran K, Olukotun K (2005) Niagara: a 32-way multithreaded Sparc processor. Micro IEEE 25(2):21–29CrossRef
15.
Zurück zum Zitat Cormen TH, Leiserson CE, Rivest LR, Stein C (2001) In: Introduction to algorithms, 2 edn. MIT, Cambridge, Massachusetts 02142, 762–766 Cormen TH, Leiserson CE, Rivest LR, Stein C (2001) In: Introduction to algorithms, 2 edn. MIT, Cambridge, Massachusetts 02142, 762–766
Metadaten
Titel
Automatic Tuning of CUDA Execution Parameters for Stencil Processing
verfasst von
Katsuto Sato
Hiroyuki Takizawa
Kazuhiko Komatsu
Hiroaki Kobayashi
Copyright-Jahr
2010
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4419-6935-4_13

Neuer Inhalt