Skip to main content
Erschienen in: International Journal of Parallel Programming 2/2014

01.04.2014

Accelerating Single Iteration Performance of CUDA-Based 3D Reaction–Diffusion Simulations

verfasst von: John K. Holmen, David L. Foster

Erschienen in: International Journal of Parallel Programming | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The most commonly used approach for solving reaction–diffusion systems relies upon stencil computations. Although stencil computations feature low compute intensity, they place high demands on memory bandwidth. Fortunately, GPU computing allows for the heavy reliance of stencil computations on neighboring data points to be exploited to significantly increase simulation speeds by reducing these memory bandwidth demands. Upon reviewing previously published works, a wide-variety of efforts have been made to optimize NVIDIA CUDA-based stencil computations. However, a critical aspect contributing to algorithm performance is commonly glossed over: the halo region loading technique utilized in conjunction with a given spatial blocking technique. This paper presents an in-depth examination of this aspect and the associated single iteration performance impacts when using symmetric, nearest neighbor 19-point stencils. This is accomplished by closely examining how the simulated space is partitioned into thread blocks and the balance between memory accesses, divergence, and computing threads. The resulting optimization strategy for accelerating 3-dimensional reaction–diffusion simulations offers up to 2.45 times speedup for single-precision floating point numbers in reference to GPU-based speedups found within the previously published work that this paper directly extends. In reference to our multithreaded CPU-based implementation, the resulting optimization strategy offers up to 8.69 times speedup for single-precision floating point numbers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Molnár Jr, F., Izsák, F., Mészároa, R., Lagzi, I.: Simulation of reaction–diffusion processes in three dimensions using CUDA. Chemom. Intell. Lab. Syst. 108(1), 76–85 (2011)CrossRef Molnár Jr, F., Izsák, F., Mészároa, R., Lagzi, I.: Simulation of reaction–diffusion processes in three dimensions using CUDA. Chemom. Intell. Lab. Syst. 108(1), 76–85 (2011)CrossRef
3.
Zurück zum Zitat Phillips, E.H., Fatica, M.: Implementing the Himeno Benchmark with CUDA on GPU Clusters. In: Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2010), pp. 1–10, April 2010 Phillips, E.H., Fatica, M.: Implementing the Himeno Benchmark with CUDA on GPU Clusters. In: Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2010), pp. 1–10, April 2010
4.
Zurück zum Zitat Micikevicius, P.: 3D finite difference computation on GPUs using CUDA. In: Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU2), pp. 79–84, March 2009 Micikevicius, P.: 3D finite difference computation on GPUs using CUDA. In: Proceedings of the 2nd Workshop on General Purpose Processing on Graphics Processing Units (GPGPU2), pp. 79–84, March 2009
5.
Zurück zum Zitat Zhang, Y., Mueller, F.: Auto-generation and auto-tuning of 3D stencil codes on GPU clusters. In: Proceedings of the 10th IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2012), March/April 2012 Zhang, Y., Mueller, F.: Auto-generation and auto-tuning of 3D stencil codes on GPU clusters. In: Proceedings of the 10th IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2012), March/April 2012
6.
Zurück zum Zitat Unat, D., Cai, X., Baden, S.B.: Mint: realizing CUDA performance in 3D stencil methods with annotated C. In: Proceedings of the International Conference on Supercomputing (ICS ’11), pp. 214–224, May/June 2011 Unat, D., Cai, X., Baden, S.B.: Mint: realizing CUDA performance in 3D stencil methods with annotated C. In: Proceedings of the International Conference on Supercomputing (ICS ’11), pp. 214–224, May/June 2011
7.
Zurück zum Zitat Nguyen, N., Satish, Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and, Analysis (SC’10), pp. 1–13, November 2010 Nguyen, N., Satish, Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and, Analysis (SC’10), pp. 1–13, November 2010
8.
Zurück zum Zitat Yang, Y., Cui, H.-M., Feng, X.-B., Xue, J.-L.: A hybrid circular queue method for iterative stencil computations on GPUs. J. Comput. Sci. Technol. 27(1), 57–74 (2012)CrossRef Yang, Y., Cui, H.-M., Feng, X.-B., Xue, J.-L.: A hybrid circular queue method for iterative stencil computations on GPUs. J. Comput. Sci. Technol. 27(1), 57–74 (2012)CrossRef
9.
Zurück zum Zitat Holewinski, J., Pouchet, L.-N., Sadayappan, P.: High-performance code generation for stencil computations on GPU architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing (ICS ’12), pp. 311–320, June 2012 Holewinski, J., Pouchet, L.-N., Sadayappan, P.: High-performance code generation for stencil computations on GPU architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing (ICS ’12), pp. 311–320, June 2012
10.
Zurück zum Zitat Meng, J., Skadron, K.: Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on GPUs. In: Proceedings of the 23rd International Conference on Supercomputing (ICS ’09), pp. 256–265, June 2009 Meng, J., Skadron, K.: Performance Modeling and Automatic Ghost Zone Optimization for Iterative Stencil Loops on GPUs. In: Proceedings of the 23rd International Conference on Supercomputing (ICS ’09), pp. 256–265, June 2009
11.
Zurück zum Zitat Kirk, D.B., Hwu, W.-M.W.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, San Fransisco (2010) Kirk, D.B., Hwu, W.-M.W.: Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann, San Fransisco (2010)
12.
Zurück zum Zitat Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison Wesley, Reading (2010) Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison Wesley, Reading (2010)
13.
Zurück zum Zitat Farber, R.: CUDA Application Design and Development. Morgan Kaufmann, San Fransisco (2011) Farber, R.: CUDA Application Design and Development. Morgan Kaufmann, San Fransisco (2011)
Metadaten
Titel
Accelerating Single Iteration Performance of CUDA-Based 3D Reaction–Diffusion Simulations
verfasst von
John K. Holmen
David L. Foster
Publikationsdatum
01.04.2014
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 2/2014
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-013-0251-z

Weitere Artikel der Ausgabe 2/2014

International Journal of Parallel Programming 2/2014 Zur Ausgabe

Announcement

Editor’s Note