research-article

3D finite difference computation on GPUs using CUDA

Author:
Paulius Micikevicius

NVIDIA, Santa Clara, CA

NVIDIA, Santa Clara, CA
View Profile

GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing UnitsMarch 2009Pages 79–84https://doi.org/10.1145/1513895.1513905

Published:08 March 2009Publication History

GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units

Pages 79–84

ABSTRACT

In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implementation for both the stencil-only computation, as well as the discretization of the wave equation, which is currently of great interest in seismic computing. For the larger stencils, the described approach achieves the throughput of between 2,400 to over 3,000 million of output points per second on a single Tesla 10-series GPU. This is roughly an order of magnitude higher than a 4-core Harpertown CPU running a similar code from seismic industry. Multi-GPU parallelization is also described, achieving linear scaling with GPUs by overlapping inter-GPU communication with computation.

References

Baysal, E., Kosloff, D. D., and Sherwood, J. W. C. 1983. Reverse-time migration. Geophysics, 48, 1514--1524.Google ScholarCross Ref
CUDA Programming Guide, 2.1, NVIDIA. http://developer.download.nvidia.com/compute/cuda/2_1/too lkit/docs/NVIDIA_CUDA_Programming_Guide_2.1.pdfGoogle Scholar
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., and Yelick, K. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (Austin, Texas, November 15--21, 2008). Conference on High Performance Networking and Computing. IEEE Press, Piscataway, NJ, 1--12. Google ScholarDigital Library
Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., and Yelick, K. 2006. Implicit and explicit optimizations for stencil computations. In Proceedings of the 2006 Workshop on Memory System Performance and Correctness (San Jose, California, October 22--22, 2006). MSPC '06. ACM, New York, NY, 51--60. Google ScholarDigital Library
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J. 2008. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro 28, 2 (Mar. 2008), 39--55. Google ScholarDigital Library
McMechan, G. A. 1983. Migration by extrapolation of time-dependent boundary values. Geophys. Prosp., 31, 413--420.Google ScholarCross Ref
Nickolls, J., Buck, I., Garland, M., and Skadron, K. 2008. Scalable Parallel Programming with CUDA. Queue 6, 2 (Mar. 2008), 40--53. Google ScholarDigital Library

Index Terms

3D finite difference computation on GPUs using CUDA
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

This paper presents implementation strategies and optimization approaches for a D3Q19 lattice Boltzmann flow solver on nVIDIA graphics processing units (GPUs). Using the STREAM benchmarks we demonstrate the GPU parallelization approach and obtain an ...
Read More
High-performance cone beam reconstruction using CUDA compatible GPUs

Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper presents an acceleration method for cone beam reconstruction using CUDA ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
March 2009
107 pages
ISBN:9781605585178
DOI:10.1145/1513895
Conference Chairs:
David Kaeli
Northeastern University
,
Miriam Leeser
Northeastern University
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 March 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CUDA
GPU
finite difference
parallel algorithms
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate57of129submissions,44%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 339
  Total Citations
  View Citations
- 2,785
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

3D finite difference computation on GPUs using CUDA

GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units

ABSTRACT

References

Cited By

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA

High-performance cone beam reconstruction using CUDA compatible GPUs