skip to main content
10.1145/1513895.1513905acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgpgpuConference Proceedingsconference-collections
research-article

3D finite difference computation on GPUs using CUDA

Published:08 March 2009Publication History

ABSTRACT

In this paper we describe a GPU parallelization of the 3D finite difference computation using CUDA. Data access redundancy is used as the metric to determine the optimal implementation for both the stencil-only computation, as well as the discretization of the wave equation, which is currently of great interest in seismic computing. For the larger stencils, the described approach achieves the throughput of between 2,400 to over 3,000 million of output points per second on a single Tesla 10-series GPU. This is roughly an order of magnitude higher than a 4-core Harpertown CPU running a similar code from seismic industry. Multi-GPU parallelization is also described, achieving linear scaling with GPUs by overlapping inter-GPU communication with computation.

References

  1. Baysal, E., Kosloff, D. D., and Sherwood, J. W. C. 1983. Reverse-time migration. Geophysics, 48, 1514--1524.Google ScholarGoogle ScholarCross RefCross Ref
  2. CUDA Programming Guide, 2.1, NVIDIA. http://developer.download.nvidia.com/compute/cuda/2_1/too lkit/docs/NVIDIA_CUDA_Programming_Guide_2.1.pdfGoogle ScholarGoogle Scholar
  3. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., and Yelick, K. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (Austin, Texas, November 15--21, 2008). Conference on High Performance Networking and Computing. IEEE Press, Piscataway, NJ, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kamil, S., Datta, K., Williams, S., Oliker, L., Shalf, J., and Yelick, K. 2006. Implicit and explicit optimizations for stencil computations. In Proceedings of the 2006 Workshop on Memory System Performance and Correctness (San Jose, California, October 22--22, 2006). MSPC '06. ACM, New York, NY, 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lindholm, E., Nickolls, J., Oberman, S., Montrym, J. 2008. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro 28, 2 (Mar. 2008), 39--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. McMechan, G. A. 1983. Migration by extrapolation of time-dependent boundary values. Geophys. Prosp., 31, 413--420.Google ScholarGoogle ScholarCross RefCross Ref
  7. Nickolls, J., Buck, I., Garland, M., and Skadron, K. 2008. Scalable Parallel Programming with CUDA. Queue 6, 2 (Mar. 2008), 40--53. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. 3D finite difference computation on GPUs using CUDA

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
        March 2009
        107 pages
        ISBN:9781605585178
        DOI:10.1145/1513895

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 March 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate57of129submissions,44%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader