ABSTRACT
Until recently, the use of graphics processing units (GPUs) for query processing was limited by the amount of memory on the graphics card, a few gigabytes at best. Moreover, input tables had to be copied to GPU memory before they could be processed, and after computation was completed, query results had to be copied back to CPU memory. The newest generation of Nvidia GPUs and development tools introduces a common memory address space, which now allows the GPU to access CPU memory directly, lifting size limitations and obviating data copy operations. We confirm that this new technology can sustain 98% of its nominal rate of 6.3 GB/sec in practice, and exploit it to process database hash joins at the same rate, i.e., the join is processed "on the fly" as the GPU reads the input tables from CPU memory at PCI-E speeds. Compared to the fastest published results for in-memory joins on the CPU, this represents more than half an order of magnitude speed-up. All of our results include the cost of result materialization (often omitted in earlier work), and we investigate the implications of changing join predicate selectivity and table size.
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB'99. Google ScholarDigital Library
- D. A. Alcantara, V. Volkov, S. Sengupta, M. Mitzenmacher, J. D. Owens, and N. Ameta. GPU Computing Gems: Jade Edition, chapter 4, pages 39--53. Morgan Kaufmann, 2012.Google Scholar
- S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD'11. Google ScholarDigital Library
- P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB'99. Google ScholarDigital Library
- R. Budruck, D. Anderson, and T. Shanley. PCI Express System Architecture. Addison-Wesley, 2003. Google ScholarDigital Library
- M. Garland, S. Le Grand, J. Nickolls, J. Anderson, J. Hardwick, S. Morton, E. Phillips, Y. Zhang, and V. Volkov. Parallel computing experiences with CUDA. IEEE Micro, 28(4). Google ScholarDigital Library
- N. K. Govindaraju and D. Manocha. Efficient relational database management using graphics processors. In DaMoN'05. Google ScholarDigital Library
- B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo, and P. V. Sander. Relational query coprocessing on graphics processors. ACM Trans. Database Syst., 34(4), Dec. 2009. Google ScholarDigital Library
- B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD'08. Google ScholarDigital Library
- C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen, N. Satish, J. Chhugani, A. Di Blas, and P. Dubey. Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs. Proc. VLDB Endow., 2(2), Aug. 2009. Google ScholarDigital Library
- S. Manegold, P. Boncz, and M. Kersten. Optimizing main-memory join on modern hardware. IEEE Trans. on Knowledge and Data Engineering, 14. Google ScholarDigital Library
- H. Pirk, S. Manegold, and M. Kersten. Accelerating foreign-key joins using asymmetric memory channels. In ADMS'11.Google Scholar
Recommendations
Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using the graphic processing unit (GPU)-single-GPU implementation
We have successfully ported an arbitrary high-order discontinuous Galerkin (ADER-DG) method for solving the three-dimensional elastic seismic wave equation on unstructured tetrahedral meshes to an Nvidia Tesla C2075 GPU using the Nvidia CUDA programming ...
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationProcessing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and ...
HG-Bitmap Join Index: A Hybrid GPU/CPU Bitmap Join Index Mechanism for OLAP
Web Information Systems Engineering – WISE 2013 WorkshopsAbstractIn-memory big data OLAP(on-line analytical processing) is time consuming task for data access latency and complex star join processing overhead. GPU is introduced to DBMSs for its remarkable parallel computing power but also restricted by its ...
Comments