skip to main content
10.1145/2723372.2747645acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Rethinking SIMD Vectorization for In-Memory Databases

Published:27 May 2015Publication History

ABSTRACT

Analytical databases are continuously adapting to the underlying hardware in order to saturate all sources of parallelism. At the same time, hardware evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, strays from the mainstream CPU design by packing a larger number of simpler cores per chip, relying on SIMD instructions to fill the performance gap. Databases have been attempting to utilize the SIMD capabilities of CPUs. However, mainstream CPUs have only recently adopted wider SIMD registers and more advanced instructions, since they do not rely primarily on SIMD for efficiency. In this paper, we present novel vectorized designs and implementations of database operators, based on advanced SIMD operations, such as gathers and scatters. We study selections, hash tables, and partitioning; and combine them to build sorting and joins. Our evaluation on the MIC-based Xeon Phi co-processor as well as the latest mainstream CPUs shows that our vectorization designs are up to an order of magnitude faster than the state-of-the-art scalar and vector approaches. Also, we highlight the impact of efficient vectorization on the algorithmic design of in-memory database operators, as well as the architectural design and power efficiency of hardware, by making simple cores comparably fast to complex cores. This work is applicable to CPUs and co-processors with advanced SIMD capabilities, using either many simple cores or fewer complex cores.

References

  1. M.-C. Albutiu et al. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, 5(10):1064--1075, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bakkum et al. Accelerating SQL database operations on a GPU with CUDA. In GPGPU, pages 94--103, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Balkesen et al. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Balkesen et al. Multicore, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, Sept. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Blanas et al. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Boncz et al. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.Google ScholarGoogle Scholar
  7. J. Chhugani et al. Efficient implementation of sorting on multi-core SIMD CPU architecture. In VLDB, pages 1313--1324, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cieslewicz et al. Automatic contention detection and amelioration for data-intensive operations. In SIGMOD, pages 483--494, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. J. DeWitt et al. Materialization strategies in a column-oriented DBMS. In ICDE, pages 466--475, 2007.Google ScholarGoogle Scholar
  10. J. Hofmann et al. Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips. CoRR, arXiv:1401.7494, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Inoue et al. AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. In PACT, pages 189--198, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Jha et al. Improving main memory hash joins on Intel Xeon Phi processors: An experimental approach. PVLDB, 8(6):642--653, Feb. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Kaldewey et al. GPU join processing revisited. In DaMoN, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Kim et al. Sort vs. hash revisited: fast join implementation on modern multicore CPUs. In VLDB, pages 1378--1389, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Kim et al. Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In SIGMOD, pages 339--350, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Krikellas et al. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  17. S. Larsen et al. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 145--156, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Leis et al. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD, pages 743--754, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Manegold et al. Optimizing database architecture for the new bottleneck: Memory access. J. VLDB, 9(3):231--246, Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Manegold et al. What happens during a join? dissecting CPU and memory optimization effects. In VLDB, pages 339--350, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Manegold et al. Cache-conscious radix-decluster projections. In VLDB, pages 684--695, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Pagh et al. Cuckoo hashing. J. Algorithms, 51(2):122--144, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Pirk et al. Accelerating foreign-key joins using asymmetric memory channels. In ADMS, 2011.Google ScholarGoogle Scholar
  25. O. Polychroniou et al. High throughput heavy hitter aggregation for modern SIMD processors. In DaMoN, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Polychroniou et al. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, pages 755--766, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. O. Polychroniou et al. Vectorized Bloom filters for advanced SIMD processors. In DaMoN, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, Aug. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. A. Ross. Selection conditions in main memory. ACM Trans. Database Systems, 29(1):132--161, Mar. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. A. Ross. Efficient hash probes on modern processors. In ICDE, pages 1297--1301, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  31. N. Satish et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Schlegel et al. Scalable frequent itemset mining on many-core processors. In DaMoN, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L. Seiler et al. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graphics, 27(3), Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Shin. Introducing control flow into vectorized code. In PACT, pages 280--291, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. L. Sidirourgos et al. Column imprints: A secondary index structure. In SIGMOD, pages 893--904, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. E. A. Sitaridi et al. Optimizing select conditions on GPUs. In DaMoN, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Stonebraker et al. C-store: A column-oriented DBMS. In VLDB, pages 553--564, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Wassenberg et al. Engineering a multi core radix sort. In EuroPar, pages 160--169, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Willhalm et al. SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Ye et al. Scalable aggregation on multicore processors. In DaMoN, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Zhou et al. Implementing database operations using SIMD instructions. In SIGMOD, pages 145--156, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Zukowski et al. Architecture-conscious hashing. In DaMoN, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rethinking SIMD Vectorization for In-Memory Databases

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
      May 2015
      2110 pages
      ISBN:9781450327589
      DOI:10.1145/2723372

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 May 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader