research-article

Rethinking SIMD Vectorization for In-Memory Databases

Authors:
Orestis Polychroniou

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Arun Raghavan

Oracle Corporation, Belmont, CA, USA

Oracle Corporation, Belmont, CA, USA
View Profile

,
Kenneth A. Ross

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015Pages 1493–1508https://doi.org/10.1145/2723372.2747645

Published:27 May 2015Publication History

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Pages 1493–1508

ABSTRACT

Analytical databases are continuously adapting to the underlying hardware in order to saturate all sources of parallelism. At the same time, hardware evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, strays from the mainstream CPU design by packing a larger number of simpler cores per chip, relying on SIMD instructions to fill the performance gap. Databases have been attempting to utilize the SIMD capabilities of CPUs. However, mainstream CPUs have only recently adopted wider SIMD registers and more advanced instructions, since they do not rely primarily on SIMD for efficiency. In this paper, we present novel vectorized designs and implementations of database operators, based on advanced SIMD operations, such as gathers and scatters. We study selections, hash tables, and partitioning; and combine them to build sorting and joins. Our evaluation on the MIC-based Xeon Phi co-processor as well as the latest mainstream CPUs shows that our vectorization designs are up to an order of magnitude faster than the state-of-the-art scalar and vector approaches. Also, we highlight the impact of efficient vectorization on the algorithmic design of in-memory database operators, as well as the architectural design and power efficiency of hardware, by making simple cores comparably fast to complex cores. This work is applicable to CPUs and co-processors with advanced SIMD capabilities, using either many simple cores or fewer complex cores.

References

M.-C. Albutiu et al. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, 5(10):1064--1075, June 2012. Google ScholarDigital Library
P. Bakkum et al. Accelerating SQL database operations on a GPU with CUDA. In GPGPU, pages 94--103, 2010. Google ScholarDigital Library
C. Balkesen et al. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013. Google ScholarDigital Library
C. Balkesen et al. Multicore, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, Sept. 2013. Google ScholarDigital Library
S. Blanas et al. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011. Google ScholarDigital Library
P. Boncz et al. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.Google Scholar
J. Chhugani et al. Efficient implementation of sorting on multi-core SIMD CPU architecture. In VLDB, pages 1313--1324, 2008. Google ScholarDigital Library
J. Cieslewicz et al. Automatic contention detection and amelioration for data-intensive operations. In SIGMOD, pages 483--494, 2010. Google ScholarDigital Library
D. J. DeWitt et al. Materialization strategies in a column-oriented DBMS. In ICDE, pages 466--475, 2007.Google Scholar
J. Hofmann et al. Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips. CoRR, arXiv:1401.7494, 2014.Google ScholarDigital Library
H. Inoue et al. AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. In PACT, pages 189--198, 2007. Google ScholarCross Ref
S. Jha et al. Improving main memory hash joins on Intel Xeon Phi processors: An experimental approach. PVLDB, 8(6):642--653, Feb. 2015. Google ScholarDigital Library
T. Kaldewey et al. GPU join processing revisited. In DaMoN, 2012. Google ScholarDigital Library
C. Kim et al. Sort vs. hash revisited: fast join implementation on modern multicore CPUs. In VLDB, pages 1378--1389, 2009. Google ScholarDigital Library
C. Kim et al. Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In SIGMOD, pages 339--350, 2010. Google ScholarDigital Library
K. Krikellas et al. Generating code for holistic query evaluation. In ICDE, pages 613--624, 2010.Google ScholarCross Ref
S. Larsen et al. Exploiting superword level parallelism with multimedia instruction sets. In PLDI, pages 145--156, 2000. Google ScholarDigital Library
V. Leis et al. Morsel-driven parallelism: A NUMA-aware query evaluation framework for the many-core age. In SIGMOD, pages 743--754, 2014. Google ScholarDigital Library
S. Manegold et al. Optimizing database architecture for the new bottleneck: Memory access. J. VLDB, 9(3):231--246, Dec. 2000. Google ScholarDigital Library
S. Manegold et al. What happens during a join? dissecting CPU and memory optimization effects. In VLDB, pages 339--350, 2000. Google ScholarDigital Library
S. Manegold et al. Cache-conscious radix-decluster projections. In VLDB, pages 684--695, 2004. Google ScholarDigital Library
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, June 2011. Google ScholarDigital Library
R. Pagh et al. Cuckoo hashing. J. Algorithms, 51(2):122--144, May 2004. Google ScholarDigital Library
H. Pirk et al. Accelerating foreign-key joins using asymmetric memory channels. In ADMS, 2011.Google Scholar
O. Polychroniou et al. High throughput heavy hitter aggregation for modern SIMD processors. In DaMoN, 2013. Google ScholarDigital Library
O. Polychroniou et al. A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort. In SIGMOD, pages 755--766, 2014. Google ScholarDigital Library
O. Polychroniou et al. Vectorized Bloom filters for advanced SIMD processors. In DaMoN, 2014. Google ScholarDigital Library
V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, Aug. 2013. Google ScholarDigital Library
K. A. Ross. Selection conditions in main memory. ACM Trans. Database Systems, 29(1):132--161, Mar. 2004. Google ScholarDigital Library
K. A. Ross. Efficient hash probes on modern processors. In ICDE, pages 1297--1301, 2007.Google ScholarCross Ref
N. Satish et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010. Google ScholarDigital Library
B. Schlegel et al. Scalable frequent itemset mining on many-core processors. In DaMoN, 2013. Google ScholarDigital Library
L. Seiler et al. Larrabee: A many-core x86 architecture for visual computing. ACM Trans. Graphics, 27(3), Aug. 2008. Google ScholarDigital Library
J. Shin. Introducing control flow into vectorized code. In PACT, pages 280--291, 2007. Google ScholarDigital Library
L. Sidirourgos et al. Column imprints: A secondary index structure. In SIGMOD, pages 893--904, 2013. Google ScholarDigital Library
E. A. Sitaridi et al. Optimizing select conditions on GPUs. In DaMoN, 2013. Google ScholarDigital Library
M. Stonebraker et al. C-store: A column-oriented DBMS. In VLDB, pages 553--564, 2005. Google ScholarDigital Library
J. Wassenberg et al. Engineering a multi core radix sort. In EuroPar, pages 160--169, 2011. Google ScholarDigital Library
T. Willhalm et al. SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, Aug. 2009. Google ScholarDigital Library
Y. Ye et al. Scalable aggregation on multicore processors. In DaMoN, 2011. Google ScholarDigital Library
J. Zhou et al. Implementing database operations using SIMD instructions. In SIGMOD, pages 145--156, 2002. Google ScholarDigital Library
M. Zukowski et al. Architecture-conscious hashing. In DaMoN, 2006. Google ScholarDigital Library

Index Terms

Rethinking SIMD Vectorization for In-Memory Databases
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Read More
Vectorizing Unstructured Mesh Computations for Many-core Architectures
PMAM'14: Proceedings of Programming Models and Applications on Multicores and Manycores

Achieving optimal performance on the latest multi-core and many-core architectures depends more and more on making efficient use of the hardware's vector processing capabilities. While auto-vectorizing compilers do not require the use of vector ...
Read More
Vectorizing unstructured mesh computations for many-core architectures

Achieving optimal performance on the latest multi-core and many-core architectures increasingly depends on making efficient use of the hardware's vector units. This paper presents results on achieving high performance through vectorization on CPUs and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
General Chair:
Timos Sellis
RMIT University, Australia
,
Program Chairs:
Susan B. Davidson
University of Pennsylvania, USA
,
Zack Ives
University of Pennsylvania, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
simd
vectorization
xeon phi
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 128
  Total Citations
  View Citations
- 1,969
  Total Downloads
- Downloads (Last 12 months)298
- Downloads (Last 6 weeks)43
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.