research-article

A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort

Authors:
Orestis Polychroniou

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Kenneth A. Ross

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataJune 2014Pages 755–766https://doi.org/10.1145/2588555.2610522

Published:18 June 2014Publication History

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Pages 755–766

ABSTRACT

Analytical database systems can achieve high throughput main-memory query execution by being aware of the dynamics of highly-parallel modern hardware. Such systems rely on partitioning to cluster or divide data into smaller pieces and thus achieve better parallelism and memory locality. This paper considers a comprehensive collection of variants of main-memory partitioning tuned for various layers of the memory hierarchy. We revisit the pitfalls of in-cache partitioning, and utilizing the crucial performance factors, we introduce new variants for partitioning out-of-cache. Besides non-in-place variants where linear extra space is used, we introduce large-scale in-place variants, and propose NUMA-aware partitioning that guarantees locality on multiple processors. Also, we make range partitioning comparably fast with hash or radix, by designing a novel cache-resident index to compute ranges. All variants are combined to build three NUMA-aware sorting algorithms: a stable LSB radix-sort; an in-place MSB radix-sort using different variants across memory layers; and a comparison-sort utilizing wide-fanout range partitioning and SIMD-optimal in-cache sorting. To the best of our knowledge, all three are the fastest to date on billion-scale inputs for both dense and sparse key domains. As shown for sorting, our work can serve as a tool for building other operations (e.g., join, aggregation) by combining the most suitable variants that best meet the design goals.

References

M.-C. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB, 5(10):1064--1075, June 2012. Google ScholarDigital Library
C. Balkesen et al. Multicore, main-memory joins: Sort vs. hash revisited. PVLDB, 7(1):85--96, Sept. 2013.Google ScholarDigital Library
C. Balkesen, J. Teubner, G. Alonso, and M. T. Ozsu. Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware. In ICDE, pages 362--373, 2013. Google ScholarDigital Library
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, pages 37--48, 2011. Google ScholarDigital Library
J. Chhugani et al. Efficient implementation of sorting on multi-core SIMD CPU architecture. In VLDB, pages 1313--1324, 2008. Google ScholarDigital Library
H. Inoue, T. Moriyama, H. Komatsu, and T. Nakatani. AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. In PACT, pages 189--198, 2007. Google ScholarDigital Library
C. Kim et al. Sort vs. hash revisited: fast join implementation on modern multicore CPUs. In VLDB, pages 1378--1389, 2009. Google ScholarDigital Library
C. Kim et al. Fast: fast architecture sensitive tree search on modern CPUs and GPUs. In SIGMOD, pages 339--350, 2010. Google ScholarDigital Library
C. Kim et al. CloudRAMsort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster. In SIGMOD, pages 841--850, 2012. Google ScholarDigital Library
Y. Li et al. NUMA-aware algorithms: the case of data shuffling. In CIDR, 2013.Google Scholar
S. Manegold, P. A. Boncz, and M. L. Kersten. What happens during a join? dissecting CPU and memory optimization effects. In VLDB, pages 339--350, 2000. Google ScholarDigital Library
V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, Aug. 2013. Google ScholarDigital Library
K. A. Ross et al. Optimal splitters for database partitioning with size bounds. In ICDT, pages 98--110, 2009. Google ScholarDigital Library
N. Satish et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In SIGMOD, pages 351--362, 2010. Google ScholarDigital Library
J. Wassenberg and P. Sanders. Engineering a multi core radix sort. In EuroPar, pages 160--169, 2011. Google ScholarDigital Library
T. Willhalm et al. SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, Aug. 2009. Google ScholarDigital Library
L. Wu, R. J. Barker, M. A. Kim, and K. A. Ross. Navigating big data with high-throughput, energy-efficient data partitioning. In ISCA, pages 249--260, 2013. Google ScholarDigital Library

Index Terms

A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Sorting is at the core of many database operations, such as index creation, sort-merge joins, and user-requested output sorting. As GPUs are emerging as a promising platform to accelerate various operations, sorting on GPUs becomes a viable endeavour. ...
Read More
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Sort is a fundamental kernel used in many database operations. In-memory sorts are now feasible; sort performance is limited by compute flops and main memory bandwidth rather than I/O. In this paper, we present a competitive analysis of comparison and ...
Read More
Versatile and scalable parallel histogram construction
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilation

Histograms are used in various fields to quickly profile the distribution of a large amount of data. However, it is challenging to efficiently utilize abundant parallel resources in modern processors for histogram construction. To make matters worse, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
General Chairs:
Curtis Dyreson
Utah State University, USA
,
Feifei Li
University of Utah, USA
,
Program Chair:
M. Tamer Özsu
University of Waterloo, Canada
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NUMA
SIMD
multi-core
parallelism
partitioning
sorting
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 994
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A comprehensive study of main-memory partitioning and its application to large-scale comparison- and radix-sort

SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Versatile and scalable parallel histogram construction