research-article

Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last

Authors:
Prashanth Menon

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Todd C. Mowry

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Andrew Pavlo

Carnegie Mellon University

Carnegie Mellon University
View Profile

Proceedings of the VLDB Endowment Volume 11 Issue 1pp 1–13https://doi.org/10.14778/3151113.3151114

Published:01 September 2017Publication History

Proceedings of the VLDB Endowment

Abstract

In-memory database management systems (DBMSs) are a key component of modern on-line analytic processing (OLAP) applications, since they provide low-latency access to large volumes of data. Because disk accesses are no longer the principle bottleneck in such systems, the focus in designing query execution engines has shifted to optimizing CPU performance. Recent systems have revived an older technique of using just-in-time (JIT) compilation to execute queries as native code instead of interpreting a plan. The state-of-the-art in query compilation is to fuse operators together in a query plan to minimize materialization overhead by passing tuples efficiently between operators. Our empirical analysis shows, however, that more tactful materialization yields better performance.

We present a query processing model called "relaxed operator fusion" that allows the DBMS to introduce staging points in the query plan where intermediate results are temporarily materialized. This allows the DBMS to take advantage of inter-tuple parallelism inherent in the plan using a combination of prefetching and SIMD vectorization to support faster query execution on data sets that exceed the size of CPU-level caches. Our evaluation shows that our approach reduces the execution time of OLAP queries by up to 2.2× and achieves up to 1.8× better performance compared to other in-memory DBMSs.

References

Actian Vector. http://esd.actian.com/product/Vector.Google Scholar
Apache Cassandra. http://cassandra.apache.org/.Google Scholar
Apache Spark. http://spark.apache.org/.Google Scholar
MemSQL. http://www.memsql.com.Google Scholar
Peloton Database Management System. http://pelotondb.io.Google Scholar
D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: How different are they really? In SIGMOD, pages 967--980, 2008. Google ScholarDigital Library
Y. Ahmad, O. Kennedy, C. Koch, and M. Nikolic. Dbtoaster: Higher-order delta processing for dynamic, frequently fresh views. PVLDB, 5(10):968--979, 2012. Google ScholarDigital Library
Y. Ahmad and C. Koch. Dbtoaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB, 2(2):1566--1569, 2009. Google ScholarDigital Library
A. Appleby. MurMur3 Hash. https://github.com/aappleby/smhasher.Google Scholar
P. Boncz, T. Neumann, and O. Erling. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark. 2014.Google Scholar
P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.Google Scholar
D. Broneske, A. Meister, and G. Saake. Hardware-sensitive scan operator variants for compiled selection pipelines. In Datenbanksysteme für Business, Technologie und Web (BTW), pages 403--412, 2017.Google Scholar
D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A history and evaluation of system r. Commun. ACM, 24:632--646, October 1981. Google ScholarDigital Library
S. Chen, A. Ailamaki, P. B. Gibbons, and T. C. Mowry. Improving hash join performance through prefetching. In ICDE, pages 116--127, 2004. Google ScholarDigital Library
A. Crotty, A. Galakatos, K. Dursun, T. Kraska, C. Binnig, U. Cetintemel, and S. Zdonik. An architecture for compiling udf-centric workflows. PVLDB, 8(12):1466--1477, 2015. Google ScholarDigital Library
A. Crotty, A. Galakatos, K. Dursun, T. Kraska, U. Çetintemel, and S. B. Zdonik. Tupleware: "big" data, big analytics, small clusters. In CIDR, 2015.Google Scholar
B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, A. W. Lee, A. Motivala, A. Q. Munir, S. Pelley, P. Povinec, G. Rahn, S. Triantafyllis, and P. Unterbrunner. The snowflake elastic data warehouse. SIGMOD '16, pages 215--226, 2016. Google ScholarDigital Library
C. Freedman, E. Ismert, and P.-A. Larson. Compilation in the microsoft SQL server hekaton engine. IEEE Data Eng. Bull., 2014.Google Scholar
G. Graefe. Volcano- an extensible and parallel query evaluation system. IEEE Trans. on Knowl. and Data Eng., 6:120--135, 1994. Google ScholarDigital Library
A. Kemper and T. Neumann. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011. Google ScholarDigital Library
Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10):853--864, 2014. Google ScholarDigital Library
O. Kocberber, B. Falsafi, and B. Grot. Asynchronous memory access chaining. PVLDB, 9(4):252--263, 2015. Google ScholarDigital Library
M. Kornacker, A. Behm, V. Bittorf, T. Bobrovytsky, C. Ching, A. Choi, J. Erickson, M. Grund, D. Hecht, M. Jacobs, I. Joshi, L. Kuff, D. Kumar, A. Leblang, N. Li, I. Pandis, H. Robinson, D. Rorke, S. Rus, J. Russell, D. Tsirogiannis, S. Wanderman-Milne, and M. Yoder. Impala: A modern, open-source SQL engine for hadoop. In CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, 2015.Google Scholar
K. Krikellas, S. D. Viglas, and M. Cintra. Generating code for holistic query evaluation. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 613--624. IEEE, 2010.Google ScholarCross Ref
H. Lang, T. Mühlbauer, F. Funke, P. A. Boncz, T. Neumann, and A. Kemper. Data blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 311--326, 2016. Google ScholarDigital Library
V. Leis, P. Boncz, A. Kemper, and T. Neumann. Morsel-driven parallelism: A numa-aware query evaluation framework for the many-core age. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD '14, pages 743--754, 2014. Google ScholarDigital Library
C.-K. Luk and T. C. Mowry. Compiler-based prefetching for recursive data structures. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS VII, pages 222--233, 1996. Google ScholarDigital Library
G. Moerkotte and T. Neumann. Accelerating queries with group-by and join by groupjoin. PVLDB, 4(11):843--851, 2011.Google ScholarDigital Library
T. C. Mowry, M. S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS V, pages 62--73, 1992. Google ScholarDigital Library
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, 2011. Google ScholarDigital Library
S. Pantela and S. Idreos. One loop does not fit all. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15, pages 2073--2074, 2015. Google ScholarDigital Library
D. Paroski. Code Generation: The Inner Sanctum of Database Performance. http://highscalability.com/blog/2016/9/7/code-generation-the-inner-sanctum-of-database-performance. html, September 2016.Google Scholar
O. Polychroniou, A. Raghavan, and K. A. Ross. Rethinking simd vectorization for in-memory databases. SIGMOD, pages 1493--1508, 2015. Google ScholarDigital Library
O. Polychroniou and K. A. Ross. Vectorized bloom filters for advanced simd processors. DaMoN '14, pages 6:1--6:6, 2014. Google ScholarDigital Library
S. Richter, V. Alvarez, and J. Dittrich. A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB, 9(3):96--107, 2015. Google ScholarDigital Library
S. Schuh, X. Chen, and J. Dittrich. An experimental comparison of thirteen relational equi-joins in main memory. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 1961--1976, 2016. Google ScholarDigital Library
J. Sompolski. Just-in-time Compilation in Vectorized Query Execution. Master's thesis, University of Warsaw, Aug 2011.Google Scholar
J. Sompolski, M. Zukowski, and P. Boncz. Vectorization vs. compilation in query execution. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN '11, pages 33--40, 2011. Google ScholarDigital Library
A. Suhan and T. Mostak. MapD: Massive Throughput Database Queries with LLVM on GPUs. http://devblogs.nvidia.com/parallelforall/mapd, June 2015.Google Scholar
The Transaction Processing Council. TPC-H Benchmark (Revision 2.16.0). http://www.tpc.org/tpch/, June 2013.Google Scholar
S. D. Viglas. Just-in-time compilation for sql query processing. PVLDB, 6(11):1190--1191, 2013. Google ScholarDigital Library
J. Zhou and K. A. Ross. Implementing database operations using simd instructions. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD '02, pages 145--156, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
M. Zukowski, N. Nes, and P. Boncz. Dsm vs. nsm: Cpu performance tradeoffs in block-oriented query processing. DaMoN '08, pages 47--54, 2008. Google ScholarDigital Library

Index Terms

Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Computing Relaxed Answers on RDF Databases
WISE '08: Proceedings of the 9th international conference on Web Information Systems Engineering

Database users may be frustrated by no answers returned when they pose a query on the database. In this paper, we study the problem of relaxing queries on RDF databases in order to acquire approximate answers. We address two problems for efficient ...
Read More
Forecasting the cost of processing multi-join queries via hashing for main-memory databases
SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing

Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even ...
Read More
Top-k dominating queries in uncertain databases
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Due to the existence of uncertain data in a wide spectrum of real applications, uncertain query processing has become increasingly important, which dramatically differs from handling certain data in a traditional database. In this paper, we formulate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 11, Issue 1
Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, Brazil
September 2017
120 pages
ISSN:2150-8097
Editors:
Jian Pei
Simon Fraser University
,
Sihem Amer-Yahia
University of Grenoble Alpes, CNRS
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2017
Published in pvldb Volume 11, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 57
  Total Citations
  View Citations
- 619
  Total Downloads
- Downloads (Last 12 months)122
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Computing Relaxed Answers on RDF Databases

Forecasting the cost of processing multi-join queries via hashing for main-memory databases

Top-k dominating queries in uncertain databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Relaxed operator fusion for in-memory databases: making compilation, vectorization, and prefetching work together at last

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Computing Relaxed Answers on RDF Databases

Forecasting the cost of processing multi-join queries via hashing for main-memory databases

Top-k dominating queries in uncertain databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media