research-article

Quantifying TPC-H choke points and their optimizations

Authors:
Markus Dreseler

Hasso Plattner Institute, University of Potsdam, Germany

Hasso Plattner Institute, University of Potsdam, Germany
View Profile

,
Martin Boissier

Hasso Plattner Institute, University of Potsdam, Germany

Hasso Plattner Institute, University of Potsdam, Germany
View Profile

,
Tilmann Rabl

Hasso Plattner Institute, University of Potsdam, Germany

Hasso Plattner Institute, University of Potsdam, Germany
View Profile

,
Matthias Uflacker

Hasso Plattner Institute, University of Potsdam, Germany

Hasso Plattner Institute, University of Potsdam, Germany
View Profile

Proceedings of the VLDB Endowment Volume 13 Issue 8pp 1206–1220https://doi.org/10.14778/3389133.3389138

Published:01 April 2020Publication History

Proceedings of the VLDB Endowment

Abstract

TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems.

This paper provides a systematic analysis of choke points and their optimizations. It complements previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Three queries (Q2, Q17, and Q21) are strongly influenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine.

References

D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In Proceedings of the 23rd International Conference on Data Engineering, ICDE, pages 466--475, 2007.Google ScholarCross Ref
American National Standards Institute. American National Standard for Information Systems, Database Language | SQL: ANSI X3.135-1992. 1992.Google Scholar
American National Standards Institute. American National Standard for Information Systems, Database Language | SQL: ANSI X3.135-1999. 1999.Google Scholar
S. Bellamkonda, R. Ahmed, A. Witkowski, A. Amor, M. Zaït, and C. C. Lin. Enhanced subquery optimizations in Oracle. PVLDB, 2(2):1366--1377, 2009.Google ScholarDigital Library
P. A. Bernstein and D. W. Chiu. Using semi-joins to solve relational queries. Journal of the ACM, 28(1):25--40, 1981.Google ScholarDigital Library
M. Boissier and M. Jendruk. Workload-driven and robust selection of compression schemes for column stores. In Proceedings of the 22nd International Conference on Extending Database Technology, EDBT, pages 674--677, 2019.Google Scholar
M. Boissier, C. A. Meyer, T. Djurken, J. Lindemann, K. Mao, P. Reinhardt, T. Specht, T. Zimmermann, and M. U acker. Analyzing data relevance and access patterns of live production database systems. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM, pages 2473--2475, 2016.Google ScholarDigital Library
P. A. Boncz, A. Anatiotis, and S. Klabe. JCC-H: adding join crossing correlations with skew to TPCH. In Performance Evaluation and Benchmarking for the Analytics Era - 9th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 103--119, 2017.Google Scholar
P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51(12):77--85, 2008.Google ScholarDigital Library
P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: memory access. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB, pages 54--65, 1999.Google ScholarDigital Library
P. Boncz, T. Neumann, and O. Erling. TPC-H analyzed: hidden messages and lessons learned from an in uential benchmark. In Performance Characterization and Benchmarking - 5th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 61--76, 2014.Google Scholar
J. Chen and J. Revels. Robust benchmarking in noisy environments. CoRR, abs/1608.04295, 2016. arXiv: 1608.04295.Google Scholar
S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: an automated prover for SQL. In 8th Biennial Conference on Innovative Data Systems Research, Online Proceedings, CIDR, 2017.Google Scholar
S. Chu, K.Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pages 510--524, 2017.Google ScholarDigital Library
R. L. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. A. Kuno, R. O. Nambiar, T. Neumann, M. Poess, K. Sattler, M. Seibold, E. Simon, and F.Waas. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, DBTest, 2011.Google ScholarDigital Library
C. Curtsinger and E. D. Berger. STABILIZER: Statistically Sound Performance Evaluation. In Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 219--228, 2013.Google ScholarDigital Library
M. Dreseler, J. Kossmann, M. Boissier, S. Klauck, M. U acker, and H. Plattner. Hyrise re-engineered: an extensible database system for research in relational in-memory data management. In Proceedings of the 22nd International Conference on Extending Database Technology, EDBT, pages 313--324, 2019.Google Scholar
M. Dreseler, J. Kossmann, J. Frohnhofen, M. U acker, and H. Plattner. Fused table scans: combining AVX-512 and JIT to double the performance of multipredicate scans. In 34th IEEE International Conference on Data Engineering Workshops, ICDE Workshops, pages 102--109, 2018.Google ScholarCross Ref
G. M. Essertel, R. Y. Tahboub, J. M. Decker, K. J. Brown, K. Olukotun, and T. Rompf. Flare: optimizing Apache Spark with native compilation for scale-up architectures and medium-size data. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI, pages 799--815, 2018.Google Scholar
L. Fegaras. A new heuristic for optimizing large queries. In Proceedings of the 9th International Conference on Database and Expert Systems Applications, DEXA, pages 726--735, 1998.Google ScholarDigital Library
A. Floratou, F. Ozcan, and B. Schiefer. Benchmarking SQL-on-Hadoop systems: TPC or not TPC? In Big Data Benchmarking - 5th International Workshop. Revised Selected Papers, WBDB, pages 63--72, 2014.Google Scholar
S. Halfpap and R. Schlosser. Workload-driven fragment allocation for partially replicated databases using linear programming. In Proceedings of the 35th International Conference on Data Engineering, ICDE, pages 1746--1749, 2019.Google ScholarCross Ref
D. Inkster, M. Zukowski, and P. A. Boncz. Integration of VectorWise with Ingres. SIGMOD Record, 40(3):45--53, 2011.Google ScholarDigital Library
R. Johnson, N. Hardavellas, I. Pandis, N. Mancheril, S. Harizopoulos, K. Sabirli, A. Ailamaki, and B. Falsafi. To share or not to share? In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB, pages 351--362, 2007.Google ScholarDigital Library
A. Kemper and T. Neumann. HyPer: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In Proceedings of the 27th International Conference on Data Engineering, ICDE, pages 195--206, 2011.Google ScholarDigital Library
T. R. Kepe, E. C. de Almeida, and M. A. Z. Alves. Database processing-in-memory: an experimental study. PVLDB, 13(3):334--347, 2019.Google ScholarDigital Library
T. Kersten, V. Leis, A. Kemper, T. Neumann, A. Pavlo, and P. A. Boncz. Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB, 11(13):2209--2222, 2018.Google Scholar
V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age. In International Conference on Management of Data, SIGMOD, pages 743--754, 2014.Google ScholarDigital Library
V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.Google ScholarDigital Library
Y. Li and J. M. Patel. WideTable: an accelerator for analytical data processing. PVLDB, 7(10):907--918, 2014.Google ScholarDigital Library
G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB, pages 476--487, 1998.Google Scholar
G. Moerkotte and T. Neumann. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB, pages 930--941, 2006.Google ScholarDigital Library
T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. ProducingWrong Data Without Doing Anything Obviously Wrong! In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 265--276, 2009.Google ScholarDigital Library
R. O. Nambiar and M. Poess. Keeping the TPC relevant! PVLDB, 6(11):1186--1187, 2013.Google Scholar
R. O. Nambiar and M. Poess. The making of TPC-DS. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB, pages 1049--1058, 2006.Google ScholarDigital Library
R. O. Nambiar, M. Poess, A. Dey, P. Cao, T. Magdon-Ismail, D. Q. Ren, and A. Bond. Introducing TPCx-HS: the first industry standard for benchmarking big data systems. In Performance Characterization and Benchmarking. Traditional to Big Data - 6th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 1--12, 2014.Google Scholar
T. Neumann. Engineering high-performance database engines. PVLDB, 7(13):1734--1741, 2014.Google ScholarDigital Library
T. Neumann and M. J. Freitag. Umbra: A disk-based system with in-memory performance. In 10th Conference on Innovative Data Systems Research, Online Proceedings, CIDR, 2020.Google Scholar
T. Neumann and A. Kemper. Unnesting arbitrary queries. In Datenbanksysteme fur Business, Technologie und Web", BTW, pages 383--402, 2015.Google Scholar
T. Neumann and B. Radke. Adaptive optimization of very large join queries. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD, pages 677--692, 2018.Google ScholarDigital Library
A. Nica, R. Sherkat, M. Andrei, X. Chen, M. Heidel, C. Bensberg, and H. Gerwens. Statisticum: data statistics management in SAP HANA. PVLDB, 10(12):1658--1669, 2017.Google ScholarDigital Library
K. Ono and G. M. Lohman. Measuring the complexity of join enumeration in query optimization. In Proceedings of the 16th International Conference on Very Large Data Bases, VLDB, pages 314--325, 1990.Google ScholarDigital Library
L. Orr, S. Kandula, and S. Chaudhuri. Pushing datainduced predicates through joins in big-data clusters. PVLDB, 13(3):252--265, 2019.Google Scholar
M. Poess, R. O. Nambiar, and D. Walrath. Why you should run TPC-DS: A workload analysis. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB, pages 1138--1149, 2007.Google ScholarDigital Library
M. Poess, T. Rabl, and H. Jacobsen. Analysis of TPCDS: the first standard benchmark for SQL-based big data systems. In Proceedings of the 2017 Symposium on Cloud Computing, SOCC, pages 573--585, 2017.Google ScholarDigital Library
M. Raasveldt, P. Holanda, T. Gubner, and H. Muhleisen. Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing. In 7th International Workshop on Testing Database Systems, DBTest, 2:1--2:6, 2018.Google Scholar
M. Raasveldt and H. Muhleisen. DuckDB: an embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD, pages 1981--1984, 2019.Google ScholarDigital Library
T. Rabl, M. Poess, H. Jacobsen, P. E. O'Neil, and E. J. O'Neil. Variations of the Star Schema Benchmark to Test the Effects of Data Skew on Query Performance. In ACM/SPEC International Conference on Performance Engineering, ACPE, pages 361--372, 2013.Google ScholarDigital Library
K. A. Ross. Conjunctive selection conditions in main memory. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 109--120, 2002.Google ScholarDigital Library
R. Schlosser, J. Kossmann, and M. Boissier. Efficient scalable multi-attribute index selection using recursive strategies. In Proceedings of the 35th International Conference on Data Engineering, ICDE, pages 1238--1249, 2019.Google ScholarCross Ref
D. Schwalb, M. Faust, J. Wust, M. Grund, and H. Plattner. Efficient transaction processing for Hyrise in mixed workload environments. In Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics, IMDM, pages 16--29, 2014.Google Scholar
M. Shao, A. Ailamaki, and B. Falsafi. DBmbench: fast and accurate database workload representation on modern microarchitecture. In Proceedings of the 2005 Conference of the Centre for Advanced Studies on Collaborative Research, pages 254--267, 2005.Google Scholar
E. Simon. Predicate migration: optimizing queries with expensive predicates. ACM SIGMOD Digital Review, 2, 2000.Google Scholar
K. Stocker, D. Kossmann, R. Braumandi, and A. Kemper. Integrating Semi-Join-Reducers into State-of-the-Art Query Processors. In Proceedings of the 17th International Conference on Data Engineering, ICDE, pages 575--584, 2001.Google ScholarDigital Library
Transaction Processing Performance Council. TPC Benchmark H (Decision Support) - Standard Specification. 1993.Google Scholar
A. Vogelsgesang, M. Haubenschild, J. Finis, A. Kemper, V. Leis, T. Muehlbauer, T. Neumann, and M. Then. Get real: how benchmarks fail to represent the real world. In Proceedings of the Workshop on Testing Database Systems, DBTest'18, 1:1--1:6, 2018.Google ScholarDigital Library
Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo. An empirical evaluation of in-memory multi-version concurrency control. PVLDB, 10(7):781--792, 2017.Google ScholarDigital Library
M. Ziauddin, A. Witkowski, Y. J. Kim, J. Lahorani, D. Potapov, and M. Krishna. Dimensions based data clustering and zone maps. PVLDB, 10(12):1622--1633, 2017.Google ScholarDigital Library

Index Terms

Quantifying TPC-H choke points and their optimizations
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Performance Evaluation of TPC-H Queries on MySQL Cluster
WAINA '10: Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops

This paper evaluates the performance of TPC-H queries on MySQL cluster via our designed vParNDB middleware based on the storage engine called NDB. The middleware combines both inter-query and intra-query parallelism to provide better query performance ...
Read More
Converting TPC-H Query Templates to Use DSQGEN for Easy Extensibility
Performance Evaluation and Benchmarking

The ability to automatically generate queries that are not known a-priory is crucial for ad-hoc benchmarks. TPC-H solves this problem with a query generator, QGEN, which utilizes query templates to generate SQL queries. QGEN's architecture makes it ...
Read More
A PDGF implementation for TPC-H
TPCTC'11: Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization

With 182 benchmark results from 20 hardware vendors, TPC-H has established itself as the industry standard benchmark to measure performance of decision support systems. The release of TPC-H twelve years ago by the Transaction Processing Performance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 13, Issue 8
April 2020
172 pages
ISSN:2150-8097
Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 April 2020
Published in pvldb Volume 13, Issue 8
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 683
  Total Downloads
- Downloads (Last 12 months)159
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Quantifying TPC-H choke points and their optimizations

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Performance Evaluation of TPC-H Queries on MySQL Cluster

Converting TPC-H Query Templates to Use DSQGEN for Easy Extensibility

A PDGF implementation for TPC-H

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Quantifying TPC-H choke points and their optimizations

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Performance Evaluation of TPC-H Queries on MySQL Cluster

Converting TPC-H Query Templates to Use DSQGEN for Easy Extensibility

A PDGF implementation for TPC-H

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media