Abstract
TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems.
This paper provides a systematic analysis of choke points and their optimizations. It complements previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Three queries (Q2, Q17, and Q21) are strongly influenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine.
- D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In Proceedings of the 23rd International Conference on Data Engineering, ICDE, pages 466--475, 2007.Google ScholarCross Ref
- American National Standards Institute. American National Standard for Information Systems, Database Language | SQL: ANSI X3.135-1992. 1992.Google Scholar
- American National Standards Institute. American National Standard for Information Systems, Database Language | SQL: ANSI X3.135-1999. 1999.Google Scholar
- S. Bellamkonda, R. Ahmed, A. Witkowski, A. Amor, M. Zaït, and C. C. Lin. Enhanced subquery optimizations in Oracle. PVLDB, 2(2):1366--1377, 2009.Google ScholarDigital Library
- P. A. Bernstein and D. W. Chiu. Using semi-joins to solve relational queries. Journal of the ACM, 28(1):25--40, 1981.Google ScholarDigital Library
- M. Boissier and M. Jendruk. Workload-driven and robust selection of compression schemes for column stores. In Proceedings of the 22nd International Conference on Extending Database Technology, EDBT, pages 674--677, 2019.Google Scholar
- M. Boissier, C. A. Meyer, T. Djurken, J. Lindemann, K. Mao, P. Reinhardt, T. Specht, T. Zimmermann, and M. U acker. Analyzing data relevance and access patterns of live production database systems. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM, pages 2473--2475, 2016.Google ScholarDigital Library
- P. A. Boncz, A. Anatiotis, and S. Klabe. JCC-H: adding join crossing correlations with skew to TPCH. In Performance Evaluation and Benchmarking for the Analytics Era - 9th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 103--119, 2017.Google Scholar
- P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51(12):77--85, 2008.Google ScholarDigital Library
- P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: memory access. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB, pages 54--65, 1999.Google ScholarDigital Library
- P. Boncz, T. Neumann, and O. Erling. TPC-H analyzed: hidden messages and lessons learned from an in uential benchmark. In Performance Characterization and Benchmarking - 5th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 61--76, 2014.Google Scholar
- J. Chen and J. Revels. Robust benchmarking in noisy environments. CoRR, abs/1608.04295, 2016. arXiv: 1608.04295.Google Scholar
- S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: an automated prover for SQL. In 8th Biennial Conference on Innovative Data Systems Research, Online Proceedings, CIDR, 2017.Google Scholar
- S. Chu, K.Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pages 510--524, 2017.Google ScholarDigital Library
- R. L. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. A. Kuno, R. O. Nambiar, T. Neumann, M. Poess, K. Sattler, M. Seibold, E. Simon, and F.Waas. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, DBTest, 2011.Google ScholarDigital Library
- C. Curtsinger and E. D. Berger. STABILIZER: Statistically Sound Performance Evaluation. In Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 219--228, 2013.Google ScholarDigital Library
- M. Dreseler, J. Kossmann, M. Boissier, S. Klauck, M. U acker, and H. Plattner. Hyrise re-engineered: an extensible database system for research in relational in-memory data management. In Proceedings of the 22nd International Conference on Extending Database Technology, EDBT, pages 313--324, 2019.Google Scholar
- M. Dreseler, J. Kossmann, J. Frohnhofen, M. U acker, and H. Plattner. Fused table scans: combining AVX-512 and JIT to double the performance of multipredicate scans. In 34th IEEE International Conference on Data Engineering Workshops, ICDE Workshops, pages 102--109, 2018.Google ScholarCross Ref
- G. M. Essertel, R. Y. Tahboub, J. M. Decker, K. J. Brown, K. Olukotun, and T. Rompf. Flare: optimizing Apache Spark with native compilation for scale-up architectures and medium-size data. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI, pages 799--815, 2018.Google Scholar
- L. Fegaras. A new heuristic for optimizing large queries. In Proceedings of the 9th International Conference on Database and Expert Systems Applications, DEXA, pages 726--735, 1998.Google ScholarDigital Library
- A. Floratou, F. Ozcan, and B. Schiefer. Benchmarking SQL-on-Hadoop systems: TPC or not TPC? In Big Data Benchmarking - 5th International Workshop. Revised Selected Papers, WBDB, pages 63--72, 2014.Google Scholar
- S. Halfpap and R. Schlosser. Workload-driven fragment allocation for partially replicated databases using linear programming. In Proceedings of the 35th International Conference on Data Engineering, ICDE, pages 1746--1749, 2019.Google ScholarCross Ref
- D. Inkster, M. Zukowski, and P. A. Boncz. Integration of VectorWise with Ingres. SIGMOD Record, 40(3):45--53, 2011.Google ScholarDigital Library
- R. Johnson, N. Hardavellas, I. Pandis, N. Mancheril, S. Harizopoulos, K. Sabirli, A. Ailamaki, and B. Falsafi. To share or not to share? In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB, pages 351--362, 2007.Google ScholarDigital Library
- A. Kemper and T. Neumann. HyPer: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In Proceedings of the 27th International Conference on Data Engineering, ICDE, pages 195--206, 2011.Google ScholarDigital Library
- T. R. Kepe, E. C. de Almeida, and M. A. Z. Alves. Database processing-in-memory: an experimental study. PVLDB, 13(3):334--347, 2019.Google ScholarDigital Library
- T. Kersten, V. Leis, A. Kemper, T. Neumann, A. Pavlo, and P. A. Boncz. Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB, 11(13):2209--2222, 2018.Google Scholar
- V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age. In International Conference on Management of Data, SIGMOD, pages 743--754, 2014.Google ScholarDigital Library
- V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.Google ScholarDigital Library
- Y. Li and J. M. Patel. WideTable: an accelerator for analytical data processing. PVLDB, 7(10):907--918, 2014.Google ScholarDigital Library
- G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB, pages 476--487, 1998.Google Scholar
- G. Moerkotte and T. Neumann. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB, pages 930--941, 2006.Google ScholarDigital Library
- T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. ProducingWrong Data Without Doing Anything Obviously Wrong! In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 265--276, 2009.Google ScholarDigital Library
- R. O. Nambiar and M. Poess. Keeping the TPC relevant! PVLDB, 6(11):1186--1187, 2013.Google Scholar
- R. O. Nambiar and M. Poess. The making of TPC-DS. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB, pages 1049--1058, 2006.Google ScholarDigital Library
- R. O. Nambiar, M. Poess, A. Dey, P. Cao, T. Magdon-Ismail, D. Q. Ren, and A. Bond. Introducing TPCx-HS: the first industry standard for benchmarking big data systems. In Performance Characterization and Benchmarking. Traditional to Big Data - 6th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 1--12, 2014.Google Scholar
- T. Neumann. Engineering high-performance database engines. PVLDB, 7(13):1734--1741, 2014.Google ScholarDigital Library
- T. Neumann and M. J. Freitag. Umbra: A disk-based system with in-memory performance. In 10th Conference on Innovative Data Systems Research, Online Proceedings, CIDR, 2020.Google Scholar
- T. Neumann and A. Kemper. Unnesting arbitrary queries. In Datenbanksysteme fur Business, Technologie und Web", BTW, pages 383--402, 2015.Google Scholar
- T. Neumann and B. Radke. Adaptive optimization of very large join queries. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD, pages 677--692, 2018.Google ScholarDigital Library
- A. Nica, R. Sherkat, M. Andrei, X. Chen, M. Heidel, C. Bensberg, and H. Gerwens. Statisticum: data statistics management in SAP HANA. PVLDB, 10(12):1658--1669, 2017.Google ScholarDigital Library
- K. Ono and G. M. Lohman. Measuring the complexity of join enumeration in query optimization. In Proceedings of the 16th International Conference on Very Large Data Bases, VLDB, pages 314--325, 1990.Google ScholarDigital Library
- L. Orr, S. Kandula, and S. Chaudhuri. Pushing datainduced predicates through joins in big-data clusters. PVLDB, 13(3):252--265, 2019.Google Scholar
- M. Poess, R. O. Nambiar, and D. Walrath. Why you should run TPC-DS: A workload analysis. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB, pages 1138--1149, 2007.Google ScholarDigital Library
- M. Poess, T. Rabl, and H. Jacobsen. Analysis of TPCDS: the first standard benchmark for SQL-based big data systems. In Proceedings of the 2017 Symposium on Cloud Computing, SOCC, pages 573--585, 2017.Google ScholarDigital Library
- M. Raasveldt, P. Holanda, T. Gubner, and H. Muhleisen. Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing. In 7th International Workshop on Testing Database Systems, DBTest, 2:1--2:6, 2018.Google Scholar
- M. Raasveldt and H. Muhleisen. DuckDB: an embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD, pages 1981--1984, 2019.Google ScholarDigital Library
- T. Rabl, M. Poess, H. Jacobsen, P. E. O'Neil, and E. J. O'Neil. Variations of the Star Schema Benchmark to Test the Effects of Data Skew on Query Performance. In ACM/SPEC International Conference on Performance Engineering, ACPE, pages 361--372, 2013.Google ScholarDigital Library
- K. A. Ross. Conjunctive selection conditions in main memory. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 109--120, 2002.Google ScholarDigital Library
- R. Schlosser, J. Kossmann, and M. Boissier. Efficient scalable multi-attribute index selection using recursive strategies. In Proceedings of the 35th International Conference on Data Engineering, ICDE, pages 1238--1249, 2019.Google ScholarCross Ref
- D. Schwalb, M. Faust, J. Wust, M. Grund, and H. Plattner. Efficient transaction processing for Hyrise in mixed workload environments. In Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics, IMDM, pages 16--29, 2014.Google Scholar
- M. Shao, A. Ailamaki, and B. Falsafi. DBmbench: fast and accurate database workload representation on modern microarchitecture. In Proceedings of the 2005 Conference of the Centre for Advanced Studies on Collaborative Research, pages 254--267, 2005.Google Scholar
- E. Simon. Predicate migration: optimizing queries with expensive predicates. ACM SIGMOD Digital Review, 2, 2000.Google Scholar
- K. Stocker, D. Kossmann, R. Braumandi, and A. Kemper. Integrating Semi-Join-Reducers into State-of-the-Art Query Processors. In Proceedings of the 17th International Conference on Data Engineering, ICDE, pages 575--584, 2001.Google ScholarDigital Library
- Transaction Processing Performance Council. TPC Benchmark H (Decision Support) - Standard Specification. 1993.Google Scholar
- A. Vogelsgesang, M. Haubenschild, J. Finis, A. Kemper, V. Leis, T. Muehlbauer, T. Neumann, and M. Then. Get real: how benchmarks fail to represent the real world. In Proceedings of the Workshop on Testing Database Systems, DBTest'18, 1:1--1:6, 2018.Google ScholarDigital Library
- Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo. An empirical evaluation of in-memory multi-version concurrency control. PVLDB, 10(7):781--792, 2017.Google ScholarDigital Library
- M. Ziauddin, A. Witkowski, Y. J. Kim, J. Lahorani, D. Potapov, and M. Krishna. Dimensions based data clustering and zone maps. PVLDB, 10(12):1622--1633, 2017.Google ScholarDigital Library
Index Terms
- Quantifying TPC-H choke points and their optimizations
Recommendations
Performance Evaluation of TPC-H Queries on MySQL Cluster
WAINA '10: Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications WorkshopsThis paper evaluates the performance of TPC-H queries on MySQL cluster via our designed vParNDB middleware based on the storage engine called NDB. The middleware combines both inter-query and intra-query parallelism to provide better query performance ...
Converting TPC-H Query Templates to Use DSQGEN for Easy Extensibility
Performance Evaluation and BenchmarkingThe ability to automatically generate queries that are not known a-priory is crucial for ad-hoc benchmarks. TPC-H solves this problem with a query generator, QGEN, which utilizes query templates to generate SQL queries. QGEN's architecture makes it ...
A PDGF implementation for TPC-H
TPCTC'11: Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and CharacterizationWith 182 benchmark results from 20 hardware vendors, TPC-H has established itself as the industry standard benchmark to measure performance of decision support systems. The release of TPC-H twelve years ago by the Transaction Processing Performance ...
Comments