skip to main content
research-article

Quantifying TPC-H choke points and their optimizations

Authors Info & Claims
Published:01 April 2020Publication History
Skip Abstract Section

Abstract

TPC-H continues to be the most widely used benchmark for relational OLAP systems. It poses a number of challenges, also known as "choke points", which database systems have to solve in order to achieve good benchmark results. Examples include joins across multiple tables, correlated subqueries, and correlations within the TPC-H data set. Knowing the impact of such optimizations helps in developing optimizers as well as in interpreting TPC-H results across database systems.

This paper provides a systematic analysis of choke points and their optimizations. It complements previous work on TPC-H choke points by providing a quantitative discussion of their relevance. It focuses on eleven choke points where the optimizations are beneficial independently of the database system. Of these, the flattening of subqueries and the placement of predicates have the biggest impact. Three queries (Q2, Q17, and Q21) are strongly influenced by the choice of an efficient query plan; three others (Q1, Q13, and Q18) are less influenced by plan optimizations and more dependent on an efficient execution engine.

References

  1. D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In Proceedings of the 23rd International Conference on Data Engineering, ICDE, pages 466--475, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  2. American National Standards Institute. American National Standard for Information Systems, Database Language | SQL: ANSI X3.135-1992. 1992.Google ScholarGoogle Scholar
  3. American National Standards Institute. American National Standard for Information Systems, Database Language | SQL: ANSI X3.135-1999. 1999.Google ScholarGoogle Scholar
  4. S. Bellamkonda, R. Ahmed, A. Witkowski, A. Amor, M. Zaït, and C. C. Lin. Enhanced subquery optimizations in Oracle. PVLDB, 2(2):1366--1377, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. A. Bernstein and D. W. Chiu. Using semi-joins to solve relational queries. Journal of the ACM, 28(1):25--40, 1981.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Boissier and M. Jendruk. Workload-driven and robust selection of compression schemes for column stores. In Proceedings of the 22nd International Conference on Extending Database Technology, EDBT, pages 674--677, 2019.Google ScholarGoogle Scholar
  7. M. Boissier, C. A. Meyer, T. Djurken, J. Lindemann, K. Mao, P. Reinhardt, T. Specht, T. Zimmermann, and M. U acker. Analyzing data relevance and access patterns of live production database systems. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management, CIKM, pages 2473--2475, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. A. Boncz, A. Anatiotis, and S. Klabe. JCC-H: adding join crossing correlations with skew to TPCH. In Performance Evaluation and Benchmarking for the Analytics Era - 9th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 103--119, 2017.Google ScholarGoogle Scholar
  9. P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51(12):77--85, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: memory access. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB, pages 54--65, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Boncz, T. Neumann, and O. Erling. TPC-H analyzed: hidden messages and lessons learned from an in uential benchmark. In Performance Characterization and Benchmarking - 5th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 61--76, 2014.Google ScholarGoogle Scholar
  12. J. Chen and J. Revels. Robust benchmarking in noisy environments. CoRR, abs/1608.04295, 2016. arXiv: 1608.04295.Google ScholarGoogle Scholar
  13. S. Chu, C. Wang, K. Weitz, and A. Cheung. Cosette: an automated prover for SQL. In 8th Biennial Conference on Innovative Data Systems Research, Online Proceedings, CIDR, 2017.Google ScholarGoogle Scholar
  14. S. Chu, K.Weitz, A. Cheung, and D. Suciu. HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pages 510--524, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. L. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. A. Kuno, R. O. Nambiar, T. Neumann, M. Poess, K. Sattler, M. Seibold, E. Simon, and F.Waas. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, DBTest, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Curtsinger and E. D. Berger. STABILIZER: Statistically Sound Performance Evaluation. In Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 219--228, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Dreseler, J. Kossmann, M. Boissier, S. Klauck, M. U acker, and H. Plattner. Hyrise re-engineered: an extensible database system for research in relational in-memory data management. In Proceedings of the 22nd International Conference on Extending Database Technology, EDBT, pages 313--324, 2019.Google ScholarGoogle Scholar
  18. M. Dreseler, J. Kossmann, J. Frohnhofen, M. U acker, and H. Plattner. Fused table scans: combining AVX-512 and JIT to double the performance of multipredicate scans. In 34th IEEE International Conference on Data Engineering Workshops, ICDE Workshops, pages 102--109, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  19. G. M. Essertel, R. Y. Tahboub, J. M. Decker, K. J. Brown, K. Olukotun, and T. Rompf. Flare: optimizing Apache Spark with native compilation for scale-up architectures and medium-size data. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI, pages 799--815, 2018.Google ScholarGoogle Scholar
  20. L. Fegaras. A new heuristic for optimizing large queries. In Proceedings of the 9th International Conference on Database and Expert Systems Applications, DEXA, pages 726--735, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Floratou, F. Ozcan, and B. Schiefer. Benchmarking SQL-on-Hadoop systems: TPC or not TPC? In Big Data Benchmarking - 5th International Workshop. Revised Selected Papers, WBDB, pages 63--72, 2014.Google ScholarGoogle Scholar
  22. S. Halfpap and R. Schlosser. Workload-driven fragment allocation for partially replicated databases using linear programming. In Proceedings of the 35th International Conference on Data Engineering, ICDE, pages 1746--1749, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  23. D. Inkster, M. Zukowski, and P. A. Boncz. Integration of VectorWise with Ingres. SIGMOD Record, 40(3):45--53, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Johnson, N. Hardavellas, I. Pandis, N. Mancheril, S. Harizopoulos, K. Sabirli, A. Ailamaki, and B. Falsafi. To share or not to share? In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB, pages 351--362, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Kemper and T. Neumann. HyPer: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In Proceedings of the 27th International Conference on Data Engineering, ICDE, pages 195--206, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. R. Kepe, E. C. de Almeida, and M. A. Z. Alves. Database processing-in-memory: an experimental study. PVLDB, 13(3):334--347, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Kersten, V. Leis, A. Kemper, T. Neumann, A. Pavlo, and P. A. Boncz. Everything you always wanted to know about compiled and vectorized queries but were afraid to ask. PVLDB, 11(13):2209--2222, 2018.Google ScholarGoogle Scholar
  28. V. Leis, P. A. Boncz, A. Kemper, and T. Neumann. Morsel-Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many-Core Age. In International Conference on Management of Data, SIGMOD, pages 743--754, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? PVLDB, 9(3):204--215, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Li and J. M. Patel. WideTable: an accelerator for analytical data processing. PVLDB, 7(10):907--918, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB, pages 476--487, 1998.Google ScholarGoogle Scholar
  32. G. Moerkotte and T. Neumann. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB, pages 930--941, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney. ProducingWrong Data Without Doing Anything Obviously Wrong! In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 265--276, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. O. Nambiar and M. Poess. Keeping the TPC relevant! PVLDB, 6(11):1186--1187, 2013.Google ScholarGoogle Scholar
  35. R. O. Nambiar and M. Poess. The making of TPC-DS. In Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB, pages 1049--1058, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. O. Nambiar, M. Poess, A. Dey, P. Cao, T. Magdon-Ismail, D. Q. Ren, and A. Bond. Introducing TPCx-HS: the first industry standard for benchmarking big data systems. In Performance Characterization and Benchmarking. Traditional to Big Data - 6th TPC Technology Conference. Revised Selected Papers, TPCTC, pages 1--12, 2014.Google ScholarGoogle Scholar
  37. T. Neumann. Engineering high-performance database engines. PVLDB, 7(13):1734--1741, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Neumann and M. J. Freitag. Umbra: A disk-based system with in-memory performance. In 10th Conference on Innovative Data Systems Research, Online Proceedings, CIDR, 2020.Google ScholarGoogle Scholar
  39. T. Neumann and A. Kemper. Unnesting arbitrary queries. In Datenbanksysteme fur Business, Technologie und Web", BTW, pages 383--402, 2015.Google ScholarGoogle Scholar
  40. T. Neumann and B. Radke. Adaptive optimization of very large join queries. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD, pages 677--692, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Nica, R. Sherkat, M. Andrei, X. Chen, M. Heidel, C. Bensberg, and H. Gerwens. Statisticum: data statistics management in SAP HANA. PVLDB, 10(12):1658--1669, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. K. Ono and G. M. Lohman. Measuring the complexity of join enumeration in query optimization. In Proceedings of the 16th International Conference on Very Large Data Bases, VLDB, pages 314--325, 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Orr, S. Kandula, and S. Chaudhuri. Pushing datainduced predicates through joins in big-data clusters. PVLDB, 13(3):252--265, 2019.Google ScholarGoogle Scholar
  44. M. Poess, R. O. Nambiar, and D. Walrath. Why you should run TPC-DS: A workload analysis. In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB, pages 1138--1149, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Poess, T. Rabl, and H. Jacobsen. Analysis of TPCDS: the first standard benchmark for SQL-based big data systems. In Proceedings of the 2017 Symposium on Cloud Computing, SOCC, pages 573--585, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. M. Raasveldt, P. Holanda, T. Gubner, and H. Muhleisen. Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing. In 7th International Workshop on Testing Database Systems, DBTest, 2:1--2:6, 2018.Google ScholarGoogle Scholar
  47. M. Raasveldt and H. Muhleisen. DuckDB: an embeddable analytical database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD, pages 1981--1984, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. T. Rabl, M. Poess, H. Jacobsen, P. E. O'Neil, and E. J. O'Neil. Variations of the Star Schema Benchmark to Test the Effects of Data Skew on Query Performance. In ACM/SPEC International Conference on Performance Engineering, ACPE, pages 361--372, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. K. A. Ross. Conjunctive selection conditions in main memory. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 109--120, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. R. Schlosser, J. Kossmann, and M. Boissier. Efficient scalable multi-attribute index selection using recursive strategies. In Proceedings of the 35th International Conference on Data Engineering, ICDE, pages 1238--1249, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  51. D. Schwalb, M. Faust, J. Wust, M. Grund, and H. Plattner. Efficient transaction processing for Hyrise in mixed workload environments. In Proceedings of the 2nd International Workshop on In Memory Data Management and Analytics, IMDM, pages 16--29, 2014.Google ScholarGoogle Scholar
  52. M. Shao, A. Ailamaki, and B. Falsafi. DBmbench: fast and accurate database workload representation on modern microarchitecture. In Proceedings of the 2005 Conference of the Centre for Advanced Studies on Collaborative Research, pages 254--267, 2005.Google ScholarGoogle Scholar
  53. E. Simon. Predicate migration: optimizing queries with expensive predicates. ACM SIGMOD Digital Review, 2, 2000.Google ScholarGoogle Scholar
  54. K. Stocker, D. Kossmann, R. Braumandi, and A. Kemper. Integrating Semi-Join-Reducers into State-of-the-Art Query Processors. In Proceedings of the 17th International Conference on Data Engineering, ICDE, pages 575--584, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Transaction Processing Performance Council. TPC Benchmark H (Decision Support) - Standard Specification. 1993.Google ScholarGoogle Scholar
  56. A. Vogelsgesang, M. Haubenschild, J. Finis, A. Kemper, V. Leis, T. Muehlbauer, T. Neumann, and M. Then. Get real: how benchmarks fail to represent the real world. In Proceedings of the Workshop on Testing Database Systems, DBTest'18, 1:1--1:6, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo. An empirical evaluation of in-memory multi-version concurrency control. PVLDB, 10(7):781--792, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. M. Ziauddin, A. Witkowski, Y. J. Kim, J. Lahorani, D. Potapov, and M. Krishna. Dimensions based data clustering and zone maps. PVLDB, 10(12):1622--1633, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Quantifying TPC-H choke points and their optimizations
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 13, Issue 8
        April 2020
        172 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 April 2020
        Published in pvldb Volume 13, Issue 8

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader