Skip to main content

2018 | OriginalPaper | Buchkapitel

JCC-H: Adding Join Crossing Correlations with Skew to TPC-H

verfasst von : Peter Boncz, Angelos-Christos Anatiotis, Steffen Kläbe

Erschienen in: Performance Evaluation and Benchmarking for the Analytics Era

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We introduce JCC-H, a drop-in replacement for the data and query generator of TPC-H, that introduces Join-Crossing-Correlations (JCC) and skew into its dataset and query workload. These correlations are carefully designed such that the filter predicates on table columns in the existing TPC-H queries now suddenly can have effects on the value-, frequency- and join-fan-out-distributions, experienced by operators in the query plan. The query generator of JCC-H is able to generate parameter bindings for the 22 query templates in two different equivalence classes: query templates that receive “normal” parameters do not experience skew and behave very similar to default TPC-H queries. Query templates expanded with the “skewed” parameters, though, experience strong join-crossing-correlations and skew in filter, aggregation and join operations. In this paper we discuss the goals of JCC-H, its detailed design, as well as show initial experiments on both a single-server and MPP database system, that confirm that our design goals were largely met. In all, JCC-H provides a convenient way for any system that is already testing with TPC-H to examine how the system can handle skew and correlations, so we hope the community can use it to make progress on issues like skew mitigation and detection and exploitation of join-crossing-correlations in query optimizers and data storage.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The join fan-out distribution is the distribution of amount of join partners for values in a primary key (PK) column, towards a particular foreign key (FK) column.
 
2
As in [6] the two variants stem from exactly the same query template: the only thing that makes them different are the parameters that get pasted into the template.
 
3
In this paper, we abbreviate the foreign key joins of TPC-H (and JCC-H) using the first letters of the table name (ps for partsupp to distinguish it from p for part). For example, with l-o we mean the join between linetem and orders.
 
4
A huge order indeed, and realism is not our primary target. However, if one orders all parts of an entire airplane, or aircraft carrier, it might still be realistic ;-).
 
5
Because there are seven years (1992–1998) and 5 populous orders, there are two years without populous order and these are 1995 and 1996.
 
6
We used the same DDL as in the VectorH SIGMOD paper: https://​github.​com/​ActianCorp/​VectorH-sigmod2016.
 
7
The code for JCC-H can be downloaded from: http://​github.​com/​ldbc/​dbgen.​JCC-H.
 
Literatur
3.
Zurück zum Zitat Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., Boncz, P.: The LDBC social network benchmark interactive workload. In: SIGMOD (2015) Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham, M.-D., Boncz, P.: The LDBC social network benchmark interactive workload. In: SIGMOD (2015)
4.
Zurück zum Zitat Frank, M., Poess, M., Rabl, T.: Efficient update data generation for DBMS benchmarks. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, pp. 169–180 (2012) Frank, M., Poess, M., Rabl, T.: Efficient update data generation for DBMS benchmarks. In: Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, pp. 169–180 (2012)
5.
Zurück zum Zitat Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: towards an industry standard benchmark for big data analytics. In: SIGMOD (2013) Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.-A.: BigBench: towards an industry standard benchmark for big data analytics. In: SIGMOD (2013)
6.
Zurück zum Zitat Gubichev, A., Boncz, P.: Parameter curation for benchmark queries. In: TPCTC, pp. 113–129 (2014) Gubichev, A., Boncz, P.: Parameter curation for benchmark queries. In: TPCTC, pp. 113–129 (2014)
7.
Zurück zum Zitat Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endowment 9(3), 204–215 (2015)CrossRef Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endowment 9(3), 204–215 (2015)CrossRef
9.
Zurück zum Zitat Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: VLDB (2007) Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: VLDB (2007)
Metadaten
Titel
JCC-H: Adding Join Crossing Correlations with Skew to TPC-H
verfasst von
Peter Boncz
Angelos-Christos Anatiotis
Steffen Kläbe
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-72401-0_8

Neuer Inhalt