Skip to main content
Erschienen in: The VLDB Journal 5/2018

30.08.2017 | Special Issue Paper

Many-query join: efficient shared execution of relational joins on modern hardware

verfasst von: Darko Makreshanski, Georgios Giannikis, Gustavo Alonso, Donald Kossmann

Erschienen in: The VLDB Journal | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Database architectures typically process queries one at a time, executing concurrent queries in independent execution contexts. Often, such a design leads to unpredictable performance and poor scalability. One approach to circumvent the problem is to take advantage of sharing opportunities across concurrently running queries. In this paper, we propose many-query join (MQJoin), a novel method for sharing the execution of a join that can efficiently deal with hundreds of concurrent queries. This is achieved by minimizing redundant work and making efficient use of main-memory bandwidth and multi-core architectures. Compared to existing proposals, MQJoin is able to efficiently handle larger workloads regardless of the schema by exploiting more sharing opportunities. We also compared MQJoin to two commercial main-memory column-store databases. For a TPC-H-based workload, we show that MQJoin provides 2–5\(\times \) higher throughput with significantly more stable response times.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
Literatur
2.
Zurück zum Zitat Albutiu, M.-C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012) Albutiu, M.-C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. PVLDB 5(10), 1064–1075 (2012)
3.
Zurück zum Zitat Arumugam, S., Dobra, A., Jermaine, C.M., Pansare, N., Perez, L.: The DataPath system: a data-centric analytic processing engine for large data warehouses. Proc. SIGMOD 2010, 519–530 (2010) Arumugam, S., Dobra, A., Jermaine, C.M., Pansare, N., Perez, L.: The DataPath system: a data-centric analytic processing engine for large data warehouses. Proc. SIGMOD 2010, 519–530 (2010)
4.
Zurück zum Zitat Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. Proc. SIGMOD 2000, 261–272 (2000) Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. Proc. SIGMOD 2000, 261–272 (2000)
5.
Zurück zum Zitat Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort versus hash revisited. PVLDB 7(1), 85–96 (2013) Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort versus hash revisited. PVLDB 7(1), 85–96 (2013)
6.
Zurück zum Zitat Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. Proc. ICDE 2013, 362–373 (2013) Balkesen, C., Teubner, J., Alonso, G., Özsu, M.T.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. Proc. ICDE 2013, 362–373 (2013)
7.
Zurück zum Zitat Balkesen, C., Teubner, J., Alonso, G., Özsu, T.: Main-memory hash joins on modern processor architectures. IEEE Trans. Knowl. Data Eng. 27(7), 1754–1766 (2015)CrossRef Balkesen, C., Teubner, J., Alonso, G., Özsu, T.: Main-memory hash joins on modern processor architectures. IEEE Trans. Knowl. Data Eng. 27(7), 1754–1766 (2015)CrossRef
8.
Zurück zum Zitat Barber, R., Lohman, G., Pandis, I., Raman, V., Sidle, R., Attaluri, G., Chainani, N., Lightstone, S., Sharpe, D.: Memory-efficient hash joins. Proc. VLDB 8(4), 353–364 (2014)CrossRef Barber, R., Lohman, G., Pandis, I., Raman, V., Sidle, R., Attaluri, G., Chainani, N., Lightstone, S., Sharpe, D.: Memory-efficient hash joins. Proc. VLDB 8(4), 353–364 (2014)CrossRef
9.
Zurück zum Zitat Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. Proc. SIGMOD 2011, 37–48 (2011) Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. Proc. SIGMOD 2011, 37–48 (2011)
10.
Zurück zum Zitat Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. Proc. CIDR 2005, 225–237 (2005) Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. Proc. CIDR 2005, 225–237 (2005)
11.
Zurück zum Zitat Candea, G., Polyzotis, N., Vingralek, R.: A scalable, predictable join operator for highly concurrent data warehouses. PVLDB 2(1), 277–288 (2009) Candea, G., Polyzotis, N., Vingralek, R.: A scalable, predictable join operator for highly concurrent data warehouses. PVLDB 2(1), 277–288 (2009)
12.
Zurück zum Zitat Chen, C., Roussopoulos, N.: The implementation and performance evaluation of the ADMS query optimizer: integrating query result caching and matching. In: Proc EDBT, pp. 323–336 (1994) Chen, C., Roussopoulos, N.: The implementation and performance evaluation of the ADMS query optimizer: integrating query result caching and matching. In: Proc EDBT, pp. 323–336 (1994)
13.
Zurück zum Zitat Chen, S., Ailamaki, A., Gibbons, P. B., Mowry, T. C.: Improving hash join performance through prefetching. In: Proc. ICDE 2004, pp. 116– (2004) Chen, S., Ailamaki, A., Gibbons, P. B., Mowry, T. C.: Improving hash join performance through prefetching. In: Proc. ICDE 2004, pp. 116– (2004)
14.
Zurück zum Zitat Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 32(3), 17 (2007)CrossRef Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 32(3), 17 (2007)CrossRef
15.
Zurück zum Zitat Ebenstein, R., Kamat, N., Nandi, A.: FluxQuery: an execution framework for highly interactive query workloads. In: Proc SIGMOD, pp. 1333–1345. ACM, New York, NY, USA (2016) Ebenstein, R., Kamat, N., Nandi, A.: FluxQuery: an execution framework for highly interactive query workloads. In: Proc SIGMOD, pp. 1333–1345. ACM, New York, NY, USA (2016)
16.
Zurück zum Zitat Giannikis, G., Alonso, G., Kossmann, D.: SharedDB: killing one thousand queries with one stone. PVLDB 5(6), 526–537 (2012) Giannikis, G., Alonso, G., Kossmann, D.: SharedDB: killing one thousand queries with one stone. PVLDB 5(6), 526–537 (2012)
17.
Zurück zum Zitat Giannikis, G., Makreshanski, D., Alonso, G., Kossmann, D.: Shared workload optimization. PVLDB 7(6), 429–440 (2014) Giannikis, G., Makreshanski, D., Alonso, G., Kossmann, D.: Shared workload optimization. PVLDB 7(6), 429–440 (2014)
18.
Zurück zum Zitat Graefe, G.: Volcano&#151 an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)CrossRef Graefe, G.: Volcano&#151 an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994)CrossRef
19.
Zurück zum Zitat Harizopoulos, S., Ailamaki, A.: StagedDB: designing database servers for modern hardware. In: In IEEE Data, pp. 11–16 (2005) Harizopoulos, S., Ailamaki, A.: StagedDB: designing database servers for modern hardware. In: In IEEE Data, pp. 11–16 (2005)
20.
Zurück zum Zitat Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. Proc. SIGMOD 2005, 383–394 (2005) Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. Proc. SIGMOD 2005, 383–394 (2005)
21.
Zurück zum Zitat Ivanova, M. G., Kersten, M. L., Nes, N. J., Gonçalves, R. A.: An architecture for recycling intermediates in a column-store. In: Proc. SIGMOD, pp. 309–320. ACM, New York, NY, USA (2009) Ivanova, M. G., Kersten, M. L., Nes, N. J., Gonçalves, R. A.: An architecture for recycling intermediates in a column-store. In: Proc. SIGMOD, pp. 309–320. ACM, New York, NY, USA (2009)
22.
Zurück zum Zitat Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on intel xeon phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015) Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on intel xeon phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015)
23.
Zurück zum Zitat Johnson, R., Harizopoulos, S., Hardavellas, N., Sabirli, K., Pandis, I., Ailamaki, A., Mancheril, N.G., Falsafi, B.: To share or not to share? Proc. VLDB 2007, 351–362 (2007) Johnson, R., Harizopoulos, S., Hardavellas, N., Sabirli, K., Pandis, I., Ailamaki, A., Mancheril, N.G., Falsafi, B.: To share or not to share? Proc. VLDB 2007, 351–362 (2007)
24.
Zurück zum Zitat Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort versus hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009) Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di Blas, A., Dubey, P.: Sort versus hash revisited: fast join implementation on modern multi-core CPUs. PVLDB 2(2), 1378–1389 (2009)
25.
Zurück zum Zitat Krikellas, K., Inc, G., Viglas, S. D., Cintra, M.: Modeling multithreaded query execution on chip multiprocessors. In ADMS (2010) Krikellas, K., Inc, G., Viglas, S. D., Cintra, M.: Modeling multithreaded query execution on chip multiprocessors. In ADMS (2010)
26.
Zurück zum Zitat Lang, C.A., Bhattacharjee, B., Malkemus, T., Padmanabhan, S., Wong, K.: Increasing buffer-locality for multiple relational table scans through grouping and throttling. Proc. ICDE 2007, 1136–1145 (2007) Lang, C.A., Bhattacharjee, B., Malkemus, T., Padmanabhan, S., Wong, K.: Increasing buffer-locality for multiple relational table scans through grouping and throttling. Proc. ICDE 2007, 1136–1145 (2007)
27.
Zurück zum Zitat Lang, C. A., Bhattacharjee, B., Malkemus, T., Wong, K.: Increasing buffer-locality for multiple index based scans through intelligent placement and index scan speed control. In: Proc. VLDB, pp. 1298–1309 (2007) Lang, C. A., Bhattacharjee, B., Malkemus, T., Wong, K.: Increasing buffer-locality for multiple index based scans through intelligent placement and index scan speed control. In: Proc. VLDB, pp. 1298–1309 (2007)
28.
Zurück zum Zitat Lang, H., Mühlbauer, T., Funke, F., Boncz, P. A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 311–326. ACM, New York, NY, USA (2016) Lang, H., Mühlbauer, T., Funke, F., Boncz, P. A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD ’16, pp. 311–326. ACM, New York, NY, USA (2016)
29.
Zurück zum Zitat Larson, P.-A., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. Proc. VLDB 8(12), 1740–1751 (2015)CrossRef Larson, P.-A., Birka, A., Hanson, E.N., Huang, W., Nowakiewicz, M., Papadimos, V.: Real-time analytical processing with SQL server. Proc. VLDB 8(12), 1740–1751 (2015)CrossRef
30.
Zurück zum Zitat Liu, F., Blanas, S.: Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In Proc. SoCC, pp. 153–166. ACM, New York, NY, USA (2015) Liu, F., Blanas, S.: Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In Proc. SoCC, pp. 153–166. ACM, New York, NY, USA (2015)
31.
Zurück zum Zitat Makreshanski, D., Giceva, J., Barthels, C., Alonso, G.: BatchDB: efficient isolated execution of hybrid OLTP+OLAP workloads for interactive applications. In: Proc. SIGMOD, pp. 37–50. ACM, New York, NY, USA (2017) Makreshanski, D., Giceva, J., Barthels, C., Alonso, G.: BatchDB: efficient isolated execution of hybrid OLTP+OLAP workloads for interactive applications. In: Proc. SIGMOD, pp. 37–50. ACM, New York, NY, USA (2017)
32.
Zurück zum Zitat Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. 14(4), 709–730 (2002)CrossRef Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. 14(4), 709–730 (2002)CrossRef
33.
Zurück zum Zitat Manegold, S., Boncz, P., Kersten, M. L.: Generic database cost models for hierarchical memory systems. In: Proc VLDB, pp. 191–202. VLDB Endowment (2002) Manegold, S., Boncz, P., Kersten, M. L.: Generic database cost models for hierarchical memory systems. In: Proc VLDB, pp. 191–202. VLDB Endowment (2002)
34.
Zurück zum Zitat Manegold, S., Pellenkoft, A., Kersten, M. L.: A multi-query optimizer for Monet. In: Proc. BNCOD, pp. 36–50. Springer, London, UK (2000) Manegold, S., Pellenkoft, A., Kersten, M. L.: A multi-query optimizer for Monet. In: Proc. BNCOD, pp. 36–50. Springer, London, UK (2000)
35.
Zurück zum Zitat Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Proc. SIGMOD 2015, pp. 1123–1136. ACM, New York, NY, USA (2015) Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Proc. SIGMOD 2015, pp. 1123–1136. ACM, New York, NY, USA (2015)
36.
Zurück zum Zitat O’Neil, P., Graefe, G.: Multi-table joins through bitmapped join indices. SIGMOD Rec. 24(3), 8–11 (1995)CrossRef O’Neil, P., Graefe, G.: Multi-table joins through bitmapped join indices. SIGMOD Rec. 24(3), 8–11 (1995)CrossRef
38.
Zurück zum Zitat Psaroudakis, I., Athanassoulis, M., Ailamaki, A.: Sharing data and work across concurrent analytical queries. PVLDB 6(9), 637–648 (2013) Psaroudakis, I., Athanassoulis, M., Ailamaki, A.: Sharing data and work across concurrent analytical queries. PVLDB 6(9), 637–648 (2013)
39.
Zurück zum Zitat Qiao, L., Raman, V., Reiss, F., Haas, P.J., Lohman, G.M.: Main-memory scan sharing for multi-core CPUs. PVLDB 1(1), 610–621 (2008) Qiao, L., Raman, V., Reiss, F., Haas, P.J., Lohman, G.M.: Main-memory scan sharing for multi-core CPUs. PVLDB 1(1), 610–621 (2008)
40.
Zurück zum Zitat Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A., Zhang, L.: DB2 with BLU acceleration: so much more than just a column store. Proc. VLDB 6(11), 1080–1091 (2013)CrossRef Raman, V., Attaluri, G., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., Liu, S., Lohman, G.M., Malkemus, T., Mueller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A., Zhang, L.: DB2 with BLU acceleration: so much more than just a column store. Proc. VLDB 6(11), 1080–1091 (2013)CrossRef
41.
Zurück zum Zitat Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., Sidle, R.: Constant-time query processing. In: Proc. ICDE 2008, pp. 60–69 (2008) Raman, V., Swart, G., Qiao, L., Reiss, F., Dialani, V., Kossmann, D., Narang, I., Sidle, R.: Constant-time query processing. In: Proc. ICDE 2008, pp. 60–69 (2008)
42.
Zurück zum Zitat Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: Proc. SIGMOD, pp. 249–260. ACM, New York, NY, USA (2000) Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: Proc. SIGMOD, pp. 249–260. ACM, New York, NY, USA (2000)
43.
Zurück zum Zitat Răducanu, B., Boncz, P., Zukowski, M.: Micro adaptivity in vectorwise. In: Proc. SIGMOD, pp. 1231–1242. ACM, New York, NY, USA (2013) Răducanu, B., Boncz, P., Zukowski, M.: Micro adaptivity in vectorwise. In: Proc. SIGMOD, pp. 1231–1242. ACM, New York, NY, USA (2013)
44.
Zurück zum Zitat Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)CrossRef Sellis, T.K.: Multiple-query optimization. ACM Trans. Database Syst. 13(1), 23–52 (1988)CrossRef
45.
Zurück zum Zitat Shatdal, A., Kant, C., Naughton, J.F.: Cache conscious algorithms for relational query processing. Proc. VLDB 1994, 510–521 (1994) Shatdal, A., Kant, C., Naughton, J.F.: Cache conscious algorithms for relational query processing. Proc. VLDB 1994, 510–521 (1994)
46.
Zurück zum Zitat Sodani, A.: Knights landing (knl): 2nd generation intel(r) xeon phi processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–24 (Aug 2015) Sodani, A.: Knights landing (knl): 2nd generation intel(r) xeon phi processor. In: 2015 IEEE Hot Chips 27 Symposium (HCS), pp. 1–24 (Aug 2015)
47.
Zurück zum Zitat Unterbrunner, P., Giannikis, G., Alonso, G., Fauser, D., Kossmann, D.: Predictable performance for unpredictable workloads. PVLDB 2(1), 706–717 (2009)CrossRef Unterbrunner, P., Giannikis, G., Alonso, G., Fauser, D., Kossmann, D.: Predictable performance for unpredictable workloads. PVLDB 2(1), 706–717 (2009)CrossRef
48.
Zurück zum Zitat Valduriez, P.: Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987)CrossRef Valduriez, P.: Join indices. ACM Trans. Database Syst. 12(2), 218–246 (1987)CrossRef
49.
Zurück zum Zitat Zukowski, M., Héman, S., Nes, N., Boncz, P.: Cooperative scans: dynamic bandwidth sharing in a DBMS. Proc. VLDB 2007, 723–734 (2007) Zukowski, M., Héman, S., Nes, N., Boncz, P.: Cooperative scans: dynamic bandwidth sharing in a DBMS. Proc. VLDB 2007, 723–734 (2007)
50.
Zurück zum Zitat Zukowski, M., Nes, N., Boncz, P.: DSM versus NSM: CPU performance tradeoffs in block-oriented query processing. In: Proc. DaMoN 2008, pp. 47–54 (2008) Zukowski, M., Nes, N., Boncz, P.: DSM versus NSM: CPU performance tradeoffs in block-oriented query processing. In: Proc. DaMoN 2008, pp. 47–54 (2008)
51.
Zurück zum Zitat Zukowski, M., van de Wiel, M., Boncz, P.: Vectorwise: a vectorized analytical DBMS. Proc. ICDE 2012, 1349–1350 (2012) Zukowski, M., van de Wiel, M., Boncz, P.: Vectorwise: a vectorized analytical DBMS. Proc. ICDE 2012, 1349–1350 (2012)
Metadaten
Titel
Many-query join: efficient shared execution of relational joins on modern hardware
verfasst von
Darko Makreshanski
Georgios Giannikis
Gustavo Alonso
Donald Kossmann
Publikationsdatum
30.08.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 5/2018
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-017-0475-4

Weitere Artikel der Ausgabe 5/2018

The VLDB Journal 5/2018 Zur Ausgabe

Premium Partner