Skip to main content
Erschienen in: The VLDB Journal 1/2020

15.11.2019 | Special Issue Paper

Adaptive partitioning and indexing for in situ query processing

verfasst von: Matthaios Olma, Manos Karpathiotakis, Ioannis Alagiannis, Manos Athanassoulis, Anastasia Ailamaki

Erschienen in: The VLDB Journal | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. To alleviate the loading cost, in situ query processing systems operate directly over raw data and offer instant access to data. At the same time, analytical workloads have increasing number of queries. Typically, each query focuses on a constantly shifting—yet small—range. As a result, minimizing the workload latency requires the benefits of indexing in in situ query processing. In this paper, we present an online partitioning and indexing scheme, along with a partitioning and indexing tuner tailored for in situ querying engines. The proposed system design improves query execution time by taking into account user query patterns, to (i) partition raw data files logically and (ii) build lightweight partition-specific indexes for each partition. We build an in situ query engine called Slalom to showcase the impact of our design. Slalom employs adaptive partitioning and builds non-obtrusive indexes in different partitions on-the-fly based on lightweight query access pattern monitoring. As a result of its lightweight nature, Slalom achieves efficient query processing over raw data with minimal memory consumption. Our experimentation with both microbenchmarks and real-life workloads shows that Slalom outperforms state-of-the-art in situ engines and achieves comparable query response times with fully indexed DBMS, offering lower cumulative query execution times for query workloads with increasing size and unpredictable access patterns.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Details on how this formula is derived are found in “Appendix.”
 
Literatur
1.
Zurück zum Zitat Abad, C.L., Roberts, N., Lu, Y., Campbell, R.H.: A storage-centric analysis of MapReduce workloads: file popularity, temporal locality and arrival patterns. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp. 100–109 (2012) Abad, C.L., Roberts, N., Lu, Y., Campbell, R.H.: A storage-centric analysis of MapReduce workloads: file popularity, temporal locality and arrival patterns. In: Proceedings of the IEEE International Symposium on Workload Characterization (IISWC), pp. 100–109 (2012)
2.
Zurück zum Zitat Abouzied, A., Abadi, D.J., Silberschatz, A.: Invisible loading: access-driven data transfer from raw files into database systems. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 1–10 (2013) Abouzied, A., Abadi, D.J., Silberschatz, A.: Invisible loading: access-driven data transfer from raw files into database systems. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 1–10 (2013)
3.
Zurück zum Zitat Agrawal, S., Narasayya, V.R., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 359–370 (2004) Agrawal, S., Narasayya, V.R., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 359–370 (2004)
4.
Zurück zum Zitat Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 169–180 (2001) Ailamaki, A., DeWitt, D.J., Hill, M.D., Skounakis, M.: Weaving relations for cache performance. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 169–180 (2001)
5.
Zurück zum Zitat Alagiannis, I., Borovica, R., Branco, M., Idreos, S., Ailamaki, A.: NoDB: efficient query execution on raw data files. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 241–252 (2012) Alagiannis, I., Borovica, R., Branco, M., Idreos, S., Ailamaki, A.: NoDB: efficient query execution on raw data files. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 241–252 (2012)
6.
Zurück zum Zitat Alamoudi, A.A., Grover, R., Carey, M.J., Borkar, V.R.: External data access and indexing in AsterixDB. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 3–12 (2015) Alamoudi, A.A., Grover, R., Carey, M.J., Borkar, V.R.: External data access and indexing in AsterixDB. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 3–12 (2015)
7.
Zurück zum Zitat Alexiou, K., Kossmann, D., Larson, P.-Å.: Adaptive range filters for cold data: avoiding trips to siberia. Proc. VLDB Endow. 6(14), 1714–1725 (2013)CrossRef Alexiou, K., Kossmann, D., Larson, P.-Å.: Adaptive range filters for cold data: avoiding trips to siberia. Proc. VLDB Endow. 6(14), 1714–1725 (2013)CrossRef
8.
Zurück zum Zitat Athanassoulis, M., Ailamaki, A.: BF-Tree: approximate tree indexing. Proc. VLDB Endow. 7(14), 1881–1892 (2014)CrossRef Athanassoulis, M., Ailamaki, A.: BF-Tree: approximate tree indexing. Proc. VLDB Endow. 7(14), 1881–1892 (2014)CrossRef
9.
Zurück zum Zitat Athanassoulis, M., Idreos, S.: Design tradeoffs of data access methods. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Tutorial (2016) Athanassoulis, M., Idreos, S.: Design tradeoffs of data access methods. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Tutorial (2016)
10.
Zurück zum Zitat Athanassoulis, M., Kester, M.S., Maas, L.M., Stoica, R., Idreos, S., Ailamaki, A., Callaghan, M.: Designing access methods: the RUM conjecture. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 461–466 (2016) Athanassoulis, M., Kester, M.S., Maas, L.M., Stoica, R., Idreos, S., Ailamaki, A., Callaghan, M.: Designing access methods: the RUM conjecture. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 461–466 (2016)
11.
Zurück zum Zitat Athanassoulis, M., Yan, Z., Idreos, S.: UpBit: scalable in-memory Updatable Bitmap indexing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2016) Athanassoulis, M., Yan, Z., Idreos, S.: UpBit: scalable in-memory Updatable Bitmap indexing. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2016)
12.
Zurück zum Zitat Blanas, S., Wu, K., Byna, S., Dong, B., Shoshani, A.: Parallel data analysis directly on scientific file formats. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 385–396 (2014) Blanas, S., Wu, K., Byna, S., Dong, B., Shoshani, A.: Parallel data analysis directly on scientific file formats. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 385–396 (2014)
13.
Zurück zum Zitat Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRef Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRef
15.
Zurück zum Zitat Bruno, N., Chaudhuri, S.: An online approach to physical design tuning. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 826–835 (2007) Bruno, N., Chaudhuri, S.: An online approach to physical design tuning. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 826–835 (2007)
16.
Zurück zum Zitat Chaudhuri, S., Narasayya, V.R.: An efficient cost-driven index selection tool for microsoft SQL server. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 146–155 (1997) Chaudhuri, S., Narasayya, V.R.: An efficient cost-driven index selection tool for microsoft SQL server. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 146–155 (1997)
17.
Zurück zum Zitat Chen, Y., Alspaugh, S., Katz, R.H.: Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)CrossRef Chen, Y., Alspaugh, S., Katz, R.H.: Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc. VLDB Endow. 5(12), 1802–1813 (2012)CrossRef
18.
Zurück zum Zitat Cheng, Y., Rusu, F.: Parallel in-situ data processing with speculative loading. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1287–1298 (2014) Cheng, Y., Rusu, F.: Parallel in-situ data processing with speculative loading. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1287–1298 (2014)
19.
Zurück zum Zitat Chou, J.C.-Y., Howison, M., Austin, B., Wu, K., Qiang, J., Bethel, E.W., Shoshani, A., Rübel, O., Prabhat, Ryne, R.D.: Parallel index and query for large scale data analysis. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 30:1–30:11 (2011) Chou, J.C.-Y., Howison, M., Austin, B., Wu, K., Qiang, J., Bethel, E.W., Shoshani, A., Rübel, O., Prabhat, Ryne, R.D.: Parallel index and query for large scale data analysis. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 30:1–30:11 (2011)
20.
Zurück zum Zitat Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4), 404–413 (1934)CrossRef Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4), 404–413 (1934)CrossRef
21.
Zurück zum Zitat DeWitt, D.J., Halverson, A., Nehme, R.V., Shankar, S., Aguilar-Saborit, J., Avanes, A., Flasza, M., Gramling, J.: Split query processing in polybase. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1255–1266 (2013) DeWitt, D.J., Halverson, A., Nehme, R.V., Shankar, S., Aguilar-Saborit, J., Avanes, A., Flasza, M., Gramling, J.: Split query processing in polybase. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1255–1266 (2013)
22.
Zurück zum Zitat Finkelstein, S.J., Schkolnick, M., Tiberio, P.: Physical database design for relational databases. ACM Trans. Database Syst. (TODS) 13(1), 91–128 (1988)CrossRef Finkelstein, S.J., Schkolnick, M., Tiberio, P.: Physical database design for relational databases. ACM Trans. Database Syst. (TODS) 13(1), 91–128 (1988)CrossRef
23.
Zurück zum Zitat Furtado, C., Lima, A.A.B., Pacitti, E., Valduriez, P., Mattoso, M.: Physical and virtual partitioning in OLAP database clusters. In: Proceedings of the Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 143–150 (2005) Furtado, C., Lima, A.A.B., Pacitti, E., Valduriez, P., Mattoso, M.: Physical and virtual partitioning in OLAP database clusters. In: Proceedings of the Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 143–150 (2005)
24.
Zurück zum Zitat Gankidi, V.R., Teletia, N., Patel, J.M., Halverson, A., DeWitt, D.J.: Indexing HDFS data in PDW: splitting the data from the index. Proc. VLDB Endow. 7(13), 1520–1528 (2014)CrossRef Gankidi, V.R., Teletia, N., Patel, J.M., Halverson, A., DeWitt, D.J.: Indexing HDFS data in PDW: splitting the data from the index. Proc. VLDB Endow. 7(13), 1520–1528 (2014)CrossRef
25.
Zurück zum Zitat Graefe, G., Kuno, H.: Self-selecting, self-tuning, incrementally optimized indexes. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 371–381 (2010) Graefe, G., Kuno, H.: Self-selecting, self-tuning, incrementally optimized indexes. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 371–381 (2010)
26.
Zurück zum Zitat Graefe, G., McKenna, W.J.: The volcano optimizer generator: extensibility and efficient search. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 209–218 (1993) Graefe, G., McKenna, W.J.: The volcano optimizer generator: extensibility and efficient search. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 209–218 (1993)
27.
Zurück zum Zitat Grund, M., Krüger, J., Plattner, H., Zeier, A., Cudre-Mauroux, P., Madden, S.: HYRISE: a main memory hybrid storage engine. Proc. VLDB Endow. 4(2), 105–116 (2010)CrossRef Grund, M., Krüger, J., Plattner, H., Zeier, A., Cudre-Mauroux, P., Madden, S.: HYRISE: a main memory hybrid storage engine. Proc. VLDB Endow. 4(2), 105–116 (2010)CrossRef
28.
Zurück zum Zitat Halim, F., Idreos, S., Karras, P., Yap, R.H.C.: Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores. Proc. VLDB Endow. 5(6), 502–513 (2012)CrossRef Halim, F., Idreos, S., Karras, P., Yap, R.H.C.: Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores. Proc. VLDB Endow. 5(6), 502–513 (2012)CrossRef
29.
Zurück zum Zitat Härder, T.: Selecting an optimal set of secondary indices. In: Proceedings of the European Cooperation in Informatics (ECI), pp. 146–160 (1976)CrossRef Härder, T.: Selecting an optimal set of secondary indices. In: Proceedings of the European Cooperation in Informatics (ECI), pp. 146–160 (1976)CrossRef
30.
Zurück zum Zitat Hu, G., Ma, J., Huang, B.: High throughput implementation of MD5 algorithm on GPU. In: Proceedings of the International Conference on Ubiquitous Information Technologies & Applications (ICUT), pp. 1–5 (2009) Hu, G., Ma, J., Huang, B.: High throughput implementation of MD5 algorithm on GPU. In: Proceedings of the International Conference on Ubiquitous Information Technologies & Applications (ICUT), pp. 1–5 (2009)
31.
Zurück zum Zitat Idreos, S., Alagiannis, I., Johnson, R., Ailamaki, A.: Here are my data files. Here are my queries. Where are my results? In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), pp. 57–68 (2011) Idreos, S., Alagiannis, I., Johnson, R., Ailamaki, A.: Here are my data files. Here are my queries. Where are my results? In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR), pp. 57–68 (2011)
32.
Zurück zum Zitat Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2007) Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2007)
33.
Zurück zum Zitat Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column-stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 297–308 (2009) Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column-stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 297–308 (2009)
34.
Zurück zum Zitat Idreos, S., Manegold, S., Kuno, H., Graefe, G.: Merging what’s cracked, cracking what’s merged: adaptive indexing in main-memory column-stores. Proc. VLDB Endow. 4(9), 586–597 (2011)CrossRef Idreos, S., Manegold, S., Kuno, H., Graefe, G.: Merging what’s cracked, cracking what’s merged: adaptive indexing in main-memory column-stores. Proc. VLDB Endow. 4(9), 586–597 (2011)CrossRef
35.
Zurück zum Zitat Idreos, S., Zoumpatianos, K., Athanassoulis, M., Dayan, N., Hentschel, B., Kester, M.S., Guo, D., Maas, L., Qin, W., Abdul, W., Sun, Y.: The periodic table of data structures. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 41(3), 64–75 (2018) Idreos, S., Zoumpatianos, K., Athanassoulis, M., Dayan, N., Hentschel, B., Kester, M.S., Guo, D., Maas, L., Qin, W., Abdul, W., Sun, Y.: The periodic table of data structures. Bull. IEEE Comput. Soc. Tech. Comm. Data Eng. 41(3), 64–75 (2018)
36.
Zurück zum Zitat Ivanova, M., Kersten, M.L., Manegold, S.: Data vaults: a symbiosis between database technology and scientific file repositories. In: Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM), pp. 485–494 (2012) Ivanova, M., Kersten, M.L., Manegold, S.: Data vaults: a symbiosis between database technology and scientific file repositories. In: Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM), pp. 485–494 (2012)
37.
Zurück zum Zitat Jindal, A., Dittrich, J.: Relax and let the database do the partitioning online. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 65–80 (2011) Jindal, A., Dittrich, J.: Relax and let the database do the partitioning online. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 65–80 (2011)
38.
Zurück zum Zitat Kargin, Y., Kersten, M.L., Manegold, S., Pirk, H.: The DBMS—your big data sommelier. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 1119–1130 (2015) Kargin, Y., Kersten, M.L., Manegold, S., Pirk, H.: The DBMS—your big data sommelier. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 1119–1130 (2015)
39.
Zurück zum Zitat Karlin, A.R., Manasse, M.S., McGeoch, L.A., Owicki, S.S.: Competitive randomized algorithms for non-uniform problems. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 301–309 (1990) Karlin, A.R., Manasse, M.S., McGeoch, L.A., Owicki, S.S.: Competitive randomized algorithms for non-uniform problems. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 301–309 (1990)
40.
Zurück zum Zitat Karpathiotakis, M., Alagiannis, I., Ailamaki, A.: Fast queries over heterogeneous data through engine customization. Proc. VLDB Endow. 9(12), 972–983 (2016)CrossRef Karpathiotakis, M., Alagiannis, I., Ailamaki, A.: Fast queries over heterogeneous data through engine customization. Proc. VLDB Endow. 9(12), 972–983 (2016)CrossRef
41.
Zurück zum Zitat Karpathiotakis, M., Alagiannis, I., Heinis, T., Branco, M., Ailamaki, A.: Just-in-time data virtualization: lightweight data management with ViDa. In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2015) Karpathiotakis, M., Alagiannis, I., Heinis, T., Branco, M., Ailamaki, A.: Just-in-time data virtualization: lightweight data management with ViDa. In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2015)
42.
Zurück zum Zitat Karpathiotakis, M., Branco, M., Alagiannis, I., Ailamaki, A.: Adaptive query processing on RAW data. Proc. VLDB Endow. 7(12), 1119–1130 (2014)CrossRef Karpathiotakis, M., Branco, M., Alagiannis, I., Ailamaki, A.: Adaptive query processing on RAW data. Proc. VLDB Endow. 7(12), 1119–1130 (2014)CrossRef
43.
Zurück zum Zitat Kerrisk, M.: The Linux programming interface: a Linux and UNIX system programming handbook. No Starch Press, San Francisco (2010) Kerrisk, M.: The Linux programming interface: a Linux and UNIX system programming handbook. No Starch Press, San Francisco (2010)
44.
Zurück zum Zitat Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: should I scan or should I probe? In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 715–730 (2017) Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: should I scan or should I probe? In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 715–730 (2017)
45.
Zurück zum Zitat Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T., Ching, C., Choi, A., Erickson, J., Grund, M., Hecht, D., Jacobs, M., Joshi, I., Kuff, L., Kumar, D., Leblang, A., Li, N., Pandis, I., Robinson, H., Rorke, D., Rus, S., Russell, J., Tsirogiannis, D., Wanderman-Milne, S., Yoder, M.: Impala: a modern, open-source SQL engine for Hadoop. In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2015) Kornacker, M., Behm, A., Bittorf, V., Bobrovytsky, T., Ching, C., Choi, A., Erickson, J., Grund, M., Hecht, D., Jacobs, M., Joshi, I., Kuff, L., Kumar, D., Leblang, A., Li, N., Pandis, I., Robinson, H., Rorke, D., Rus, S., Russell, J., Tsirogiannis, D., Wanderman-Milne, S., Yoder, M.: Impala: a modern, open-source SQL engine for Hadoop. In: Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR) (2015)
46.
Zurück zum Zitat Lightstone, S., Teorey, T.J., Nadeau, T.P.: Physical Database Design: The Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More. Morgan Kaufmann, Burlington (2007) Lightstone, S., Teorey, T.J., Nadeau, T.P.: Physical Database Design: The Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More. Morgan Kaufmann, Burlington (2007)
47.
Zurück zum Zitat López-Blázquez, F., Mino, B.S.: Binomial approximation to hypergeometric probabilities. J. Stat. Plan. Inference 87(1), 21–29 (2000) MathSciNetCrossRef López-Blázquez, F., Mino, B.S.: Binomial approximation to hypergeometric probabilities. J. Stat. Plan. Inference 87(1), 21–29 (2000) MathSciNetCrossRef
48.
Zurück zum Zitat McCrary, S.: Implementing algorithms to measure common statistics. VLDB J. 8, 1–17 (2015) McCrary, S.: Implementing algorithms to measure common statistics. VLDB J. 8, 1–17 (2015)
49.
Zurück zum Zitat Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: interactive analysis of web-scale datasets. Proc. VLDB Endow. 3(1), 330–339 (2010)CrossRef Melnik, S., Gubarev, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: interactive analysis of web-scale datasets. Proc. VLDB Endow. 3(1), 330–339 (2010)CrossRef
50.
Zurück zum Zitat Moerkotte, G.: Small materialized aggregates: a light weight index structure for data warehousing. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 476–487 (1998) Moerkotte, G.: Small materialized aggregates: a light weight index structure for data warehousing. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 476–487 (1998)
51.
Zurück zum Zitat Mühlbauer, T., Rödiger, W., Seilbeck, R., Reiser, A., Kemper, A., Neumann, T.: Instant loading for main memory databases. Proc. VLDB Endow. 6(14), 1702–1713 (2013)CrossRef Mühlbauer, T., Rödiger, W., Seilbeck, R., Reiser, A., Kemper, A., Neumann, T.: Instant loading for main memory databases. Proc. VLDB Endow. 6(14), 1702–1713 (2013)CrossRef
52.
Zurück zum Zitat O’Neil, P.E.: Model 204 architecture and performance. In: Proceedings of the International Workshop on High Performance Transaction Systems (HPTS), pp. 40–59 (1987) O’Neil, P.E.: Model 204 architecture and performance. In: Proceedings of the International Workshop on High Performance Transaction Systems (HPTS), pp. 40–59 (1987)
53.
Zurück zum Zitat Papadomanolakis, S., Ailamaki, A.: AutoPart: Automating schema design for large scientific databases using data partitioning. In: Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM), pp. 383 (2004) Papadomanolakis, S., Ailamaki, A.: AutoPart: Automating schema design for large scientific databases using data partitioning. In: Proceedings of the International Conference on Scientific and Statistical Database Management (SSDBM), pp. 383 (2004)
54.
Zurück zum Zitat Pearson, K.: Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philos. Trans. R. Soc. Lond. 186(Part I), 343–424 (1895) Pearson, K.: Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material. Philos. Trans. R. Soc. Lond. 186(Part I), 343–424 (1895)
56.
Zurück zum Zitat Petraki, E., Idreos, S., Manegold, S.: Holistic indexing in main-memory column-stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2015) Petraki, E., Idreos, S., Manegold, S.: Holistic indexing in main-memory column-stores. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2015)
57.
Zurück zum Zitat Richter, S., Quiané-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead static and adaptive indexing in Hadoop. VLDB J. 23(3), 469–494 (2013)CrossRef Richter, S., Quiané-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead static and adaptive indexing in Hadoop. VLDB J. 23(3), 469–494 (2013)CrossRef
58.
Zurück zum Zitat Rivest, R.L.: The MD5 message-digest algorithm. RFC 1321, 1–21 (1992) Rivest, R.L.: The MD5 message-digest algorithm. RFC 1321, 1–21 (1992)
59.
Zurück zum Zitat Schnaitter, K., Abiteboul, S., Milo, T., Polyzotis, N.: COLT: continuous on-line database tuning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 793–795 (2006) Schnaitter, K., Abiteboul, S., Milo, T., Polyzotis, N.: COLT: continuous on-line database tuning. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 793–795 (2006)
60.
Zurück zum Zitat Schuhknecht, F.M., Jindal, A., Dittrich, J.: The uncracked pieces in database cracking. Proc. VLDB Endow. 7(2), 97–108 (2013)CrossRef Schuhknecht, F.M., Jindal, A., Dittrich, J.: The uncracked pieces in database cracking. Proc. VLDB Endow. 7(2), 97–108 (2013)CrossRef
61.
Zurück zum Zitat Sidirourgos, L., Kersten, M.L.: Column imprints: a secondary index structure. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 893–904 (2013) Sidirourgos, L., Kersten, M.L.: Column imprints: a secondary index structure. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 893–904 (2013)
62.
Zurück zum Zitat Sinha, R.R., Mitra, S., Winslett, M.: Bitmap indexes for large scientific data sets: a case study. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS) (2006) Sinha, R.R., Mitra, S., Winslett, M.: Bitmap indexes for large scientific data sets: a case study. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS) (2006)
63.
Zurück zum Zitat Sun, L., Franklin, M.J., Krishnan, S., Xin, R.S.: Fine-grained partitioning for aggressive data skipping. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1115–1126 (2014) Sun, L., Franklin, M.J., Krishnan, S., Xin, R.S.: Fine-grained partitioning for aggressive data skipping. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1115–1126 (2014)
64.
Zurück zum Zitat Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive—a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)CrossRef Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive—a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)CrossRef
65.
Zurück zum Zitat Wang, X., Yu, H.: How to break MD5 and other hash functions. In: Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pp. 19–35 (2005) Wang, X., Yu, H.: How to break MD5 and other hash functions. In: Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), pp. 19–35 (2005)
66.
Zurück zum Zitat Wu, E., Madden, S.: Partitioning techniques for fine-grained indexing. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 1127–1138 (2011) Wu, E., Madden, S.: Partitioning techniques for fine-grained indexing. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 1127–1138 (2011)
67.
Zurück zum Zitat Wu, K., Ahern, S., Bethel, E.W., Chen, J., Childs, H., Cormier-Michel, E., Geddes, C., Gu, J., Hagen, H., Hamann, B., Koegler, W., Lauret, J., Meredith, J., Messmer, P., Otoo, E.J., Perevoztchikov, V., Poskanzer, A., Rübel, O., Shoshani, A., Sim, A., Stockinger, K., Weber, G., Zhang, W.-M.: FastBit: interactively searching massive data. J. Phys.: Conf. Ser. 180(1), 012053 (2009) Wu, K., Ahern, S., Bethel, E.W., Chen, J., Childs, H., Cormier-Michel, E., Geddes, C., Gu, J., Hagen, H., Hamann, B., Koegler, W., Lauret, J., Meredith, J., Messmer, P., Otoo, E.J., Perevoztchikov, V., Poskanzer, A., Rübel, O., Shoshani, A., Sim, A., Stockinger, K., Weber, G., Zhang, W.-M.: FastBit: interactively searching massive data. J. Phys.: Conf. Ser. 180(1), 012053 (2009)
68.
Zurück zum Zitat Zilio, D.C., Rao, J., Lightstone, S., Lohman, G.M., Storm, A., Garcia-Arellano, C., Fadden, S.: DB2 design advisor: integrated automatic physical database design. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1087–1097 (2004) Zilio, D.C., Rao, J., Lightstone, S., Lohman, G.M., Storm, A., Garcia-Arellano, C., Fadden, S.: DB2 design advisor: integrated automatic physical database design. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), pp. 1087–1097 (2004)
Metadaten
Titel
Adaptive partitioning and indexing for in situ query processing
verfasst von
Matthaios Olma
Manos Karpathiotakis
Ioannis Alagiannis
Manos Athanassoulis
Anastasia Ailamaki
Publikationsdatum
15.11.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 1/2020
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-019-00580-x

Weitere Artikel der Ausgabe 1/2020

The VLDB Journal 1/2020 Zur Ausgabe