Skip to main content
Erschienen in: The VLDB Journal 2-3/2020

19.11.2019 | Special Issue Paper

General dynamic Yannakakis: conjunctive queries with theta joins under updates

verfasst von: Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, Wolfgang Lehner

Erschienen in: The VLDB Journal | Ausgabe 2-3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The ability to efficiently analyze changing data is a key requirement of many real-time analytics applications. In prior work, we have proposed general dynamic Yannakakis (GDyn), a general framework for dynamically processing acyclic conjunctive queries with \(\theta \)-joins in the presence of data updates. Whereas traditional approaches face a trade-off between materialization of subresults (to avoid inefficient recomputation) and recomputation of subresults (to avoid the potentially large space overhead of materialization), GDyn is able to avoid this trade-off. It intelligently maintains a succinct data structure that supports efficient maintenance under updates and from which the full query result can quickly be enumerated. In this paper, we consolidate and extend the development of GDyn. First, we give full formal proof of GDyn ’s correctness and complexity. Second, we present a novel algorithm for computing GDyn query plans. Finally, we instantiate GDyn to the case where all \(\theta \)-joins are inequalities and present extended experimental comparison against state-of-the-art engines. Our approach performs consistently better than the competitor systems with multiple orders of magnitude improvements in both time and memory consumption.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that, in this framework, value modifications inside a tuple are modeled by deleting the tuple with the old value, and then reinserting the tuple, but now with the new value.
 
2
Note that such queries may also contain equijoins by sharing variables between atoms.
 
3
Strictly speaking, we described in section that R needs to be sorted lexicographically, first on \(\overline{x} \cap \overline{y}\) and then on x. The grouping + sorting of the enumeration index obtains the same result.
 
4
In the conference version of this paper [26], there was an incorrect claim: we stated that updates could be processed in time \(O(M\cdot \log (M))\) in the general case of multiple inequalities. We then found a bug in our proof and we currently do not know if this bound can be achieved.
 
5
Note that because we set \({\textit{out}}(\mathcal {I}) = \emptyset \) on the residual, new variables may become isolated and therefore more reductions steps may be possible on the normal form of \(\mathcal {I}\).
 
6
In the sense that batch updates are only supported by treating each update tuple in the batch individually.
 
7
Should \(X_2 {\setminus } X_1\) be empty, we don’t actually need to do anything on \(\mathcal {I}_1\): \(X_1 \cup X_2\) is already removed from it. A similar remark holds for \(\mathcal {I}_2\) when \(X_1 {\setminus } X_2\) is empty.
 
8
Note that, since \(e_1\) does not share variables with any predicate, the CSE operation also does not remove any predicates from \(\mathcal {H}_1\), similar to the ISO operation and hence yields \(\mathcal {I}_1\).
 
9
Note that all leafs have a parent since the root of \(T_1\) is an interior node labeled by \(\emptyset \).
 
Literatur
1.
Zurück zum Zitat Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)MATH Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)MATH
2.
Zurück zum Zitat Abo Khamis, M., Ngo, H.Q., Rudra, A.: FAQ: questions asked frequently. In: Proceedings of PODS, pp. 13–28 (2016) Abo Khamis, M., Ngo, H.Q., Rudra, A.: FAQ: questions asked frequently. In: Proceedings of PODS, pp. 13–28 (2016)
3.
Zurück zum Zitat Agrawal, J., Diao, Y., Gyllstrom, D., Immerman, N.: Efficient pattern matching over event streams. Proc. SIGMOD 2008, 147–160 (2008) Agrawal, J., Diao, Y., Gyllstrom, D., Immerman, N.: Efficient pattern matching over event streams. Proc. SIGMOD 2008, 147–160 (2008)
4.
Zurück zum Zitat Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K., Motwani, R., Srivastava, U., Widom, J.: STREAM: the stanford data stream management system. In: Data Stream Management—Processing High-Speed Data Streams, pp. 317–336 (2016) Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K., Motwani, R., Srivastava, U., Widom, J.: STREAM: the stanford data stream management system. In: Data Stream Management—Processing High-Speed Data Streams, pp. 317–336 (2016)
5.
Zurück zum Zitat Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998)CrossRef Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998)CrossRef
6.
Zurück zum Zitat Bagan, G., Durand, A., Grandjean, E.: On acyclic conjunctive queries and constant delay enumeration. In: Proceedings of CSL, pp. 208–222 (2007) Bagan, G., Durand, A., Grandjean, E.: On acyclic conjunctive queries and constant delay enumeration. In: Proceedings of CSL, pp. 208–222 (2007)
7.
Zurück zum Zitat Bakibayev, N., Kočiský, T., Olteanu, D., Závodný, J.: Aggregation and ordering in factorised databases. Proc. VLDB 6(14), 1990–2001 (2013)CrossRef Bakibayev, N., Kočiský, T., Olteanu, D., Závodný, J.: Aggregation and ordering in factorised databases. Proc. VLDB 6(14), 1990–2001 (2013)CrossRef
8.
Zurück zum Zitat Berkholz, C., Keppeler, J., Schweikardt, N.: Answering conjunctive queries under updates. In: Proceedings of PODS, pp. 303–318 (2017) Berkholz, C., Keppeler, J., Schweikardt, N.: Answering conjunctive queries under updates. In: Proceedings of PODS, pp. 303–318 (2017)
9.
Zurück zum Zitat Bernstein, P.A., Goodman, N.: The power of inequality semijoins. Inf. Syst. 6(4), 255–265 (1981)CrossRef Bernstein, P.A., Goodman, N.: The power of inequality semijoins. Inf. Syst. 6(4), 255–265 (1981)CrossRef
10.
Zurück zum Zitat Brault-Baron, J.: De la pertinence de l’énumération: complexité en logiques. Ph.D. thesis, Université de Caen (2013) Brault-Baron, J.: De la pertinence de l’énumération: complexité en logiques. Ph.D. thesis, Université de Caen (2013)
11.
Zurück zum Zitat Brenna, L., Demers, A.J., Gehrke, J., Hong, M., Ossher, J., Panda, B., Riedewald, M., Thatte, M., White, W.M.: Cayuga: a high-performance event processing engine. Proc. SIGMOD 2007, 1100–1102 (2007) Brenna, L., Demers, A.J., Gehrke, J., Hong, M., Ossher, J., Panda, B., Riedewald, M., Thatte, M., White, W.M.: Cayuga: a high-performance event processing engine. Proc. SIGMOD 2007, 1100–1102 (2007)
12.
Zurück zum Zitat Chirkova, R., Yang, J.: Materialized views. Found. Trends Databases 4(4), 295–405 (2012)CrossRef Chirkova, R., Yang, J.: Materialized views. Found. Trends Databases 4(4), 295–405 (2012)CrossRef
13.
Zurück zum Zitat Cormen, T.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)MATH Cormen, T.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)MATH
14.
Zurück zum Zitat Cugola, G., Margara, A.: TESLA: a formally defined event specification language. Proc. DEBS 2010, 50–61 (2010)CrossRef Cugola, G., Margara, A.: TESLA: a formally defined event specification language. Proc. DEBS 2010, 50–61 (2010)CrossRef
15.
Zurück zum Zitat Cugola, G., Margara, A.: Complex event processing with T-REX. J. Syst. Softw. 85(8), 1709–1728 (2012)CrossRef Cugola, G., Margara, A.: Complex event processing with T-REX. J. Syst. Softw. 85(8), 1709–1728 (2012)CrossRef
16.
Zurück zum Zitat Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012)CrossRef Cugola, G., Margara, A.: Processing flows of information: from data stream to complex event processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012)CrossRef
17.
Zurück zum Zitat DeWitt, D.J., Naughton, J.F., Schneider, D.A.: An evaluation of non-equijoin algorithms. VLDB 1991, 443–452 (1991) DeWitt, D.J., Naughton, J.F., Schneider, D.A.: An evaluation of non-equijoin algorithms. VLDB 1991, 443–452 (1991)
18.
Zurück zum Zitat Enderle, J., Hampel, M., Seidl, T.: Joining interval data in relational databases. Proc SIGMOD 2004, 683–694 (2004) Enderle, J., Hampel, M., Seidl, T.: Joining interval data in relational databases. Proc SIGMOD 2004, 683–694 (2004)
20.
Zurück zum Zitat Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of VLDB, pp. 500–511 (2003)CrossRef Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: Proceedings of VLDB, pp. 500–511 (2003)CrossRef
21.
Zurück zum Zitat Gupata, A., Mumick, I.S. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1999) Gupata, A., Mumick, I.S. (eds.): Materialized Views: Techniques, Implementations, and Applications. MIT Press, Cambridge (1999)
22.
Zurück zum Zitat Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally. In: Proceedings of SIGMOD, pp. 157–166 (1993)CrossRef Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally. In: Proceedings of SIGMOD, pp. 157–166 (1993)CrossRef
23.
Zurück zum Zitat Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: VLDB’95, pp. 562–573 (1995) Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems. In: VLDB’95, pp. 562–573 (1995)
24.
Zurück zum Zitat Henzinger, M., Krinninger, S., Nanongkai, D., Saranurak, T.: Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In: Proceedings of STOC, pp. 21–30 (2015) Henzinger, M., Krinninger, S., Nanongkai, D., Saranurak, T.: Unifying and strengthening hardness for dynamic problems via the online matrix-vector multiplication conjecture. In: Proceedings of STOC, pp. 21–30 (2015)
25.
Zurück zum Zitat Idris, M., Ugarte, M., Vansummeren, S.: The dynamic Yannakakis algorithm: compact and efficient query processing under updates. In: Proceedings of SIGMOD 2017 (2017) Idris, M., Ugarte, M., Vansummeren, S.: The dynamic Yannakakis algorithm: compact and efficient query processing under updates. In: Proceedings of SIGMOD 2017 (2017)
26.
Zurück zum Zitat Idris, M., Ugarte, M., Vansummeren, S., Voigt, H., Lehner, W.: Conjunctive queries with inequalities under updates. PVLDB 11(7), 733–745 (2018) Idris, M., Ugarte, M., Vansummeren, S., Voigt, H., Lehner, W.: Conjunctive queries with inequalities under updates. PVLDB 11(7), 733–745 (2018)
27.
Zurück zum Zitat Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proceedings of ICDE, pp. 341–352 (2003) Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: Proceedings of ICDE, pp. 341–352 (2003)
28.
Zurück zum Zitat Khayyat, Z., Lucia, W., Singh, M., Ouzzani, M., Papotti, P., Quiané-Ruiz, J., Tang, N., Kalnis, P.: Fast and scalable inequality joins. VLDB J. 26(1), 125–150 (2017)CrossRef Khayyat, Z., Lucia, W., Singh, M., Ouzzani, M., Papotti, P., Quiané-Ruiz, J., Tang, N., Kalnis, P.: Fast and scalable inequality joins. VLDB J. 26(1), 125–150 (2017)CrossRef
29.
Zurück zum Zitat Koch, C.: Incremental query evaluation in a ring of databases. In: Proceedings of PODS, pp. 87–98 (2010) Koch, C.: Incremental query evaluation in a ring of databases. In: Proceedings of PODS, pp. 87–98 (2010)
30.
Zurück zum Zitat Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Lupei, D., Shaikhha, A.: Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J. 23, 253–278 (2014)CrossRef Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Lupei, D., Shaikhha, A.: Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. VLDB J. 23, 253–278 (2014)CrossRef
31.
Zurück zum Zitat Mei, Y., Madden, S.: Zstream: a cost-based query processor for adaptively detecting composite events. Proc. SIGMOD 2009, 193–206 (2009) Mei, Y., Madden, S.: Zstream: a cost-based query processor for adaptively detecting composite events. Proc. SIGMOD 2009, 193–206 (2009)
32.
Zurück zum Zitat Nikolic, M., Olteanu, D.: Incremental view maintenance with triple lock factorization benefits. Proc. SIGMOD 2018, 365–380 (2018) Nikolic, M., Olteanu, D.: Incremental view maintenance with triple lock factorization benefits. Proc. SIGMOD 2018, 365–380 (2018)
33.
Zurück zum Zitat Olteanu, D., Závodný, J.: Size bounds for factorised representations of query results. ACM TODS 40(1), 2:1–2:44 (2015)MathSciNetCrossRef Olteanu, D., Závodný, J.: Size bounds for factorised representations of query results. ACM TODS 40(1), 2:1–2:44 (2015)MathSciNetCrossRef
34.
Zurück zum Zitat Roy, P., Teubner, J., Gemulla, R.: Low-latency handshake join. PVLDB 7(9), 709–720 (2014) Roy, P., Teubner, J., Gemulla, R.: Low-latency handshake join. PVLDB 7(9), 709–720 (2014)
35.
Zurück zum Zitat Sahay, B., Ranjan, J.: Real time business intelligence in supply chain analytics. Inf. Manage. Comput. Secur. 16(1), 28–48 (2008)CrossRef Sahay, B., Ranjan, J.: Real time business intelligence in supply chain analytics. Inf. Manage. Comput. Secur. 16(1), 28–48 (2008)CrossRef
36.
Zurück zum Zitat Schleich, M., Olteanu, D., Ciucanu, R.: Learning linear regression models over factorized joins. In: Proceedings of SIGMOD, pp. 3–18 (2016) Schleich, M., Olteanu, D., Ciucanu, R.: Learning linear regression models over factorized joins. In: Proceedings of SIGMOD, pp. 3–18 (2016)
37.
Zurück zum Zitat Schultz-Møller, N.P., Migliavacca, M., Pietzuch, P.R.: Distributed complex event processing with query rewriting. In: Proceedings of DEBS 2009 (2009) Schultz-Møller, N.P., Migliavacca, M., Pietzuch, P.R.: Distributed complex event processing with query rewriting. In: Proceedings of DEBS 2009 (2009)
38.
Zurück zum Zitat Segoufin, L.: Constant delay enumeration for conjunctive queries. SIGMOD Rec. 44(1), 10–17 (2015)CrossRef Segoufin, L.: Constant delay enumeration for conjunctive queries. SIGMOD Rec. 44(1), 10–17 (2015)CrossRef
39.
Zurück zum Zitat Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. SIGMOD Rec. 4, 42–47 (2005)CrossRef Stonebraker, M., Çetintemel, U., Zdonik, S.: The 8 requirements of real-time stream processing. SIGMOD Rec. 4, 42–47 (2005)CrossRef
40.
Zurück zum Zitat Teubner, J., Müller, R.: How soccer players would do stream joins. In: Proceedings of SIGMOD, pp. 625–636 (2011) Teubner, J., Müller, R.: How soccer players would do stream joins. In: Proceedings of SIGMOD, pp. 625–636 (2011)
41.
Zurück zum Zitat Urhan, T., Franklin, M.J.: Xjoin: a reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23(2), 27–33 (2000) Urhan, T., Franklin, M.J.: Xjoin: a reactively-scheduled pipelined join operator. IEEE Data Eng. Bull. 23(2), 27–33 (2000)
42.
Zurück zum Zitat Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: Proceedings of STOC, pp. 137–146 (1982) Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: Proceedings of STOC, pp. 137–146 (1982)
43.
Zurück zum Zitat Viglas, S., Naughton, J. F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: Proceedings of VLDB, pp. 285–296 (2003)CrossRef Viglas, S., Naughton, J. F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: Proceedings of VLDB, pp. 285–296 (2003)CrossRef
44.
Zurück zum Zitat Wang, W., Gao, J., Zhang, M., Wang, S., Chen, G., Ng, T.K., Ooi, B.C., Shao, J., Reyad, M.: Rafiki: machine learning as an analytics service system. PVLDB 12(2), 128–140 (2018) Wang, W., Gao, J., Zhang, M., Wang, S., Chen, G., Ng, T.K., Ooi, B.C., Shao, J., Reyad, M.: Rafiki: machine learning as an analytics service system. PVLDB 12(2), 128–140 (2018)
45.
Zurück zum Zitat Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proceedings of the First International Conference on Parallel and Distributed Information Systems (PDIS 1991), pp. 68–77. IEEE Computer Society (1991) Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proceedings of the First International Conference on Parallel and Distributed Information Systems (PDIS 1991), pp. 68–77. IEEE Computer Society (1991)
46.
Zurück zum Zitat Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. Proc. SIGMOD 2006, 407–418 (2006) Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. Proc. SIGMOD 2006, 407–418 (2006)
47.
Zurück zum Zitat Yannakakis, M.: Algorithms for acyclic database schemes. In: Proceedings of VLDB, pp. 82–94 (1981) Yannakakis, M.: Algorithms for acyclic database schemes. In: Proceedings of VLDB, pp. 82–94 (1981)
48.
Zurück zum Zitat Yoshikawa, M., Kambayashi, Y.: Processing inequality queries based on generalized semi-joins. In: VLDB, pp. 416–428 (1984) Yoshikawa, M., Kambayashi, Y.: Processing inequality queries based on generalized semi-joins. In: VLDB, pp. 416–428 (1984)
49.
Zurück zum Zitat Zhang, H., Diao, Y., Immerman, N.: On complexity and optimization of expensive queries in complex event processing. In: Proceedings of SIGMOD (2014) Zhang, H., Diao, Y., Immerman, N.: On complexity and optimization of expensive queries in complex event processing. In: Proceedings of SIGMOD (2014)
Metadaten
Titel
General dynamic Yannakakis: conjunctive queries with theta joins under updates
verfasst von
Muhammad Idris
Martín Ugarte
Stijn Vansummeren
Hannes Voigt
Wolfgang Lehner
Publikationsdatum
19.11.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 2-3/2020
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-019-00590-9

Weitere Artikel der Ausgabe 2-3/2020

The VLDB Journal 2-3/2020 Zur Ausgabe

Guest Editorial

VLDB SI 2018 editorial