ABSTRACT
Many applications schedule queries before all data is ready. To return fast query results, database systems can eagerly process existing data and incrementally incorporate new data into prior intermediate results, which often relies on incremental view maintenance (IVM) techniques. However, incrementally maintaining a query result can increase the total amount of work mainly as some early work is not useful for computing the final query result. In this paper, we propose a new metric incrementability to quantify the cost-effectiveness of IVM to decide how eagerly or lazily databases should incrementally execute a query. We further observe that different parts of a query have different levels of incrementability and the query execution should have a decomposed control flow based on the difference. Therefore, to address these needs, we propose a new query processing method Incrementability-aware Query Processing (InQP). We build a prototype InQP system based on Spark and show that InQP significantly reduces resource consumption with a similar latency compared with incrementability-oblivious approaches.
Supplemental Material
- Apache kafka. https://kafka.apache.org/.Google Scholar
- Spark structured streaming. https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html.Google Scholar
- Y. Ahmad, O. Kennedy, C. Koch, and M. Nikolic. DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Proc. VLDB Endow., 5(10):968--979, 2012.Google ScholarDigital Library
- A. Ayad and J. F. Naughton. Static optimization of conjunctive queries with sliding windows over infinite streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13--18, 2004, pages 419--430, 2004.Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, and R. Motwani. Chain: Operator scheduling for memory minimization in data stream systems. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, California, USA, June 9--12, 2003, pages 253--264, 2003.Google ScholarDigital Library
- J. A. Blakeley, P. Larson, and F. W. Tompa. Efficiently updating materialized views. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 28--30, 1986, pages 61--71, 1986.Google ScholarDigital Library
- W. Cai, M. Balazinska, and D. Suciu. Pessimistic cardinality estimation: Tighter upper bounds for intermediate join cardinalities. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 18--35, 2019.Google ScholarDigital Library
- P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flink#8482;: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28--38, 2015.Google Scholar
- D. Carney, U. cC etintemel, A. Rasin, S. B. Zdonik, M. Cherniack, and M. Stonebraker. Operator scheduling in a data stream manager. In Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, Berlin, Germany, September 9--12, 2003, pages 838--849, 2003.Google ScholarCross Ref
- B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A high-performance incremental query processor for diverse analytics. Proc. VLDB Endow., 8(4):401--412, 2014.Google ScholarDigital Library
- S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. A. Shah. Telegraphcq: Continuous dataflow processing for an uncertain world. In CIDR 2003, First Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 5--8, 2003, Online Proceedings, 2003.Google Scholar
- S. Chaudhuri, V. R. Narasayya, and R. Ramamurthy. Estimating progress of long running SQL queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13--18, 2004, pages 803--814, 2004.Google Scholar
- J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. Niagaracq: A scalable continuous query system for internet databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16--18, 2000, Dallas, Texas, USA., pages 379--390, 2000.Google ScholarDigital Library
- R. Chirkova and J. Yang. Materialized views. Foundations and Trends in Databases, 4(4):295--405, 2012.Google ScholarCross Ref
- Y. Chung, M. L. Mortensen, C. Binnig, and T. Kraska. Estimating the impact of unknown unknowns on aggregate query results. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 861--876, 2016.Google ScholarDigital Library
- L. S. Colby, T. Griffin, L. Libkin, I. S. Mumick, and H. Trickey. Algorithms for deferred view maintenance. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4--6, 1996, pages 469--480, 1996.Google ScholarDigital Library
- L. S. Colby, A. Kawaguchi, D. F. Lieuwen, I. S. Mumick, and K. A. Ross. Supporting multiple view maintenance policies. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13--15, 1997, Tucson, Arizona, USA., pages 405--416, 1997.Google ScholarDigital Library
- C. Estan and J. F. Naughton. End-biased samples for join cardinality estimation. In Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 3--8 April 2006, Atlanta, GA, USA, page 20, 2006.Google ScholarDigital Library
- L. Gao, M. Wang, X. S. Wang, and S. Padmanabhan. A learning-based approach to estimate statistics of operators in continuous queries: a case study. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, DMKD 2003, San Diego, California, USA, June 13, 2003, pages 66--72, 2003.Google ScholarDigital Library
- G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys (CSUR), 25(2):73--169, 1993.Google Scholar
- T. Griffin and B. Kumar. Algebraic change propagation for semijoin and outerjoin queries. SIGMOD Record, 27(3):22--27, 1998.Google ScholarDigital Library
- T. Griffin and L. Libkin. Incremental maintenance of views with duplicates. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, USA, May 22--25, 1995, pages 328--339, 1995.Google ScholarDigital Library
- A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Maintaining views incrementally. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 26--28, 1993, pages 157--166, 1993.Google ScholarDigital Library
- H. He, J. Xie, J. Yang, and H. Yu. Asymmetric batch incremental view maintenance. In K. Aberer, M. J. Franklin, and S. Nishio, editors, Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, 5--8 April 2005, Tokyo, Japan, pages 106--117. IEEE Computer Society, 2005.Google Scholar
- J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In J. Peckham, editor, SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13--15, 1997, Tucson, Arizona, USA, pages 171--182. ACM Press, 1997.Google Scholar
- M. Idris, M. Ugarte, and S. Vansummeren. The dynamic yannakakis algorithm: Compact and efficient query processing under updates. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 1259--1274, 2017.Google ScholarDigital Library
- S. Krishnan, Z. Yang, K. Goldberg, J. M. Hellerstein, and I. Stoica. Learning to optimize join queries with deep reinforcement learning. CoRR, abs/1808.03196, 2018.Google Scholar
- P. Larson and J. Zhou. Efficient maintenance of materialized outer-join views. In Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, April 15--20, 2007, pages 56--65, 2007.Google ScholarCross Ref
- V. Leis, A. Gubichev, A. Mirchev, P. A. Boncz, A. Kemper, and T. Neumann. How good are query optimizers, really? Proc. VLDB Endow., 9(3):204--215, 2015.Google ScholarDigital Library
- V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann. Cardinality estimation done right: Index-based join sampling. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings, 2017.Google Scholar
- L. F. Mackert and G. M. Lohman. R* optimizer validation and performance evaluation for local queries. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 28--30, 1986, pages 84--95, 1986.Google ScholarDigital Library
- T. Malik, R. C. Burns, and N. V. Chawla. A black-box approach to query cardinality estimation. In CIDR 2007, Third Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 7--10, 2007, Online Proceedings, pages 56--67, 2007.Google Scholar
- R. C. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A learned query optimizer. Proc. VLDB Endow., 12(11):1705--1718, 2019.Google ScholarDigital Library
- M. Nikolic, M. Dashti, and C. Koch. How to win a hot dog eating contest: Distributed incremental view maintenance with batch updates. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 511--526, 2016.Google ScholarDigital Library
- J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. An empirical analysis of deep learning for cardinality estimation. CoRR, abs/1905.06425, 2019.Google Scholar
- P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, Boston, Massachusetts, USA, May 30 - June 1., pages 23--34, 1979.Google ScholarDigital Library
- M. A. Sharaf, P. K. Chrysanthis, A. Labrinidis, and K. Pruhs. Algorithms and metrics for processing multiple heterogeneous continuous queries. ACM Trans. Database Syst., 33(1):5:1--5:44, 2008.Google ScholarDigital Library
- M. Stillger, G. M. Lohman, V. Markl, and M. Kandil. LEO - db2's learning optimizer. In VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11--14, 2001, Roma, Italy, pages 19--28, 2001.Google Scholar
- R. Taft, N. El-Sayed, M. Serafini, Y. Lu, A. Aboulnaga, M. Stonebraker, R. Mayerhofer, and F. Andrade. P-store: An elastic database system with predictive provisioning. In Proceedings of the 2018 International Conference on Management of Data, pages 205--219. ACM, 2018.Google ScholarDigital Library
- D. Tang, Z. Shang, A. J. Elmore, S. Krishnan, and M. J. Franklin. Intermittent query processing. Proc. VLDB Endow., 12(11):1427--1441, July 2019.Google ScholarDigital Library
- Y. Tao, M. L. Yiu, D. Papadias, M. Hadjieleftheriou, and N. Mamoulis. RPJ: producing fast join results on streams through rate-based optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14--16, 2005, pages 371--382, 2005.Google ScholarDigital Library
- I. Trummer. Exact cardinality query optimization with bounded execution cost. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 2--17, 2019.Google ScholarDigital Library
- S. Venkataraman, A. Panda, K. Ousterhout, M. Armbrust, A. Ghodsi, M. J. Franklin, B. Recht, and I. Stoica. Drizzle: Fast and adaptable stream processing at scale. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28--31, 2017, pages 374--389, 2017.Google ScholarDigital Library
- S. Viglas and J. F. Naughton. Rate-based query optimization for streaming information sources. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, USA, June 3--6, 2002, pages 37--48, 2002.Google ScholarDigital Library
- S. Viglas, J. F. Naughton, and J. Burger. Maximizing the output rate of multi-way join queries over streaming information sources. In Proceedings of 29th International Conference on Very Large Data Bases, VLDB 2003, Berlin, Germany, September 9--12, 2003, pages 285--296, 2003.Google ScholarCross Ref
- A. Wilschut and P. Apers. Pipelining in query execution. In Proceedings of the International Conference on Databases, Parallel Architectures and Their Applications (PARBASE 1990), pages 562--562, United States, 3 1990. IEEE Computer Society.Google ScholarCross Ref
- K. Zeng, S. Agarwal, and I. Stoica. iOLAP: Managing uncertainty for efficient incremental OLAP. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 1347--1361, 2016.Google ScholarDigital Library
- J. Zhou, P. Larson, and H. G. Elmongui. Lazy maintenance of materialized views. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23--27, 2007, pages 231--242, 2007.Google ScholarDigital Library
Index Terms
- Thrifty Query Execution via Incrementability
Recommendations
Resource-efficient Shared Query Execution via Exploiting Time Slackness
SIGMOD '21: Proceedings of the 2021 International Conference on Management of DataShared query execution can reduce resource consumption by sharing common sub-expressions across concurrent queries. We show that this is not always the case when regularly querying a dataset under change. Depending on latency goals, how eagerly to ...
Maintaining Acyclic Foreign-Key Joins under Updates
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataA large number of analytical queries (e.g., all the 22 queries in the TPC-H benchmark) are based on acyclic foreign-key joins. In this paper, we study the problem of incrementally maintaining the query results of these joins under updates, i.e., ...
Federated Query Service Based on CORBA
TOOLS '99: Proceedings of the 31st International Conference on Technology of Object-Oriented Language and SystemsThe current information technology trends toward the distribution of data among numbers of autonomous and heterogeneous repositories. Object Query Service is specified by OMG to provide uniform interfaces for users to retrieve information based on ...
Comments