ABSTRACT
Data Stream Management Systems performing on-line analytics rely on the efficient execution of large numbers of Aggregate Continuous Queries (ACQs). The state-of-the-art WeaveShare optimizer uses the Weavability concept in order to selectively combine ACQs for partial aggregation and produce high quality execution plans. However, WeaveShare does not scale well with the number of ACQs. In this paper we propose a novel closed formula, F1, that accelerates Weavability calculations, and thus allows WeaveShare to achieve exceptional scalability in systems with heavy workloads. In general, F1 can reduce the computation time of any technique that combines partial aggregations within composite slides of multiple ACQs. We theoretically analyze the Bit Set approach currently used by WeaveShare and show that F1 is superior in both time and space complexities. We show that F1 performs 1062 times less operations compared to Bit Set to produce the same execution plan for the same input. We experimentally show that F1 executes up to 60,000 times faster and can handle 1,000,000 ACQs in a setting where the limit for the current technique is 550.
- D. J. Abadi et al. Aurora: a new model and architecture for data stream management. VLDBJ, 2003. Google ScholarDigital Library
- D. J. Abadi et al. The design of the borealis stream processing engine. In CIDR, 2005.Google Scholar
- T. Akidau et al. Millwheel: Fault-tolerant stream processing at internet scale. In VLDB, 2013. Google ScholarDigital Library
- C. Chung, S. Guirguis, and A. Kurdia. Competitive cost-savings in data stream management systems. In COCOON. 2014.Google Scholar
- T. Condie, N. Conway, P. Alvaro, J. Hellerstein, J. Gerth, J. Talbot, K. Elmeleegy, R. Sears. Online aggregation and continuous query support in mapreduce. In SIGMOD, 2010. Google ScholarDigital Library
- T. M. Ghanem, M. A. Hammad, M. F. Mokbel, W. G. Aref, and A. K. Elmagarmid. Incremental evaluation of sliding-window queries over data streams. TKDE, 2007. Google ScholarDigital Library
- S. Guirguis, M. Sharaf, P. K. Chrysanthis, and A. Labrinidis. Three-level processing of multiple aggregate continuous queries. In ICDE, 2012. Google ScholarDigital Library
- S. Guirguis, M. A. Sharaf, P. K. Chrysanthis, and A. Labrinidis. Optimized processing of multiple aggregate continuous queries. In CIKM, 2011. Google ScholarDigital Library
- V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, C. Soriente, and P. Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE TPDS, 2012. Google ScholarDigital Library
- N. R. Katsipoulakis, C. Thoma, E. A. Gratta, A. Labrinidis, A. J. Lee, and P. K. Chrysanthis. Confidential elastic processing of data streams. In SIGMOD, 2015. Google ScholarDigital Library
- S. Krishnamurthy, C. Wu, and M. Franklin. On-the-fly sharing for streamed aggregation. In SIGMOD, 2006. Google ScholarDigital Library
- J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD, 2005. Google ScholarDigital Library
- J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. Semantics and evaluation techniques for window aggregates in data streams. In SIGMOD, 2005. Google ScholarDigital Library
- R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, , and R. Varma. Query processing, approximation, and resource management in a data stream management system. In CIDR, 2003.Google Scholar
- K. Naidu, R. Rastogi, S. Satkin, and A. Srinivasan. Memory-constrained aggregate computation over data streams. In ICDE, 2011. Google ScholarDigital Library
- P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000. Google ScholarDigital Library
- A. U. Shein, P. K. Chrysanthis, and A. Labrinidis. Processing of aggregate continuous queries in a distributed environment. In BIRTE. 2015.Google Scholar
- K. Tangwongsan, M. Hirzel, S. Schneider, K-L. Wu. General incremental sliding-window aggregation. VLDB, 2015. Google ScholarDigital Library
- A. Toshniwal et al. Storm@twitter. In SIGMOD, 2014. Google ScholarDigital Library
- E. Weisstein. Binomial series. Wolfram MathWorld.Google Scholar
- E. Weisstein. Euclidean algorithm. Wolfram MathWorld.Google Scholar
- E. Weisstein. Harmonic number. Wolfram MathWorld.Google Scholar
- E. Weisstein. Prime number theorem. Wolfram MathWorld.Google Scholar
- R. Zhang, N. Koudas, B. C. Ooi, D. Srivastava. Multiple aggregations over data streams. In SIGMOD, 2005. Google ScholarDigital Library
- R. Zhang, N. Koudas, B. C. Ooi, D. Srivastava, and P. Zhou. Streaming multiple aggregations using phantoms. VLDBJ, 2010. Google ScholarDigital Library
Index Terms
- F1: Accelerating the Optimization of Aggregate Continuous Queries
Recommendations
Optimized processing of multiple aggregate continuous queries
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementData Streams Management Systems are designed to support monitoring applications, which require the processing of hundreds of Aggregate Continuous Queries (ACQs). These ACQs typically have different time granularities, with possibly different selection ...
Prefilter: predicate pushdown at streaming speeds
SSPS '08: Proceedings of the 2nd international workshop on Scalable stream processing systemThis paper presents the prefilter: a predicate pushdown framework for a Data Stream Management System (DSMS). Though early predicate evaluation is a well-known query optimization strategy, novel problems arise in a high-performance DSMS. In particular, (...
Scalable Distributed Stream Join Processing
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataEfficient and scalable stream joins play an important role in performing real-time analytics for many cloud applications. However, like in conventional database processing, online theta-joins over data streams are computationally expensive and moreover, ...
Comments