ABSTRACT
One fundamental challenge in data stream processing is to cope with the ubiquity of disorder of tuples within a stream caused by network latency, operator parallelization, merging of asynchronous streams, etc. High result accuracy and low result latency are two conflicting goals in out-of-order stream processing. Different applications may prefer different extent of trade-offs between the two goals. However, existing disorder handling solutions either try to meet one goal to the extreme by sacrificing the other, or try to meet both goals but have shortcomings including unguaranteed result accuracy or increased complexity in operator implementation and application logic.
To meet different application requirements on the latency versus result accuracy trade-off in out-of-order stream processing, in this paper, we propose to make this trade-off user-configurable. Particularly, focusing on sliding window aggregates, we introduce AQ-K-slack, a buffer-based quality-driven disorder handling approach. AQ-K-slack leverages techniques from the fields of sampling-based approximate query processing and control theory. It can adjust the input buffer size dynamically to minimize the result latency, while respecting user-specified threshold on relative errors in produced query results. AQ-K-slack requires no a priori knowledge of disorder characteristics of data streams, and imposes no changes to the query operator implementation or the application logic. Experiments over real-world out-of-order data streams show that, compared to the state-of-art, AQ-K-slack can reduce the average buffer size, thus the average result latency, by at least 51% while respecting user-specified requirement on the accuracy of query results.
- D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. VLDB Journal, 12(2):120--139, 2003. Google ScholarDigital Library
- S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: Queries with bounded errors and bounded response times on very large data. EuroSys, pages 29--42, 2013. Google ScholarDigital Library
- B. Babcock, M. Datar, and R. Motwani. Load shedding for aggregation queries over data streams. ICDE, pages 350--361, 2004. Google ScholarDigital Library
- S. Babu, U. Srivastava, and J. Widom. Exploiting K-constraints to reduce memory overhead in continuous queries over data streams. TODS, 29(3):545--580, 2004. Google ScholarDigital Library
- R. S. Barga, J. Goldstein, M. H. Ali, and M. Hong. Consistent streaming through time: A vision for event stream processing. CIDR, pages 363--374, 2007.Google Scholar
- A. Brito, C. Fetzer, H. Sturzrehm, and P. Felber. Speculative out-of-order event processing with software transaction memory. DEBS, pages 265--275, 2008. Google ScholarDigital Library
- C. Busch and S. Tirthapura. A deterministic algorithm for summarizing asynchronous streams over a sliding window. STACS, pages 465--476, 2007. Google ScholarDigital Library
- S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. ACM Commun., 54(8):88--98, 2011. Google ScholarDigital Library
- G. Cormode, F. Korn, and S. Tirthapura. Time-decaying aggregates in out-of-order streams. PODS, pages 89--98, 2008. Google ScholarDigital Library
- C. D. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: high performance network monitoring with an SQL interface. SIGMOD, page 623, 2002. Google ScholarDigital Library
- S. Gandhi, L. Foschini, and S. Suri. Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. ICDE, pages 924--935, 2010.Google ScholarCross Ref
- M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm. A catalog of stream processing optimizations. ACM Comput. Surv., 46(4):1--34, 2014. Google ScholarDigital Library
- J. Krämer and B. Seeger. Semantics and implementation of continuous sliding window queries over data streams. ACM Trans. Database Syst., 34(1):4:1--4:49, 2009. Google ScholarDigital Library
- S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, and N. Thombre. Continuous analytics over discontinuous streams. SIGMOD, pages 1081--1092, 2010. Google ScholarDigital Library
- Y. Law and C. Zaniolo. Improving the accuracy of continuous aggregates and mining queries on data streams under load shedding. Int. J. Bus. Intell. Data Min., 3(1):99--117, 2008. Google ScholarDigital Library
- W. S. Levine. The control handbook, Second Edition. CRC Press New York, 2011.Google Scholar
- J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. Semantics and evaluation techniques for window aggregates in data streams. SIGMOD, pages 311--322, 2005. Google ScholarDigital Library
- J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: A new architecture for high-performance stream systems. VLDB, 1(1):274--288, 2008. Google ScholarDigital Library
- M. Liu, M. Li, D. Golovnya, E. Rundensteiner, and K. Claypool. Sequence pattern query processing over out-of-order event streams. ICDE, pages 784--795, 2009. Google ScholarDigital Library
- S. Lohr. Sampling: Design and Analysis. Sampling: Design and Analysis. Duxbury Press, 1999.Google Scholar
- A. S. Maskey and M. Cherniack. Replay-based approaches to revision processing in stream query engines. SSPS, pages 3--12, 2008. Google ScholarDigital Library
- B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. ICDE, pages 76--88, 2010.Google ScholarCross Ref
- C. Mutschler and M. Philippsen. Distributed low-latency out-of-order event processing for high data rate sensor streams. IPDPS, pages 1133--1144, 2013. Google ScholarDigital Library
- C. Mutschler and M. Philippsen. Reliable speculative processing of out-of-order event streams in generic publish/subscribe middlewares. DEBS, pages 147--158, 2013. Google ScholarDigital Library
- C. Mutschler, H. Ziekow, and Z. Jerzak. The DEBS 2013 grand challenge. DEBS, pages 289--294, 2013. Google ScholarDigital Library
- B. Ottenwälder, B. Koldehofe, K. Rothermel, K. Hong, D. Lillethun, and U. Ramachandran. MCEP: A mobility-aware complex event processing system. ACM Trans. Internet Technol., 14(1):6:1--6:24, 2014. Google ScholarDigital Library
- K. Patroumpas and T. Sellis. Window specification over data streams. EDBT, pages 445--464, 2006. Google ScholarDigital Library
- E. Ryvkina, A. Maskey, M. Cherniack, and S. B. Zdonik. Revision processing in a stream processing engine: A high-level design. ICDE, page 141, 2006. Google ScholarDigital Library
- SAP Event Stream Processor. http://www.sap.com/pc/tech/database/software/sybase-complex-event-processing/index.html.Google Scholar
- U. Srivastava and J. Widom. Flexible time management in data stream systems. PODS, pages 263--274, 2004. Google ScholarDigital Library
- M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. SIGMOD Rec., 34(4):42--47, 2005. Google ScholarDigital Library
- J. Teubner and R. Mueller. How soccer players would do stream joins. SIGMOD, pages 625--636, 2011. Google ScholarDigital Library
- S. Tirthapura and D. P. Woodruff. A general method for estimating correlated aggregates over a data stream. ICDE, pages 162--173, 2012. Google ScholarDigital Library
- S. Tirthapura, B. Xu, and C. Busch. Sketching asynchronous streams over a sliding window. PODC, pages 82--91, 2006. Google ScholarDigital Library
- P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE TKDE, 15(3):555--568, 2003. Google ScholarDigital Library
- L. Wang, G. Luo, K. Yi, and G. Cormode. Quantiles over data streams: An experimental study. SIGMOD, pages 737--748, 2013. Google ScholarDigital Library
- J. G. Ziegler and N. B. Nichols. Optimum settings for automatic controllers. Trans. of ASME, 64:759--768, 1942.Google Scholar
Index Terms
- Quality-driven processing of sliding window aggregates over out-of-order data streams
Recommendations
Quality-Driven Continuous Query Execution over Out-of-Order Data Streams
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataExecuting continuous queries over out-of-order data streams, where tuples are not ordered according to timestamps, is challenging; because high result accuracy and low result latency are two conflicting performance metrics. Although many applications ...
Sketching distributed sliding-window data streams
While traditional data management systems focus on evaluating single, ad hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...
Time- and Space-Efficient Sliding Window Top-k Query Processing
A sliding window top-k (top-k/w) query monitors incoming data stream objects within a sliding window of size w to identify the k highest-ranked objects with respect to a given scoring function over time. Processing of such queries is challenging because,...
Comments