Abstract
Many stream-processing systems enforce an order on data streams during query evaluation to help unblock blocking operators and purge state from stateful operators. Such in-order processing (IOP) systems not only must enforce order on input streams, but also require that query operators preserve order. This order-preserving requirement constrains the implementation of stream systems and incurs significant performance penalties, particularly for memory consumption. Especially for high-performance, potentially distributed stream systems, the cost of enforcing order can be prohibitive. We introduce a new architecture for stream systems, out-of-order processing (OOP), that avoids ordering constraints. The OOP architecture frees stream systems from the burden of order maintenance by using explicit stream progress indicators, such as punctuation or heartbeats, to unblock and purge operators. We describe the implementation of OOP stream systems and discuss the benefits of this architecture in depth. For example, the OOP approach has proven useful for smoothing workload bursts caused by expensive end-of-window operations, which can overwhelm internal communication paths in IOP approaches. We have implemented OOP in two stream systems, Gigascope and NiagaraST. Our experimental study shows that the OOP approach can significantly outperform IOP in a number of aspects, including memory, throughput and latency.
- Abadi, D., et al. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal 12(2), August 2003. Google ScholarDigital Library
- Arasu, A., Babu, S., Widom, J. The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal 14(1), March 2005. Google ScholarDigital Library
- Avnur, R., Hellerstein, J. M. Eddies: Continuously Adaptive Query Processing. SIGMOD 2000. Google ScholarDigital Library
- Arasu, A., Widom, J. Resource Sharing in Continuous Sliding-window Aggregates. VLDB 2004. Google ScholarDigital Library
- Balazinska, M, Balakrishnan, H., Madden, S., Stonebraker M. Fault-Tolerance in the Borealis Distributed Stream Processing System. SIGMOD 2005. Google ScholarDigital Library
- Cormode, G., et al. Holistic UDAFs at Streaming Speeds. SIGMOD 2004. Google ScholarDigital Library
- Cranor, C., Johnson, T., Spatashek, O. Gigascope: A Stream Database for Network Applications. SIGMOD 2003. Google ScholarDigital Library
- Ding, L., et al. Joining Punctuated Streams. EDBT 2004.Google Scholar
- Ding, L., Rundensteiner, E. A. Evaluating Window Joins over Punctuated Streams. CIKM 2004. Google ScholarDigital Library
- Golab, L., Ozsu, M. T. Processing Sliding Window multi-joins in Continuous queries over Data Streams. VLDB 2003. Google ScholarDigital Library
- Hwang, J-H, Cetintemel, U., Zdonik, S. Fast and Highly-Available Stream Processing over Wide Area Networks. ICDE 2008. Google ScholarDigital Library
- Hammad, M. et al. Optimizing In-Order Execution of Continuous Queries over Streamed Sensor Data. SSDBM 2005. Google ScholarDigital Library
- Hammad, M., et al. Scheduling for Shared Window Joins over Data Streams. VLDB 2003. Google ScholarDigital Library
- Li, Hua-Gang, et al. Safety Guarantee of Continuous Join Queries over Punctuated Data Streams. VLDB 2006. Google ScholarDigital Library
- Jin Li, et al. No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams. SIGMOD Record 34(1), March 2005. Google ScholarDigital Library
- Jin Li, et al. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. SIGMOD 2005. Google ScholarDigital Library
- Theodore Johnson, et al. A Heartbeat Mechanism and Its Application in Gigascope. VLDB 2005. Google ScholarDigital Library
- Kang, J., Naughton, J. F., Viglas, S. Evaluating Window Joins over Unbounded Streams. ICDE 2003.Google ScholarCross Ref
- Ken Keys, et al. A Robust System for Accurate Real-time Summaries of Internet Traffic. ACM SIGMETRICS Performance Evaluation Review 33(1), June 2005. Google ScholarDigital Library
- Krishnamurthy, S., Wu, C., Franklin, M. J. On-the-Fly Sharing for Streamed Aggregation. SIGMOD 2006. Google ScholarDigital Library
- Naughton, J. et al. The Niagara Internet Query System. IEEE Data Eng. Bulletin 24(2), June 2001.Google Scholar
- Passive Measurement and Analysis Project. San Diego Supercomputer Center. http://pma.nlanr.net/PMA.Google Scholar
- Raman, V., Raman, B., Hellerstein, J. M. Online Dynamic Reordering for Interactive Data Processing. VLDB 1999. Google ScholarDigital Library
- Rundensteiner, E. A., et al. CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. VLDB 2004. Google ScholarDigital Library
- Srivastava, U, Widom, J. Flexible Time Management in Data Stream Systems. PODS 2004. Google ScholarDigital Library
- Srivastava, U, Widom, J. Memory-Limited Execution of Windowed Stream Joins. VLDB 2004. Google ScholarDigital Library
- StreamSQL. http://www.streamsql.org.Google Scholar
- Tucker, P. Punctuated Data Streams. Doctoral Dissertation. Oregon Health & Science University, Portland, OR, 2005. Google ScholarDigital Library
- Tucker, P., et al. Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Trans. on Knowledge and Data Engineering, 15(3), May 2003. Google ScholarDigital Library
- Urhan, T., Franklin, M. J. XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data Eng. Bull. 23(2), 2000.Google Scholar
- Urban, T. and Franklin, M. J. Dynamic Pipeline Scheduling for Improving Interactive Query Performance. VLDB 2001. Google ScholarDigital Library
- Viglas, S., Naughton, J. F., Burger, J. Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources. VLDB 2003. Google ScholarDigital Library
Index Terms
- Out-of-order processing: a new architecture for high-performance stream systems
Recommendations
Dual-Paradigm Stream Processing
ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingExisting stream processing frameworks operate either under data stream paradigm processing data record by record to favor low latency, or under operation stream paradigm processing data in micro-batches to desire high throughput. For complex and mutable ...
Quality-driven processing of sliding window aggregates over out-of-order data streams
DEBS '15: Proceedings of the 9th ACM International Conference on Distributed Event-Based SystemsOne fundamental challenge in data stream processing is to cope with the ubiquity of disorder of tuples within a stream caused by network latency, operator parallelization, merging of asynchronous streams, etc. High result accuracy and low result latency ...
Comments