ABSTRACT
Stream processing is important for analyzing continuous streams of data in real time. Sliding-window aggregation is both needed for many streaming applications and surprisingly hard to do efficiently. Picking the wrong aggregation algorithm causes poor performance, and knowledge of the right algorithms and when to use them is scarce. This paper was written to accompany a tutorial, but can be read as a stand-alone survey that aims to better educate the community about fast sliding-window aggregation algorithms for a variety of common aggregation operations and window types.
- adamax. 2011. Re: Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations. http://stackoverflow.com/questions/4802038/. Retrieved May, 2017.Google Scholar
- Arvind Arasu and Jennifer Widom. 2004. Resource sharing in continuous sliding window aggregates. In Conference on Very Large Data Bases (VLDB). 336--347. Google ScholarDigital Library
- Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM (CACM) 13, 7 (1970), 422--426. Google ScholarDigital Library
- Irina Botan, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Renée J. Miller, and Nesime Tatbul. 2010. SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. In Conference on Very Large Data Bases (VLDB). 232--243.Google ScholarDigital Library
- Graham Cormode and S. Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55, 1 (2005), 58--75. Google ScholarDigital Library
- Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms (AofA). 127--146.Google Scholar
- Buğra Gedik. 2013. Generic windowing support for extensible stream processing systems. Software Practice and Experience (SP&E) (2013), 1105--1128. Google ScholarDigital Library
- Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. In International Conference on Data Engineering (ICDE). 152--159. Google ScholarDigital Library
- Martin Hirzel, Rodric Rabbah, Philippe Suter, Olivier Tardieu, and Mandana Vaziri. 2016. Spreadsheets for Stream Processing with Unbounded Windows and Partitions. In Conference on Distributed Event-Based Systems (DEBS). 49--60. Google ScholarDigital Library
- Sailesh Krishnamurthy, Michael J. Franklin, Jeffrey Davis, Daniel Farina, Pasha Golovko, Alan Li, and Neil Thombre. 2010. Continuous Analytics over Discontinuous Streams. In International Conference on Management of Data (SIGMOD). 1081--1092. Google ScholarDigital Library
- Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly sharing for streamed aggregation. In International Conference on Management of Data (SIGMOD). 623--634. Google ScholarDigital Library
- Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Record 34, 1 (2005), 39--44. Google ScholarDigital Library
- Chris Okasaki. 1995. Simple and efficient purely functional queues and deques. Journal of Functional Programming (JFP) 5, 4 (1995), 583--592.Google ScholarCross Ref
- Scott Schneider, Buğra Gedik, and Martin Hirzel. 2013. Tutorial: Stream Processing Optimizations. In Conference on Distributed Event-Based Systems (DEBS). 249--258. Google ScholarDigital Library
- Scott Schneider, Martin Hirzel, Buğra Gedik, and Kun-Lung Wu. 2015. Safe Data Parallelism for General Streaming. IEEE Transactions on Computers (TC) 64, 2 (Feb. 2015), 504--517.Google ScholarCross Ref
- Jon Skeet. 2009. Re: design a stack such that getMinimum() should be O(1). http://stackoverflow.com/questions/685060/. Retrieved May, 2017.Google Scholar
- Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Principles of Database Systems (PODS). 263--274. Google ScholarDigital Library
- Guy L. Steele, Jr. 2009. Organizing Functional Code for Parallel Execution or, Foldl and Foldr Considered Slightly Harmful. In International Conference on Functional Programming (ICFP). 1--2. Google ScholarDigital Library
- Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2017. Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time. In Conference on Distributed Event-Based Systems (DEBS). Google ScholarDigital Library
- Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General Incremental Sliding-Window Aggregation. In Conference on Very Large Data Bases (VLDB). 702--713. Google ScholarDigital Library
Index Terms
- Sliding-Window Aggregation Algorithms: Tutorial
Recommendations
General incremental sliding-window aggregation
Stream processing is gaining importance as more data becomes available in the form of continuous streams and companies compete to promptly extract insights from them. In such applications, sliding-window aggregation is a central operator, and ...
Stream Aggregation with Compressed Sliding Windows
High performance stream aggregation is critical for many emerging applications that analyze massive volumes of data. Incoming data needs to be stored in a sliding window during processing, in case the aggregation functions cannot be computed ...
L-BiX: incremental sliding-window aggregation over data streams using linear bidirectional aggregating indexes
AbstractThe number of real-time information sources, or so-called streams, has rapidly increased, leading to a greater demand for complex analyses over streams. Although many stream analysis methods exist, aggregation is fundamental to ascertain higher ...
Comments