skip to main content
10.1145/3093742.3095107acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
tutorial

Sliding-Window Aggregation Algorithms: Tutorial

Published:08 June 2017Publication History

ABSTRACT

Stream processing is important for analyzing continuous streams of data in real time. Sliding-window aggregation is both needed for many streaming applications and surprisingly hard to do efficiently. Picking the wrong aggregation algorithm causes poor performance, and knowledge of the right algorithms and when to use them is scarce. This paper was written to accompany a tutorial, but can be read as a stand-alone survey that aims to better educate the community about fast sliding-window aggregation algorithms for a variety of common aggregation operations and window types.

References

  1. adamax. 2011. Re: Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations. http://stackoverflow.com/questions/4802038/. Retrieved May, 2017.Google ScholarGoogle Scholar
  2. Arvind Arasu and Jennifer Widom. 2004. Resource sharing in continuous sliding window aggregates. In Conference on Very Large Data Bases (VLDB). 336--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM (CACM) 13, 7 (1970), 422--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Irina Botan, Roozbeh Derakhshan, Nihal Dindar, Laura Haas, Renée J. Miller, and Nesime Tatbul. 2010. SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. In Conference on Very Large Data Bases (VLDB). 232--243.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Graham Cormode and S. Muthukrishnan. 2005. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55, 1 (2005), 58--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. 2007. HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm. In Conference on Analysis of Algorithms (AofA). 127--146.Google ScholarGoogle Scholar
  7. Buğra Gedik. 2013. Generic windowing support for extensible stream processing systems. Software Practice and Experience (SP&E) (2013), 1105--1128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. 1996. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. In International Conference on Data Engineering (ICDE). 152--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Martin Hirzel, Rodric Rabbah, Philippe Suter, Olivier Tardieu, and Mandana Vaziri. 2016. Spreadsheets for Stream Processing with Unbounded Windows and Partitions. In Conference on Distributed Event-Based Systems (DEBS). 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sailesh Krishnamurthy, Michael J. Franklin, Jeffrey Davis, Daniel Farina, Pasha Golovko, Alan Li, and Neil Thombre. 2010. Continuous Analytics over Discontinuous Streams. In International Conference on Management of Data (SIGMOD). 1081--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. 2006. On-the-fly sharing for streamed aggregation. In International Conference on Management of Data (SIGMOD). 623--634. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Record 34, 1 (2005), 39--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chris Okasaki. 1995. Simple and efficient purely functional queues and deques. Journal of Functional Programming (JFP) 5, 4 (1995), 583--592.Google ScholarGoogle ScholarCross RefCross Ref
  14. Scott Schneider, Buğra Gedik, and Martin Hirzel. 2013. Tutorial: Stream Processing Optimizations. In Conference on Distributed Event-Based Systems (DEBS). 249--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Scott Schneider, Martin Hirzel, Buğra Gedik, and Kun-Lung Wu. 2015. Safe Data Parallelism for General Streaming. IEEE Transactions on Computers (TC) 64, 2 (Feb. 2015), 504--517.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jon Skeet. 2009. Re: design a stack such that getMinimum() should be O(1). http://stackoverflow.com/questions/685060/. Retrieved May, 2017.Google ScholarGoogle Scholar
  17. Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Principles of Database Systems (PODS). 263--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guy L. Steele, Jr. 2009. Organizing Functional Code for Parallel Execution or, Foldl and Foldr Considered Slightly Harmful. In International Conference on Functional Programming (ICFP). 1--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2017. Low-Latency Sliding-Window Aggregation in Worst-Case Constant Time. In Conference on Distributed Event-Based Systems (DEBS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General Incremental Sliding-Window Aggregation. In Conference on Very Large Data Bases (VLDB). 702--713. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sliding-Window Aggregation Algorithms: Tutorial

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems
      June 2017
      393 pages
      ISBN:9781450350655
      DOI:10.1145/3093742

      Copyright © 2017 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 June 2017

      Check for updates

      Qualifiers

      • tutorial
      • Research
      • Refereed limited

      Upcoming Conference

      DEBS '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader