skip to main content
10.1145/2675743.2771828acmconferencesArticle/Chapter ViewAbstractPublication PagesdebsConference Proceedingsconference-collections
research-article

Quality-driven processing of sliding window aggregates over out-of-order data streams

Published:24 June 2015Publication History

ABSTRACT

One fundamental challenge in data stream processing is to cope with the ubiquity of disorder of tuples within a stream caused by network latency, operator parallelization, merging of asynchronous streams, etc. High result accuracy and low result latency are two conflicting goals in out-of-order stream processing. Different applications may prefer different extent of trade-offs between the two goals. However, existing disorder handling solutions either try to meet one goal to the extreme by sacrificing the other, or try to meet both goals but have shortcomings including unguaranteed result accuracy or increased complexity in operator implementation and application logic.

To meet different application requirements on the latency versus result accuracy trade-off in out-of-order stream processing, in this paper, we propose to make this trade-off user-configurable. Particularly, focusing on sliding window aggregates, we introduce AQ-K-slack, a buffer-based quality-driven disorder handling approach. AQ-K-slack leverages techniques from the fields of sampling-based approximate query processing and control theory. It can adjust the input buffer size dynamically to minimize the result latency, while respecting user-specified threshold on relative errors in produced query results. AQ-K-slack requires no a priori knowledge of disorder characteristics of data streams, and imposes no changes to the query operator implementation or the application logic. Experiments over real-world out-of-order data streams show that, compared to the state-of-art, AQ-K-slack can reduce the average buffer size, thus the average result latency, by at least 51% while respecting user-specified requirement on the accuracy of query results.

References

  1. D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: A new model and architecture for data stream management. VLDB Journal, 12(2):120--139, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. Blinkdb: Queries with bounded errors and bounded response times on very large data. EuroSys, pages 29--42, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock, M. Datar, and R. Motwani. Load shedding for aggregation queries over data streams. ICDE, pages 350--361, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Babu, U. Srivastava, and J. Widom. Exploiting K-constraints to reduce memory overhead in continuous queries over data streams. TODS, 29(3):545--580, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. S. Barga, J. Goldstein, M. H. Ali, and M. Hong. Consistent streaming through time: A vision for event stream processing. CIDR, pages 363--374, 2007.Google ScholarGoogle Scholar
  6. A. Brito, C. Fetzer, H. Sturzrehm, and P. Felber. Speculative out-of-order event processing with software transaction memory. DEBS, pages 265--275, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Busch and S. Tirthapura. A deterministic algorithm for summarizing asynchronous streams over a sliding window. STACS, pages 465--476, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. ACM Commun., 54(8):88--98, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Cormode, F. Korn, and S. Tirthapura. Time-decaying aggregates in out-of-order streams. PODS, pages 89--98, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. D. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: high performance network monitoring with an SQL interface. SIGMOD, page 623, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Gandhi, L. Foschini, and S. Suri. Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order. ICDE, pages 924--935, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. Hirzel, R. Soulé, S. Schneider, B. Gedik, and R. Grimm. A catalog of stream processing optimizations. ACM Comput. Surv., 46(4):1--34, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Krämer and B. Seeger. Semantics and implementation of continuous sliding window queries over data streams. ACM Trans. Database Syst., 34(1):4:1--4:49, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, and N. Thombre. Continuous analytics over discontinuous streams. SIGMOD, pages 1081--1092, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Law and C. Zaniolo. Improving the accuracy of continuous aggregates and mining queries on data streams under load shedding. Int. J. Bus. Intell. Data Min., 3(1):99--117, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. S. Levine. The control handbook, Second Edition. CRC Press New York, 2011.Google ScholarGoogle Scholar
  17. J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. Semantics and evaluation techniques for window aggregates in data streams. SIGMOD, pages 311--322, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: A new architecture for high-performance stream systems. VLDB, 1(1):274--288, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Liu, M. Li, D. Golovnya, E. Rundensteiner, and K. Claypool. Sequence pattern query processing over out-of-order event streams. ICDE, pages 784--795, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Lohr. Sampling: Design and Analysis. Sampling: Design and Analysis. Duxbury Press, 1999.Google ScholarGoogle Scholar
  21. A. S. Maskey and M. Cherniack. Replay-based approaches to revision processing in stream query engines. SSPS, pages 3--12, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Mozafari and C. Zaniolo. Optimal load shedding with aggregates and mining queries. ICDE, pages 76--88, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. C. Mutschler and M. Philippsen. Distributed low-latency out-of-order event processing for high data rate sensor streams. IPDPS, pages 1133--1144, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Mutschler and M. Philippsen. Reliable speculative processing of out-of-order event streams in generic publish/subscribe middlewares. DEBS, pages 147--158, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Mutschler, H. Ziekow, and Z. Jerzak. The DEBS 2013 grand challenge. DEBS, pages 289--294, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Ottenwälder, B. Koldehofe, K. Rothermel, K. Hong, D. Lillethun, and U. Ramachandran. MCEP: A mobility-aware complex event processing system. ACM Trans. Internet Technol., 14(1):6:1--6:24, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Patroumpas and T. Sellis. Window specification over data streams. EDBT, pages 445--464, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Ryvkina, A. Maskey, M. Cherniack, and S. B. Zdonik. Revision processing in a stream processing engine: A high-level design. ICDE, page 141, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. SAP Event Stream Processor. http://www.sap.com/pc/tech/database/software/sybase-complex-event-processing/index.html.Google ScholarGoogle Scholar
  30. U. Srivastava and J. Widom. Flexible time management in data stream systems. PODS, pages 263--274, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. SIGMOD Rec., 34(4):42--47, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Teubner and R. Mueller. How soccer players would do stream joins. SIGMOD, pages 625--636, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Tirthapura and D. P. Woodruff. A general method for estimating correlated aggregates over a data stream. ICDE, pages 162--173, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Tirthapura, B. Xu, and C. Busch. Sketching asynchronous streams over a sliding window. PODC, pages 82--91, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE TKDE, 15(3):555--568, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. Wang, G. Luo, K. Yi, and G. Cormode. Quantiles over data streams: An experimental study. SIGMOD, pages 737--748, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. G. Ziegler and N. B. Nichols. Optimum settings for automatic controllers. Trans. of ASME, 64:759--768, 1942.Google ScholarGoogle Scholar

Index Terms

  1. Quality-driven processing of sliding window aggregates over out-of-order data streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        DEBS '15: Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems
        June 2015
        385 pages
        ISBN:9781450332866
        DOI:10.1145/2675743

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 June 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate130of553submissions,24%

        Upcoming Conference

        DEBS '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader