skip to main content
research-article

Semantics and implementation of continuous sliding window queries over data streams

Published:23 April 2009Publication History
Skip Abstract Section

Abstract

In recent years the processing of continuous queries over potentially infinite data streams has attracted a lot of research attention. We observed that the majority of work addresses individual stream operations and system-related issues rather than the development of a general-purpose basis for stream processing systems. Furthermore, example continuous queries are often formulated in some declarative query language without specifying the underlying semantics precisely enough. To overcome these deficiencies, this article presents a consistent and powerful operator algebra for data streams which ensures that continuous queries have well-defined, deterministic results. In analogy to traditional database systems, we distinguish between a logical and a physical operator algebra. While the logical algebra specifies the semantics of the individual operators in a descriptive but concrete way over temporal multisets, the physical algebra provides efficient implementations in the form of stream-to-stream operators. By adapting and enhancing research from temporal databases to meet the challenging requirements in streaming applications, we are able to carry over the conventional transformation rules from relational databases to stream processing. For this reason, our approach not only makes it possible to express continuous queries with a sound semantics, but also provides a solid foundation for query optimization, one of the major research topics in the stream community. Since this article seamlessly explains the steps from query formulation to query execution, it outlines the innovative features and operational functionality implemented in our state-of-the-art stream processing infrastructure.

Skip Supplemental Material Section

Supplemental Material

References

  1. Abadi, D. J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2, 120--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arasu, A., Babcock, B., Babu, S., McAlister, J., and Widom, J. 2002. Characterizing memory requirements for queries over continuous data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 221--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arasu, A., Babu, S., and Widom, J. 2006. The CQL continuous query language: Semantic foundations and query execution. VLDB J. 15, 2, 121--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Babcock, B., Babu, S., Datar, M., and Motwani, R. 2003. Chain: Operator scheduling for memory minimization in data stream systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 253--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Babu, S., Munagala, K., Widom, J., and Motwani, R. 2005. Adaptive caching for continuous queries. In Proceedings of the International Conference on Data Engineering (ICDE), 118--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bai, Y., Thakkar, H., Wang, H., Luo, C., and Zaniolo, C. 2006. A data stream language and system designed for power and extensibility. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), 337--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Barga, R. S., Goldstein, J., Ali, M. H., and Hong, M. 2007. Consistent streaming through time: A vision for event stream processing. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 363--374.Google ScholarGoogle Scholar
  9. Cammert, M., Krämer, J., Seeger, B., and Vaupel, S. 2008. A cost-based approach to adaptive resource management in data stream systems. IEEE Trans. Knowl. Data Eng. 20, 2, 230--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Carney, D., Cetintemel, U., Zdonik, S., Rasin, A., Cerniak, M., and Stonebraker, M. 2003. Operator scheduling in a data stream manager. In Proceedings of the International Conference on Very Large Databases (VLDB), 838--849. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chandrasekaran, S., Cooper, O., Deshpande, A., and et al. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the Conference on Innovative Data Systems Research (CIDR).Google ScholarGoogle Scholar
  12. Chandrasekaran, S. and Franklin, M. J. 2002. Streaming queries over streaming data. In Proceedings of the International Conference on Very Large Databases (VLDB), 203--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chaudhuri, S., Motwani, R., and Narasayya, V. 1999. On random sampling over joins. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 263--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen, J., DeWitt, D., Tian, F., and Wang, Y. 2000. NiagaraCQ: A scalable continuous query system for Internet databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 379--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to Algorithms, 2nd ed. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cranor, C. D., Johnson, T., Spatscheck, O., and Shkapenyuk, V. 2003. Gigascope: A stream database for network applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 647--651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Datar, M., Gionis, A., Indyk, P., and Motwani, R. 2002. Maintaining stream statistics over sliding windows. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dayal, U. 1987. Of nests and trees: A unified approach to processing queries that contain nested subqueries, aggregates, and quantifiers. In Proceedings of the International Conference on Very Large Databases, 197--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dayal, U., Goodman, N., and Katz, R. H. 1982. An extended relational algebra with control over duplicate elimination. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 117--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Demers, A. J., Gehrke, J., Panda, B., Riedewald, M., Sharma, V., and White, W. M. 2007. Cayuga: A General purpose event monitoring system. In Proceedings of the Conference on Innovative Data Systems Research (CIDR), 412--422.Google ScholarGoogle Scholar
  21. Dittrich, J.-P., Seeger, B., Taylor, D. S., and Widmayer, P. 2002. Progressive merge join: A generic and non-blocking sort-based join algorithm. In Proceedings of the International Conference on Very Large Databases (VLDB), 299--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gao, D., Jensen, C. S., Snodgrass, R. T., and Soo, M. D. 2005. Join operations in temporal databases. VLDB J. 14, 1, 2--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Garcia-Molina, H., Ullman, J. D., and Widom, J. 2000. Database System Implementation. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ghanem, T. M., Hammad, M. A., Mokbel, M. F., Aref, W. G., and Elmagarmid, A. K. 2007. Incremental evaluation of sliding-window queries over data streams. IEEE Trans. Knowl. Data Eng. 19, 1, 57--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Golab, L. and Ozsu, M. 2003a. Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the International Conference on Very Large Databases (VLDB), 500--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Golab, L. and Ozsu, M. 2003b. Issues in data stream management. SIGMOD Rec. 32, 2, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Golab, L., Ozsu, M. 2005. Update-Pattern-Aware modeling and processing of continuous queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 658--669. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Graefe, G. 1993. Query evaluation techniques for large databases. ACM Comput. Surv. 25, 2, 73--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., and Pirahesh, H. 1997. Data cube: A relational aggregation operator generalizing group-by, cross- tab, and sub-totals. Data Mining Knowl. Discov. 1, 1, 29--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O'Callaghan, L. 2003. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng. 15, 3, 515--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Haas, P. J. and Hellerstein, J. M. 1999. Ripple joins for online aggregation. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 287--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hellerstein, J. M., Haas, P. J., and Wang, H. 1997. Online aggregation. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jensen, C. S., Clifford, J., Elmasri, R., Gadia, S. K., Hayes, P. J., and Jajodia, S. 1994. A consensus glossary of temporal database concepts. SIGMOD Rec. 23, 1, 52--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kang, J., Naughton, J., and Viglas, S. 2003. Evaluating window joins over unbounded streams. In Proceedings of the International Conference on Data Engineering (ICDE), 341--352.Google ScholarGoogle Scholar
  35. Karp, R. M., Shenker, S., and Papadimitriou, C. H. 2003. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Krämer, J. 2007. Continuous queries over data streams -- Semantics and implementation. Ph.D. thesis, University of Marburg.Google ScholarGoogle Scholar
  37. Krämer, J. and Seeger, B. 2004. PIPES - A public infrastructure for processing and exploring streams. In Proceedings of the ACM SIGMOD International Conference on Database Management, 925--926. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Krämer, J. and Seeger, B. 2005. A temporal foundation for continuous queries over data streams. In Proceedings of the International Conference on Management of Data (COMAD), 70--82.Google ScholarGoogle Scholar
  39. Law, Y.-N., Wang, H., and Zaniolo, C. 2004. Query languages and data models for database sequences and data streams. In Proceedings of the International Conference on Very Large Databases (VLDB), 492--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucker, P. A. 2005. Semantics and evaluation techniques for window aggregates in data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 311--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Madden, S., Franklin, M. J., Hellerstein, J. M., and Hong, W. 2002. TAG: A tiny aggregation service for ad-hoc sensor networks. In Symposium on Operating System Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Manku, G. S. and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Nievergelt, J., and Preparata, F. P. 1982. Plane-Sweep algorithms for intersecting geometric figures. Commun. ACM 25, 10, 739--747. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Patroumpas, K. and Sellis, T. K. 2006. Window specification over data streams. In Proceedings of the EDBT Workshops, 445--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Raman, V., Deshpande, A., and Hellerstein, J. M. 2003. Using state modules for adaptive query processing. In Proceedings of the International Conference on Data Engineering (ICDE), 353.Google ScholarGoogle Scholar
  46. Roy, P., Seshadri, S., Sudarshan, S., and Bhobe, S. 2000. Efficient and extensible algorithms for multi query optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sellis, T. K. 1988. Multiple-Query optimization. ACM Trans. Database Syst. 13, 1, 23--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Slivinskas, G., Jensen, C. S., and Snodgrass, R. T. 2001. A foundation for conventional and temporal query optimization addressing duplicates and ordering. IEEE Trans. Knowl. Data Eng. 13, 1, 21--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. SQR. 2003. SQR -- A stream query repository. http://www.db.stanford.edu/stream/sqr.Google ScholarGoogle Scholar
  50. Srivastava, U. and Widom, J. 2004. Flexible time management in data stream systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Tansel, A., Clifford, J., Gadia, S., Jajodia, S., Segev, A., and Snodgrass, R. T. 1993. Temporal Databases: Theory, Design, and Implementation. Benjamin/Cummings. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Tatbul, N., Cetintemel, U., Zdonik, S. B., Cherniack, M., and Stonebraker, M. 2003. Load shedding in a data stream manager. In Proceedings of the International Conference on Very Large Databases (VLDB), 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. 2003. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15, 3, 555--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Tucker, P. A., Tufte, K., Papadimos, V., and Maier, D. 2002. NEXMark -- A benchmark for queries over data streams. http://www.cse.ogi.edu/dot/niagara/NEXMark.Google ScholarGoogle Scholar
  55. Viglas, S. D. and Naughton, J. F. 2002. Rate-Based query optimization for streaming information sources. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Viglas, S. D., Naughton, J. F., and Burger, J. 2003. Maximizing the output rate of multi-join queries over streaming information sources. In Proceedings of the International Conference on Very Large Databases (VLDB), 285--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Wang, H., Zaniolo, C., and Luo, C. 2003. ATLaS: A small but complete SQL extension for data mining and data streams. In Proceedings of the International Conference on Very Large Databases (VLDB), 1113--1116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yang, J. and Widom, J. 2001. Incremental computation and maintenance of temporal aggregates. In Proceedings of the International Conference on Data Engineering, 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yang, Y., Krämer, J., Papadias, D., and Seeger, B. 2007. HybMig: A hybrid approach to dynamic plan migration for continuous queries. IEEE Trans. Knowl. Data Eng. 19, 3, 398--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Zhu, Y., Rundensteiner, E. A., and Heineman, G. T. 2004. Dynamic plan migration for continuous queries over data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 431--442. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semantics and implementation of continuous sliding window queries over data streams

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Database Systems
            ACM Transactions on Database Systems  Volume 34, Issue 1
            April 2009
            349 pages
            ISSN:0362-5915
            EISSN:1557-4644
            DOI:10.1145/1508857
            Issue’s Table of Contents

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 April 2009
            • Accepted: 1 June 2008
            • Revised: 1 October 2007
            • Received: 1 April 2007
            Published in tods Volume 34, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader