skip to main content
article

Continuous queries over data streams

Published:01 September 2001Publication History
Skip Abstract Section

Abstract

In many recent applications, data may take the form of continuous data streams, rather than finite stored data sets. Several aspects of data management need to be reconsidered in the presence of data streams, offering a new research direction for the database community. In this paper we focus primarily on the problem of query processing, specifically on how to define and evaluate continuous queries over data streams. We address semantic issues as well as efficiency concerns. Our main contributions are threefold. First, we specify a general and flexible architecture for query processing in the presence of data streams. Second, we use our basic architecture as a tool to clarify alternative semantics and processing techniques for continuous queries. The architecture also captures most previous work on continuous queries and data streams, as well as related concepts such as triggers and materialized views. Finally, we map out research topics in the area of query processing over data streams, showing where previous work is relevant and describing problems yet to be addressed.

References

  1. {AF00} M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 53-64, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {AGP00} S. Acharya, P. B. Gibbons, and V. Poosala. Congressional samples for approximate answering of group-by queries. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 487-498, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {AGPR99} S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 275-286, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {AH00} R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 261-272, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {AMS96} N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. of the 1996 Annual ACM Symp. on Theory of Computing, pages 20-29, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. {B+97} D. Barbara et al. The New Jersey data reduction report. IEEE Data Engineering Bulletin, 20(4):3-45, 1997.Google ScholarGoogle Scholar
  7. {Bar99} D. Barbara. The characterization of continuous queries. Intl. Journal of Cooperative Information Systems, 8(4):295-323, December 1999.Google ScholarGoogle ScholarCross RefCross Ref
  8. {BCL89} J. A. Blakeley, N. Coburn, and P. A. Larson. Updating derived relations: Detecting irrelevant and autonomously computable updates. ACM Trans. on Database Systems, 14(3):369-400, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {BGR01} S. Babu, M. N. Garofalakis, and R. Rastogi. SPARTAN: A model-based semantic compression system for massive data tables. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 283-294, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {CDTW00} J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 379-390, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {CFPR00} C. Cortes, K. Fisher, D. Pregibon, and A. Rogers. Hancock: a language for extracting signatures from data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 9-17, August 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {CGRS00} K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 111-122, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {CMN99} S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 263-274, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {DG00} N. G. Duffield and M. Grossglauser. Trajectory sampling for direct traffic observation. In Proc. of the 2000 ACM SIGCOMM, pages 271-284, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {DH00} P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 71-80, August 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {Fin82} S. J. Finkelstein. Common subexpression analysis in database applications. In Proc. of the 1982 ACM SIGMOD Intl. Conf. on Management of Data, pages 235-245, June 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. {FRM94} C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, pages 419-429, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {FW98} A. Fiat and G. J. Woeginger. Online Algorithms, The State of the Art. Springer-Verlag, Berlin, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {GJM96} A. Gupta, H. V. Jagadish, and I. S. Mumick. Data integration using self-maintainable views. In Proc. of the 1996 Intl. Conf. on Extending Database Technology, pages 140-144, March 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {GK01} M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 58-66, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. {GKMS01} A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In Proc. of the 2001 Intl. Conf. on Very Large Data Bases, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {GKS01} J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 13-24, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {GM95} A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Engineering Bulletin, 18(2):3-18, June 1995.Google ScholarGoogle Scholar
  24. {GM99} P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 50, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {GMLY98} H. Garcia-Molina, W. J. Labio, and J. Yang. Expiring data in a warehouse. In Proc. of the 1998 Intl. Conf. on Very Large Data Bases, pages 500-511, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {GMMO00} S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. of the 2000 Annual Symp. on Foundations of Computer Science, pages 359-366, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. {GMP97} P. B. Gibbons, Y. Matias, and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. of the 1997 Intl. Conf. on Very Large Data Bases, pages 466-475, August 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. {Gra90} Goetz Graefe. Encapsulation of parallelism in the volcano query processing system. In Proc. of the 1990 ACM SIGMOD Intl. Conf. on Management of Data, pages 102-111, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {Gra93} G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73-170, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. {HF+00} J. M. Hellerstein, M. J. Franklin, et al. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7-18, June 2000.Google ScholarGoogle Scholar
  31. {HH99} P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 287-298, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. {HHW97} J. M. Hellerstein, P. J. Haas, and H. Wang. Online aggregation. In Proc. of the 1997 ACM SIGMOD Intl. Conf. on Management of Data, pages 171-182, May 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. {Hid99} C. Hidber. Online association rule mining. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 145-156, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. {HRR98} M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report TR-1998-011, Compaq Systems Research Center, Palo Alto, California, May 1998.Google ScholarGoogle Scholar
  35. {HSD01} G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. of the 2001 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, August 2001. (To appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. {IFF+99} Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 299-310, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. {IP99} Y. E. Ioannidis and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. of the 1999 Intl. Conf. on Very Large Data Bases, pages 174-185, September 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. {JMS95} H. V. Jagadish, I. S. Mumick, and A. Silberschatz. View maintenance issues for the Chronicle data model. In Proc. of the 1995 ACM Symp. on Principles of Database Systems, pages 113-124, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. {KGM95} B. Kao and H. Garcia-Molina. An overview of real-time database systems. In S. H. Son, editor, Advances in Real-Time Systems, pages 463-486. Prentice Hall, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. {LPT99} L. Liu, C. Pu, and W. Tang. Continual queries for internet scale event-driven information delivery. IEEE Trans. on Knowledge and Data Engineering, 11(4):583-590, August 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. {MRL99} G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 251-262, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. {MVW00} Y. Matias, J. S. Vitter, and M. Wang. Dynamic maintenance of wavelet-based histograms. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 101-110, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. {NACP01} B. Nguyen, S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML data on the web. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 437-448, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. {PG99} V. Poosala and V. Ganti. Fast approximate answers to aggregate queries on a data cube. In Proc. of the 1999 Intl. Conf. on Scientific and Statistical Database Management, pages 24-33, July 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. {QGMW96} D. Quass, A. Gupta, I. S. Mumick, and J. Widom. Making views self-maintainable for data warehousing. In Proc. of the 1996 Intl. Conf. on Parallel and Distributed Information Systems, pages 158-169, December 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. {Se188} T. K. Sellis. Multiple-query optimization. ACM Trans. on Database Systems, 13(1):23-52, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. {SLR94} P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query processing. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, pages 430-441, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. {SPAM91} U. Schreier, H. Pirahesh, R. Agrawal, and C. Mohan. Alert: An architecture for transforming a passive DBMS into an active DBMS. In Proc. of the 1991 Intl. Conf. on Very Large Data Bases, pages 469-478, September 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. {STD+00} J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. F. Naughton, and D. Maier. Architecting a network query engine for producing partial results. In Proc. of the 2000 Intl. Workshop on the Web and Databases, pages 17-22, May 2000.Google ScholarGoogle Scholar
  50. {Sul96} M. Sullivan. Tribeca: A stream database manager for network traffic analysis. In Proc. of the 1996 Intl. Conf. on Very Large Data Bases, page 594, September 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. {Tan96} A. S. Tanenbaum. Computer Networks. Prentice Hall, Upper Saddle River, New Jersey, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. {Tea99} Times-Ten Team. In-memory data management for consumer transactions: The Times-Ten approach. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 528-529, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. {TGNO92} D. B. Terry, D. Goldberg, D. Nichols, and B. M. Oki. Continuous queries over append-only databases. In Proc. of the 1992 ACM SIGMOD Intl. Conf. on Management of Data, pages 321-330, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. {Tra} Traderbot home page. http://www.traderbot.com.Google ScholarGoogle Scholar
  55. {UF01} T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In Proc. of the 2001 Intl. Conf. on Very Large Data Bases, September 2001. (To appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. {UW97} J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, New Jersey, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. {Vit85} J. S. Vitter. Random sampling with a reservoir. ACM Trans. on Mathematical Software, 11(1):37-57, March 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. {VW99} J. S. Vitter and M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 193-204, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. {WA91} A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. of the 1991 Intl. Conf. on Parallel and Distributed Information Systems, pages 68-77, December 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. {WC96} J. Widom and S. Ceri. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann, San Francisco, California, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. {XPA99} XML path language (XPath) version 1.0, November 1999. W3C Recommendation available at http://www.w3.org/TR/xpath.Google ScholarGoogle Scholar
  62. {YSJ+00} B. Yi, N. Sidiropoulos; T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for coevolving time sequences. In Proc. of the 2000 Intl. Conf. on Data Engineering, pages 13-22, March 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Continuous queries over data streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGMOD Record
        ACM SIGMOD Record  Volume 30, Issue 3
        September 2001
        97 pages
        ISSN:0163-5808
        DOI:10.1145/603867
        Issue’s Table of Contents

        Copyright © 2001 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 2001

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader