Abstract
In many recent applications, data may take the form of continuous data streams, rather than finite stored data sets. Several aspects of data management need to be reconsidered in the presence of data streams, offering a new research direction for the database community. In this paper we focus primarily on the problem of query processing, specifically on how to define and evaluate continuous queries over data streams. We address semantic issues as well as efficiency concerns. Our main contributions are threefold. First, we specify a general and flexible architecture for query processing in the presence of data streams. Second, we use our basic architecture as a tool to clarify alternative semantics and processing techniques for continuous queries. The architecture also captures most previous work on continuous queries and data streams, as well as related concepts such as triggers and materialized views. Finally, we map out research topics in the area of query processing over data streams, showing where previous work is relevant and describing problems yet to be addressed.
- {AF00} M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 53-64, September 2000. Google ScholarDigital Library
- {AGP00} S. Acharya, P. B. Gibbons, and V. Poosala. Congressional samples for approximate answering of group-by queries. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 487-498, May 2000. Google ScholarDigital Library
- {AGPR99} S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 275-286, June 1999. Google ScholarDigital Library
- {AH00} R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 261-272, May 2000. Google ScholarDigital Library
- {AMS96} N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. of the 1996 Annual ACM Symp. on Theory of Computing, pages 20-29, May 1996. Google ScholarDigital Library
- {B+97} D. Barbara et al. The New Jersey data reduction report. IEEE Data Engineering Bulletin, 20(4):3-45, 1997.Google Scholar
- {Bar99} D. Barbara. The characterization of continuous queries. Intl. Journal of Cooperative Information Systems, 8(4):295-323, December 1999.Google ScholarCross Ref
- {BCL89} J. A. Blakeley, N. Coburn, and P. A. Larson. Updating derived relations: Detecting irrelevant and autonomously computable updates. ACM Trans. on Database Systems, 14(3):369-400, 1989. Google ScholarDigital Library
- {BGR01} S. Babu, M. N. Garofalakis, and R. Rastogi. SPARTAN: A model-based semantic compression system for massive data tables. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 283-294, May 2001. Google ScholarDigital Library
- {CDTW00} J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 379-390, May 2000. Google ScholarDigital Library
- {CFPR00} C. Cortes, K. Fisher, D. Pregibon, and A. Rogers. Hancock: a language for extracting signatures from data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 9-17, August 2000. Google ScholarDigital Library
- {CGRS00} K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 111-122, September 2000. Google ScholarDigital Library
- {CMN99} S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 263-274, June 1999. Google ScholarDigital Library
- {DG00} N. G. Duffield and M. Grossglauser. Trajectory sampling for direct traffic observation. In Proc. of the 2000 ACM SIGCOMM, pages 271-284, September 2000. Google ScholarDigital Library
- {DH00} P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 71-80, August 2000. Google ScholarDigital Library
- {Fin82} S. J. Finkelstein. Common subexpression analysis in database applications. In Proc. of the 1982 ACM SIGMOD Intl. Conf. on Management of Data, pages 235-245, June 1982. Google ScholarDigital Library
- {FRM94} C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, pages 419-429, May 1994. Google ScholarDigital Library
- {FW98} A. Fiat and G. J. Woeginger. Online Algorithms, The State of the Art. Springer-Verlag, Berlin, 1998. Google ScholarDigital Library
- {GJM96} A. Gupta, H. V. Jagadish, and I. S. Mumick. Data integration using self-maintainable views. In Proc. of the 1996 Intl. Conf. on Extending Database Technology, pages 140-144, March 1996. Google ScholarDigital Library
- {GK01} M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 58-66, May 2001. Google ScholarDigital Library
- {GKMS01} A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In Proc. of the 2001 Intl. Conf. on Very Large Data Bases, September 2001. Google ScholarDigital Library
- {GKS01} J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 13-24, May 2001. Google ScholarDigital Library
- {GM95} A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Engineering Bulletin, 18(2):3-18, June 1995.Google Scholar
- {GM99} P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 50, 1999. Google ScholarDigital Library
- {GMLY98} H. Garcia-Molina, W. J. Labio, and J. Yang. Expiring data in a warehouse. In Proc. of the 1998 Intl. Conf. on Very Large Data Bases, pages 500-511, August 1998. Google ScholarDigital Library
- {GMMO00} S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. of the 2000 Annual Symp. on Foundations of Computer Science, pages 359-366, November 2000. Google ScholarDigital Library
- {GMP97} P. B. Gibbons, Y. Matias, and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. of the 1997 Intl. Conf. on Very Large Data Bases, pages 466-475, August 1997. Google ScholarDigital Library
- {Gra90} Goetz Graefe. Encapsulation of parallelism in the volcano query processing system. In Proc. of the 1990 ACM SIGMOD Intl. Conf. on Management of Data, pages 102-111, May 1990. Google ScholarDigital Library
- {Gra93} G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73-170, 1993. Google ScholarDigital Library
- {HF+00} J. M. Hellerstein, M. J. Franklin, et al. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7-18, June 2000.Google Scholar
- {HH99} P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 287-298, June 1999. Google ScholarDigital Library
- {HHW97} J. M. Hellerstein, P. J. Haas, and H. Wang. Online aggregation. In Proc. of the 1997 ACM SIGMOD Intl. Conf. on Management of Data, pages 171-182, May 1997. Google ScholarDigital Library
- {Hid99} C. Hidber. Online association rule mining. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 145-156, June 1999. Google ScholarDigital Library
- {HRR98} M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report TR-1998-011, Compaq Systems Research Center, Palo Alto, California, May 1998.Google Scholar
- {HSD01} G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. of the 2001 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, August 2001. (To appear). Google ScholarDigital Library
- {IFF+99} Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 299-310, June 1999. Google ScholarDigital Library
- {IP99} Y. E. Ioannidis and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. of the 1999 Intl. Conf. on Very Large Data Bases, pages 174-185, September 1999. Google ScholarDigital Library
- {JMS95} H. V. Jagadish, I. S. Mumick, and A. Silberschatz. View maintenance issues for the Chronicle data model. In Proc. of the 1995 ACM Symp. on Principles of Database Systems, pages 113-124, May 1995. Google ScholarDigital Library
- {KGM95} B. Kao and H. Garcia-Molina. An overview of real-time database systems. In S. H. Son, editor, Advances in Real-Time Systems, pages 463-486. Prentice Hall, 1995. Google ScholarDigital Library
- {LPT99} L. Liu, C. Pu, and W. Tang. Continual queries for internet scale event-driven information delivery. IEEE Trans. on Knowledge and Data Engineering, 11(4):583-590, August 1999. Google ScholarDigital Library
- {MRL99} G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 251-262, June 1999. Google ScholarDigital Library
- {MVW00} Y. Matias, J. S. Vitter, and M. Wang. Dynamic maintenance of wavelet-based histograms. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 101-110, September 2000. Google ScholarDigital Library
- {NACP01} B. Nguyen, S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML data on the web. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 437-448, May 2001. Google ScholarDigital Library
- {PG99} V. Poosala and V. Ganti. Fast approximate answers to aggregate queries on a data cube. In Proc. of the 1999 Intl. Conf. on Scientific and Statistical Database Management, pages 24-33, July 1999. Google ScholarDigital Library
- {QGMW96} D. Quass, A. Gupta, I. S. Mumick, and J. Widom. Making views self-maintainable for data warehousing. In Proc. of the 1996 Intl. Conf. on Parallel and Distributed Information Systems, pages 158-169, December 1996. Google ScholarDigital Library
- {Se188} T. K. Sellis. Multiple-query optimization. ACM Trans. on Database Systems, 13(1):23-52, 1988. Google ScholarDigital Library
- {SLR94} P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query processing. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, pages 430-441, May 1994. Google ScholarDigital Library
- {SPAM91} U. Schreier, H. Pirahesh, R. Agrawal, and C. Mohan. Alert: An architecture for transforming a passive DBMS into an active DBMS. In Proc. of the 1991 Intl. Conf. on Very Large Data Bases, pages 469-478, September 1991. Google ScholarDigital Library
- {STD+00} J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. F. Naughton, and D. Maier. Architecting a network query engine for producing partial results. In Proc. of the 2000 Intl. Workshop on the Web and Databases, pages 17-22, May 2000.Google Scholar
- {Sul96} M. Sullivan. Tribeca: A stream database manager for network traffic analysis. In Proc. of the 1996 Intl. Conf. on Very Large Data Bases, page 594, September 1996. Google ScholarDigital Library
- {Tan96} A. S. Tanenbaum. Computer Networks. Prentice Hall, Upper Saddle River, New Jersey, 1996. Google ScholarDigital Library
- {Tea99} Times-Ten Team. In-memory data management for consumer transactions: The Times-Ten approach. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 528-529, June 1999. Google ScholarDigital Library
- {TGNO92} D. B. Terry, D. Goldberg, D. Nichols, and B. M. Oki. Continuous queries over append-only databases. In Proc. of the 1992 ACM SIGMOD Intl. Conf. on Management of Data, pages 321-330, June 1992. Google ScholarDigital Library
- {Tra} Traderbot home page. http://www.traderbot.com.Google Scholar
- {UF01} T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In Proc. of the 2001 Intl. Conf. on Very Large Data Bases, September 2001. (To appear). Google ScholarDigital Library
- {UW97} J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, New Jersey, 1997. Google ScholarDigital Library
- {Vit85} J. S. Vitter. Random sampling with a reservoir. ACM Trans. on Mathematical Software, 11(1):37-57, March 1985. Google ScholarDigital Library
- {VW99} J. S. Vitter and M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 193-204, June 1999. Google ScholarDigital Library
- {WA91} A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. of the 1991 Intl. Conf. on Parallel and Distributed Information Systems, pages 68-77, December 1991. Google ScholarDigital Library
- {WC96} J. Widom and S. Ceri. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann, San Francisco, California, 1996. Google ScholarDigital Library
- {XPA99} XML path language (XPath) version 1.0, November 1999. W3C Recommendation available at http://www.w3.org/TR/xpath.Google Scholar
- {YSJ+00} B. Yi, N. Sidiropoulos; T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for coevolving time sequences. In Proc. of the 2000 Intl. Conf. on Data Engineering, pages 13-22, March 2000. Google ScholarDigital Library
Index Terms
- Continuous queries over data streams
Recommendations
Transformation of continuous aggregation join queries over data streams
SSTD'07: Proceedings of the 10th international conference on Advances in spatial and temporal databasesWe address continuously processing an aggregation join query over data streams. Queries of this type involve both join and aggregation operations, with windows specified on join input streams. To our knowledge, the existing researches address join query ...
Continuous query processing in data streams using duality of data and queries
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataRecent data stream systems such as TelegraphCQ have employed the well-known property of duality between data and queries. In these systems, query processing methods are classified into two dual categories -- data-initiative and query-initiative -- ...
Comments