ABSTRACT
The querying and analysis of data streams has been a topic of much recent interest, motivated by applications from the fields of networking, web usage analysis, sensor instrumentation, telecommunications, and others. Many of these applications involve monitoring answers to continuous queries over data streams produced at physically distributed locations, and most previous approaches require streams to be transmitted to a single location for centralized processing. Unfortunately, the continual transmission of a large number of rapid data streams to a central location can be impractical or expensive. We study a useful class of queries that continuously report the k largest values obtained from distributed data streams ("top-k monitoring queries"), which are of particular interest because they can be used to reduce the overhead incurred while running other types of monitoring queries. We show that transmitting entire data streams is unnecessary to support these queries and present an alternative approach that reduces communication significantly. In our approach, arithmetic constraints are maintained at remote stream sources to ensure that the most recently provided top-k answer remains valid to within a user-specified error tolerance. Distributed communication is only necessary on occasion, when constraints are violated, and we show empirically through extensive simulation on real-world data that our approach reduces overall communication cost by an order of magnitude compared with alternatives that o er the same error guarantees.
- M. Arlitt and T. Jin. 1998 world cup web site access logs, Aug. 1988. Available at http://www.acm.org/sigcomm/ITA/.Google Scholar
- M. Arlitt and T. Jin. Workload characterization of the 1998 world cup web site. Technical Report HPL-1999-35R1, Hewlett Packard, Sept. 1999.Google Scholar
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. PODS, 2002. Google ScholarDigital Library
- B. Babcock and C. Olston. Distributed top-k monitoring. Technical report, Stanford University Computer Science Department, 2002. http://dbpubs.stanford.edu/pub/2002-61.Google Scholar
- D. Barbara and H. Garcia-Molina. The Demarcation Protocol: A technique for maintaining linear arithmetic constraints in distributed database systems. In Proc. EDBT, 1992. Google ScholarDigital Library
- P. A. Bernstein, B. T. Blaustein, and E. M. Clarke. Fast maintenance of semantic integrity assertions using redundant aggregate data. In Proc. VLDB, 1980.Google Scholar
- N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In Proc. ICDE, 2002. Google ScholarDigital Library
- D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams - a new class of data management applications. In Proc. VLDB, 2002. Google ScholarDigital Library
- M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Proc. Twenty-Ninth International Colloquium on Automata Languages and Programming, 2002. Google ScholarDigital Library
- J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In Proc. ACM SIGMOD, 2000. Google ScholarDigital Library
- Denial of service attacks using nameservers. Incident Note IN-2000-04, CMU Software Engineering Institute CERT Coordination Center, Apr. 2000. http://www.cert.org/incident_notes/IN-2000-04.html.Google Scholar
- M. Dilman and D. Raz. Efficient reactive monitoring. In Proc. INFOCOM, 2001.Google ScholarCross Ref
- C. Dwork, S. R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proc. WWW10, 2001. Google ScholarDigital Library
- D. Estrin, L. Girod, G. Pottie, and M. Srivastava. Instrumenting the world with wireless sensor networks. In Proc. International Conference on Acoustics, Speech, and Signal Processing, 2001.Google ScholarCross Ref
- R. Fagin, S. R. Kumar, and D. Sivakumar. Comparing top k lists. In Proc. SODA, 2003. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Proc. PODS, 2001. Google ScholarDigital Library
- S. Floyd and V. Paxson. Wide-area Traffic: The Failure of Poisson Modeling. IEEE/ACM Transactions on Networking, 3(3), 1995. Google ScholarDigital Library
- P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In Proc. ACM SIGMOD, 1998. Google ScholarDigital Library
- P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In Proc. Thirteenth ACM Symposium on Parallel Algorithms and Architectures, 2001. Google ScholarDigital Library
- P. B. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. In Proc. Fourteenth ACM Symposium on Parallel Algorithms and Architectures, 2002. Google ScholarDigital Library
- A. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proc. VLDB, 2001. Google ScholarDigital Library
- A. Gupta and J. Widom. Local verification of global integrity constraints in distributed databases. In Proc. ACM SIGMOD, 1993. Google ScholarDigital Library
- A. Householder, A. Manion, L. Pesante, and G. Weaver. Managing the threat of denial-of-service attacks. Technical report, CMU Software Engineering Institute CERT Coordination Center, Oct. 2001. http://www.cert.org/archive/pdf/Managing_DoS.pdf.Google Scholar
- N. Huyn. Maintaining global integrity constraints in distributed databases. Constraints, 2(3/4), 1997. Google ScholarDigital Library
- D. Li, K. Wong, Y. H. Hu, and A. Sayeed. Detection, classification and tracking of targets. IEEE Signal Processing Magazine, 2002.Google Scholar
- S. Madden, M. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In Proc. ACM SIGMOD, 2002. Google ScholarDigital Library
- G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. VLDB, 2002. Google ScholarDigital Library
- R. Min, M. Bhardwaj, S. Cho, A. Sinha, E. Shih, A. Wang, and A. Chandrakasan. Low-power wireless sensor networks. In Proc. Fourteenth International Conference on VLSI Design, 2001. Google ScholarDigital Library
- R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, resource management, and approximation in a data stream management system. In Proc. CIDR, 2003.Google Scholar
- C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In Proc. ACM SIGMOD, 2003. Google ScholarDigital Library
- T. Palpanas, R. Sidle, R. Cochrane, and H. Pirahesh. Incremental maintenance for non-distributive aggregate functions. In Proc. VLDB, 2002. Google ScholarDigital Library
- G. Pottie and W. Kaiser. Wireless integrated network sensors. Communications of the ACM, 43(5), 2000. Google ScholarDigital Library
- N. Soparkar and A. Silberschatz. Data-value partitioning and virtual messages. In Proc. PODS, 1990. Google ScholarDigital Library
- T. Yamashita. Dynamic replica control based on fairly assigned variation of data with weak consistency for loosely coupled distributed systems. In Proc. ICDCS, 2002. Google ScholarDigital Library
- K. Yi, H. Yu, J. Yang, G. Xia, and Y. Chen. Efficient maintenance of materialized top-k views. In Proc. ICDE, 2003.Google Scholar
Index Terms
- Distributed top-k monitoring
Recommendations
Continuous monitoring of top-k queries over sliding windows
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataGiven a dataset P and a preference function f, a top-k query retrieves the k tuples in P with the highest scores according to f. Even though the problem is well-studied in conventional databases, the existing methods are inapplicable to highly dynamic ...
Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Recently, uncertain data processing has become more and more important. Although a significant amount of previous research explores various continuous queries on data streams, continuous queries on uncertain data streams have seldom been investigated. ...
Uncertain top-k query processing in distributed environments
The top-k query on uncertain data set has been a very hot topic these years, and there have been many studies on uncertain top-k queries. Unfortunately, most of the existing algorithms only consider centralized processing environments, and they are not ...
Comments