ABSTRACT
We consider the problem of evaluating multiple overlapping queries defined on data streams, where each query is a conjunction of multiple filters and each filter may be shared across multiple queries. Efficient support for overlapping queries is a critical issue in the emerging data stream systems, and this is particularly the case when filters are expensive in terms of their computational complexity and processing time. This problem generalizes other well-known problems such as pipelined filter ordering and set cover, and is not only NP-Hard but also hard to approximate within a factor of o(log n) from the optimum, where n is the number of queries. In this paper, we present two near-optimal approximation lgorithms with provably-good performance guarantees for the evaluation of overlapping queries. We present an edge-coverage based Greedy algorithm which achieves an approximation ratio of (1 + log(n) + log(α)), where n is the number of queries and α is the average number of filters in a query. We also present a randomized, fast and easily parallelizable Harmonic algorithm which achieves an approximation ratio of 2β, where β is the maximum number of filters in a query. We have implemented these algorithms in a prototype system, and evaluated their performance using extensive experiments in the context of multimedia stream analysis. The results show that our Greedy algorithm consistently outperforms other known algorithms under various settings and scales well as the numbers of queries and filters increase.
- M. Aguilera, R. Strom, D. Sturman, M. Astley, and T. Chandra. Matching events in a content-based subscription system. In PODC'99: Proceedings of the 18th ACM Symposium on Principles of Distributed Computing, pages 53--61, 1999. Google ScholarDigital Library
- M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In VLDB'00: Proceedings of the 26th International Conference on Very Large Data Bases, pages 53--64, 2000. Google ScholarDigital Library
- S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream filters. In SIGMOD'04: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 407--418, 2004. Google ScholarDigital Library
- A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst., 19(3):332--383, 2001. Google ScholarDigital Library
- C.-Y. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPath expressions. The VLDB Journal, 11(4):354--379, 2002. Google ScholarDigital Library
- S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. on Database Systems, 24(2):177--228, 1999. Google ScholarDigital Library
- J. Chen, D. DeWitt, and J. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE'02: Proceedings of the 18th International Conference on Data Engineering, pages 345--356, 2002. Google ScholarDigital Library
- J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for Internet databases. In SIGMOD'00: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 379--390, 2000. Google ScholarDigital Library
- P. A. Chirita, S. Idreos, M. Koubarakis, and W. Nejdl. Publish/subscribe for RDF-based P2P networks. In Proceedings of the 1st European Semantic Web Symposium, pages 182--197, 2004.Google ScholarCross Ref
- M. Cilia, C. Bornhoevd, and A. P. Buchmann. CREAM: An infrastructure for distributed, heterogeneous event-based applications. In Proceedings of International Conference on Cooperative Information Systems, pages 482--502, 2003.Google ScholarCross Ref
- A. Condon, A. Deshpande, L. Hellerstein, and N. Wu. Flow algorithms for two pipelined filter ordering problems. In PODS'06: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 193--202, 2006. Google ScholarDigital Library
- G. Cugola, E. D. Nitto, and A. Fuggetta. Exploiting an event-based infrastructure to develop complex distributed systems. In ICSE'98: Proceedings of the 20th International Conference on Software Engineering, pages 261--270, 1998. Google ScholarDigital Library
- N. N. Dalvi, S. K. Sanghai, P. Roy, and S. Sudarshan. Pipelining in multi-query optimization. In PODS'01: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 59--70, 2001. Google ScholarDigital Library
- M. Goemans and J. Vondrak. Stochastic covering and adaptivity. In LATIN'06: Proceedings of the 7th Latin American Symposium on Theoretical Informatics, 2006. Google ScholarDigital Library
- J. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates. In SIGMOD'93: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 267--276, 1993. Google ScholarDigital Library
- S. Madden, M. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In SIGMOD'02: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 49--60, 2002. Google ScholarDigital Library
- H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. In SIGMOD'01: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 307--318, 2001. Google ScholarDigital Library
- K. Munagala, U. Srivastava, and J. Widom. Optimization of continuous queries with shared expensive filters. In PODS'07: Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 215--224, 2007. Google ScholarDigital Library
- A. Natsev, J. Tešić, L. Xie, R. Yan, and J. R. Smith. Ibm multimedia search and retrieval system. In CIVR '07: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pages 645--645, 2007. Google ScholarDigital Library
- J. Pereira, F. Fabret, H.-A. Jacobsen, F. Llirbat, and D. Shasha. WebFilter: A high-throughput XML-based publish and subscribe system. In VLDB'01: Proceedings of the 27th International Conference on Very Large Data Bases, pages 723--724, 2001. Google ScholarDigital Library
- M. Petrovic, I. Burcea, and H.-A. Jacobsen. S-ToPSS: Semantic Toronto publish/subscribe system. In VLDB'03: Proceedings of the 29th International Conference on Very Large Data Bases, pages 1101--1104, 2003. Google ScholarDigital Library
- M. Petrovic, H. Liu, and H.-A. Jacobsen. G-ToPSS: Fast filtering of graph-based metadata. In WWW'05: Proceedings of the 14th International Conference on World Wide Web, pages 539--547, 2005. Google ScholarDigital Library
- V. V. Vazirani. Approximation algorithms. Springer-Verlag, Inc., New York, NY, USA, 2001. Google ScholarDigital Library
- J. Wang, B. Jin, and J. Li. An ontology-based publish/subscribe system. In Middleware'04: Proceedings of the 5th ACM/IFIP/USENIX International Conference on Middleware, pages 232--253, 2004. Google ScholarDigital Library
Index Terms
- Near-optimal algorithms for shared filter evaluation in data stream systems
Recommendations
Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems
The problem of optimal query processing in distributed database systems was shown to be NP-hard. This means that heuristic algorithms are necessary to solve the query processing problem. In this paper, we describe algorithms to improve the solutions ...
A generic flow algorithm for shared filter ordering problems
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsWe consider a fundamental flow maximization problem that arises during the evaluation of multiple overlapping queries defined on a data stream, in a heterogenous parallel environment. Each query is a conjunction of boolean filters, and each filter could ...
An Optimal Algorithm for Processing Distributed Star Queries
The problem of optimal query processing in distributed database systems was shown to be NP-hard. However, for a special type of queries called star queries, we have developed a polynomial optimal algorithm. Semijoin tactics are applied for query ...
Comments