skip to main content
10.1145/1376616.1376633acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Near-optimal algorithms for shared filter evaluation in data stream systems

Published:09 June 2008Publication History

ABSTRACT

We consider the problem of evaluating multiple overlapping queries defined on data streams, where each query is a conjunction of multiple filters and each filter may be shared across multiple queries. Efficient support for overlapping queries is a critical issue in the emerging data stream systems, and this is particularly the case when filters are expensive in terms of their computational complexity and processing time. This problem generalizes other well-known problems such as pipelined filter ordering and set cover, and is not only NP-Hard but also hard to approximate within a factor of o(log n) from the optimum, where n is the number of queries. In this paper, we present two near-optimal approximation lgorithms with provably-good performance guarantees for the evaluation of overlapping queries. We present an edge-coverage based Greedy algorithm which achieves an approximation ratio of (1 + log(n) + log(α)), where n is the number of queries and α is the average number of filters in a query. We also present a randomized, fast and easily parallelizable Harmonic algorithm which achieves an approximation ratio of 2β, where β is the maximum number of filters in a query. We have implemented these algorithms in a prototype system, and evaluated their performance using extensive experiments in the context of multimedia stream analysis. The results show that our Greedy algorithm consistently outperforms other known algorithms under various settings and scales well as the numbers of queries and filters increase.

References

  1. M. Aguilera, R. Strom, D. Sturman, M. Astley, and T. Chandra. Matching events in a content-based subscription system. In PODC'99: Proceedings of the 18th ACM Symposium on Principles of Distributed Computing, pages 53--61, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In VLDB'00: Proceedings of the 26th International Conference on Very Large Data Bases, pages 53--64, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Babu, R. Motwani, K. Munagala, I. Nishizawa, and J. Widom. Adaptive ordering of pipelined stream filters. In SIGMOD'04: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 407--418, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. Design and evaluation of a wide-area event notification service. ACM Trans. Comput. Syst., 19(3):332--383, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C.-Y. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPath expressions. The VLDB Journal, 11(4):354--379, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. ACM Trans. on Database Systems, 24(2):177--228, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chen, D. DeWitt, and J. Naughton. Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In ICDE'02: Proceedings of the 18th International Conference on Data Engineering, pages 345--356, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for Internet databases. In SIGMOD'00: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 379--390, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. A. Chirita, S. Idreos, M. Koubarakis, and W. Nejdl. Publish/subscribe for RDF-based P2P networks. In Proceedings of the 1st European Semantic Web Symposium, pages 182--197, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Cilia, C. Bornhoevd, and A. P. Buchmann. CREAM: An infrastructure for distributed, heterogeneous event-based applications. In Proceedings of International Conference on Cooperative Information Systems, pages 482--502, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. Condon, A. Deshpande, L. Hellerstein, and N. Wu. Flow algorithms for two pipelined filter ordering problems. In PODS'06: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 193--202, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Cugola, E. D. Nitto, and A. Fuggetta. Exploiting an event-based infrastructure to develop complex distributed systems. In ICSE'98: Proceedings of the 20th International Conference on Software Engineering, pages 261--270, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. N. Dalvi, S. K. Sanghai, P. Roy, and S. Sudarshan. Pipelining in multi-query optimization. In PODS'01: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 59--70, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Goemans and J. Vondrak. Stochastic covering and adaptivity. In LATIN'06: Proceedings of the 7th Latin American Symposium on Theoretical Informatics, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates. In SIGMOD'93: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 267--276, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Madden, M. Shah, J. M. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In SIGMOD'02: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 49--60, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized view selection and maintenance using multi-query optimization. In SIGMOD'01: Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 307--318, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Munagala, U. Srivastava, and J. Widom. Optimization of continuous queries with shared expensive filters. In PODS'07: Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 215--224, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Natsev, J. Tešić, L. Xie, R. Yan, and J. R. Smith. Ibm multimedia search and retrieval system. In CIVR '07: Proceedings of the 6th ACM International Conference on Image and Video Retrieval, pages 645--645, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Pereira, F. Fabret, H.-A. Jacobsen, F. Llirbat, and D. Shasha. WebFilter: A high-throughput XML-based publish and subscribe system. In VLDB'01: Proceedings of the 27th International Conference on Very Large Data Bases, pages 723--724, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Petrovic, I. Burcea, and H.-A. Jacobsen. S-ToPSS: Semantic Toronto publish/subscribe system. In VLDB'03: Proceedings of the 29th International Conference on Very Large Data Bases, pages 1101--1104, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Petrovic, H. Liu, and H.-A. Jacobsen. G-ToPSS: Fast filtering of graph-based metadata. In WWW'05: Proceedings of the 14th International Conference on World Wide Web, pages 539--547, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. V. Vazirani. Approximation algorithms. Springer-Verlag, Inc., New York, NY, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Wang, B. Jin, and J. Li. An ontology-based publish/subscribe system. In Middleware'04: Proceedings of the 5th ACM/IFIP/USENIX International Conference on Middleware, pages 232--253, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Near-optimal algorithms for shared filter evaluation in data stream systems

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
              June 2008
              1396 pages
              ISBN:9781605581026
              DOI:10.1145/1376616

              Copyright © 2008 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 9 June 2008

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader