skip to main content
article

Sketch-based geometric monitoring of distributed stream queries

Published:01 August 2013Publication History
Skip Abstract Section

Abstract

Emerging large-scale monitoring applications rely on continuous tracking of complex data-analysis queries over collections of massive, physically-distributed data streams. Thus, in addition to the space- and time-efficiency requirements of conventional stream processing (at each remote monitor site), effective solutions also need to guarantee communication efficiency (over the underlying communication network). The complexity of the monitored query adds to the difficulty of the problem -- this is especially true for nonlinear queries (e.g., joins), where no obvious solutions exist for distributing the monitor condition across sites. The recently proposed geometric method offers a generic methodology for splitting an arbitrary (non-linear) global threshold-monitoring task into a collection of local site constraints; still, the approach relies on maintaining the complete stream(s) at each site, thus raising serious efficiency concerns for massive data streams. In this paper, we propose novel algorithms for efficiently tracking a broad class of complex aggregate queries in such distributed-streams settings. Our tracking schemes rely on a novel combination of the geometric method with compact sketch summaries of local data streams, and maintain approximate answers with provable error guarantees, while optimizing space and processing costs at each remote site and communication cost across the network. One of our key technical insights for the effective use of the geometric method lies in exploiting a much lower-dimensional space for monitoring the sketch-based estimation query. Due to the complex, highly nonlinear nature of these estimates, efficiently monitoring the local geometric constraints poses challenging algorithmic issues for which we propose novel solutions. Experimental results on real-life data streams verify the effectiveness of our approach.

References

  1. N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. "Tracking Join and Self-Join Sizes in Limited Storage". In ACM PODS, 1999. Google ScholarGoogle Scholar
  2. N. Alon, Y. Matias, and M. Szegedy. "The Space Complexity of Approximating the Frequency Moments". In ACM STOC, 1996. Google ScholarGoogle Scholar
  3. B. Babcock and C. Olston. "Distributed Top-K Monitoring". In ACM SIGMOD, 2003. Google ScholarGoogle Scholar
  4. M. Charikar, K. Chen, and M. Farach-Colton. "Finding Frequent Items in Data Streams". In ICALP, 2002. Google ScholarGoogle Scholar
  5. D. Chu, A. Deshpande, J. M. Hellerstein, and W. Hong. "Approximate Data Collection in Sensor Networks using Probabilistic Models". In IEEE ICDE, 2006. Google ScholarGoogle Scholar
  6. G. Cormode and M. Garofalakis. Streaming in a connected world: querying and tracking distributed data streams. In ACM SIGMOD, 2007. Google ScholarGoogle Scholar
  7. G. Cormode and M. Garofalakis. "Approximate Continuous Querying of Distributed Streams". ACM TODS, 33(2), 2008. Google ScholarGoogle Scholar
  8. G. Cormode, M. Garofalakis, S. Muthukrishnan, and R. Rastogi. "Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles". In ACM SIGMOD, 2005. Google ScholarGoogle Scholar
  9. G. Cormode, M. Garofalakis, and D. Sacharidis. "Fast Approximate Wavelet Tracking on Streams". In EDBT, 2006. Google ScholarGoogle Scholar
  10. G. Cormode and S. Muthukrishnan. "What's Hot and What's Not: Tracking Most Frequent Items Dynamically". In ACM PODS, 2003. Google ScholarGoogle Scholar
  11. G. Cormode and S. Muthukrishnan. "An improved data stream summary: The count-min sketch and its applications". In Jrnl. of Algorithms, 55(1), 2005. Google ScholarGoogle Scholar
  12. C. Cranor, T. Johnson, O. Spatscheck, and V. Shkapenyuk. "Gigascope: A Stream Database for Network Applications". In ACM SIGMOD, 2003. Google ScholarGoogle Scholar
  13. A. Das, S. Ganguly, M. Garofalakis, and R. Rastogi. "Distributed Set-Expression Cardinality Estimation". In VLDB, 2004. Google ScholarGoogle Scholar
  14. A. Deshpande, C. Guestrin, S. R. Madden, J. M. Hellerstein, and W. Hong. "Model-Driven Data Acquisition in Sensor Networks". In VLDB, 2004. Google ScholarGoogle Scholar
  15. A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi. "Processing Complex Aggregate Queries over Data Streams". In ACM SIGMOD, 2002. Google ScholarGoogle Scholar
  16. S. Ganguly, M. Garofalakis, and R. Rastogi. "Processing Set Expressions over Continuous Update Streams". In ACM SIGMOD, 2003. Google ScholarGoogle Scholar
  17. N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, and A. Schuster. "Prediction-based Geometric Monitoring over Distributed Data Streams". In ACM SIGMOD, 2012. Google ScholarGoogle Scholar
  18. P. B. Gibbons. "Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports". In VLDB, 2001. Google ScholarGoogle Scholar
  19. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. "How to Summarize the Universe: Dynamic Maintenance of Quantiles". In VLDB, 2002. Google ScholarGoogle Scholar
  20. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. "One-pass wavelet decomposition of data streams". IEEE TKDE, 15(3), 2003. Google ScholarGoogle Scholar
  21. M. B. Greenwald and S. Khanna. "Space-Efficient Online Computation of Quantile Summaries". In ACM SIGMOD, 2001. Google ScholarGoogle Scholar
  22. M. B. Greenwald and S. Khanna. "Power-Conserving Computation of Order-Statistics over Sensor Networks". In ACM PODS, 2004. Google ScholarGoogle Scholar
  23. A. Jain, E. Y. Chang, and Y.-F. Wang. "Adaptive stream resource management using Kalman Filters". In ACM SIGMOD, 2004. Google ScholarGoogle Scholar
  24. R. Keralapura, G. Cormode, and J. Ramamirtham. "Communication-efficient distributed monitoring of thresholded counts". In ACM SIGMOD, 2006. Google ScholarGoogle Scholar
  25. D. Keren, I. Sharfman, A. Schuster, and A. Livne. "Shape-Sensitive Geometric Monitoring". IEEE TKDE, 24(8), 2012. Google ScholarGoogle Scholar
  26. S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. "The Design of an Acquisitional Query Processor for Sensor Networks". In ACM SIGMOD, 2003. Google ScholarGoogle Scholar
  27. A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. "Finding (Recently) Frequent Items in Distributed Data Streams". In IEEE ICDE, 2005. Google ScholarGoogle Scholar
  28. G. S. Manku and R. Motwani. "Approximate Frequency Counts over Data Streams". In VLDB, 2002. Google ScholarGoogle Scholar
  29. NII Shonan Workshop on Large-Scale Distributed Computation, Shonan Village, Japan, January 2012. http://www.nii.ac.jp/shonan/seminar011/.Google ScholarGoogle Scholar
  30. C. Olston, J. Jiang, and J. Widom. "Adaptive Filters for Continuous Queries over Distributed Data Streams". In ACM SIGMOD, 2003. Google ScholarGoogle Scholar
  31. I. Sharfman, A. Schuster, and D. Keren. "A geometric approach to monitoring threshold functions over distributed data streams". In ACM SIGMOD, 2006. Google ScholarGoogle Scholar
  32. N. Thaper, S. Guha, P. Indyk, and N. Koudas. "Dynamic Multidimensional Histograms". In ACM SIGMOD, 2002. Google ScholarGoogle Scholar

Index Terms

  1. Sketch-based geometric monitoring of distributed stream queries
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 6, Issue 10
        August 2013
        180 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 August 2013
        Published in pvldb Volume 6, Issue 10

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader