Skip to main content
Log in

Continuous Monitoring of Distributed Data Streams over a Time-Based Sliding Window

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

In this paper we extend the study of algorithms for monitoring distributed data streams from whole data streams to a time-based sliding window. The concern is how to minimize the communication between individual streams and the root, while allowing the root, at any time, to report the global statistics of all streams within a given error bound. This paper presents communication-efficient algorithms for three classical statistics, namely, basic counting, frequent items and quantiles. The worst-case communication cost over a window is \(O(\frac{k}{\varepsilon} \log\frac{\varepsilon N}{k})\) bits for basic counting, \(O(\frac{k}{\varepsilon} \log\frac{N}{k})\) words for frequent items and \(O(\frac{k}{\varepsilon^{2}} \log\frac{N}{k})\) words for quantiles, where k is the number of distributed data streams, N is the total number of items in the streams that arrive or expire in the window, and ε<1 is the given error bound. The performance of our algorithms matches and nearly matches the corresponding lower bounds. We also show how to generalize these results to streams with out-of-order data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, C.: Data Streams: Models and Algorithms. Springer, Berlin (2006)

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  3. Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Proceedings of the 36th International Colloquium on Automata, Languages and Programming, pp. 95–106 (2009)

    Chapter  Google Scholar 

  4. Arasu, A., Manku, G.: Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 286–296 (2004)

    Google Scholar 

  5. Babai, L., Gal, A., Kimmel, P., Lokam, S.: Communication complexity of simultaneous messages. SIAM J. Comput. 33(1), 137–166 (2004)

    Article  MathSciNet  Google Scholar 

  6. Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 633–634 (2002)

    Google Scholar 

  7. Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proceedings of the 22nd ACM SIGMOD International Conference on Management of Data, pp. 28–39 (2003)

    Google Scholar 

  8. Busch, C., Tirthapua, S.: A deterministic algorithm for summarizing asynchronous streams over a sliding window. In: Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science, pp. 265–476 (2007)

    Google Scholar 

  9. Cormode, G., Garofalakis, M.: Sketching streams through the net: distributed approximate query tracking. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 13–24 (2005)

    Google Scholar 

  10. Cormode, G., Garofalakis, M., Muthukrishnan, S., Rastogi, R.: Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: Proceedings of the 24th ACM SIGMOD International Conference on Management of Data, pp. 25–36 (2005)

    Google Scholar 

  11. Cormode, G., Korn, F., Tirthapura, S.: Time-decaying aggregates in out-of-order streams. In: Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 89–98 (2008)

    Google Scholar 

  12. Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1076–1085 (2008)

    Google Scholar 

  13. Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the 29th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 77–86 (2010)

    Google Scholar 

  14. Das, A., Ganguly, S., Garofalakis, M., Rastogi, R.: Distributed set-expression cardinality estimation. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 312–323 (2004)

    Google Scholar 

  15. Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  16. Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Proceedings of the 10th Annual European Symposium on Algorithms, pp. 323–334 (2002)

    Google Scholar 

  17. Demaine, E., Lopez-Ortiz, A., Munro, J.: Frequency estimation of internet packet streams with limited space. In: Proceedings of the 10th Annual European Symposium on Algorithms, pp. 348–360 (2002)

    Google Scholar 

  18. Gibbons, P., Tirthapura, S.: Distributed streams algorithms for sliding windows. In: Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 63–72 (2002)

    Google Scholar 

  19. Greenwald, M., Khanna, S.: Power-conserving computation of order-statistics over sensor networks. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 275–285 (2004)

    Google Scholar 

  20. Guha, S., Koudas, N., Shim, K.: Data-streams and histograms. In: Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pp. 471–475 (2001)

    Google Scholar 

  21. Indyk, P.: Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 148–155 (2000)

    Google Scholar 

  22. Jain, N., Yalagandula, P., Dahlin, M., Zhang, Y.: Insight: A distributed monitoring system for tracking continuous queries. In: Proceedings of the 20th ACM Symposium on Operating Systems Principles, pp. 1–7 (2005)

    Google Scholar 

  23. Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, pp. 289–300 (2006)

    Chapter  Google Scholar 

  24. Lee, L.K., Ting, H.F.: Maintaining significant stream statistics over sliding windows. In: 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 724–732 (2006)

    Chapter  Google Scholar 

  25. Lee, L.K., Ting, H.F.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 290–297 (2006)

    Google Scholar 

  26. Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 767–778 (2005)

    Google Scholar 

  27. Mouratidis, K., Bakiras, S., Papadias, D.: Continuous monitoring of top-k queries over sliding windows. In: Proceedings of the 25th ACM SIGMOD International Conference on Management of Data, pp. 635–646 (2006)

    Chapter  Google Scholar 

  28. Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers, Hanover (2005)

    MATH  Google Scholar 

  29. Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proceedings of the 22nd ACM SIGMOD International Conference on Management of Data, pp. 563–574 (2003)

    Google Scholar 

  30. Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Transactions on Database Systems 32(4) (2007)

  31. Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. In: Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 167–174 (2009)

    Google Scholar 

  32. Yi, K., Zhang, Q.: Private communication

  33. Yi, K., Zhang, Q.: Multi-dimensional online tracking. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1098–1107 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tak-Wah Lam.

Additional information

T.W. Lam is partially supported by the GRF Grant HKU-713909E.

H.F. Ting is partially supported by the GRF Grant HKU-716307E.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chan, HL., Lam, TW., Lee, LK. et al. Continuous Monitoring of Distributed Data Streams over a Time-Based Sliding Window. Algorithmica 62, 1088–1111 (2012). https://doi.org/10.1007/s00453-011-9506-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-011-9506-5

Keywords

Navigation