skip to main content
10.1145/564870.564880acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

Distributed streams algorithms for sliding windows

Published:10 August 2002Publication History

ABSTRACT

This paper presents algorithms for estimating aggregate functions over a "sliding window" of the N most recent data items in one or more streams. Our results include

  • For a single stream, we present the first ε-approximation scheme for the number of 1's in a sliding window that is optimal in both worst case time and space. We also present the first ε for the sum of integers in [0..R] in a sliding window that is optimal in both worst case time and space (assuming R is at most polynomial in N). Both algorithms are deterministic and use only logarithmic memory words.

  • In contrast, we show that an deterministic algorithm that estimates, to within a small constant relative error, the number of 1's (or the sum of integers) in a sliding window over the union of distributed streams requires Ω(N) space.

  • We present the first randomized (ε,δ)-approximation scheme for the number of 1's in a sliding window over the union of distributed streams that uses only logarithmic memory words. We also present the first (ε,δ)-approximation scheme for the number of distinct values in a sliding window over distributed streams that uses only logarithmic memory words.

Our results are obtained using a novel family of synopsis data structures.

References

  1. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. of Computer and System Sciences, 58:137--147, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Babcock, M. Datar, and R. Motwani. Sampling from a moving window over streaming data. In Proc. 13th ACM-SIAM Symp. on Discrete Algorithms, pages 633--634, Jan. 2002. Short paper.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. 13th ACM-SIAM Symp. on Discrete Algorithms, pages 623--632, Jan. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proc. 13th ACM-SIAM Symp. on Discrete Algorithms, pages 635--644, Jan. 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. In Proc. 40th IEEE Symp. on Foundations of Computer Science, pages 501--511, Oct. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. Testing and spot-checking of data streams. In Proc. 11th ACM-SIAM Symp. on Discrete Algorithms, pages 165--174, Jan. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Computer and System Sciences, 31:182--209, 1985.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Fong and M. Strauss. An approximate Lp-difference algorithm for massive data streams. In Proc. 17th Symp. on Theoretical Aspects of Computer Science, LNCS 1770, pages 193--204. Springer, Feb. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 13--24, May 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. B. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In Proc. 27th International Conf. on Very Large Data Bases, pages 541--550, Sept. 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 331--342, June 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, pages 39--70. AMS, 1999. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, Vol. 50. A two page summary appeared as a short paper in SODA'99.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In Proc. 13th ACM Symp. on Parallel Algorithms and Architectures, pages 281--290, July 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. C. Gilbert, Y. Kotidis, and M. J. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proc. 27th International Conf. on Very Large Data Bases, pages 79--88, Sept. 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 58--66, May 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In Proc. 33rd ACM Symp. on Theory of Computing, pages 471--475, July 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. 41st IEEE Symp. on Foundations of Computer Science, pages 359--366, Nov. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical report, Digital Systems Research Center, Palo Alto, CA, May 1998.]]Google ScholarGoogle Scholar
  19. P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In Proc. 41st IEEE Symp. on Foundations of Computer Science, pages 189--197, Nov. 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Indyk. Personal communication, 2002.]]Google ScholarGoogle Scholar
  21. I. Kremer, N. Nisan, and D. Ron. On randomized one-round communication complexity. Computational Complexity, 8(1):21--49, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, Cambridge, UK, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. Newman. Private vs. common random bits in communication complexity. Information Processing Letters, 39:67--71, 1991.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Newman and M. Szegedy. Public vs. private coin flips in one round communication games. In Proc. 28th ACM Symp. on the Theory of Computing, pages 561--570, May 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Trevisan. A note on counting distinct elements in the streaming model. Unpublished manuscript, April 2001.]]Google ScholarGoogle Scholar

Index Terms

  1. Distributed streams algorithms for sliding windows

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SPAA '02: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
          August 2002
          302 pages
          ISBN:1581135297
          DOI:10.1145/564870

          Copyright © 2002 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 August 2002

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate447of1,461submissions,31%

          Upcoming Conference

          SPAA '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader