skip to main content
10.1145/1064092.1064116acmconferencesArticle/Chapter ViewAbstractPublication PagessocgConference Proceedingsconference-collections
Article

Sampling in dynamic data streams and applications

Published:06 June 2005Publication History

ABSTRACT

A dynamic geometric data stream is a sequence of m Add/Remove operations of points from a discrete geometric space (1,...,Δ)d [21]. Add(p) inserts a point p from (1,...,Δ)d into the current point set, Remove(p) deletes p from P. We develop low-storage data structures to (i) maintain ε-approximations of range spaces of P with constant VC-dimension and (ii) maintain an ε-approximation of the weight of the Euclidean minimum spanning tree of P. Our data structures use O(log3ε • log3(1/ε) • log(1/ε)/ε2) and O(log (1/δ) • (log Δ/ε)O(d)) bits of memory, respectively (we assume that the dimension d is a constant), and they are correct with probability 1-δ. These results are based on a new data structure that maintains a set of elements chosen (almost) uniformly at random from P.

References

  1. N. Alon, Y. Matias, and M. Szegedy. The Space Complexity of Approximating the Frequency Moments. Proc 28th STOC, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bagchi, A. Chaudhary, D. Eppstein and M. T. Goodrich. Deterministic Sampling and Range Counting in Geometric Data Streams. Proc 20th Annual Symposium on Computational Geometry, pp. 144--151, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, L. Trevisan. Counting Distinct Elements in a Data Stream. Proc RANDOM, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. M. Chan. Faster Core-Set Constructions and Data Stream Algorithms in Fixed Dimensions. Proc 20th Annual Symposium on Computational Geometry, pp. 152--159, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. M. Chan., B. S. Sadjad Geometric Optimization Problems over Sliding Windows Proc 15th International Symposium on Algorithms and Computation, pp. 246--258, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Chazelle, R. Rubinfeld, and L. Trevisan. Approximating the minimum spanning tree weight in sublinear time. Proc 28th ICALP, pages 190--200, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode, M. Datar, P. Indyk. Comparing Data Streams using Hamming norms. Proc International Conference on Very Large Databases (VLDB), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cormode and S. Muthukrishnan. Radial Histograms for Spatial Streams. DIMACS Technical Report 2003-11, 2003.]]Google ScholarGoogle Scholar
  9. G. Cormode, S. Muthukrishnan, and I. Rozenbaum. Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling. DIMACS Technical Report 2005-11 , 2005.]]Google ScholarGoogle Scholar
  10. A. Czumaj, F. Ergün, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, and C. Sohler. Sublinear-time approximation of Euclidean minimum spanning tree. Proc 14th SODA, pages 813--822, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Czumaj and C. Sohler. Estimating the Weight of Metric Minimum Spanning Trees in Sublinear-Time. Proc 36th STOC, pp. 175--183, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. Proc 1998 Intl. Conf. on Very Large Data Bases, pp. 299--310, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Feigenbaum, S. Kannan, and J. Zhang. Computing Diameter in the Streaming and Sliding Window Models. Technical Report YALEU/DCS/TR-1245, Yale University, 2002.]]Google ScholarGoogle Scholar
  14. G. Frahling, C. Sohler. Coresets in dynamic geometric data streams. Manuscript, 2004.]]Google ScholarGoogle Scholar
  15. P. Flajolet and G. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31: 182--209, 1985.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Ganguly,M. Garofalakis, R. Rastogi. Processing Set Expressions over Continuous Update Streams. Proc SIGMOD Conference 2003, pp. 265--276, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Hagerub and C. Rüb. A Guided Tour of Chernoff Bounds. Information Processing Letters, 33:305--308, 1989/90.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Har-Peled, S. Mazumdar. On Coresets for k-Means and k-Median Clustering. Proc 36th STOC, pp. 291--300, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Hershberger and S. Suri. Convex Hulls and Related Problems in Data Streams. Proceedings of the ACM/DIMACS Workshop on Management and Processing of Data Streams, 2003.]]Google ScholarGoogle Scholar
  20. P. Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. Proc 41st FOCS, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Indyk. Algorithms for Dynamic Geometric Problems over Data Streams. Proc 36th STOC, pp. 373--380, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Indyk. Better Algorithms for high-dimensional proximity problems via asymmetric embeddings. Proc 14th SODA, pages 539--545, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. S. Manku, R. Motwani. Approximate Frequency Counts over Data Streams. Proc 2002 Intl. Conf. on Very Large Data Bases, pp. 346--357, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Muthukrishnan. Data streams: Algorithms and applications (invited talk at SODA'03). Available at http://athos.rutgers.edu/ muthu/stream-1-1.ps, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Nisan. Pseudorandom generators for Space-Bounded Computation. Proc 22nd STOC, 204--212, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Suri, C. D. Toth, and Y. Zhou. Range Counting over Multidimensional Data Streams. Proc 20th Annual Symposium on Computational Geometry, pp. 160--169, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sampling in dynamic data streams and applications

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SCG '05: Proceedings of the twenty-first annual symposium on Computational geometry
      June 2005
      398 pages
      ISBN:1581139918
      DOI:10.1145/1064092

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 June 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      SCG '05 Paper Acceptance Rate41of141submissions,29%Overall Acceptance Rate625of1,685submissions,37%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader