ABSTRACT
A dynamic geometric data stream is a sequence of m Add/Remove operations of points from a discrete geometric space (1,...,Δ)d [21]. Add(p) inserts a point p from (1,...,Δ)d into the current point set, Remove(p) deletes p from P. We develop low-storage data structures to (i) maintain ε-approximations of range spaces of P with constant VC-dimension and (ii) maintain an ε-approximation of the weight of the Euclidean minimum spanning tree of P. Our data structures use O(log3ε • log3(1/ε) • log(1/ε)/ε2) and O(log (1/δ) • (log Δ/ε)O(d)) bits of memory, respectively (we assume that the dimension d is a constant), and they are correct with probability 1-δ. These results are based on a new data structure that maintains a set of elements chosen (almost) uniformly at random from P.
- N. Alon, Y. Matias, and M. Szegedy. The Space Complexity of Approximating the Frequency Moments. Proc 28th STOC, 1996.]] Google ScholarDigital Library
- A. Bagchi, A. Chaudhary, D. Eppstein and M. T. Goodrich. Deterministic Sampling and Range Counting in Geometric Data Streams. Proc 20th Annual Symposium on Computational Geometry, pp. 144--151, 2004.]] Google ScholarDigital Library
- Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, L. Trevisan. Counting Distinct Elements in a Data Stream. Proc RANDOM, 2002.]] Google ScholarDigital Library
- T. M. Chan. Faster Core-Set Constructions and Data Stream Algorithms in Fixed Dimensions. Proc 20th Annual Symposium on Computational Geometry, pp. 152--159, 2004.]] Google ScholarDigital Library
- T. M. Chan., B. S. Sadjad Geometric Optimization Problems over Sliding Windows Proc 15th International Symposium on Algorithms and Computation, pp. 246--258, 2004.]] Google ScholarDigital Library
- B. Chazelle, R. Rubinfeld, and L. Trevisan. Approximating the minimum spanning tree weight in sublinear time. Proc 28th ICALP, pages 190--200, 2001.]] Google ScholarDigital Library
- G. Cormode, M. Datar, P. Indyk. Comparing Data Streams using Hamming norms. Proc International Conference on Very Large Databases (VLDB), 2002.]] Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. Radial Histograms for Spatial Streams. DIMACS Technical Report 2003-11, 2003.]]Google Scholar
- G. Cormode, S. Muthukrishnan, and I. Rozenbaum. Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling. DIMACS Technical Report 2005-11 , 2005.]]Google Scholar
- A. Czumaj, F. Ergün, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, and C. Sohler. Sublinear-time approximation of Euclidean minimum spanning tree. Proc 14th SODA, pages 813--822, 2003.]] Google ScholarDigital Library
- A. Czumaj and C. Sohler. Estimating the Weight of Metric Minimum Spanning Trees in Sublinear-Time. Proc 36th STOC, pp. 175--183, 2004.]] Google ScholarDigital Library
- M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing iceberg queries efficiently. Proc 1998 Intl. Conf. on Very Large Data Bases, pp. 299--310, 1998.]] Google ScholarDigital Library
- J. Feigenbaum, S. Kannan, and J. Zhang. Computing Diameter in the Streaming and Sliding Window Models. Technical Report YALEU/DCS/TR-1245, Yale University, 2002.]]Google Scholar
- G. Frahling, C. Sohler. Coresets in dynamic geometric data streams. Manuscript, 2004.]]Google Scholar
- P. Flajolet and G. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31: 182--209, 1985.]] Google ScholarDigital Library
- S. Ganguly,M. Garofalakis, R. Rastogi. Processing Set Expressions over Continuous Update Streams. Proc SIGMOD Conference 2003, pp. 265--276, 2003.]] Google ScholarDigital Library
- T. Hagerub and C. Rüb. A Guided Tour of Chernoff Bounds. Information Processing Letters, 33:305--308, 1989/90.]] Google ScholarDigital Library
- S. Har-Peled, S. Mazumdar. On Coresets for k-Means and k-Median Clustering. Proc 36th STOC, pp. 291--300, 2004.]] Google ScholarDigital Library
- J. Hershberger and S. Suri. Convex Hulls and Related Problems in Data Streams. Proceedings of the ACM/DIMACS Workshop on Management and Processing of Data Streams, 2003.]]Google Scholar
- P. Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation. Proc 41st FOCS, 2000.]] Google ScholarDigital Library
- P. Indyk. Algorithms for Dynamic Geometric Problems over Data Streams. Proc 36th STOC, pp. 373--380, 2004.]] Google ScholarDigital Library
- P. Indyk. Better Algorithms for high-dimensional proximity problems via asymmetric embeddings. Proc 14th SODA, pages 539--545, 2003.]] Google ScholarDigital Library
- G. S. Manku, R. Motwani. Approximate Frequency Counts over Data Streams. Proc 2002 Intl. Conf. on Very Large Data Bases, pp. 346--357, 2002.]] Google ScholarDigital Library
- S. Muthukrishnan. Data streams: Algorithms and applications (invited talk at SODA'03). Available at http://athos.rutgers.edu/ muthu/stream-1-1.ps, 2003.]] Google ScholarDigital Library
- N. Nisan. Pseudorandom generators for Space-Bounded Computation. Proc 22nd STOC, 204--212, 1990.]] Google ScholarDigital Library
- S. Suri, C. D. Toth, and Y. Zhou. Range Counting over Multidimensional Data Streams. Proc 20th Annual Symposium on Computational Geometry, pp. 160--169, 2004.]] Google ScholarDigital Library
Index Terms
- Sampling in dynamic data streams and applications
Recommendations
Tight bounds for Lp samplers, finding duplicates in streams, and related problems
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsIn this paper, we present near-optimal space bounds for Lp-samplers. Given a stream of updates (additions and subtraction) to the coordinates of an underlying vector x in Rn, a perfect Lp sampler outputs the i-th coordinate with probability xipxpp. In ...
Coresets in dynamic geometric data streams
STOC '05: Proceedings of the thirty-seventh annual ACM symposium on Theory of computingA dynamic geometric data stream consists of a sequence of m insert/delete operations of points from the discrete space 1,…,Δd [26]. We develop streaming (1 + ε)-approximation algorithms for k-median, k-means, MaxCut, maximum weighted matching (MaxWM), ...
A data structure for multi-dimensional range reporting
SCG '07: Proceedings of the twenty-third annual symposium on Computational geometryWe present a static data structure for orthogonal range reporting on a U x U x U integer grid. Our data structure supports orthogonal range reporting queries in O((log log n)2 + log log U + k) time and uses O(n log4 n) space, where k is the size of the ...
Comments