ABSTRACT
We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B numbers are chosen to minimize a suitable error between the original data and the estimate derived from the synopsis.Several good one-pass wavelet construction streaming algorithms minimizing the l2 error exist. For other error measures, the problem is less understood. We provide the first one-pass small space streaming algorithms with provable error guarantees (additive approximation) for minimizing a variety of non-Euclidean error measures including all weighted lp (including l∞) and relative error lp metrics.In several previous works solutions (for weighted l2, l∞ and maximum relative error) where the B synopsis coefficients are restricted to be wavelet coefficients of the data were proposed. This restriction yields suboptimal solutions on even fairly simple examples. Other lines of research, such as probabilistic synopsis, imposed restrictions on how the synopsis was arrived at. To the best of our knowledge this paper is the first paper to address the general problem, without any restriction on how the synopsis is arrived at, as well as provide the first streaming algorithms with guaranteed performance for these classes of error measures.
- K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. VLDB Conference, 2000. Google ScholarDigital Library
- K. Chakrabarti, E. J. Keogh, S. Mehrotra, and M. J. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. ACM TODS, 27(2):188--228, 2002. Google ScholarDigital Library
- I. Daubechies. Ten Lectures on Wavelets. SIAM, 1992. Google ScholarDigital Library
- A. Deligiannakis and N. Roussopoulos. Extended wavelets for multiple measures. SIGMOD Conference, 2003. Google ScholarDigital Library
- M. Garofalakis and A. Kumar. Deterministic wavelet thresholding for maximum error metric. Proc. of PODS, 2004. Google ScholarDigital Library
- M. N. Garofalakis and P. B. Gibbons. Probabilistic wavelet synopses. ACM TODS (See also SIGMOD 2002), 29:43--90, 2004. Google ScholarDigital Library
- A. Gilbert, Y. Kotadis, S. Muthukrishnan, and M. Strauss. Surfing Wavelets on Streams: One Pass Summaries for Approximate Aggregate Queries. VLDB Conference, 2001. Google ScholarDigital Library
- A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and Martin Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proc. of ACM STOC, 2002. Google ScholarDigital Library
- S. Guha. Space efficiency in synopsis construction problems. VLDB Conference, 2005. Google ScholarDigital Library
- S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Histogramming data streams with fast per-item processing. In Proc. of ICALP, 2002. Google ScholarDigital Library
- S. Guha, C. Kim, and K. Shim. XWAVE: Optimal and approximate extended wavelets for streaming data. VLDB Conference, 2004. Google ScholarDigital Library
- S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. Proc. of FOCS, 2000. Google ScholarDigital Library
- C. E. Jacobs, A. Finkelstein, and D. H. Salesin. Fast multiresolution image querying. Computer Graphics, 29(Annual Conference Series):277--286, 1995. Google ScholarDigital Library
- P. Karras and N. Mamoulis. One pass wavelet synopis for maximum error metrics. VLDB Conference, 2005. Google ScholarDigital Library
- E. Keogh, K. Chakrabati, S. Mehrotra, and M. Pazzani. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. Proc. of SIGMOD, 2001. Google ScholarDigital Library
- Y. Matias and D. Urieli. Manuscript, 2004.Google Scholar
- Y. Matias and D. Urieli. Optimal workload-based wavelet synopses. Proc. of ICDT, 2005. Google ScholarDigital Library
- Y. Matias, J. S. Vitter, and M. Wang. Dynamic Maintenance of Wavelet-Based Histograms. VLDB Conference, 2000. Google ScholarDigital Library
- Y. Matias, J. Scott Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. Proc. of SIGMOD, 1998. Google ScholarDigital Library
- S. Muthukrishnan. Workload optimal wavelet synopsis. DIMACS TR, 2004.Google Scholar
- G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, 1996.Google Scholar
Index Terms
- Wavelet synopsis for data streams: minimizing non-euclidean error
Recommendations
Efficient range-constrained similarity search on wavelet synopses over multiple streams
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementDue to the resource limitation in the data stream environment, it has been reported that answering user queries according to the wavelet synopsis of a stream is an essential ability of a Data Stream Management System (DSMS). In this paper, motivated by ...
Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error
Due to the resource limitation in the data stream environments, it has been reported that answering user queries according to the wavelet synopsis of a stream is an essential ability of a Data Stream Management System (DSMS). In the literature, recent ...
Clustering data streams using grid-based synopsis
Continually advancing technology has made it feasible to capture data online for onward transmission as a steady flow of newly generated data points, termed as data stream. Continuity and unboundedness of data streams make storage of data and multiple ...
Comments