skip to main content
10.1145/1081870.1081884acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Wavelet synopsis for data streams: minimizing non-euclidean error

Published:21 August 2005Publication History

ABSTRACT

We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B numbers are chosen to minimize a suitable error between the original data and the estimate derived from the synopsis.Several good one-pass wavelet construction streaming algorithms minimizing the l2 error exist. For other error measures, the problem is less understood. We provide the first one-pass small space streaming algorithms with provable error guarantees (additive approximation) for minimizing a variety of non-Euclidean error measures including all weighted lp (including l) and relative error lp metrics.In several previous works solutions (for weighted l2, l and maximum relative error) where the B synopsis coefficients are restricted to be wavelet coefficients of the data were proposed. This restriction yields suboptimal solutions on even fairly simple examples. Other lines of research, such as probabilistic synopsis, imposed restrictions on how the synopsis was arrived at. To the best of our knowledge this paper is the first paper to address the general problem, without any restriction on how the synopsis is arrived at, as well as provide the first streaming algorithms with guaranteed performance for these classes of error measures.

References

  1. K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. VLDB Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Chakrabarti, E. J. Keogh, S. Mehrotra, and M. J. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. ACM TODS, 27(2):188--228, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Daubechies. Ten Lectures on Wavelets. SIAM, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Deligiannakis and N. Roussopoulos. Extended wavelets for multiple measures. SIGMOD Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Garofalakis and A. Kumar. Deterministic wavelet thresholding for maximum error metric. Proc. of PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. N. Garofalakis and P. B. Gibbons. Probabilistic wavelet synopses. ACM TODS (See also SIGMOD 2002), 29:43--90, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Gilbert, Y. Kotadis, S. Muthukrishnan, and M. Strauss. Surfing Wavelets on Streams: One Pass Summaries for Approximate Aggregate Queries. VLDB Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and Martin Strauss. Fast, small-space algorithms for approximate histogram maintenance. In Proc. of ACM STOC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Guha. Space efficiency in synopsis construction problems. VLDB Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Histogramming data streams with fast per-item processing. In Proc. of ICALP, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Guha, C. Kim, and K. Shim. XWAVE: Optimal and approximate extended wavelets for streaming data. VLDB Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. Proc. of FOCS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. E. Jacobs, A. Finkelstein, and D. H. Salesin. Fast multiresolution image querying. Computer Graphics, 29(Annual Conference Series):277--286, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Karras and N. Mamoulis. One pass wavelet synopis for maximum error metrics. VLDB Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Keogh, K. Chakrabati, S. Mehrotra, and M. Pazzani. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. Proc. of SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Matias and D. Urieli. Manuscript, 2004.Google ScholarGoogle Scholar
  17. Y. Matias and D. Urieli. Optimal workload-based wavelet synopses. Proc. of ICDT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Matias, J. S. Vitter, and M. Wang. Dynamic Maintenance of Wavelet-Based Histograms. VLDB Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Matias, J. Scott Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. Proc. of SIGMOD, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Muthukrishnan. Workload optimal wavelet synopsis. DIMACS TR, 2004.Google ScholarGoogle Scholar
  21. G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, 1996.Google ScholarGoogle Scholar

Index Terms

  1. Wavelet synopsis for data streams: minimizing non-euclidean error

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
            August 2005
            844 pages
            ISBN:159593135X
            DOI:10.1145/1081870

            Copyright © 2005 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 August 2005

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            Overall Acceptance Rate1,133of8,635submissions,13%

            Upcoming Conference

            KDD '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader