skip to main content
article

Probabilistic wavelet synopses

Published:01 March 2004Publication History
Skip Abstract Section

Abstract

Recent work has demonstrated the effectiveness of the wavelet decomposition in reducing large amounts of data to compact sets of wavelet coefficients (termed "wavelet synopses") that can be used to provide fast and reasonably accurate approximate query answers. A major shortcoming of these existing wavelet techniques is that the quality of the approximate answers they provide varies widely, even for identical queries on nearly identical values in distinct parts of the data. As a result, users have no way of knowing whether a particular approximate answer is highly-accurate or off by many orders of magnitude. In this article, we introduce Probabilistic Wavelet Synopses, the first wavelet-based data reduction technique optimized for guaranteed accuracy of individual approximate answers. Whereas previous approaches rely on deterministic thresholding for selecting the wavelet coefficients to include in the synopsis, our technique is based on a novel, probabilistic thresholding scheme that assigns each coefficient a probability of being included based on its importance to the reconstruction of individual data values, and then flips coins to select the synopsis. We show how our scheme avoids the above pitfalls of deterministic thresholding, providing unbiased, highly accurate answers for individual data values in a data vector. We propose several novel optimization algorithms for tuning our probabilistic thresholding scheme to minimize desired error metrics. Experimental results on real-world and synthetic data sets evaluate these algorithms, and demonstrate the effectiveness of our probabilistic wavelet synopses in providing fast, highly accurate answers with improved quality guarantees.

References

  1. Acharya, S., Gibbons, P. B., Poosala, V., and Ramaswamy, S. 1999. Join synopses for approximate query answering. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (Philadelphia, Pa.). ACM, New York, 275--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amsaleg, L., Bonnet, P., Franklin, M. J., Tomasic, A., and Urhan, T. 1997. Improving responsiveness for wide-area data access. IEEE Data Eng. Bull. 20, 3 (Sept.), 3--11. (Special Issue on Improving Query Responsiveness).Google ScholarGoogle Scholar
  3. Anastassiou, G. A. and Yu, X. M. 1992a. Monotone and probabilistic wavelet approximation. Stoch. Anal. Appl. 10, 3, 251--264.Google ScholarGoogle ScholarCross RefCross Ref
  4. Anastassiou, G. A. and Yu, X. M. 1992b. Probabilistic discrete wavelet approximation. Num. Func. Anal. Opt. 13, 117--121.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chakrabarti, K., Garofalakis, M., Rastogi, R., and Shim, K. 2000. Approximate query processing using wavelets. In Proceedings of the 26th International Conference on Very Large Data Bases (Cairo, Egypt). 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Deshpande, A., Garofalakis, M., and Rastogi, R. 2001. Independence is good: Dependency-based histogram synopses for high-dimensional data. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (Santa Barbara, Calif.). ACM, New York, 199--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. DeVore, R. A. 1998. Nonlinear approximation. Acta Numer. 7, 51--150.Google ScholarGoogle ScholarCross RefCross Ref
  8. Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., and Strauss, M. J. 2001. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proceedings of the 27th International Conference on Very Large Data Bases (Rome, Italy). 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gunopulos, D., Kollios, G., Tsotras, V. J., and Domeniconi, C. 2000. Approximating multi-dimensional aggregate range queries over real attributes. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Tex.). ACM, New York, 463--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Haas, P. J. and Swami, A. N. 1992. Sequential sampling procedures for query size estimation. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data (San Diego, Calif.). ACM, New York, 341--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hellerstein, J. M., Haas, P. J., and Wang, H. J. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (Tucson, Az.). ACM, New York, 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Information and Computer Science, University of California at Irvine. 2000. ftp://ftp.ics.uci.edu/pub/machine-learning-databases.Google ScholarGoogle Scholar
  13. Jawerth, B. and Sweldens, W. 1994. An overview of wavelet based multiresolution analyses. SIAM Rev. 36, 3, 377--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Matias, Y. 1992. Highly parallel randomized algorithmics. Ph.D. dissertation, Tel-Aviv Univ. Tel-Aviv, Israel.Google ScholarGoogle Scholar
  15. Matias, Y., Vitter, J. S., and Wang, M. 1998. Wavelet-based histograms for selectivity estimation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (Seattle, Wash.). ACM New York, 448--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Matias, Y., Vitter, J. S., and Wang, M. 2000. Dynamic maintenance of wavelet-based histograms. In Proceedings of the 26th International Conference on Very Large Data Bases (Cairo, Egypt). 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press, Cambridge, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Natsev, A., Rastogi, R., and Shim, K. 1999. WALRUS: A similarity retrieval algorithm for image databases. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (Philadelphia, Pa.). ACM, New York, 395--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Raghavan, P. and Thompson, C. 1987. Randomized rounding: A technique for provably good algorithms and algorithmic proofs. Combinatorica 7, 4, 365--374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Stollnitz, E. J., DeRose, T. D., and Salesin, D. H. 1996. Wavelets for Computer Graphics---Theory and Applications. Morgan-Kaufmann, San Francisco, Calif. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vazirani, V. V. 2001. Approximation Algorithms. Springer-Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Vitter, J. S. and Wang, M. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (Philadelphia, Pa.). ACM, New York, 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Probabilistic wavelet synopses

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader