skip to main content
10.1145/564691.564741acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Dynamic multidimensional histograms

Published:03 June 2002Publication History

ABSTRACT

Histograms are a concise and flexible way to construct summary structures for large data sets. They have attracted a lot of attention in database research due to their utility in many areas, including query optimization, and approximate query answering. They are also a basic tool for data visualization and analysis.In this paper, we present a formal study of dynamic multidimensional histogram structures over continuous data streams. At the heart of our proposal is the use of a dynamic summary data structure (vastly different from a histogram) maintaining a succinct approximation of the data distribution of the underlying continuous stream. On demand, an accurate histogram is derived from this dynamic data structure. We propose algorithms for extracting such an accurate histogram and we analyze their behavior and tradeoffs. The proposed algorithms are able to provide approximate guarantees about the quality of the estimation of the histograms they extract.We complement our analytical results with a thorough experimental evaluation using real data sets.

References

  1. A. Aboulnaga and S. Chaudhuri. Self Tuning Histograms: Building Histograms Without Looking at Data. Proceedings of ACM SIGMOD, pages 181-192, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Acharya, P. Gibbons, V. Poosala, and S. Ramaswamy. The Aqua Approximate Query Answering System. Proceedings of ACM SIGMOD, Philladephia PA, pages 574-578, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock, M. Datar, and R. Motwani. Sampling From a Moving Window Over Streaming Data. Proceedings of the Symposium on Discrete Algorithms, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Babu and J. Widom. Contineous Queries Over Data Streams. SIGMOD Record, Sept. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Bruno, L. Gravano, and S. Chaudhuri. STHoles: A Workload Aware Multidimensional Histogram. Proceedings of ACM SIGMOD, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining Stream Statistics over Sliding Windows. Proceedings of the Symposium on Discrete Algorithms, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Gibbons, Y. Mattias, and V. Poosala. Fast Incremental Maintenance of Approximate Histograms. Proceedings of VLDB, Athens Greece, pages 466-475, Aug. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Gilbert, S. Guha, P. Indyk, Y. Kotadis, S. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintanance. Proc. STOC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Gilbert, Y. Kotadis, S. Muthukrishnan, and M. Strauss. Quicksand: quick summary and analysis of network data. DIMACS tech report.Google ScholarGoogle Scholar
  10. A. Gilbert, Y. Kotadis, S. Muthukrishnan, and M. Strauss. Surfing Wavelets on Streams: One Pass Summaries for Approximate Aggregate Queries. Proceedings of VLDB, pages 79-88, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Gray, A. Bosworth, A. Leyman, and H. Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross Tab and Sub Total. Proceedings of ICDE, pages 152-159, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Greenwald and S. Khanna. Space-Efficient Online Computation of Quantile Summaries. Proceedings of ACM SIGMOD, Santa Barbara, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Guha and N. Koudas. Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation. ICDE, Feb. 2002.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Guha, N. Koudas, and K. Shim. Data Streams and Histograms. Symposium on the Theory of Computing (STOC), July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Guha, N. Mishra, R. Motwani, and L. O'callahan. Clustering Data Streams. Foundations of Computer Science (FOCS), Sept. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Gunopulos, G. Kollios, V. Tsotras, and C. Domeniconi. Approximating Multi-Dimensional Aggregate Range Queries Over Real Attributes. Proceedings of ACM SIGMOD, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Haas, J. Naughton, S. Seshadri, and L. Stokes. Sampling Based Estimation Of the Number Of Distinct Values Of An Attribute. Proceedings of VLDB, pages 311-322, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Haas, J. Naughton, S. Seshadri, and A. Swami. Fixed Precision Estimation Of Join Selectivity. Proceedings of ACM PODS, pages 190-201, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Haas and A. Swami. Sequantial Sampling Procedures for Query Size Estimation. Proceedings of ACM SIGMOD, San Diego, CA, pages 341-350, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Indyk. Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation". Foundations of Computer Science (FOCS), Sept. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Ioannidis and V. Poosala. Balancing Histogram Optimality and Practicality for Query Result Size Estimation. Proceedings of ACM SIGMOD, San Hose, CA, pages 233-244, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. Proceedings of VLDB, pages 275-286, Aug. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Khanna, S. Muthukrishnan, and S. Skiena. Efficient array partitioning. Proc. ICALP, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. P. Kooi. The Optimization of Queries in Relational Databases. PhD Thesis, Case Western Reserve University, Sept. 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Lee, D. Kim, and C. Chung. Multidimensional Selectivity Estimation Using Compressed Histogram Information. Proceedings of ACM SIGMOD, pages 205-214, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Madden and M. Franklin. Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data. Proceedings of ICDE, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Mattias, J. S. Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. Proc. of the 1998 ACM SIGMOD Intern. Conf. on Management of Data, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Mattias, J. S. Vitter, and M. Wang. Dynamic Maintenance of Wavelet-Based Histograms. Proceedings of the International Conference on Very Large Databases, (VLDB), Cairo, Egypt, pages 101-111, Sept. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Muthukrishnan, V. Poosala, and T. Suel. Partitioning two dimensional arrays: algorithms, complexity and applications. Proc. Intl Conf. Database Theory, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Poosala and Y. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. Proceedings of VLDB, Athens Greece, pages 486-495, Aug. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita. Improved Histograms for Selectivity Estimation of Range Predicates. Proceedings of ACM SIGMOD, Montreal Canada, pages 294-305, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. G. Singh, S. Rajagopalan, and B. Lindsay. Random Sampling Techniques For Space Efficient Computation Of Large Datasets. Proceedings of SIGMOD, Philladelphia PA, pages 251-262, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Vitter and M. Wang. Approximate computation of multidimensional aggregates on sparse data using wavelets. Proceedings of SIGMOD, pages 193-204, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Vitter, M. Wang, and B. R. Iyer. Data Cube Approximation and Histograms via Wavelets. Proc. of the 1998 ACM CIKM Intern. Conf. on Information and Knowledge Management, November 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Wu, D. Agrawal, and A. E. Abbadi. Applying the Golden Rule of Sampling for Selectivity Estimation. Proceedings of ACM SIGMOD, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dynamic multidimensional histograms

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              SIGMOD '02: Proceedings of the 2002 ACM SIGMOD international conference on Management of data
              June 2002
              654 pages
              ISBN:1581134975
              DOI:10.1145/564691

              Copyright © 2002 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 3 June 2002

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              SIGMOD '02 Paper Acceptance Rate42of240submissions,18%Overall Acceptance Rate785of4,003submissions,20%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader