skip to main content
article
Free Access

Multi-dimensional selectivity estimation using compressed histogram information

Authors Info & Claims
Published:01 June 1999Publication History
Skip Abstract Section

Abstract

The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes. Additionally, for multimedia databases, there are intrinsic requirements for the multi-dimensional selectivity estimation because feature vectors are stored in multi-dimensional indexing trees. In the 1-dimensional case, a histogram is practically the most preferable. In the multi-dimensional case, however, a histogram is not adequate because of high storage overhead and high error rates.

In this paper, we propose a novel approach for the multi-dimensional selectivity estimation. Compressed information from a large number of small-sized histogram buckets is maintained using the discrete cosine transform. This enables low error rates and low storage overheads even in high dimensions. In addition, this approach has the advantage of supporting dynamic data updates by eliminating the overhead for periodical reconstructions of the compressed information. Extensive experimental results show advantages of the proposed approach.

References

  1. AFS93 R. Agrawal, C. Faloutsos, A. Swami. Efficient Similarity Search In Sequence Databases. Foundations of Data Organizations and Algorithms Conference, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. BBK98 S. Berchtold, C. Bohm, H. Kriegel. The Pyramid Technique: Towards Breaking the Curse of Dimensiomdity. ACM SIGMOD Conference, pp.142-153, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. BKK96 S. Berchtold, D. Keim, H. Kriegel. The X-tree: An index Structure for High-Dimensional Data. 22th VLZ)B Conference, pp. 28-39, 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BF95 A. Belussi, C. Faloutsos. Estimating the Selectivir.~ of Spatial Queries Using the 'Correlation' Fractal Dimens.ion. 21th VLDB Conference, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CR94 C. Chen. N. Roussopoulos. Adaptive Selectivity Estimation Using Query Feedback. ACM SIGMOD Conference, pp. 161 - 172, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CSZS97 W. Chang, G. Sheikholeslami, A Zhang, T. Syeda- Mahmood. Efficient Resource Selection in Distributed Visual Information Systems. A CM Multimedia Conference, pp. 203- 213, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CG96 S. Chaudhuri, L. Gravano. Optimizing Queries over Multimedia Repositories. A CM SIGMOD Conference pp. 91- 102, 1996,. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chri83 S. Christodoulakis. Estimating record selectivities. Information Systems Journal, 8(2), pp 105-115, 1983.Google ScholarGoogle Scholar
  9. EKSWX98 M. Ester, H. Kriegel, J. Sander, M. Winner, X. Xu. Incremental Clustering for Mining in Data Warehousing Environment. 24th VLDB Conference, pp. 323-333, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. EKSX96 M. Ester, H. Kriegel, J. Sander, X. Xu. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd Int. Conf. on Knowledge Discovering and Data Mining, 1996.Google ScholarGoogle Scholar
  11. Fa96 R. Fagin. Combining Fuzzy Information from Multiple Systems. In Proc. of the 5th A CM Symposium on Principles of Database Systems, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fa98 R. Fagin. Fuzzy Queries in Multimedia Database Systems. In Proc. of the 7th A CM Symposium on Principles of Database Systems, pp. 1-10, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. GRS98 S. Guha, R. Rastogi, K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. A CM SIGMOD Conference, pp. 73-84, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HNSS95 EJ. Haas, J.E Naughton, S. Seshadri, and L. Stokes. Sampling based estimation of the number of distinct values of an attribute. 21th VLDB Conference, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Io93 Y. Ioannidis. Universality of Serial Histograms. 19th VLDB Conference, pp. 256-267, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. IP95 Y. Ioannidis, V. Poosala. Balancing Optimality and Practicality for Query Result Size Estimation. A CM SIGMOD Conference, pp. 233-244, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. JKMPSS98 H. Jagadish, N. Kouda, S. Muthukrishnan, V. Poosala, K. Sevcik, T. Suel. Optimal Histograms with Quality Gurantees. 24th VLDB Conference, pp. 275-286, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lim90 J.S. Lira. Two Dimensional Signal And Image Processing. Prentice Hall, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. MCS98 M.V. Mannino, E Chu, and T. Sager. Statistical profile estimation in database systems. A CM Computing Surveys, 20(3), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. NH94 R. Ng, J. Han. Efficient and Effective Clustering Methods for Spatial Data Minig. 20th VLDB Conference, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. PSTW93 B. Pagel, H. Six, H. Toben, E Widmayer. Towards an Analysis of Range Query Performance in Spatial Data Structures. In Proc. of the 2nd A CM Symposium on Principles of Database Systems, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. PIHS96 V. Poosala, Y.E. Ioannidis, P.J. Haas, E.J. Shekita. Improved Histograms for Selectivity Estimation of Range Predicates. A CM SIGMOD Conference, pp. 294-305, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. PI97 V. Poosala, Y.E. loannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. 23th VLDB Conference, pp. 486-495, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. RY90 K.R. Rao, P. Yip. Discrete Cosine Transform Algorithms, Advantages, Applications. Academic Press, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. SCZ98 G. Sheikholeslami, W. Chang, A. Zhang. Semantic Clustering and Querying Heterogeneous Features for Visual Data. A CM Multimedia Conference, pp. 3-12, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. SLRD93 W. Sun, Y. Ling, N. Rishe, and Y. Deng. An Instant and accurate size estimation method for joins and selections in a retrieval-intensive environment. A CM SIGMOD Conference, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. WKW94 K.Y. Whang, S.W. Kim, G. Wiederhold. Dynamic Maintenance of Data Distribution for Selectivity Estimation, VLDB Journal. Vol.3, No. 1, pp. 29-51, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. ZRL96 T. Zhang, R. Ramakrishnam, M. Linvry. BIRCH: An Efficient Data Clustering Method for Very Large Databases. A CM SIGMOD Conference, pp. 103-114, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-dimensional selectivity estimation using compressed histogram information

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGMOD Record
            ACM SIGMOD Record  Volume 28, Issue 2
            June 1999
            599 pages
            ISSN:0163-5808
            DOI:10.1145/304181
            Issue’s Table of Contents
            • cover image ACM Conferences
              SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
              June 1999
              604 pages
              ISBN:1581130848
              DOI:10.1145/304182

            Copyright © 1999 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 June 1999

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader