Abstract
The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes. Additionally, for multimedia databases, there are intrinsic requirements for the multi-dimensional selectivity estimation because feature vectors are stored in multi-dimensional indexing trees. In the 1-dimensional case, a histogram is practically the most preferable. In the multi-dimensional case, however, a histogram is not adequate because of high storage overhead and high error rates.
In this paper, we propose a novel approach for the multi-dimensional selectivity estimation. Compressed information from a large number of small-sized histogram buckets is maintained using the discrete cosine transform. This enables low error rates and low storage overheads even in high dimensions. In addition, this approach has the advantage of supporting dynamic data updates by eliminating the overhead for periodical reconstructions of the compressed information. Extensive experimental results show advantages of the proposed approach.
- AFS93 R. Agrawal, C. Faloutsos, A. Swami. Efficient Similarity Search In Sequence Databases. Foundations of Data Organizations and Algorithms Conference, 1993. Google ScholarDigital Library
- BBK98 S. Berchtold, C. Bohm, H. Kriegel. The Pyramid Technique: Towards Breaking the Curse of Dimensiomdity. ACM SIGMOD Conference, pp.142-153, 1998. Google ScholarDigital Library
- BKK96 S. Berchtold, D. Keim, H. Kriegel. The X-tree: An index Structure for High-Dimensional Data. 22th VLZ)B Conference, pp. 28-39, 1996 Google ScholarDigital Library
- BF95 A. Belussi, C. Faloutsos. Estimating the Selectivir.~ of Spatial Queries Using the 'Correlation' Fractal Dimens.ion. 21th VLDB Conference, 1995. Google ScholarDigital Library
- CR94 C. Chen. N. Roussopoulos. Adaptive Selectivity Estimation Using Query Feedback. ACM SIGMOD Conference, pp. 161 - 172, 1994. Google ScholarDigital Library
- CSZS97 W. Chang, G. Sheikholeslami, A Zhang, T. Syeda- Mahmood. Efficient Resource Selection in Distributed Visual Information Systems. A CM Multimedia Conference, pp. 203- 213, 1997. Google ScholarDigital Library
- CG96 S. Chaudhuri, L. Gravano. Optimizing Queries over Multimedia Repositories. A CM SIGMOD Conference pp. 91- 102, 1996,. Google ScholarDigital Library
- Chri83 S. Christodoulakis. Estimating record selectivities. Information Systems Journal, 8(2), pp 105-115, 1983.Google Scholar
- EKSWX98 M. Ester, H. Kriegel, J. Sander, M. Winner, X. Xu. Incremental Clustering for Mining in Data Warehousing Environment. 24th VLDB Conference, pp. 323-333, 1998. Google ScholarDigital Library
- EKSX96 M. Ester, H. Kriegel, J. Sander, X. Xu. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd Int. Conf. on Knowledge Discovering and Data Mining, 1996.Google Scholar
- Fa96 R. Fagin. Combining Fuzzy Information from Multiple Systems. In Proc. of the 5th A CM Symposium on Principles of Database Systems, 1996. Google ScholarDigital Library
- Fa98 R. Fagin. Fuzzy Queries in Multimedia Database Systems. In Proc. of the 7th A CM Symposium on Principles of Database Systems, pp. 1-10, 1998. Google ScholarDigital Library
- GRS98 S. Guha, R. Rastogi, K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. A CM SIGMOD Conference, pp. 73-84, 1998. Google ScholarDigital Library
- HNSS95 EJ. Haas, J.E Naughton, S. Seshadri, and L. Stokes. Sampling based estimation of the number of distinct values of an attribute. 21th VLDB Conference, 1995. Google ScholarDigital Library
- Io93 Y. Ioannidis. Universality of Serial Histograms. 19th VLDB Conference, pp. 256-267, 1993. Google ScholarDigital Library
- IP95 Y. Ioannidis, V. Poosala. Balancing Optimality and Practicality for Query Result Size Estimation. A CM SIGMOD Conference, pp. 233-244, 1995. Google ScholarDigital Library
- JKMPSS98 H. Jagadish, N. Kouda, S. Muthukrishnan, V. Poosala, K. Sevcik, T. Suel. Optimal Histograms with Quality Gurantees. 24th VLDB Conference, pp. 275-286, 1998. Google ScholarDigital Library
- Lim90 J.S. Lira. Two Dimensional Signal And Image Processing. Prentice Hall, 1990. Google ScholarDigital Library
- MCS98 M.V. Mannino, E Chu, and T. Sager. Statistical profile estimation in database systems. A CM Computing Surveys, 20(3), 1988. Google ScholarDigital Library
- NH94 R. Ng, J. Han. Efficient and Effective Clustering Methods for Spatial Data Minig. 20th VLDB Conference, 1994. Google ScholarDigital Library
- PSTW93 B. Pagel, H. Six, H. Toben, E Widmayer. Towards an Analysis of Range Query Performance in Spatial Data Structures. In Proc. of the 2nd A CM Symposium on Principles of Database Systems, 1993. Google ScholarDigital Library
- PIHS96 V. Poosala, Y.E. Ioannidis, P.J. Haas, E.J. Shekita. Improved Histograms for Selectivity Estimation of Range Predicates. A CM SIGMOD Conference, pp. 294-305, 1996. Google ScholarDigital Library
- PI97 V. Poosala, Y.E. loannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. 23th VLDB Conference, pp. 486-495, 1997. Google ScholarDigital Library
- RY90 K.R. Rao, P. Yip. Discrete Cosine Transform Algorithms, Advantages, Applications. Academic Press, 1990. Google ScholarDigital Library
- SCZ98 G. Sheikholeslami, W. Chang, A. Zhang. Semantic Clustering and Querying Heterogeneous Features for Visual Data. A CM Multimedia Conference, pp. 3-12, 1998. Google ScholarDigital Library
- SLRD93 W. Sun, Y. Ling, N. Rishe, and Y. Deng. An Instant and accurate size estimation method for joins and selections in a retrieval-intensive environment. A CM SIGMOD Conference, 1993. Google ScholarDigital Library
- WKW94 K.Y. Whang, S.W. Kim, G. Wiederhold. Dynamic Maintenance of Data Distribution for Selectivity Estimation, VLDB Journal. Vol.3, No. 1, pp. 29-51, 1994. Google ScholarDigital Library
- ZRL96 T. Zhang, R. Ramakrishnam, M. Linvry. BIRCH: An Efficient Data Clustering Method for Very Large Databases. A CM SIGMOD Conference, pp. 103-114, 1996. Google ScholarDigital Library
Index Terms
- Multi-dimensional selectivity estimation using compressed histogram information
Recommendations
Multi-dimensional selectivity estimation using compressed histogram information
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataThe database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when ...
A multi-dimensional histogram for selectivity estimation and fast approximate query answering
CASCON '03: Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative researchHistograms have been widely used for selectivity estimation in query optimization, as well as for fast approximate query answering in many data mining, OLAP, and data visualization applications. This paper presents a new type of multi-dimensional ...
Selectivity estimation using probabilistic models
Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, ...
Comments