Abstract
Many current database systems use histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. In choosing among the various histograms, one has to balance between two conflicting goals: optimality, so that generated estimates have the least error, and practicality, so that histograms can be constructed and maintained efficiently. In this paper, we present both theoretical and experimental results on several issues related to this trade-off. Our overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query.
- 1 C. M. Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. In Proc. of the 199~ A CM-SIGMOD Conf., pages 161-172, Washington, DC, May 1994. Google ScholarDigital Library
- 2 S. Christodoulakis. Estimating block transfers and join sizes. In Proc. of the 1983 A CM-SIGMOD Con}., pages 40-54, San Jose, CA, May 1983. Google ScholarDigital Library
- 3 S. Christodoulakis. Implications of certain assumptions in database performance evaluation. A CM TODS, 9(2):163-186, June 1984. Google ScholarDigital Library
- 4 S. Christodoulakis On the estimation and use of selectivities in database performance evaluation. Research Report CS-89-24, Dept. of Computer Science, University of Waterloo, June 1989.Google Scholar
- 5 C. Date. A Guide to DB2. Addison Wesley, Reading, MA, 1989. Google ScholarDigital Library
- 6 C. Faloutsos and H. V. Jagadish. On b-tree indices for skewed distributions. In Proc. 18th Int. VLDB Conf., pages 363-374, Vancouver, BC, August 1992. Google ScholarDigital Library
- 7 P. Haas and A. Swami. Sequential sampling procedures for query size estimation. In Proc. of the 1992 ACM- SIGMOD Conf., pages 341-350, San Diego, CA, June 1992. Google ScholarDigital Library
- 8 P. J. Haas and A. Swami. Sampling-based selectivity estimation for joins using augmented frequent value statistics. In Proc. of the 1995 IEEE Con}. on Data Engzneerzng, Taipei, Taiwan, March 1995. Google ScholarDigital Library
- 9 Y. Ioannidis. Universality of serial histograms. In Proc. 19th Int. VLDB Conf., pages 256-267, Dublin, Ireland, August 1993. Google ScholarDigital Library
- 10 Y. Io~nnidis and $. CJhristodoulakis. On the propagation of errors in the size of join results. In Proc. of the 1991 A CM-SIGMOD Conf., pages 268-277, Denver, CO, May 1991. Google ScholarDigital Library
- 11 Y. Ioannidis and S Christodoulakis. Optimal histograms for limiting worst-case error propagation in the estimates of query optimizers, December 1993 Google ScholarDigital Library
- 12 Y. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. Unpublished manuscript, February 1995.Google Scholar
- 13 N. Kamel and R. King. A model of data distribution based on texture analysis. In Proc. of the 1985 A CM- SIGMOD Conf., pages 319-325. Austin, TX, May 1985. Google ScholarDigital Library
- 14 R. P. Kooi. The Opt~mizatzon of Q, uemes in Relational Databases PhD thesis, Case Western Reserve University, September 1980. Google ScholarDigital Library
- 15 R. J. Lipton, J. F. Naughton, and D. A. Schneider. Practical selectivity estimation through adaptive sampiing. In Proc. of the 1990 A CM-SIGMOD Conf., pages 1-11, Atlantic City, NJ, May 1990. Google ScholarDigital Library
- 16 M. V. Mannino, P. Chu, and T. Sager. Statistical profile estimation in database systems. A CM Cornput~ng Surveys, 20(3):192-221, September 1988. Google ScholarDigital Library
- 17 A. W. Marshall and I. Olkin. Inequalities: Theory of Majorizat~on and Its Applications. Academic Press, New York, NY, 1979.Google Scholar
- 18 T. H. Merrett and E. Otoo. Distribution models of relations. In Proc. 5th Int. VLDB Conf., pages 418- 425, Rio de Janeiro, Brazil, October 1979.Google ScholarCross Ref
- 19 M. Muralikrishna and D. J. DeWltt. Equi-depth histograms for estimating selectivity factors for multidimensional queries. In Proc. of the 1988 A CM- SIGMOD Conf., pages 28-36, Chicago, IL, June 1988. Google ScholarDigital Library
- 20 B. Muthuswamy and L. Kerschberg. A ddsm for relational query optimization. In Proc. A CM Annual Conf., Denver, CO, October 1985. Google ScholarDigital Library
- 21 G. Piatetsky-Shapiro and C. Connell. Accurate estimation of the number of tuptes satisfying a condition. In Proc. o} the 198~ A CM-SIGMOD Conf., pages 256-276, Boston, MA, June 1984. Google ScholarDigital Library
- 22 P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the A CM-SIGMOD Conf., pages 23-34, Boston, MA, June 1979. Google ScholarDigital Library
- 23 Y. Wang. Experience from a real life query optimizer. In Proc. of the 1992 A CM-SIGMOD Conf., page 286, San Diego, CA, June 1992. Conf. presentation. Google ScholarDigital Library
- 24 G. K. Zipf. Human Behavior and the Pmnc~ple of Least Effort. Addison-Wesley, Reading, MA, 1949.Google Scholar
Index Terms
- Balancing histogram optimality and practicality for query result size estimation
Recommendations
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of dataMany current database systems use histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. In choosing among the various histograms, one has to ...
Query Result Size Estimation Using a Novel Histogram-like Technique: The Rectangular Attribute Cardinality Map
IDEAS '99: Proceedings of the 1999 International Symposium on Database Engineering & ApplicationsCurrent database systems utilize histograms to approximate frequency distributions of attribute values of relations. These are used to efficiently estimate query result sizes and access plan costs. Even though they have been in use for nearly two ...
Improving Range Query Result Size Estimation Based on a New Optimal Histogram
FQAS 2013: Proceedings of the 10th International Conference on Flexible Query Answering Systems - Volume 8132Many commercial relational Data Base Management Systems DBMSs maintain histograms to approximate the distribution of values in the relation attributes and based on them estimate query result sizes. A histogram approximates the distribution by grouping ...
Comments