Abstract
We present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<", "≥", "≤". Our method gives highly accurate, yet easy to compute, estimates. We store information about attribute values as a list of distribution steps (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by increasing the number of steps. We analyze desirable conditions that such estimates should satisfy. Based on the distribution steps, we derive a set of estimation formulas which minimize the worst-case error. We also present another set of formulas which reduce the average-case error. Finally, we show how to use sampling to compute a close approximation of the distribution steps very quickly. The major applications of our method are in query optimization and in answering statistical queries.
- {Blasgen 77} Blasgen, M W., and Eswaran, K. P. Storage and access in relational databases IBM Syst. J 16(4), 1977.Google Scholar
- {Christodoulakis 81} Christodoulakis, S Estimating selectivities in data bases. Technical Report CSRG-136, Ph.D. thesis, University of Toronto, 1981. Google ScholarDigital Library
- {Christodoulakis 83} Christodoulakis, S Estimating block transfers and join sizes. In Proc of annual meeting, pages 40--54. ACM-SIGMOD, May, 1983. Google ScholarDigital Library
- {Demolombe 80} Demolombe, R. Estimation of the number of tuples satisying a query expressed in predicate calculus language In Proc of 6th Int. Conf. on Very Large Data Bases, pages 55--63. Montreal, Canada, 1980.Google Scholar
- {Dixon 79} Dixon, W. J., and Massey, F. J. Introduction to statistical analysis McGraw-Hill Book Company, 1979.Google Scholar
- {Luk 83} Luk, W. S. On estimating block accesses in database organizations. Commun. ACM 26(11):945--947, November, 1983. Google ScholarDigital Library
- {Piatetsky 84} Piatetsky-Shapiro, G. I. A Self-Organizing Database System -- a Different Approach to Query Optimization. PhD thesis, Courant Institute, New York University, May, 1984. Google ScholarDigital Library
- {Rowe 83} Rowe, N C. Top-down statistical estimation on a database. In Proc. of annual meeting, pages 135--145. ACM-SIGMOD, May, 1983. Google ScholarDigital Library
- {Samson 83} Samson, W. B. and Bendell, A. Rank order distributions and secondary key indexing, extended abstract. In Proc of 2nd Int Conf. on Databases. Cambridge, England, September, 1983.Google Scholar
- {Selinger 79} Selinger, P. G., et al. Access path selection in a relational database management system. In Proc of ACM-SIGMOD Int Conf. on Manage. of Data. 1979. Google ScholarDigital Library
- {Whang 83} Whang, K.-Y, Wiederhold, G. and Sagalowicz, D. Estimating block accesses in database organizations: a closed noniterative formula. Commun. ACM 26(11):940--944, November, 1983. Google ScholarDigital Library
- {Yao 77} Yao, S. B. Approximating block accesses in database organizations Commun ACM 20(4):260--261, April, 1977 Google ScholarDigital Library
- {Yao 79} Yao, S. B. Optimization of query evaluation algorithms. ACM Trans. Database Syst. 4(2):133--155, September, 1979. Google ScholarDigital Library
- {Yu 78} Yu, C. T., Luk, W. S., and Siu, M. K. On the estimation of the number of desired records with respect to a given query. ACM Trans Database Syst 3(1) 41--56, March, 1978 Google ScholarDigital Library
- {Zipf 49} Zipf, G. K. Human Behaviour and the Principle of Least Effort. Addison-Wesley, Cambridge, Mass., 1949.Google Scholar
Index Terms
- Accurate estimation of the number of tuples satisfying a condition
Recommendations
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84: Proceedings of the 1984 ACM SIGMOD international conference on Management of dataWe present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<", "≥", "≤". Our method gives highly accurate, yet easy to compute, estimates. We store information ...
Estimation of the number of tuples satisfying a query expressed in predicate calculus language
VLDB '80: Proceedings of the sixth international conference on Very Large Data Bases - Volume 6We present a statistic model which allows us to estimate the cardinality of queries and sub-queries expressed in Predicate Calculus language. In a first step, we deal with expressions formed only with an AND operator with n operands. We show how we can ...
Discovering queries based on example tuples
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataAn enterprise information worker is often aware of a few example tuples (but not the entire result) that should be present in the output of the query. We study the problem of discovering the minimal project join query that contains the given example ...
Comments