ABSTRACT
Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm, in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide “sanity bounds” to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.
- BDT83.D B~tton, D DeWltt, and C Turbyfill Benchmarkmg database systems A systematic approach In Proc Nznth VLDB, pages 8-19, 1983 Google ScholarDigital Library
- Chr83a.S Chnstodoulakls Estimating block transfers and join sizes In Proc A CM SIGMOD Conference, pages 40-54, May 1983 Google ScholarDigital Library
- Chr83b.S Chrxstodoulakxs Estxmatmg record selectlvltles Informatzon Systems, 8(2) 69- 79, 1983Google Scholar
- Dem80.R Demolombe Estimation of the number of tuples satxsfymg a query expressed in predicate calculus language In Proc Szzth VLDB, pages 55-63, 1980Google Scholar
- Fed84.J Fedorowlcz Database performance evaluation using multiple regresmon techniques In Proc A CM SIGMOD Conference, pages 70-76, June 1984 Google ScholarDigital Library
- Fel68.W Feller An Introduction to Probability Theory and Its Applzcatsons, volume 1 John Wiley and Sons, inc, New York, New York, 1968Google Scholar
- HOT88.W-C Hou, G Ozsoyoglu, and B Tnneja StatlstlcM estimators for relational algebra expressions In Proc A CM PODS, pages 276-287, March 1988 Google ScholarDigital Library
- HOT89.W-C Hou, G Ozsoyoglu, and B Tane3a Processing aggregate relational querms with hard time constraints in Proc ACM SIGMOD Conference, pages 68-77, June 1989 Google ScholarDigital Library
- HTY82.L Herschberg, P D Tmg, and S B Yao Query optimization in star computer networks A CM Transactions on Database Systems, 7(4), December 1982 Google ScholarDigital Library
- KK85.N Kamel and 1~ King A model of data distribution based on texture analysis In Proc A CM SIGMOD Conference, pages 319-325, May 1985 Google ScholarDigital Library
- Koo80.P~ Kool The optzm~zat~on of quemes zn relatzonal database systems PhD thesis, Case Western Umverslty, Cleveland, Ohio, 1980Google Scholar
- LN89.1~ Lipton and J Naughton Estimating the size of generalized transitive closures in Proc Fzfleenth VLDB, pages 165-172, August 1989 Google ScholarDigital Library
- LN90.R Lipton and J Naughton Query size estlmatmn by adaptxve sampling In Proc A CM PODS, March 1990 Google ScholarDigital Library
- LNS90.1~ Lipton and J Naughton and D Schneider Practical Selectlwty Estimation through Adaptive Samphng Umverslty of Wisconsin-Madison Computer Sciences Department Techmcal Report, March 1990Google Scholar
- Lyn88.C Lynch Selectivity estlmatmn and query optlmlzatmn in large databases with highly skewed distributions of column values In Proc Fourteenth VLDB, pages 240-251, August 1988 Google ScholarDigital Library
- MCS88.M Manmno, P Chu, and T Sager Statistical profile estimation m database systems Computsng Surveys, 20(3) 191-221, September 1988 Google ScholarDigital Library
- MD88.M Murahknshna and D DeWltt Equldepth histograms for estimating selectiv- Ity factors for multl-dlmensmnal queries In Proc SIGMOD Conference, pages 28- 36, June 1988 Google ScholarDigital Library
- MDL83.A Montgomery, D D'Souza, and S Lee The cost of relatmnal algebraic operatmns on skewed data Estimates and experiments Information Processing Letters, pages 235-241, 1983Google Scholar
- MK85.B Muthuswamy and G Kerschberg A DDSM for relatmnal query optlmlzatmn Techmcal report, Umverslty of South Carohna, Columbia, 1985 As cited in {MCS88}Google Scholar
- OR86.F Olken and D Rotem Simple random samphng for relational databases In Proc Twelfth VLDB, pages 160-169, August 1986 Google ScholarDigital Library
- OR89.F Olken and D Rotem Random samphng from B+trees In Proc Fzfleenth VLDB, pages 269-278, August 1989 Google ScholarDigital Library
- PSC84.G Pmtetsky-Shaplro and C Connell Accurate estimation of the number of tuples satisfying a condltmn In Proc A CM SIG- MOD Conference, pages 256-276, 3une 1984 Google ScholarDigital Library
- SAC+79.P G Sehnger, M M Astrahan, D D Chamberhn, R A Lone, and T G Price Access path selectmn m a relatmnal database management system In Proc A CM SIGMOD Conference, pages 23-34, 1979 Google ScholarDigital Library
Index Terms
- Practical selectivity estimation through adaptive sampling
Recommendations
Practical selectivity estimation through adaptive sampling
Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm, in this paper we investigate its practicality as applied to selects ...
Selectivity Estimation for Joins Using Systematic Sampling
DEXA '97: Proceedings of the 8th International Workshop on Database and Expert Systems ApplicationsWe propose a new approach to the estimation of join selectivity. The technique, which we have called ``systematic sampling'', is a novel variant of the sampling-based approach. Systematic sampling works as follows: Given a relation R of N tuples, with a ...
Improved selectivity estimation by combining knowledge from sampling and synopses
Proceedings of the 44th International Conference on Very Large Data Bases, Rio de Janeiro, BrazilEstimating selectivities remains a critical task in query processing. Optimizers rely on the accuracy of selectivities when generating execution plans and, in approximate query answering, estimated selectivities affect the quality of the result. Many ...
Comments