skip to main content
10.1145/93597.93611acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Practical selectivity estimation through adaptive sampling

Published:01 May 1990Publication History

ABSTRACT

Recently we have proposed an adaptive, random sampling algorithm for general query size estimation. In earlier work we analyzed the asymptotic efficiency and accuracy of the algorithm, in this paper we investigate its practicality as applied to selects and joins. First, we extend our previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy. Next, we provide “sanity bounds” to deal with queries for which the underlying data is extremely skewed or the query result is very small. Finally, we report on the performance of the estimation algorithm as implemented in a host language on a commercial relational system. The results are encouraging, even with this loose coupling between the estimation algorithm and the DBMS.

References

  1. BDT83.D B~tton, D DeWltt, and C Turbyfill Benchmarkmg database systems A systematic approach In Proc Nznth VLDB, pages 8-19, 1983 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chr83a.S Chnstodoulakls Estimating block transfers and join sizes In Proc A CM SIGMOD Conference, pages 40-54, May 1983 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chr83b.S Chrxstodoulakxs Estxmatmg record selectlvltles Informatzon Systems, 8(2) 69- 79, 1983Google ScholarGoogle Scholar
  4. Dem80.R Demolombe Estimation of the number of tuples satxsfymg a query expressed in predicate calculus language In Proc Szzth VLDB, pages 55-63, 1980Google ScholarGoogle Scholar
  5. Fed84.J Fedorowlcz Database performance evaluation using multiple regresmon techniques In Proc A CM SIGMOD Conference, pages 70-76, June 1984 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fel68.W Feller An Introduction to Probability Theory and Its Applzcatsons, volume 1 John Wiley and Sons, inc, New York, New York, 1968Google ScholarGoogle Scholar
  7. HOT88.W-C Hou, G Ozsoyoglu, and B Tnneja StatlstlcM estimators for relational algebra expressions In Proc A CM PODS, pages 276-287, March 1988 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. HOT89.W-C Hou, G Ozsoyoglu, and B Tane3a Processing aggregate relational querms with hard time constraints in Proc ACM SIGMOD Conference, pages 68-77, June 1989 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. HTY82.L Herschberg, P D Tmg, and S B Yao Query optimization in star computer networks A CM Transactions on Database Systems, 7(4), December 1982 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. KK85.N Kamel and 1~ King A model of data distribution based on texture analysis In Proc A CM SIGMOD Conference, pages 319-325, May 1985 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Koo80.P~ Kool The optzm~zat~on of quemes zn relatzonal database systems PhD thesis, Case Western Umverslty, Cleveland, Ohio, 1980Google ScholarGoogle Scholar
  12. LN89.1~ Lipton and J Naughton Estimating the size of generalized transitive closures in Proc Fzfleenth VLDB, pages 165-172, August 1989 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. LN90.R Lipton and J Naughton Query size estlmatmn by adaptxve sampling In Proc A CM PODS, March 1990 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. LNS90.1~ Lipton and J Naughton and D Schneider Practical Selectlwty Estimation through Adaptive Samphng Umverslty of Wisconsin-Madison Computer Sciences Department Techmcal Report, March 1990Google ScholarGoogle Scholar
  15. Lyn88.C Lynch Selectivity estlmatmn and query optlmlzatmn in large databases with highly skewed distributions of column values In Proc Fourteenth VLDB, pages 240-251, August 1988 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. MCS88.M Manmno, P Chu, and T Sager Statistical profile estimation m database systems Computsng Surveys, 20(3) 191-221, September 1988 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. MD88.M Murahknshna and D DeWltt Equldepth histograms for estimating selectiv- Ity factors for multl-dlmensmnal queries In Proc SIGMOD Conference, pages 28- 36, June 1988 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. MDL83.A Montgomery, D D'Souza, and S Lee The cost of relatmnal algebraic operatmns on skewed data Estimates and experiments Information Processing Letters, pages 235-241, 1983Google ScholarGoogle Scholar
  19. MK85.B Muthuswamy and G Kerschberg A DDSM for relatmnal query optlmlzatmn Techmcal report, Umverslty of South Carohna, Columbia, 1985 As cited in {MCS88}Google ScholarGoogle Scholar
  20. OR86.F Olken and D Rotem Simple random samphng for relational databases In Proc Twelfth VLDB, pages 160-169, August 1986 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. OR89.F Olken and D Rotem Random samphng from B+trees In Proc Fzfleenth VLDB, pages 269-278, August 1989 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. PSC84.G Pmtetsky-Shaplro and C Connell Accurate estimation of the number of tuples satisfying a condltmn In Proc A CM SIG- MOD Conference, pages 256-276, 3une 1984 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. SAC+79.P G Sehnger, M M Astrahan, D D Chamberhn, R A Lone, and T G Price Access path selectmn m a relatmnal database management system In Proc A CM SIGMOD Conference, pages 23-34, 1979 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Practical selectivity estimation through adaptive sampling

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SIGMOD '90: Proceedings of the 1990 ACM SIGMOD international conference on Management of data
                May 1990
                398 pages
                ISBN:0897913655
                DOI:10.1145/93597

                Copyright © 1990 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 May 1990

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate785of4,003submissions,20%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader