skip to main content
article
Free Access

Accurate estimation of the number of tuples satisfying a condition

Authors Info & Claims
Published:01 June 1984Publication History
Skip Abstract Section

Abstract

We present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<", "≥", "≤". Our method gives highly accurate, yet easy to compute, estimates. We store information about attribute values as a list of distribution steps (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by increasing the number of steps. We analyze desirable conditions that such estimates should satisfy. Based on the distribution steps, we derive a set of estimation formulas which minimize the worst-case error. We also present another set of formulas which reduce the average-case error. Finally, we show how to use sampling to compute a close approximation of the distribution steps very quickly. The major applications of our method are in query optimization and in answering statistical queries.

References

  1. {Blasgen 77} Blasgen, M W., and Eswaran, K. P. Storage and access in relational databases IBM Syst. J 16(4), 1977.Google ScholarGoogle Scholar
  2. {Christodoulakis 81} Christodoulakis, S Estimating selectivities in data bases. Technical Report CSRG-136, Ph.D. thesis, University of Toronto, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {Christodoulakis 83} Christodoulakis, S Estimating block transfers and join sizes. In Proc of annual meeting, pages 40--54. ACM-SIGMOD, May, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {Demolombe 80} Demolombe, R. Estimation of the number of tuples satisying a query expressed in predicate calculus language In Proc of 6th Int. Conf. on Very Large Data Bases, pages 55--63. Montreal, Canada, 1980.Google ScholarGoogle Scholar
  5. {Dixon 79} Dixon, W. J., and Massey, F. J. Introduction to statistical analysis McGraw-Hill Book Company, 1979.Google ScholarGoogle Scholar
  6. {Luk 83} Luk, W. S. On estimating block accesses in database organizations. Commun. ACM 26(11):945--947, November, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. {Piatetsky 84} Piatetsky-Shapiro, G. I. A Self-Organizing Database System -- a Different Approach to Query Optimization. PhD thesis, Courant Institute, New York University, May, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {Rowe 83} Rowe, N C. Top-down statistical estimation on a database. In Proc. of annual meeting, pages 135--145. ACM-SIGMOD, May, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {Samson 83} Samson, W. B. and Bendell, A. Rank order distributions and secondary key indexing, extended abstract. In Proc of 2nd Int Conf. on Databases. Cambridge, England, September, 1983.Google ScholarGoogle Scholar
  10. {Selinger 79} Selinger, P. G., et al. Access path selection in a relational database management system. In Proc of ACM-SIGMOD Int Conf. on Manage. of Data. 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. {Whang 83} Whang, K.-Y, Wiederhold, G. and Sagalowicz, D. Estimating block accesses in database organizations: a closed noniterative formula. Commun. ACM 26(11):940--944, November, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {Yao 77} Yao, S. B. Approximating block accesses in database organizations Commun ACM 20(4):260--261, April, 1977 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. {Yao 79} Yao, S. B. Optimization of query evaluation algorithms. ACM Trans. Database Syst. 4(2):133--155, September, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {Yu 78} Yu, C. T., Luk, W. S., and Siu, M. K. On the estimation of the number of desired records with respect to a given query. ACM Trans Database Syst 3(1) 41--56, March, 1978 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. {Zipf 49} Zipf, G. K. Human Behaviour and the Principle of Least Effort. Addison-Wesley, Cambridge, Mass., 1949.Google ScholarGoogle Scholar

Index Terms

  1. Accurate estimation of the number of tuples satisfying a condition
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGMOD Record
        ACM SIGMOD Record  Volume 14, Issue 2
        June 1984
        333 pages
        ISSN:0163-5808
        DOI:10.1145/971697
        Issue’s Table of Contents
        • cover image ACM Conferences
          SIGMOD '84: Proceedings of the 1984 ACM SIGMOD international conference on Management of data
          June 1984
          339 pages
          ISBN:0897911288
          DOI:10.1145/602259

        Copyright © 1984 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 1984

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF971697.602294.2.pdf

      eReader

      View online with eReader.

      eReader