skip to main content
article
Free Access

Balancing histogram optimality and practicality for query result size estimation

Authors Info & Claims
Published:22 May 1995Publication History
Skip Abstract Section

Abstract

Many current database systems use histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. In choosing among the various histograms, one has to balance between two conflicting goals: optimality, so that generated estimates have the least error, and practicality, so that histograms can be constructed and maintained efficiently. In this paper, we present both theoretical and experimental results on several issues related to this trade-off. Our overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query.

References

  1. 1 C. M. Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. In Proc. of the 199~ A CM-SIGMOD Conf., pages 161-172, Washington, DC, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 S. Christodoulakis. Estimating block transfers and join sizes. In Proc. of the 1983 A CM-SIGMOD Con}., pages 40-54, San Jose, CA, May 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 S. Christodoulakis. Implications of certain assumptions in database performance evaluation. A CM TODS, 9(2):163-186, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 S. Christodoulakis On the estimation and use of selectivities in database performance evaluation. Research Report CS-89-24, Dept. of Computer Science, University of Waterloo, June 1989.Google ScholarGoogle Scholar
  5. 5 C. Date. A Guide to DB2. Addison Wesley, Reading, MA, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6 C. Faloutsos and H. V. Jagadish. On b-tree indices for skewed distributions. In Proc. 18th Int. VLDB Conf., pages 363-374, Vancouver, BC, August 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7 P. Haas and A. Swami. Sequential sampling procedures for query size estimation. In Proc. of the 1992 ACM- SIGMOD Conf., pages 341-350, San Diego, CA, June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 P. J. Haas and A. Swami. Sampling-based selectivity estimation for joins using augmented frequent value statistics. In Proc. of the 1995 IEEE Con}. on Data Engzneerzng, Taipei, Taiwan, March 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9 Y. Ioannidis. Universality of serial histograms. In Proc. 19th Int. VLDB Conf., pages 256-267, Dublin, Ireland, August 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 Y. Io~nnidis and $. CJhristodoulakis. On the propagation of errors in the size of join results. In Proc. of the 1991 A CM-SIGMOD Conf., pages 268-277, Denver, CO, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11 Y. Ioannidis and S Christodoulakis. Optimal histograms for limiting worst-case error propagation in the estimates of query optimizers, December 1993 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 Y. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. Unpublished manuscript, February 1995.Google ScholarGoogle Scholar
  13. 13 N. Kamel and R. King. A model of data distribution based on texture analysis. In Proc. of the 1985 A CM- SIGMOD Conf., pages 319-325. Austin, TX, May 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 R. P. Kooi. The Opt~mizatzon of Q, uemes in Relational Databases PhD thesis, Case Western Reserve University, September 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15 R. J. Lipton, J. F. Naughton, and D. A. Schneider. Practical selectivity estimation through adaptive sampiing. In Proc. of the 1990 A CM-SIGMOD Conf., pages 1-11, Atlantic City, NJ, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 M. V. Mannino, P. Chu, and T. Sager. Statistical profile estimation in database systems. A CM Cornput~ng Surveys, 20(3):192-221, September 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17 A. W. Marshall and I. Olkin. Inequalities: Theory of Majorizat~on and Its Applications. Academic Press, New York, NY, 1979.Google ScholarGoogle Scholar
  18. 18 T. H. Merrett and E. Otoo. Distribution models of relations. In Proc. 5th Int. VLDB Conf., pages 418- 425, Rio de Janeiro, Brazil, October 1979.Google ScholarGoogle ScholarCross RefCross Ref
  19. 19 M. Muralikrishna and D. J. DeWltt. Equi-depth histograms for estimating selectivity factors for multidimensional queries. In Proc. of the 1988 A CM- SIGMOD Conf., pages 28-36, Chicago, IL, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 B. Muthuswamy and L. Kerschberg. A ddsm for relational query optimization. In Proc. A CM Annual Conf., Denver, CO, October 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 G. Piatetsky-Shapiro and C. Connell. Accurate estimation of the number of tuptes satisfying a condition. In Proc. o} the 198~ A CM-SIGMOD Conf., pages 256-276, Boston, MA, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the A CM-SIGMOD Conf., pages 23-34, Boston, MA, June 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 Y. Wang. Experience from a real life query optimizer. In Proc. of the 1992 A CM-SIGMOD Conf., page 286, San Diego, CA, June 1992. Conf. presentation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24 G. K. Zipf. Human Behavior and the Pmnc~ple of Least Effort. Addison-Wesley, Reading, MA, 1949.Google ScholarGoogle Scholar

Index Terms

  1. Balancing histogram optimality and practicality for query result size estimation

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM SIGMOD Record
                ACM SIGMOD Record  Volume 24, Issue 2
                May 1995
                490 pages
                ISSN:0163-5808
                DOI:10.1145/568271
                Issue’s Table of Contents
                • cover image ACM Conferences
                  SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of data
                  June 1995
                  508 pages
                  ISBN:0897917316
                  DOI:10.1145/223784

                Copyright © 1995 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 22 May 1995

                Check for updates

                Qualifiers

                • article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader