skip to main content
10.1145/182591.182593acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article
Free Access

Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

Published:24 May 1994Publication History

ABSTRACT

We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA's Infrared-Ultraviolet Explorer etc.) we provide evidence that real data indeed are skewed, and, moreover, we show that they behave as mathematical fractals, with a measurable, non-integer fractal dimension.

Armed with this tool, we then show its practical use in predicting the performance of spatial access methods, and specifically of the R-trees. We provide the first analysis of R-trees for skewed distributions of points: We develop a formula that estimates the number of disk accesses for range queries, given only the fractal dimension of the point set, and its count. Experiments on real data sets show that the formula is very accurate: the relative error is usually below 5%, and it rarely exceeds 10%.

We believe that the fractal dimension will help replace the uniformity and independence assumptions, allowing more accurate analysis for any spatial access method, as well as better estimates for query optimization on multi-attribute queries.

References

  1. ACF+93.Manish Arya, William Cody, Christos Faloutsos, Joel Richardson, and Arthur Toga. Qbism: a prototype 3-d medical image database system. IEEE Data Engineering Bulletin, 16(1):38-42, March 1993.Google ScholarGoogle Scholar
  2. ACF+94.Manish Ary#, William Cody, Christos Faloutsos, Joel Richardson, and Arthur Toga. Qbism: Extending a dbms to support 3d medical images. Tenth Int. Conf. on Data Engineering (ICDE), February 1994. (to appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AS91.Walid G. Aref and Hanan Samet. Optimization strategies for spatial query processing. Proc. of VLDB (Very Large Data Bases), pages 81-90, September 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BKSS90.N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: an efficient and robust access method for points and rectangles. A CM SIGMOD, pages 322-331, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chr84.S. Christodoulakis. Implication of certain assumptions in data base performance evaluation. A CM TODS, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. FJ92.Christos Faloutsos and H.V. Jagadish. On b-tree indices for skewed distributions. In 18th VLDB Conference, pages 363-374, Vancouver, British Columbia, August 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. FSR87.Analysis of object oriented spatial access methods. Proc. A CM SIGMOD, pages 426-439 426- 439, May 1987. also available as SRC-TR-87-30, UMIACS-TR-86-27, CS-TR-1781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gar82.I. Gargantini. An effective way to represent quadtrees.Comm. of A CM (CACM), 25(12):905-910, December 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gut84.A. Guttman. R-trees: a dynamic index structure for spatial searching. Proc. A CM SIGMOD, pages 47-57, June 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. IC91.Yannis E. Ioannidis and Stavros Christodoulakis. On the propagation of errors in the size of join results. Proc. of A CM SIGMOD, pages 268-277, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jag90.H.V. Jagadish. Spatial search with polyhedta. Proc. Sixth IEEE Int'l Conf. on Data Engineering, February 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. KF93.Ibrahim Kamel and Christos Faloutsos. On packing r-trees. Second Int. Conf. on Information and Knowledge Management (CIKM), November 1993. to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Man77.B. Mandelbrot. Fractal Geometry of Nature. W.H. Freeman, New York, 1977.Google ScholarGoogle Scholar
  14. MD88.M. Muralikrishna and David J. DeWitt. Equidepth histograms for estimating selectivity factors for multi-dimensional queries. In Proc. A CM SIGMOD, pages 28-36, Chicago, IL, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. NHS84.J. Nievergelt, H. Hinterberger, and K.C. Sevcik. The grid file: an adaptable, symmetric multikey file structure. ACM TODS, 9(1):38-71, March 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. NS86.R. Nelson and H. Samet. A population analysis of quadtrees with variable node size. Tech. Report CAR-TR-241, also CS-TR-1740, DCR- 86-05557, Computer Science Department, Univ. of Maryland, College Park, December 1986.Google ScholarGoogle Scholar
  17. Ore86.J. Orenstein. Spatial query processing in an object-oriented database system. Proc. A CM SIGMOD, pages 326-336, May 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. PSTW93.B. Pagel, H. Six, H. Toben, and P. Widmayer. Towards an analysis of range query performance. In Proc. PODS 93, pages 214-221, Washington, D.C., May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. RL85.N. Roussopoulos and D. Leifker. Direct spatial search on pictorial databases using packed rtrees. Proc. A CM SIGMOD, May 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sam89.H. Samet. The Design and Analysis o} Spatial Data Structures. Addison-Wesley, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sam90.H. Samet. Applications o} Spatial Data Structures Computer Graphics, Image Processing and GIS. Addison-Wesley, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sch91.Manfred Schroeder. Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise. W.tt. Freeman and Company, New York, 1991.Google ScholarGoogle Scholar
  23. SRF87.T. Sellis, N. Roussopoulos, and C. Faloutsos. The r+ tree: a dynamic index for multidimensional objects. In Proc. 13th International Conference on VLDB, pages 507-518, England,, September 1987. also available as SRC-TR-87- 32,UMIAGS-TR-8?-3, GS-TR-1795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zip49.G.K. Zipf. Human Behavior and Principle o.f Least Effort: an Introduction to Human Ecology. Addison Wesley, Cambridge, Massachusetts, 1949.Google ScholarGoogle Scholar

Index Terms

  1. Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

          Recommendations

          Reviews

          Jerzy W. Jaromczyk

          The storage and retrieval of sets of points in multidimensional spaces is critical for a number of database applications, including geographic information systems and medical image databases. A spatial access method (SAM) is a standard approach for storing and accessing such data points; this includes R-trees. To estimate the search effort (range queries) in any particular SAM it is necessary to make assumptions about the distribution of data points. The paper proposes to use the notion of fractal dimension to characterize the deviation of real distributions of data points from the uniform distribution. This notion, based on elementary concepts from the theory of fractals, helps to analyze the search performance of a spatial access method. The authors use it to analyze R-trees for a score of real-life data sets and argue that the analytic predictions based on the fractal dimension agree closely with the experimental results. Several graphs and tables and an elaborated conclusion section contribute to the clarity of the presentation.

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PODS '94: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
            May 1994
            313 pages
            ISBN:0897916425
            DOI:10.1145/182591
            • Chairman:
            • Victor Vianu

            Copyright © 1994 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 May 1994

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            PODS '94 Paper Acceptance Rate28of117submissions,24%Overall Acceptance Rate642of2,707submissions,24%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader