ABSTRACT
We propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, star coordinates from NASA's Infrared-Ultraviolet Explorer etc.) we provide evidence that real data indeed are skewed, and, moreover, we show that they behave as mathematical fractals, with a measurable, non-integer fractal dimension.
Armed with this tool, we then show its practical use in predicting the performance of spatial access methods, and specifically of the R-trees. We provide the first analysis of R-trees for skewed distributions of points: We develop a formula that estimates the number of disk accesses for range queries, given only the fractal dimension of the point set, and its count. Experiments on real data sets show that the formula is very accurate: the relative error is usually below 5%, and it rarely exceeds 10%.
We believe that the fractal dimension will help replace the uniformity and independence assumptions, allowing more accurate analysis for any spatial access method, as well as better estimates for query optimization on multi-attribute queries.
- ACF+93.Manish Arya, William Cody, Christos Faloutsos, Joel Richardson, and Arthur Toga. Qbism: a prototype 3-d medical image database system. IEEE Data Engineering Bulletin, 16(1):38-42, March 1993.Google Scholar
- ACF+94.Manish Ary#, William Cody, Christos Faloutsos, Joel Richardson, and Arthur Toga. Qbism: Extending a dbms to support 3d medical images. Tenth Int. Conf. on Data Engineering (ICDE), February 1994. (to appear). Google ScholarDigital Library
- AS91.Walid G. Aref and Hanan Samet. Optimization strategies for spatial query processing. Proc. of VLDB (Very Large Data Bases), pages 81-90, September 1991. Google ScholarDigital Library
- BKSS90.N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: an efficient and robust access method for points and rectangles. A CM SIGMOD, pages 322-331, May 1990. Google ScholarDigital Library
- Chr84.S. Christodoulakis. Implication of certain assumptions in data base performance evaluation. A CM TODS, June 1984. Google ScholarDigital Library
- FJ92.Christos Faloutsos and H.V. Jagadish. On b-tree indices for skewed distributions. In 18th VLDB Conference, pages 363-374, Vancouver, British Columbia, August 1992. Google ScholarDigital Library
- FSR87.Analysis of object oriented spatial access methods. Proc. A CM SIGMOD, pages 426-439 426- 439, May 1987. also available as SRC-TR-87-30, UMIACS-TR-86-27, CS-TR-1781. Google ScholarDigital Library
- Gar82.I. Gargantini. An effective way to represent quadtrees.Comm. of A CM (CACM), 25(12):905-910, December 1982. Google ScholarDigital Library
- Gut84.A. Guttman. R-trees: a dynamic index structure for spatial searching. Proc. A CM SIGMOD, pages 47-57, June 1984. Google ScholarDigital Library
- IC91.Yannis E. Ioannidis and Stavros Christodoulakis. On the propagation of errors in the size of join results. Proc. of A CM SIGMOD, pages 268-277, May 1991. Google ScholarDigital Library
- Jag90.H.V. Jagadish. Spatial search with polyhedta. Proc. Sixth IEEE Int'l Conf. on Data Engineering, February 1990. Google ScholarDigital Library
- KF93.Ibrahim Kamel and Christos Faloutsos. On packing r-trees. Second Int. Conf. on Information and Knowledge Management (CIKM), November 1993. to appear. Google ScholarDigital Library
- Man77.B. Mandelbrot. Fractal Geometry of Nature. W.H. Freeman, New York, 1977.Google Scholar
- MD88.M. Muralikrishna and David J. DeWitt. Equidepth histograms for estimating selectivity factors for multi-dimensional queries. In Proc. A CM SIGMOD, pages 28-36, Chicago, IL, June 1988. Google ScholarDigital Library
- NHS84.J. Nievergelt, H. Hinterberger, and K.C. Sevcik. The grid file: an adaptable, symmetric multikey file structure. ACM TODS, 9(1):38-71, March 1984. Google ScholarDigital Library
- NS86.R. Nelson and H. Samet. A population analysis of quadtrees with variable node size. Tech. Report CAR-TR-241, also CS-TR-1740, DCR- 86-05557, Computer Science Department, Univ. of Maryland, College Park, December 1986.Google Scholar
- Ore86.J. Orenstein. Spatial query processing in an object-oriented database system. Proc. A CM SIGMOD, pages 326-336, May 1986. Google ScholarDigital Library
- PSTW93.B. Pagel, H. Six, H. Toben, and P. Widmayer. Towards an analysis of range query performance. In Proc. PODS 93, pages 214-221, Washington, D.C., May 1993. Google ScholarDigital Library
- RL85.N. Roussopoulos and D. Leifker. Direct spatial search on pictorial databases using packed rtrees. Proc. A CM SIGMOD, May 1985. Google ScholarDigital Library
- Sam89.H. Samet. The Design and Analysis o} Spatial Data Structures. Addison-Wesley, 1989. Google ScholarDigital Library
- Sam90.H. Samet. Applications o} Spatial Data Structures Computer Graphics, Image Processing and GIS. Addison-Wesley, 1990. Google ScholarDigital Library
- Sch91.Manfred Schroeder. Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise. W.tt. Freeman and Company, New York, 1991.Google Scholar
- SRF87.T. Sellis, N. Roussopoulos, and C. Faloutsos. The r+ tree: a dynamic index for multidimensional objects. In Proc. 13th International Conference on VLDB, pages 507-518, England,, September 1987. also available as SRC-TR-87- 32,UMIAGS-TR-8?-3, GS-TR-1795. Google ScholarDigital Library
- Zip49.G.K. Zipf. Human Behavior and Principle o.f Least Effort: an Introduction to Human Ecology. Addison Wesley, Cambridge, Massachusetts, 1949.Google Scholar
Index Terms
- Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension
Recommendations
Relaxing the Uniformity and Independence Assumptions Using the Concept of Fractal Dimension
Special issue on principles of database systemsWe propose the concept of fractal dimension of a set of points, in order to quantify the deviation from the uniformity distribution. Using measurements on real data sets (road intersections of U.S. counties, population versus area of different nations, ...
Almost k-wise independence versus k-wise independence
We say that a distribution over {0,1}n is (ε,k)-wise independent if its restriction to every k coordinates results in a distribution that is ε-close to the uniform distribution. A natural question regarding (ε, k)-wise independent distributions is how ...
Comments