Skip to main content
Erschienen in: The VLDB Journal 4/2014

01.08.2014 | Regular Paper

Instance-level worst-case query bounds on R-trees

verfasst von: Yufei Tao, Yi Yang, Xiaocheng Hu, Cheng Sheng, Shuigeng Zhou

Erschienen in: The VLDB Journal | Ausgabe 4/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Even with its significant impacts on the database area, the R-tree is often criticized by its lack of good worst-case guarantees. For example, in range search (where we want to report all the data points in a query rectangle), it is known that on adversely designed datasets and queries, an R-tree can be as slow as a sequential scan that simply reads all the data points. Nevertheless, R-trees work so well on real data that they have been widely implemented in commercial systems. This stark contrast has caused long-term controversy between practitioners and theoreticians as to whether this structure deserves its fame. This paper provides theoretical evidence that, somewhat surprisingly, R-trees are efficient in the worst case for range search on many real datasets. Given any integer \(K\), we explain how to obtain an upper bound on the cost of answering all (i.e., infinitely many) range queries retrieving at most \(K\) objects. On practical data, the upper bound is only a fraction of the overhead of sequential scan (unless, apparently, \(K\) is at the same order as the dataset size). Our upper bounds are tight up to a constant factor, namely they cannot be lowered by more than \(O(1)\) times while still capturing the most expensive queries. Our upper bounds can be calculated in constant time by remembering only three integers. These integers, in turn, are generated from only the leaf MBRs of an R-tree, but not the leaf nodes themselves. In practice, the internal nodes are often buffered in memory, so that the integers aforementioned can be efficiently maintained along with the data updates and made available to a query optimizer at any time. Furthermore, our analytical framework introduces instance-level query bound as a new technique for evaluating the efficiency of heuristic structures in a theory-flavored manner (previously, experimentation was the dominant assessment method).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
All rectangles in this paper are axis-parallel.
 
2
If the rectangle is covered by \((-\infty , x] \times (-\infty , y]\), it is not counted by \(\mathrm{downquad}(p, S)\).
 
3
Such a query always exists for a generic \(P\). To see this, move a vertical sweep line \(\ell \) rightward starting from \(x = -\infty \), and stop as soon as \(\tau \) points are on or to the left of \(\ell \). The swept region is a range query with size \(\tau \).
 
Literatur
1.
Zurück zum Zitat Arge, L., de Berg, M., Haverkort, H.J., Yi, K.: The priority R-tree: a practically efficient and worst-case optimal R-tree. In: Proceedings of ACM Management of Data (SIGMOD), pp. 347–358 (2004) Arge, L., de Berg, M., Haverkort, H.J., Yi, K.: The priority R-tree: a practically efficient and worst-case optimal R-tree. In: Proceedings of ACM Management of Data (SIGMOD), pp. 347–358 (2004)
2.
Zurück zum Zitat Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of ACM Management of Data (SIGMOD), pp. 322–331 (1990) Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of ACM Management of Data (SIGMOD), pp. 322–331 (1990)
3.
Zurück zum Zitat Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM (CACM) 18(9), 509–517 (1975)CrossRefMATH Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM (CACM) 18(9), 509–517 (1975)CrossRefMATH
4.
Zurück zum Zitat Faloutsos, C., Kamel, I.: Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 4–13 (1994) Faloutsos, C., Kamel, I.: Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 4–13 (1994)
5.
Zurück zum Zitat Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM Management of Data (SIGMOD), pp. 47–57 (1984) Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM Management of Data (SIGMOD), pp. 47–57 (1984)
6.
Zurück zum Zitat Hellerstein, J.M., Koutsoupias, E., Miranker, D.P., Papadimitriou, C.H., Samoladas, V.: On a model of indexability and its bounds for range queries. JACM 49(1), 35–55 (2002) Hellerstein, J.M., Koutsoupias, E., Miranker, D.P., Papadimitriou, C.H., Samoladas, V.: On a model of indexability and its bounds for range queries. JACM 49(1), 35–55 (2002)
7.
Zurück zum Zitat Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Datab. Syst. 24(2), 265–318 (1999)CrossRef Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Datab. Syst. 24(2), 265–318 (1999)CrossRef
8.
Zurück zum Zitat Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of Very Large Data Bases (VLDB), pp. 500–509 (1994) Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of Very Large Data Bases (VLDB), pp. 500–509 (1994)
9.
Zurück zum Zitat Kanth, K.V.R., Singh, A.K.: Optimal dynamic range searching in non-replicating index structures. In: Proceedings of International Conference on Database Theory (ICDT), pp. 257–276 (1999) Kanth, K.V.R., Singh, A.K.: Optimal dynamic range searching in non-replicating index structures. In: Proceedings of International Conference on Database Theory (ICDT), pp. 257–276 (1999)
10.
Zurück zum Zitat Leutenegger, S.T., Edgington, J.M., Lopez, M.A.: STR: A simple and efficient algorithm for R-tree packing. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 497–506 (1997) Leutenegger, S.T., Edgington, J.M., Lopez, M.A.: STR: A simple and efficient algorithm for R-tree packing. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 497–506 (1997)
11.
Zurück zum Zitat Pagel, B.-U., Six, H.-W., Toben, H., Widmayer, P.: Towards an analysis of range query performance in spatial data structures. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 214–221 (1993) Pagel, B.-U., Six, H.-W., Toben, H., Widmayer, P.: Towards an analysis of range query performance in spatial data structures. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 214–221 (1993)
12.
Zurück zum Zitat Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Datab. Syst. 30(1), 41–82 (2005)CrossRef Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Datab. Syst. 30(1), 41–82 (2005)CrossRef
13.
Zurück zum Zitat Procopiuc, O., Agarwal, P.K., Arge, L., Vitter, J.S.: Bkd-tree: A dynamic scalable kd-tree. In: Proceedings of Symposium on Advances in Spatial and Temporal Databases (SSTD), pp. 46–65 (2003) Procopiuc, O., Agarwal, P.K., Arge, L., Vitter, J.S.: Bkd-tree: A dynamic scalable kd-tree. In: Proceedings of Symposium on Advances in Spatial and Temporal Databases (SSTD), pp. 46–65 (2003)
14.
Zurück zum Zitat Robinson, J.T.: The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of ACM Management of Data (SIGMOD), pp. 10–18 (1981) Robinson, J.T.: The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of ACM Management of Data (SIGMOD), pp. 10–18 (1981)
15.
Zurück zum Zitat Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of ACM Management of Data (SIGMOD), pp. 71–79 (1995) Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of ACM Management of Data (SIGMOD), pp. 71–79 (1995)
16.
Zurück zum Zitat Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The R+-tree: A dynamic index for multi-dimensional objects. In: Proceedings of Very Large Data Bases (VLDB), pp. 507–518 (1987) Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The R+-tree: A dynamic index for multi-dimensional objects. In: Proceedings of Very Large Data Bases (VLDB), pp. 507–518 (1987)
17.
Zurück zum Zitat Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. CACM 28(2), 202–208 (1985)CrossRefMathSciNet Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. CACM 28(2), 202–208 (1985)CrossRefMathSciNet
18.
Zurück zum Zitat Tao, Y., Papadias, D.: Performance analysis of R*-trees with arbitrary node extents. IEEE Trans. Knowl. Data Eng. 16(6), 653–668 (2004)CrossRef Tao, Y., Papadias, D.: Performance analysis of R*-trees with arbitrary node extents. IEEE Trans. Knowl. Data Eng. 16(6), 653–668 (2004)CrossRef
19.
Zurück zum Zitat Theodoridis, Y., Sellis, T.K.: A model for the prediction of R-tree performance. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 161–171 (1996) Theodoridis, Y., Sellis, T.K.: A model for the prediction of R-tree performance. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 161–171 (1996)
Metadaten
Titel
Instance-level worst-case query bounds on R-trees
verfasst von
Yufei Tao
Yi Yang
Xiaocheng Hu
Cheng Sheng
Shuigeng Zhou
Publikationsdatum
01.08.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 4/2014
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-013-0339-5

Weitere Artikel der Ausgabe 4/2014

The VLDB Journal 4/2014 Zur Ausgabe