Skip to main content

2017 | OriginalPaper | Buchkapitel

Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL – Lessons from the Road

verfasst von : Jia Yu, Mohamed Sarwat

Erschienen in: Advances in Spatial and Temporal Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present our experience in indexing the drop-off and pick-up locations of taxi trips in New York City. The paper presents a comprehensive experimental analysis of classic and state-of-the-art spatial database indexing schemes. The paper evaluates a popular spatial tree indexing scheme (i.e., GIST-Spatial), a Block Range Index (BRIN-Spatial) provided by PostgreSQL as well as a new indexing scheme, namely Hippo-Spatial. In the experiments, the paper considers five evaluation metrics to compare and contrast the performance of the three indexing schemes: storage overhead, index initialization time, query response time, maintenance overhead, and throughput. Furthermore, the benchmark takes into account parameters that affect the index performance, which include but is not limited to: data size, spatial query selectivity, and spatial area density, The paper finally analyzes the experimental evaluation results and highlights the key insights and lessons learned. The results emphasize the fact that there is no one size that fits all when it comes to indexing massive-scale spatial data. The results also prove that modern database systems can maintain a lightweight index (in terms of storage and maintenance overhead) that is also fast enough for spatial data analytics applications. The source code for the experiments presented in the paper is available here: https://​github.​com/​DataSystemsLab/​hippo-postgresql.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Aoki, P.M.: Generalizing “search” in generalized search trees. In: Proceedings of the 14th International Conference on Data Engineering, pp. 380–389. IEEE (1998) Aoki, P.M.: Generalizing “search” in generalized search trees. In: Proceedings of the 14th International Conference on Data Engineering, pp. 380–389. IEEE (1998)
6.
Zurück zum Zitat Bontempo, C., Zagelow, G.: The IBM data warehouse architecture. Commun. ACM 41(9), 38–48 (1998)CrossRef Bontempo, C., Zagelow, G.: The IBM data warehouse architecture. Commun. ACM 41(9), 38–48 (1998)CrossRef
9.
Zurück zum Zitat Corral, A., Vassilakopoulos, M., Manolopoulos, Y.: Algorithms for joining R-trees and linear region quadtrees. In: Güting, R.H., Papadias, D., Lochovsky, F. (eds.) SSD 1999. LNCS, vol. 1651, pp. 251–269. Springer, Heidelberg (1999). doi:10.1007/3-540-48482-5_16 CrossRef Corral, A., Vassilakopoulos, M., Manolopoulos, Y.: Algorithms for joining R-trees and linear region quadtrees. In: Güting, R.H., Papadias, D., Lochovsky, F. (eds.) SSD 1999. LNCS, vol. 1651, pp. 251–269. Springer, Heidelberg (1999). doi:10.​1007/​3-540-48482-5_​16 CrossRef
10.
Zurück zum Zitat Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval of composite keys. Acta Inf. 4(1), 1–9 (1974)CrossRefMATH Finkel, R.A., Bentley, J.L.: Quad trees: a data structure for retrieval of composite keys. Acta Inf. 4(1), 1–9 (1974)CrossRefMATH
11.
Zurück zum Zitat Fusco, F., Stoecklin, M.P., Vlachos, M.: Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. VLDB J. 3(1–2), 1382–1393 (2010) Fusco, F., Stoecklin, M.P., Vlachos, M.: Net-fli: on-the-fly compression, archiving and indexing of streaming network traffic. VLDB J. 3(1–2), 1382–1393 (2010)
12.
Zurück zum Zitat Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of the International Conference on Data Engineering, ICDE, pp. 370–379. IEEE (1998) Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proceedings of the International Conference on Data Engineering, ICDE, pp. 370–379. IEEE (1998)
13.
Zurück zum Zitat Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 47–57. ACM (1984) Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 47–57. ACM (1984)
14.
Zurück zum Zitat Hellerstein, J.M.: Generalized search tree. In: Liu, L., Tamer Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1222–1224. Springer, US (2009) Hellerstein, J.M.: Generalized search tree. In: Liu, L., Tamer Özsu, M. (eds.) Encyclopedia of Database Systems, pp. 1222–1224. Springer, US (2009)
15.
Zurück zum Zitat Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems, September 1995 Hellerstein, J.M., Naughton, J.F., Pfeffer, A.: Generalized search trees for database systems, September 1995
16.
Zurück zum Zitat Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of the International Conference on Very Large Data Bases, VLDB, September 1994 Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of the International Conference on Very Large Data Bases, VLDB, September 1994
17.
Zurück zum Zitat Kamel, I., Khalil, M., Kouramajian, V.: Bulk insertion in dynamic R-trees. In: Proceedings of the International Symposium on Spatial Data Handling, SDH, pp. 31–42 (1996) Kamel, I., Khalil, M., Kouramajian, V.: Bulk insertion in dynamic R-trees. In: Proceedings of the International Symposium on Spatial Data Handling, SDH, pp. 31–42 (1996)
18.
Zurück zum Zitat Kornacker, M., Mohan, C., Hellerstein, J.M.: Concurrency and recovery in generalized search trees. ACM SIGMOD Rec. 26, 62–72 (1997). ACMCrossRef Kornacker, M., Mohan, C., Hellerstein, J.M.: Concurrency and recovery in generalized search trees. ACM SIGMOD Rec. 26, 62–72 (1997). ACMCrossRef
19.
Zurück zum Zitat Lee, M.-L., Hsu, W., Jensen, C.S., Cui, B., Teo, K.L.: Supporting frequent updates in R-trees: a bottom-up approach. In: Proceedings of the International Conference on Very Large Data Bases, VLDB, pp. 608–619, September 2003 Lee, M.-L., Hsu, W., Jensen, C.S., Cui, B., Teo, K.L.: Supporting frequent updates in R-trees: a bottom-up approach. In: Proceedings of the International Conference on Very Large Data Bases, VLDB, pp. 608–619, September 2003
20.
Zurück zum Zitat Samet, H., Webber, R.E.: Storing a collection of polygons using quadtrees. ACM Trans. Graph. TOG 4(3), 182–222 (1985)CrossRef Samet, H., Webber, R.E.: Storing a collection of polygons using quadtrees. ACM Trans. Graph. TOG 4(3), 182–222 (1985)CrossRef
21.
Zurück zum Zitat Sidirourgos, L., Kersten, M.L.: Column imprints: a secondary index structure. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 893–904. ACM (2013) Sidirourgos, L., Kersten, M.L.: Column imprints: a secondary index structure. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 893–904. ACM (2013)
22.
Zurück zum Zitat Ślezak, D., Eastwood, V.: Data warehouse technology by infobright. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 841–846. ACM (2009) Ślezak, D., Eastwood, V.: Data warehouse technology by infobright. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 841–846. ACM (2009)
23.
Zurück zum Zitat Stonebraker, M., Rowe, L.A.: The design of postgres. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 340–355. ACM (1986) Stonebraker, M., Rowe, L.A.: The design of postgres. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD, pp. 340–355. ACM (1986)
24.
Zurück zum Zitat Tayeb, J., Ulusoy, Ö., Wolfson, O.: A quadtree-based dynamic attribute indexing method. Comput. J. 41(3), 185–200 (1998)CrossRefMATH Tayeb, J., Ulusoy, Ö., Wolfson, O.: A quadtree-based dynamic attribute indexing method. Comput. J. 41(3), 185–200 (1998)CrossRefMATH
25.
Zurück zum Zitat Weiss, R.: A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores (2012) Weiss, R.: A technical overview of the oracle exadata database machine and exadata storage server. Oracle White Paper. Oracle Corporation, Redwood Shores (2012)
26.
Zurück zum Zitat Xu, X., Han, J., Lu, W.: RT-tree: an improved R-tree indexing structure for temporal spatial databases. In: Proceeding of the International Symposium on Spatial Data Handling, SSDH, pp. 1040–1049, July 1990 Xu, X., Han, J., Lu, W.: RT-tree: an improved R-tree indexing structure for temporal spatial databases. In: Proceeding of the International Symposium on Spatial Data Handling, SSDH, pp. 1040–1049, July 1990
27.
Zurück zum Zitat Yu, J., Moraffah, R., Sarwat, M.: Hippo in action: scalable indexing of a billion New York city taxi trips and beyond. In: Proceedings of the International Conference on Data Engineering, ICDE. IEEE (2017) Yu, J., Moraffah, R., Sarwat, M.: Hippo in action: scalable indexing of a billion New York city taxi trips and beyond. In: Proceedings of the International Conference on Data Engineering, ICDE. IEEE (2017)
28.
Zurück zum Zitat Jia, Y., Sarwat, M.: Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems. Proc. VLDB Endowment 10(4), 385–396 (2016)CrossRef Jia, Y., Sarwat, M.: Two birds, one stone: a fast, yet lightweight, indexing scheme for modern database systems. Proc. VLDB Endowment 10(4), 385–396 (2016)CrossRef
29.
Zurück zum Zitat Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the International Conference on Data Engineering, ICDE, pp. 59–59. IEEE (2006) Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proceedings of the International Conference on Data Engineering, ICDE, pp. 59–59. IEEE (2006)
Metadaten
Titel
Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL – Lessons from the Road
verfasst von
Jia Yu
Mohamed Sarwat
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64367-0_8

Premium Partner