Skip to main content
Erschienen in: The VLDB Journal 4/2023

19.01.2023 | Regular Paper

SQUID: subtrajectory query in trillion-scale GPS database

verfasst von: Dongxiang Zhang, Zhihao Chang, Dingyu Yang, Dongsheng Li, Kian-Lee Tan, Ke Chen, Gang Chen

Erschienen in: The VLDB Journal | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Subtrajectory query has been a fundamental operator in mobility data management and useful in the applications of trajectory clustering, co-movement pattern mining and contact tracing in epidemiology. In this paper, we make the first attempt to study subtrajectory query in trillion-scale GPS databases, so as to support applications with urban-scale moving users and weeks-long historical data. We develop SQUID as a distributed subtrajectory query processing engine on Spark, with threefold technical contributions. First, we propose compact index and storage layers to handle massive trajectory datasets with trillion-scale GPS points. Second, we leverage hybrid partitioning, together with local indexes that are disk I/O friendly, to facilitate pruning. Third, we devise a novel filter-and-refine query processing framework to effectively reduce the number of trajectories for verification. Our experiments are conducted on huge trajectory datasets with up to 520 billion GPS points. The results validate the compactness of the storage mechanism and the scalability of the distributed query processing framework.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agarwal, P. K., Fox, K., Munagala, K., Nath, A., Pan, J., Taylor, E.: Subtrajectory clustering: models and algorithms. In Van den Bussche, J. and Arenas, M. (eds) PODS, 75–87 ACM, (2018) Agarwal, P. K., Fox, K., Munagala, K., Nath, A., Pan, J., Taylor, E.: Subtrajectory clustering: models and algorithms. In Van den Bussche, J. and Arenas, M. (eds) PODS, 75–87 ACM, (2018)
2.
Zurück zum Zitat Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)CrossRef Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)CrossRef
3.
Zurück zum Zitat Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In SIGMOD Conference, 1383–1394 ACM, (2015) Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In SIGMOD Conference, 1383–1394 ACM, (2015)
4.
Zurück zum Zitat Bakalov, P., Hadjieleftheriou, M., Keogh, E. J, Tsotras, V. J.: Efficient trajectory joins using symbolic representations Bakalov, P., Hadjieleftheriou, M., Keogh, E. J, Tsotras, V. J.: Efficient trajectory joins using symbolic representations
5.
Zurück zum Zitat Bakalov, P., Hadjieleftheriou, M., Tsotras, V. J.: Time relaxed spatiotemporal trajectory joins. In GIS 182–191 (2005) Bakalov, P., Hadjieleftheriou, M., Tsotras, V. J.: Time relaxed spatiotemporal trajectory joins. In GIS 182–191 (2005)
6.
Zurück zum Zitat Bakalov, P., Tsotras, V. J.: Continuous spatiotemporal trajectory joins. In GSN, 109–128 (2006) Bakalov, P., Tsotras, V. J.: Continuous spatiotemporal trajectory joins. In GSN, 109–128 (2006)
7.
Zurück zum Zitat Chen, L., Özsu, M. T., Oria, V.: Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, pp. 491–502. ACM, (2005) Chen, L., Özsu, M. T., Oria, V.: Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, pp. 491–502. ACM, (2005)
8.
Zurück zum Zitat Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. Proc. VLDB Endow. 12(10), 1208–1220 (2019)CrossRef Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. Proc. VLDB Endow. 12(10), 1208–1220 (2019)CrossRef
9.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In OSDI, 137–150, USENIX Association, (2004) Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In OSDI, 137–150, USENIX Association, (2004)
10.
Zurück zum Zitat Eldawy, A., Mokbel, M. F.: Spatialhadoop: a mapreduce framework for spatial data. In ICDE, 1352–1363 (2015) Eldawy, A., Mokbel, M. F.: Spatialhadoop: a mapreduce framework for spatial data. In ICDE, 1352–1363 (2015)
11.
Zurück zum Zitat Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 226–231 (1996) Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 226–231 (1996)
12.
Zurück zum Zitat Fan, Q., Zhang, D., Huayu, W., Tan, K.-L.: A general and parallel platform for mining co-movement patterns over large-scale trajectories. Proc. VLDB Endow. 10(4), 313–324 (2016)CrossRef Fan, Q., Zhang, D., Huayu, W., Tan, K.-L.: A general and parallel platform for mining co-movement patterns over large-scale trajectories. Proc. VLDB Endow. 10(4), 313–324 (2016)CrossRef
13.
Zurück zum Zitat Fang, Z., Yunjun G. L., Chen, P. L., Miao, X., Jensen, C. S.: Coming: a real-time co-movement mining system for streaming trajectories. In SIGMOD, 2777–2780 (2020) Fang, Z., Yunjun G. L., Chen, P. L., Miao, X., Jensen, C. S.: Coming: a real-time co-movement mining system for streaming trajectories. In SIGMOD, 2777–2780 (2020)
14.
Zurück zum Zitat Gudmundsson, J., van Kreveld, M. J., Computing longest duration flocks in trajectory data. In GIS, pages 35–42, (2006) Gudmundsson, J., van Kreveld, M. J., Computing longest duration flocks in trajectory data. In GIS, pages 35–42, (2006)
15.
Zurück zum Zitat Hadoop. In Encyclopedia of GIS, page 837 (2017) Hadoop. In Encyclopedia of GIS, page 837 (2017)
16.
Zurück zum Zitat Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. IEEE Trans. Knowl. Data Eng. 29(1), 114–127 (2017)CrossRef Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. IEEE Trans. Knowl. Data Eng. 29(1), 114–127 (2017)CrossRef
17.
Zurück zum Zitat Jeung, H., Yiu, M.L., Zhou, X., Jensen, C.S., Shen, H.T.: Discovery of convoys in trajectory databases. Proc. VLDB Endow. 1(1), 1068–1080 (2008)CrossRef Jeung, H., Yiu, M.L., Zhou, X., Jensen, C.S., Shen, H.T.: Discovery of convoys in trajectory databases. Proc. VLDB Endow. 1(1), 1068–1080 (2008)CrossRef
18.
Zurück zum Zitat Keogh, E. J., Pazzani, M. J.: A simple dimensionality reduction technique for fast similarity search in large time series databases. In PADKK, pp. 122–133, (2000) Keogh, E. J., Pazzani, M. J.: A simple dimensionality reduction technique for fast similarity search in large time series databases. In PADKK, pp. 122–133, (2000)
19.
Zurück zum Zitat Li, Y., Li, Y., Gunopulos, D., Guibas, L. J.: Knowledge-based trajectory completion from sparse GPS samples. In GIS, 33:1–33:10 (2016) Li, Y., Li, Y., Gunopulos, D., Guibas, L. J.: Knowledge-based trajectory completion from sparse GPS samples. In GIS, 33:1–33:10 (2016)
20.
Zurück zum Zitat Li, Y., Bailey, J., Kulik, L.: Efficient mining of platoon patterns in trajectory databases. Data Knowl. Eng. 100, 167–187 (2015)CrossRef Li, Y., Bailey, J., Kulik, L.: Efficient mining of platoon patterns in trajectory databases. Data Knowl. Eng. 100, 167–187 (2015)CrossRef
21.
Zurück zum Zitat Li, Z., Ding, B., Han, J., Kays, R.: Swarm: mining relaxed temporal moving object clusters. Proc. VLDB Endow. 3(1), 723–734 (2010)CrossRef Li, Z., Ding, B., Han, J., Kays, R.: Swarm: mining relaxed temporal moving object clusters. Proc. VLDB Endow. 3(1), 723–734 (2010)CrossRef
22.
Zurück zum Zitat Liu, S., Liu, C., Luo, Q., Ni, L. M., Krishnan, R.: Calibrating large scale vehicle trajectory data. In MDM, pp. 222–231. IEEE Computer Society, (2012) Liu, S., Liu, C., Luo, Q., Ni, L. M., Krishnan, R.: Calibrating large scale vehicle trajectory data. In MDM, pp. 222–231. IEEE Computer Society, (2012)
23.
Zurück zum Zitat Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In GIS, pp. 352–361. ACM, (2009) Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In GIS, pp. 352–361. ACM, (2009)
24.
Zurück zum Zitat Nibali, A., He, Z.: Trajic: an effective compression system for trajectory data. IEEE Trans. Knowl. Data Eng. 27(11), 3138–3151 (2015)CrossRef Nibali, A., He, Z.: Trajic: an effective compression system for trajectory data. IEEE Trans. Knowl. Data Eng. 27(11), 3138–3151 (2015)CrossRef
25.
Zurück zum Zitat Nutanong, S., Jacox, E.H., Samet, H.: An incremental hausdorff distance calculation algorithm. PVLDB 4(8), 506–517 (2011) Nutanong, S., Jacox, E.H., Samet, H.: An incremental hausdorff distance calculation algorithm. PVLDB 4(8), 506–517 (2011)
26.
Zurück zum Zitat Qi, J., Tao, Y., Chang, Y., Zhang, R.: Packing r-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. 45(3), 14:1-14:47 (2020)MathSciNetCrossRef Qi, J., Tao, Y., Chang, Y., Zhang, R.: Packing r-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. 45(3), 14:1-14:47 (2020)MathSciNetCrossRef
27.
Zurück zum Zitat Raskar, R., Schunemann, I., Barbar, R., Vilcans, K., Gray, J., Vepakomma, P., Kapa, S., Nuzzo, A., Gupta, R., Berke, A., Greenwood, D., Keegan, C., Kanaparti, S., Beaudry, R., Stansbury, D., Beatriz B. A., Rishank K., Vitor P., Benedetti, F. M., Alina C., Riddhiman D., Kaushal J., Khahlil L., Greg N., Vitor P., Steve P., Yasaman R., Abhishek S., Greg S., and John W.: Maintaining personal privacy in an epidemic, Apps gone rogue (2020) Raskar, R., Schunemann, I., Barbar, R., Vilcans, K., Gray, J., Vepakomma, P., Kapa, S., Nuzzo, A., Gupta, R., Berke, A., Greenwood, D., Keegan, C., Kanaparti, S., Beaudry, R., Stansbury, D., Beatriz B. A., Rishank K., Vitor P., Benedetti, F. M., Alina C., Riddhiman D., Kaushal J., Khahlil L., Greg N., Vitor P., Steve P., Yasaman R., Abhishek S., Greg S., and John W.: Maintaining personal privacy in an epidemic, Apps gone rogue (2020)
28.
Zurück zum Zitat Sacharidis, D., Skoutas, D., Skoumas, G.: Continuous monitoring of nearest trajectories. In GIS, 361–370 (2014) Sacharidis, D., Skoutas, D., Skoumas, G.: Continuous monitoring of nearest trajectories. In GIS, 361–370 (2014)
29.
Zurück zum Zitat Shang, Zeyuan, Li, Guoliang, Bao, Zhifeng: DITA: distributed in-memory trajectory analytics. In SIGMOD Conference, 725–740 ACM,(2018) Shang, Zeyuan, Li, Guoliang, Bao, Zhifeng: DITA: distributed in-memory trajectory analytics. In SIGMOD Conference, 725–740 ACM,(2018)
30.
Zurück zum Zitat Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E. J., O’Neil, P. E., Rasin, A., Tran, N., Zdonik, S. B.: C-store: a column-oriented DBMS. In VLDB 553–564 ACM, (2005) Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E. J., O’Neil, P. E., Rasin, A., Tran, N., Zdonik, S. B.: C-store: a column-oriented DBMS. In VLDB 553–564 ACM, (2005)
31.
Zurück zum Zitat Su, H., Zheng, K., Wang, H., Huang, J., Zhou, X.: Calibrating trajectory data for similarity-based analysis. In SIGMOD Conference 833–844 ACM, (2013) Su, H., Zheng, K., Wang, H., Huang, J., Zhou, X.: Calibrating trajectory data for similarity-based analysis. In SIGMOD Conference 833–844 ACM, (2013)
32.
Zurück zum Zitat Tampakis, P., Doulkeridis, C., Pelekis, N., Theodoridis, Y.: Distributed subtrajectory join on massive datasets. ACM Trans. Spatial Algorithms Syst. 6(2), 8:1-8:29 (2020)CrossRef Tampakis, P., Doulkeridis, C., Pelekis, N., Theodoridis, Y.: Distributed subtrajectory join on massive datasets. ACM Trans. Spatial Algorithms Syst. 6(2), 8:1-8:29 (2020)CrossRef
33.
Zurück zum Zitat Tampakis, P., Pelekis, N., Doulkeridis, C., Theodoridis, Y.: Scalable distributed subtrajectory clustering. In IEEE International Conference on Big Data, 950–959 IEEE, (2019) Tampakis, P., Pelekis, N., Doulkeridis, C., Theodoridis, Y.: Scalable distributed subtrajectory clustering. In IEEE International Conference on Big Data, 950–959 IEEE, (2019)
34.
Zurück zum Zitat Tang, B., Yiu, M. L., Mouratidis, K., Wang, K.: Efficient motif discovery in spatial trajectories using discrete fréchet distance. In EDBT, pp. 378–389. OpenProceedings.org, (2017) Tang, B., Yiu, M. L., Mouratidis, K., Wang, K.: Efficient motif discovery in spatial trajectories using discrete fréchet distance. In EDBT, pp. 378–389. OpenProceedings.org, (2017)
35.
Zurück zum Zitat Wang, S., Ferhatosmanoglu, H.: Ppq-trajectory: spatio-temporal quantization for querying in large trajectory repositories. Proc. VLDB Endow. 14(2), 215–227 (2020)CrossRef Wang, S., Ferhatosmanoglu, H.: Ppq-trajectory: spatio-temporal quantization for querying in large trajectory repositories. Proc. VLDB Endow. 14(2), 215–227 (2020)CrossRef
36.
Zurück zum Zitat Wang, Y., Lim, E.-P., Hwang, S.-Y.: Efficient mining of group patterns from user movement data. Data Knowl. Eng. 57(3), 240–282 (2006)CrossRef Wang, Y., Lim, E.-P., Hwang, S.-Y.: Efficient mining of group patterns from user movement data. Data Knowl. Eng. 57(3), 240–282 (2006)CrossRef
37.
Zurück zum Zitat Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. Proc. VLDB Endow. 10(11), 1478–1489 (2017)CrossRef Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. Proc. VLDB Endow. 10(11), 1478–1489 (2017)CrossRef
38.
Zurück zum Zitat Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In SIGMOD Conference, 1071–1085 ACM, (2016) Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In SIGMOD Conference, 1071–1085 ACM, (2016)
39.
Zurück zum Zitat Yi, B.-K., Jagadish, H. V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In ICDE, pp. 201–208. IEEE Computer Society, (1998) Yi, B.-K., Jagadish, H. V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In ICDE, pp. 201–208. IEEE Computer Society, (1998)
40.
Zurück zum Zitat You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In ICDE Workshops, 34–41 IEEE Computer Society, (2015) You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In ICDE Workshops, 34–41 IEEE Computer Society, (2015)
41.
Zurück zum Zitat Yu, J., Zhang, Z., Sarwat, M.: Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 23(1), 37–78 (2019)CrossRef Yu, J., Zhang, Z., Sarwat, M.: Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 23(1), 37–78 (2019)CrossRef
42.
Zurück zum Zitat Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In ICDE, 1262–1273 (2019) Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In ICDE, 1262–1273 (2019)
43.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 15–28 USENIX Association,(2012) Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 15–28 USENIX Association,(2012)
44.
Zurück zum Zitat Zhang, D., Chang, Z., Wu, S., Yuan, Y., Tan, K.L., Chen, G.: Continuous trajectory similarity search for online outlier detection. IEEE Transactions on Knowledge and Data Engineering, 1 (2020) Zhang, D., Chang, Z., Wu, S., Yuan, Y., Tan, K.L., Chen, G.: Continuous trajectory similarity search for online outlier detection. IEEE Transactions on Knowledge and Data Engineering, 1 (2020)
45.
Zurück zum Zitat Zhang, D., Chan, C.-Y., Tan, K.-L.: Processing spatial keyword query as a top-k aggregation query. In SIGIR, 355–364 ACM, (2014) Zhang, D., Chan, C.-Y., Tan, K.-L.: Processing spatial keyword query as a top-k aggregation query. In SIGIR, 355–364 ACM, (2014)
46.
Zurück zum Zitat Zhang, D., Ding, M., Yang, D., Liu, Y., Fan, J.: Trajectory simplification: an experimental study and quality analysis. Proc. VLDB Endow. 11(9), 934–946 (2018)CrossRef Zhang, D., Ding, M., Yang, D., Liu, Y., Fan, J.: Trajectory simplification: an experimental study and quality analysis. Proc. VLDB Endow. 11(9), 934–946 (2018)CrossRef
47.
Zurück zum Zitat Zhou, Z., Dou, W., Jia, G., Chunhua, H., Xiaolong, X., Xiaotong, W., Pan, J.: A method for real-time trajectory monitoring to improve taxi service using GPS big data. Inf. Manag. 53(8), 964–977 (2016)CrossRef Zhou, Z., Dou, W., Jia, G., Chunhua, H., Xiaolong, X., Xiaotong, W., Pan, J.: A method for real-time trajectory monitoring to improve taxi service using GPS big data. Inf. Manag. 53(8), 964–977 (2016)CrossRef
Metadaten
Titel
SQUID: subtrajectory query in trillion-scale GPS database
verfasst von
Dongxiang Zhang
Zhihao Chang
Dingyu Yang
Dongsheng Li
Kian-Lee Tan
Ke Chen
Gang Chen
Publikationsdatum
19.01.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 4/2023
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-022-00777-7

Weitere Artikel der Ausgabe 4/2023

The VLDB Journal 4/2023 Zur Ausgabe