Skip to main content
Top
Published in: The VLDB Journal 4/2023

19-01-2023 | Regular Paper

SQUID: subtrajectory query in trillion-scale GPS database

Authors: Dongxiang Zhang, Zhihao Chang, Dingyu Yang, Dongsheng Li, Kian-Lee Tan, Ke Chen, Gang Chen

Published in: The VLDB Journal | Issue 4/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Subtrajectory query has been a fundamental operator in mobility data management and useful in the applications of trajectory clustering, co-movement pattern mining and contact tracing in epidemiology. In this paper, we make the first attempt to study subtrajectory query in trillion-scale GPS databases, so as to support applications with urban-scale moving users and weeks-long historical data. We develop SQUID as a distributed subtrajectory query processing engine on Spark, with threefold technical contributions. First, we propose compact index and storage layers to handle massive trajectory datasets with trillion-scale GPS points. Second, we leverage hybrid partitioning, together with local indexes that are disk I/O friendly, to facilitate pruning. Third, we devise a novel filter-and-refine query processing framework to effectively reduce the number of trajectories for verification. Our experiments are conducted on huge trajectory datasets with up to 520 billion GPS points. The results validate the compactness of the storage mechanism and the scalability of the distributed query processing framework.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agarwal, P. K., Fox, K., Munagala, K., Nath, A., Pan, J., Taylor, E.: Subtrajectory clustering: models and algorithms. In Van den Bussche, J. and Arenas, M. (eds) PODS, 75–87 ACM, (2018) Agarwal, P. K., Fox, K., Munagala, K., Nath, A., Pan, J., Taylor, E.: Subtrajectory clustering: models and algorithms. In Van den Bussche, J. and Arenas, M. (eds) PODS, 75–87 ACM, (2018)
2.
go back to reference Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)CrossRef Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-gis: a high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)CrossRef
3.
go back to reference Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In SIGMOD Conference, 1383–1394 ACM, (2015) Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In SIGMOD Conference, 1383–1394 ACM, (2015)
4.
go back to reference Bakalov, P., Hadjieleftheriou, M., Keogh, E. J, Tsotras, V. J.: Efficient trajectory joins using symbolic representations Bakalov, P., Hadjieleftheriou, M., Keogh, E. J, Tsotras, V. J.: Efficient trajectory joins using symbolic representations
5.
go back to reference Bakalov, P., Hadjieleftheriou, M., Tsotras, V. J.: Time relaxed spatiotemporal trajectory joins. In GIS 182–191 (2005) Bakalov, P., Hadjieleftheriou, M., Tsotras, V. J.: Time relaxed spatiotemporal trajectory joins. In GIS 182–191 (2005)
6.
go back to reference Bakalov, P., Tsotras, V. J.: Continuous spatiotemporal trajectory joins. In GSN, 109–128 (2006) Bakalov, P., Tsotras, V. J.: Continuous spatiotemporal trajectory joins. In GSN, 109–128 (2006)
7.
go back to reference Chen, L., Özsu, M. T., Oria, V.: Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, pp. 491–502. ACM, (2005) Chen, L., Özsu, M. T., Oria, V.: Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, pp. 491–502. ACM, (2005)
8.
go back to reference Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. Proc. VLDB Endow. 12(10), 1208–1220 (2019)CrossRef Chen, L., Gao, Y., Fang, Z., Miao, X., Jensen, C.S., Guo, C.: Real-time distributed co-movement pattern detection on streaming trajectories. Proc. VLDB Endow. 12(10), 1208–1220 (2019)CrossRef
9.
go back to reference Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In OSDI, 137–150, USENIX Association, (2004) Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In OSDI, 137–150, USENIX Association, (2004)
10.
go back to reference Eldawy, A., Mokbel, M. F.: Spatialhadoop: a mapreduce framework for spatial data. In ICDE, 1352–1363 (2015) Eldawy, A., Mokbel, M. F.: Spatialhadoop: a mapreduce framework for spatial data. In ICDE, 1352–1363 (2015)
11.
go back to reference Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 226–231 (1996) Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 226–231 (1996)
12.
go back to reference Fan, Q., Zhang, D., Huayu, W., Tan, K.-L.: A general and parallel platform for mining co-movement patterns over large-scale trajectories. Proc. VLDB Endow. 10(4), 313–324 (2016)CrossRef Fan, Q., Zhang, D., Huayu, W., Tan, K.-L.: A general and parallel platform for mining co-movement patterns over large-scale trajectories. Proc. VLDB Endow. 10(4), 313–324 (2016)CrossRef
13.
go back to reference Fang, Z., Yunjun G. L., Chen, P. L., Miao, X., Jensen, C. S.: Coming: a real-time co-movement mining system for streaming trajectories. In SIGMOD, 2777–2780 (2020) Fang, Z., Yunjun G. L., Chen, P. L., Miao, X., Jensen, C. S.: Coming: a real-time co-movement mining system for streaming trajectories. In SIGMOD, 2777–2780 (2020)
14.
go back to reference Gudmundsson, J., van Kreveld, M. J., Computing longest duration flocks in trajectory data. In GIS, pages 35–42, (2006) Gudmundsson, J., van Kreveld, M. J., Computing longest duration flocks in trajectory data. In GIS, pages 35–42, (2006)
15.
go back to reference Hadoop. In Encyclopedia of GIS, page 837 (2017) Hadoop. In Encyclopedia of GIS, page 837 (2017)
16.
go back to reference Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. IEEE Trans. Knowl. Data Eng. 29(1), 114–127 (2017)CrossRef Hu, G., Shao, J., Liu, F., Wang, Y., Shen, H.T.: If-matching: towards accurate map-matching with information fusion. IEEE Trans. Knowl. Data Eng. 29(1), 114–127 (2017)CrossRef
17.
go back to reference Jeung, H., Yiu, M.L., Zhou, X., Jensen, C.S., Shen, H.T.: Discovery of convoys in trajectory databases. Proc. VLDB Endow. 1(1), 1068–1080 (2008)CrossRef Jeung, H., Yiu, M.L., Zhou, X., Jensen, C.S., Shen, H.T.: Discovery of convoys in trajectory databases. Proc. VLDB Endow. 1(1), 1068–1080 (2008)CrossRef
18.
go back to reference Keogh, E. J., Pazzani, M. J.: A simple dimensionality reduction technique for fast similarity search in large time series databases. In PADKK, pp. 122–133, (2000) Keogh, E. J., Pazzani, M. J.: A simple dimensionality reduction technique for fast similarity search in large time series databases. In PADKK, pp. 122–133, (2000)
19.
go back to reference Li, Y., Li, Y., Gunopulos, D., Guibas, L. J.: Knowledge-based trajectory completion from sparse GPS samples. In GIS, 33:1–33:10 (2016) Li, Y., Li, Y., Gunopulos, D., Guibas, L. J.: Knowledge-based trajectory completion from sparse GPS samples. In GIS, 33:1–33:10 (2016)
20.
go back to reference Li, Y., Bailey, J., Kulik, L.: Efficient mining of platoon patterns in trajectory databases. Data Knowl. Eng. 100, 167–187 (2015)CrossRef Li, Y., Bailey, J., Kulik, L.: Efficient mining of platoon patterns in trajectory databases. Data Knowl. Eng. 100, 167–187 (2015)CrossRef
21.
go back to reference Li, Z., Ding, B., Han, J., Kays, R.: Swarm: mining relaxed temporal moving object clusters. Proc. VLDB Endow. 3(1), 723–734 (2010)CrossRef Li, Z., Ding, B., Han, J., Kays, R.: Swarm: mining relaxed temporal moving object clusters. Proc. VLDB Endow. 3(1), 723–734 (2010)CrossRef
22.
go back to reference Liu, S., Liu, C., Luo, Q., Ni, L. M., Krishnan, R.: Calibrating large scale vehicle trajectory data. In MDM, pp. 222–231. IEEE Computer Society, (2012) Liu, S., Liu, C., Luo, Q., Ni, L. M., Krishnan, R.: Calibrating large scale vehicle trajectory data. In MDM, pp. 222–231. IEEE Computer Society, (2012)
23.
go back to reference Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In GIS, pp. 352–361. ACM, (2009) Lou, Y., Zhang, C., Zheng, Y., Xie, X., Wang, W., Huang, Y.: Map-matching for low-sampling-rate GPS trajectories. In GIS, pp. 352–361. ACM, (2009)
24.
go back to reference Nibali, A., He, Z.: Trajic: an effective compression system for trajectory data. IEEE Trans. Knowl. Data Eng. 27(11), 3138–3151 (2015)CrossRef Nibali, A., He, Z.: Trajic: an effective compression system for trajectory data. IEEE Trans. Knowl. Data Eng. 27(11), 3138–3151 (2015)CrossRef
25.
go back to reference Nutanong, S., Jacox, E.H., Samet, H.: An incremental hausdorff distance calculation algorithm. PVLDB 4(8), 506–517 (2011) Nutanong, S., Jacox, E.H., Samet, H.: An incremental hausdorff distance calculation algorithm. PVLDB 4(8), 506–517 (2011)
26.
go back to reference Qi, J., Tao, Y., Chang, Y., Zhang, R.: Packing r-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. 45(3), 14:1-14:47 (2020)MathSciNetCrossRef Qi, J., Tao, Y., Chang, Y., Zhang, R.: Packing r-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. 45(3), 14:1-14:47 (2020)MathSciNetCrossRef
27.
go back to reference Raskar, R., Schunemann, I., Barbar, R., Vilcans, K., Gray, J., Vepakomma, P., Kapa, S., Nuzzo, A., Gupta, R., Berke, A., Greenwood, D., Keegan, C., Kanaparti, S., Beaudry, R., Stansbury, D., Beatriz B. A., Rishank K., Vitor P., Benedetti, F. M., Alina C., Riddhiman D., Kaushal J., Khahlil L., Greg N., Vitor P., Steve P., Yasaman R., Abhishek S., Greg S., and John W.: Maintaining personal privacy in an epidemic, Apps gone rogue (2020) Raskar, R., Schunemann, I., Barbar, R., Vilcans, K., Gray, J., Vepakomma, P., Kapa, S., Nuzzo, A., Gupta, R., Berke, A., Greenwood, D., Keegan, C., Kanaparti, S., Beaudry, R., Stansbury, D., Beatriz B. A., Rishank K., Vitor P., Benedetti, F. M., Alina C., Riddhiman D., Kaushal J., Khahlil L., Greg N., Vitor P., Steve P., Yasaman R., Abhishek S., Greg S., and John W.: Maintaining personal privacy in an epidemic, Apps gone rogue (2020)
28.
go back to reference Sacharidis, D., Skoutas, D., Skoumas, G.: Continuous monitoring of nearest trajectories. In GIS, 361–370 (2014) Sacharidis, D., Skoutas, D., Skoumas, G.: Continuous monitoring of nearest trajectories. In GIS, 361–370 (2014)
29.
go back to reference Shang, Zeyuan, Li, Guoliang, Bao, Zhifeng: DITA: distributed in-memory trajectory analytics. In SIGMOD Conference, 725–740 ACM,(2018) Shang, Zeyuan, Li, Guoliang, Bao, Zhifeng: DITA: distributed in-memory trajectory analytics. In SIGMOD Conference, 725–740 ACM,(2018)
30.
go back to reference Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E. J., O’Neil, P. E., Rasin, A., Tran, N., Zdonik, S. B.: C-store: a column-oriented DBMS. In VLDB 553–564 ACM, (2005) Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E. J., O’Neil, P. E., Rasin, A., Tran, N., Zdonik, S. B.: C-store: a column-oriented DBMS. In VLDB 553–564 ACM, (2005)
31.
go back to reference Su, H., Zheng, K., Wang, H., Huang, J., Zhou, X.: Calibrating trajectory data for similarity-based analysis. In SIGMOD Conference 833–844 ACM, (2013) Su, H., Zheng, K., Wang, H., Huang, J., Zhou, X.: Calibrating trajectory data for similarity-based analysis. In SIGMOD Conference 833–844 ACM, (2013)
32.
go back to reference Tampakis, P., Doulkeridis, C., Pelekis, N., Theodoridis, Y.: Distributed subtrajectory join on massive datasets. ACM Trans. Spatial Algorithms Syst. 6(2), 8:1-8:29 (2020)CrossRef Tampakis, P., Doulkeridis, C., Pelekis, N., Theodoridis, Y.: Distributed subtrajectory join on massive datasets. ACM Trans. Spatial Algorithms Syst. 6(2), 8:1-8:29 (2020)CrossRef
33.
go back to reference Tampakis, P., Pelekis, N., Doulkeridis, C., Theodoridis, Y.: Scalable distributed subtrajectory clustering. In IEEE International Conference on Big Data, 950–959 IEEE, (2019) Tampakis, P., Pelekis, N., Doulkeridis, C., Theodoridis, Y.: Scalable distributed subtrajectory clustering. In IEEE International Conference on Big Data, 950–959 IEEE, (2019)
34.
go back to reference Tang, B., Yiu, M. L., Mouratidis, K., Wang, K.: Efficient motif discovery in spatial trajectories using discrete fréchet distance. In EDBT, pp. 378–389. OpenProceedings.org, (2017) Tang, B., Yiu, M. L., Mouratidis, K., Wang, K.: Efficient motif discovery in spatial trajectories using discrete fréchet distance. In EDBT, pp. 378–389. OpenProceedings.org, (2017)
35.
go back to reference Wang, S., Ferhatosmanoglu, H.: Ppq-trajectory: spatio-temporal quantization for querying in large trajectory repositories. Proc. VLDB Endow. 14(2), 215–227 (2020)CrossRef Wang, S., Ferhatosmanoglu, H.: Ppq-trajectory: spatio-temporal quantization for querying in large trajectory repositories. Proc. VLDB Endow. 14(2), 215–227 (2020)CrossRef
36.
go back to reference Wang, Y., Lim, E.-P., Hwang, S.-Y.: Efficient mining of group patterns from user movement data. Data Knowl. Eng. 57(3), 240–282 (2006)CrossRef Wang, Y., Lim, E.-P., Hwang, S.-Y.: Efficient mining of group patterns from user movement data. Data Knowl. Eng. 57(3), 240–282 (2006)CrossRef
37.
go back to reference Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. Proc. VLDB Endow. 10(11), 1478–1489 (2017)CrossRef Xie, D., Li, F., Phillips, J.M.: Distributed trajectory similarity search. Proc. VLDB Endow. 10(11), 1478–1489 (2017)CrossRef
38.
go back to reference Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In SIGMOD Conference, 1071–1085 ACM, (2016) Xie, D., Li, F., Yao, B., Li, G., Zhou, L., Guo, M.: Simba: efficient in-memory spatial analytics. In SIGMOD Conference, 1071–1085 ACM, (2016)
39.
go back to reference Yi, B.-K., Jagadish, H. V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In ICDE, pp. 201–208. IEEE Computer Society, (1998) Yi, B.-K., Jagadish, H. V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In ICDE, pp. 201–208. IEEE Computer Society, (1998)
40.
go back to reference You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In ICDE Workshops, 34–41 IEEE Computer Society, (2015) You, S., Zhang, J., Gruenwald, L.: Large-scale spatial join query processing in cloud. In ICDE Workshops, 34–41 IEEE Computer Society, (2015)
41.
go back to reference Yu, J., Zhang, Z., Sarwat, M.: Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 23(1), 37–78 (2019)CrossRef Yu, J., Zhang, Z., Sarwat, M.: Spatial data management in apache spark: the geospark perspective and beyond. GeoInformatica 23(1), 37–78 (2019)CrossRef
42.
go back to reference Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In ICDE, 1262–1273 (2019) Yuan, H., Li, G.: Distributed in-memory trajectory similarity search and join on road network. In ICDE, 1262–1273 (2019)
43.
go back to reference Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 15–28 USENIX Association,(2012) Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M. J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI, 15–28 USENIX Association,(2012)
44.
go back to reference Zhang, D., Chang, Z., Wu, S., Yuan, Y., Tan, K.L., Chen, G.: Continuous trajectory similarity search for online outlier detection. IEEE Transactions on Knowledge and Data Engineering, 1 (2020) Zhang, D., Chang, Z., Wu, S., Yuan, Y., Tan, K.L., Chen, G.: Continuous trajectory similarity search for online outlier detection. IEEE Transactions on Knowledge and Data Engineering, 1 (2020)
45.
go back to reference Zhang, D., Chan, C.-Y., Tan, K.-L.: Processing spatial keyword query as a top-k aggregation query. In SIGIR, 355–364 ACM, (2014) Zhang, D., Chan, C.-Y., Tan, K.-L.: Processing spatial keyword query as a top-k aggregation query. In SIGIR, 355–364 ACM, (2014)
46.
go back to reference Zhang, D., Ding, M., Yang, D., Liu, Y., Fan, J.: Trajectory simplification: an experimental study and quality analysis. Proc. VLDB Endow. 11(9), 934–946 (2018)CrossRef Zhang, D., Ding, M., Yang, D., Liu, Y., Fan, J.: Trajectory simplification: an experimental study and quality analysis. Proc. VLDB Endow. 11(9), 934–946 (2018)CrossRef
47.
go back to reference Zhou, Z., Dou, W., Jia, G., Chunhua, H., Xiaolong, X., Xiaotong, W., Pan, J.: A method for real-time trajectory monitoring to improve taxi service using GPS big data. Inf. Manag. 53(8), 964–977 (2016)CrossRef Zhou, Z., Dou, W., Jia, G., Chunhua, H., Xiaolong, X., Xiaotong, W., Pan, J.: A method for real-time trajectory monitoring to improve taxi service using GPS big data. Inf. Manag. 53(8), 964–977 (2016)CrossRef
Metadata
Title
SQUID: subtrajectory query in trillion-scale GPS database
Authors
Dongxiang Zhang
Zhihao Chang
Dingyu Yang
Dongsheng Li
Kian-Lee Tan
Ke Chen
Gang Chen
Publication date
19-01-2023
Publisher
Springer Berlin Heidelberg
Published in
The VLDB Journal / Issue 4/2023
Print ISSN: 1066-8888
Electronic ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-022-00777-7

Other articles of this Issue 4/2023

The VLDB Journal 4/2023 Go to the issue

Premium Partner