Skip to main content
Erschienen in: World Wide Web 4/2021

28.05.2021

Clustering Enhanced Error-tolerant Top-k Spatio-textual Search

verfasst von: Yong Zhang, Yu Chen, Junye Yang, Jin Wang, Huiqi Hu, Chunxiao Xing, Xiaofang Zhou

Erschienen in: World Wide Web | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There are a large amount of Location-Based Services widely available on a variety of portable electronic devices. It is critical for them to efficiently support top-kquery considering both spatial and textual relevance. Considering both the errors in user input and the spatial databases, it is necessary to support error-tolerant spatio-textual search for end-users. Previous researches mainly focused on set-based textual relevance, which makes it difficult for them to find reasonable results when the input tokens are not exactly matched with those from the records in spatial database. We design a novel framework to support top-kspatio-textual search with fuzzy token matching. A hierarchical index is proposed to capture signatures of both spatial and textual relevance. Based on it, we devise two algorithms to preferentially access the nodes with more similar objects while those with dissimilar ones can be pruned. We further propose a clustering based approach to construct the index by leveraging textual information. We conduct extensive experiments on real world POI datasets, and the results show that our framework outperforms state-of-the-art methods by a significant margin.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arasu, A., Ganti, V., and Kaushik, R.. Efficient exact set-similarity joins. In VLDB, pages 918–929, (2006) Arasu, A., Ganti, V., and Kaushik, R.. Efficient exact set-similarity joins. In VLDB, pages 918–929, (2006)
2.
Zurück zum Zitat Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. PVLDB. 6(1), 1–12 (2012) Bouros, P., Ge, S., Mamoulis, N.: Spatio-textual similarity joins. PVLDB. 6(1), 1–12 (2012)
3.
Zurück zum Zitat Chaudhuri, S., Ganti, V., and Kaushik, R.. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, (2006) Chaudhuri, S., Ganti, V., and Kaushik, R.. A primitive operator for similarity joins in data cleaning. In ICDE, page 5, (2006)
4.
Zurück zum Zitat Chen, L. and Cong, G.. Diversity-aware top-k publish/subscribe for text stream. In SIGMOD, pages 347–362, (2015) Chen, L. and Cong, G.. Diversity-aware top-k publish/subscribe for text stream. In SIGMOD, pages 347–362, (2015)
5.
Zurück zum Zitat Chen, L., Cong, G., Cao, X.. An efficient query indexing mechanism for filtering geo-textual data. In SIGMOD, pages 749–760, (2013) Chen, L., Cong, G., Cao, X.. An efficient query indexing mechanism for filtering geo-textual data. In SIGMOD, pages 749–760, (2013)
6.
Zurück zum Zitat Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: An experimental evaluation. PVLDB. 6(3), 217–228 (2013) Chen, L., Cong, G., Jensen, C.S., Wu, D.: Spatial keyword query processing: An experimental evaluation. PVLDB. 6(3), 217–228 (2013)
7.
Zurück zum Zitat Chen, L., Cong, G., Cao, X., Tan, K.. Temporal spatial-keyword top-k publish/subscribe. In ICDE, pages 255–266, (2015) Chen, L., Cong, G., Cao, X., Tan, K.. Temporal spatial-keyword top-k publish/subscribe. In ICDE, pages 255–266, (2015)
8.
Zurück zum Zitat Chen, L., Shang, S., Zheng, K., and Kalnis, P.. Cluster-based subscription matching for geo-textual data streams. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 890–901, (2019) Chen, L., Shang, S., Zheng, K., and Kalnis, P.. Cluster-based subscription matching for geo-textual data streams. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pages 890–901, (2019)
9.
Zurück zum Zitat Chen, X., Xu, J., Zhou, R., Zhao, P., Liu, C., Fang, J., Zhao, L.. S2r-tree: a pivot-based indexing structure for semantic-aware spatial keyword search. GeoInformatica, 07 (2019 Chen, X., Xu, J., Zhou, R., Zhao, P., Liu, C., Fang, J., Zhao, L.. S2r-tree: a pivot-based indexing structure for semantic-aware spatial keyword search. GeoInformatica, 07 (2019
10.
Zurück zum Zitat Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., and Suel, T.. Text vs. space: efficient geo-search query processing. In CIKM, pages 423–432, (2011) Christoforaki, M., He, J., Dimopoulos, C., Markowetz, A., and Suel, T.. Text vs. space: efficient geo-search query processing. In CIKM, pages 423–432, (2011)
11.
Zurück zum Zitat Cong, G., Jensen, C.S.. Querying geo-textual data: Spatial keyword queries and beyond. In SIGMOD, pages 2207–2212, (2016) Cong, G., Jensen, C.S.. Querying geo-textual data: Spatial keyword queries and beyond. In SIGMOD, pages 2207–2212, (2016)
12.
Zurück zum Zitat Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB. 2(1), 337–348 (2009) Cong, G., Jensen, C.S., Wu, D.: Efficient retrieval of the top-k most relevant spatial web objects. PVLDB. 2(1), 337–348 (2009)
13.
Zurück zum Zitat Deng, D., Li, G., Feng, J., and Li, W.. Top-k string similarity search with edit-distance constraints. In ICDE, pages 925–936, (2013) Deng, D., Li, G., Feng, J., and Li, W.. Top-k string similarity search with edit-distance constraints. In ICDE, pages 925–936, (2013)
14.
Zurück zum Zitat Felipe, I.D., Hristidis, V., Rishe, N.. Keyword search on spatial databases. In ICDE, pages 656–665, (2008) Felipe, I.D., Hristidis, V., Rishe, N.. Keyword search on spatial databases. In ICDE, pages 656–665, (2008)
15.
Zurück zum Zitat Feng, K., Cong, G., Bhowmick, S.S., Peng, W., Miao, C.. Towards best region search for data exploration. In SIGMOD, pages 1055–1070, (2016) Feng, K., Cong, G., Bhowmick, S.S., Peng, W., Miao, C.. Towards best region search for data exploration. In SIGMOD, pages 1055–1070, (2016)
16.
Zurück zum Zitat Guo, T., Feng, K., Cong, G., Bao, Z.. Efficient selection of geospatial data on maps for interactive and visualized exploration. In SIGMOD, pages 567–582, (2018) Guo, T., Feng, K., Cong, G., Bao, Z.. Efficient selection of geospatial data on maps for interactive and visualized exploration. In SIGMOD, pages 567–582, (2018)
17.
Zurück zum Zitat Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.. A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In ICDE, pages 711–722, (2015) Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.. A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In ICDE, pages 711–722, (2015)
18.
Zurück zum Zitat Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)CrossRef Hu, H., Li, G., Bao, Z., Feng, J., Wu, Y., Gong, Z., Xu, Y.: Top-k spatio-textual similarity join. IEEE Trans. Knowl. Data Eng. 28(2), 551–565 (2016)CrossRef
19.
Zurück zum Zitat Hu, S., Xiao, C., Ishikawa, Y.: An efficient algorithm for location-aware query autocompletion. IEICE Trans. Inf. Syst. 101-D(1), 181–192 (2018)CrossRef Hu, S., Xiao, C., Ishikawa, Y.: An efficient algorithm for location-aware query autocompletion. IEICE Trans. Inf. Syst. 101-D(1), 181–192 (2018)CrossRef
20.
Zurück zum Zitat Li, C., Lu, J., and Lu, Y.. Efficient merging and filtering algorithms for approximate string searches. In ICDE, pages 257–266, (2008) Li, C., Lu, J., and Lu, Y.. Efficient merging and filtering algorithms for approximate string searches. In ICDE, pages 257–266, (2008)
21.
Zurück zum Zitat Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB. 5(3), 253–264 (2011) Li, G., Deng, D., Wang, J., Feng, J.: PASS-JOIN: A partition-based method for similarity joins. PVLDB. 5(3), 253–264 (2011)
22.
Zurück zum Zitat Li, Z., Lee, K.C.K., Zheng, B., Lee, W., Lee, D.L., Wang, X.: Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)CrossRef Li, Z., Lee, K.C.K., Zheng, B., Lee, W., Lee, D.L., Wang, X.: Ir-tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23(4), 585–599 (2011)CrossRef
23.
Zurück zum Zitat Li, G., Wang, Y., Wang, T., Feng, J.. Location-aware publish/subscribe. In KDD, pages 802–810, (2013) Li, G., Wang, Y., Wang, T., Feng, J.. Location-aware publish/subscribe. In KDD, pages 802–810, (2013)
24.
Zurück zum Zitat Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRef Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRef
25.
Zurück zum Zitat Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvag, K.. Efficient processing of top-k spatial keyword queries. In SSTD, pages 205–222, (2011) Rocha-Junior, J.B., Gkorgkas, O., Jonassen, S., Nørvag, K.. Efficient processing of top-k spatial keyword queries. In SSTD, pages 205–222, (2011)
26.
Zurück zum Zitat Song, T., Xu, K., Li, J., Li, Y., and Tong, Y.. Multi-skill aware task assignment in real-time spatial crowdsourcing. GeoInformatica, 24, 04 (2019) Song, T., Xu, K., Li, J., Li, Y., and Tong, Y.. Multi-skill aware task assignment in real-time spatial crowdsourcing. GeoInformatica, 24, 04 (2019)
27.
Zurück zum Zitat Wang, J. and Lin, C.. Fast error-tolerant location-aware query autocompletion. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20–24, 2020, pages 1998–2001. IEEE, (2020) Wang, J. and Lin, C.. Fast error-tolerant location-aware query autocompletion. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20–24, 2020, pages 1998–2001. IEEE, (2020)
28.
Zurück zum Zitat Wang, J., Li, G., Feng, J.. Fast-join: An efficient method for fuzzy token matching based string similarity join. In ICDE, pages 458–469, (2011) Wang, J., Li, G., Feng, J.. Fast-join: An efficient method for fuzzy token matching based string similarity join. In ICDE, pages 458–469, (2011)
29.
Zurück zum Zitat Wang, X., Ding, X., Tung, A.K.H., Zhang, Z.: Efficient and effective KNN sequence search with approximate n-grams. PVLDB. 7(1), 1–12 (2013) Wang, X., Ding, X., Tung, A.K.H., Zhang, Z.: Efficient and effective KNN sequence search with approximate n-grams. PVLDB. 7(1), 1–12 (2013)
30.
Zurück zum Zitat Wang, J., Li, G., Deng, D., Zhang, Y., and Feng, J.. Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search. In ICDE, pages 519–530, (2015) Wang, J., Li, G., Deng, D., Zhang, Y., and Feng, J.. Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search. In ICDE, pages 519–530, (2015)
31.
Zurück zum Zitat Wang, X., Zhang, Y., Zhang, W., Lin, X., Huang, Z.: SKYPE: top-k spatial-keyword publish/subscribe over sliding window. PVLDB. 9(7), 588–599 (2016) Wang, X., Zhang, Y., Zhang, W., Lin, X., Huang, Z.: SKYPE: top-k spatial-keyword publish/subscribe over sliding window. PVLDB. 9(7), 588–599 (2016)
32.
Zurück zum Zitat Wang, J., Lin, C., Li, M., and Zaniolo, C.. An efficient sliding window approach for approximate entity extraction with synonyms. In EDBT, pages 109–120, (2019). Wang, J., Lin, C., Li, M., and Zaniolo, C.. An efficient sliding window approach for approximate entity extraction with synonyms. In EDBT, pages 109–120, (2019).
33.
Zurück zum Zitat Wang, J., Lin, C., Zaniolo, C.. Mf-join: Efficient fuzzy string similarity join with multi-level filtering. In ICDE, pages 386–397, (2019) Wang, J., Lin, C., Zaniolo, C.. Mf-join: Efficient fuzzy string similarity join with multi-level filtering. In ICDE, pages 386–397, (2019)
34.
Zurück zum Zitat Wu, D., Yiu, M.L., Cong, G., Jensen, C.S.: Joint top-k spatial keyword query processing. IEEE Trans. Knowl. Data Eng. 24(10), 1889–1903 (2012)CrossRef Wu, D., Yiu, M.L., Cong, G., Jensen, C.S.: Joint top-k spatial keyword query processing. IEEE Trans. Knowl. Data Eng. 24(10), 1889–1903 (2012)CrossRef
35.
Zurück zum Zitat Xiao, C., Wang, W., Lin, X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB. 1(1), 933–944 (2008)MathSciNet Xiao, C., Wang, W., Lin, X.: Ed-join: an efficient algorithm for similarity joins with edit distance constraints. PVLDB. 1(1), 933–944 (2008)MathSciNet
36.
Zurück zum Zitat Xiao, C., Wang, W., Lin, X., Yu, J.X.. Efficient similarity joins for near duplicate detection. In WWW, pages 131–140, (2008) Xiao, C., Wang, W., Lin, X., Yu, J.X.. Efficient similarity joins for near duplicate detection. In WWW, pages 131–140, (2008)
37.
Zurück zum Zitat Yang, Z., Yu, J., Kitsuregawa, M.. Fast algorithms for top-k approximate string matching. In AAAI, (2010) Yang, Z., Yu, J., Kitsuregawa, M.. Fast algorithms for top-k approximate string matching. In AAAI, (2010)
38.
Zurück zum Zitat Yang, C., Chen, L., Shang, S., Zhu, F., Liu, L., Shao, L.. Toward efficient navigation of massive-scale geo-textual streams. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 4838–4845. International Joint Conferences on Artificial Intelligence Organization, 72, (2019) Yang, C., Chen, L., Shang, S., Zhu, F., Liu, L., Shao, L.. Toward efficient navigation of massive-scale geo-textual streams. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 4838–4845. International Joint Conferences on Artificial Intelligence Organization, 72, (2019)
39.
Zurück zum Zitat Yang, J., Zhang, Y., Zhou, X., Wang, J., Hu, H., and Xing, C.. A hierarchical framework for top-k location-aware error-tolerant keyword search. In ICDE, pages 986–997, (2019) Yang, J., Zhang, Y., Zhou, X., Wang, J., Hu, H., and Xing, C.. A hierarchical framework for top-k location-aware error-tolerant keyword search. In ICDE, pages 986–997, (2019)
40.
Zurück zum Zitat Yao, B., Li, F., Hadjieleftheriou, M., Hou, K.. Approximate string search in spatial databases. In ICDE, pages 545–556, (2010) Yao, B., Li, F., Hadjieleftheriou, M., Hou, K.. Approximate string search in spatial databases. In ICDE, pages 545–556, (2010)
41.
Zurück zum Zitat C. Zhang, Y. Zhang, W. Zhang, and X. Lin. Inverted linear quadtree: Efficient top k spatial keyword search. In ICDE, pages 901–912, 2013. C. Zhang, Y. Zhang, W. Zhang, and X. Lin. Inverted linear quadtree: Efficient top k spatial keyword search. In ICDE, pages 901–912, 2013.
42.
Zurück zum Zitat Zhang, D., Tan, K.-L., Tung, A.K.H.. Scalable top-k spatial keyword search. In EDBT, pages 359–370, 2013. Zhang, D., Tan, K.-L., Tung, A.K.H.. Scalable top-k spatial keyword search. In EDBT, pages 359–370, 2013.
43.
Zurück zum Zitat Zhang, Y., Li, X., Wang, J., Zhang, Y., Xing, C., and Yuan, X.. An efficient framework for exact set similarity search using tree structure indexes. In ICDE, pages 759–770, 2017. Zhang, Y., Li, X., Wang, J., Zhang, Y., Xing, C., and Yuan, X.. An efficient framework for exact set similarity search using tree structure indexes. In ICDE, pages 759–770, 2017.
44.
Zurück zum Zitat Zhang, Y., Wu, J., Wang, J., Xing, C.. A transformation-based framework for knn set similarity search. IEEE Trans. Knowl. Data Eng., (2019) Zhang, Y., Wu, J., Wang, J., Xing, C.. A transformation-based framework for knn set similarity search. IEEE Trans. Knowl. Data Eng., (2019)
45.
Zurück zum Zitat Zhao, J, Gao, Y., Chen, G., Jensen, C.S., Chen, R., Cai, D.. Reverse top-k geo-social keyword queries in road networks. In ICDE, pages 387–398, (2017) Zhao, J, Gao, Y., Chen, G., Jensen, C.S., Chen, R., Cai, D.. Reverse top-k geo-social keyword queries in road networks. In ICDE, pages 387–398, (2017)
46.
Zurück zum Zitat Zheng, K., Su, H., Zheng, B., Shang, S., Xu, J., Liu, J., and X. Zhou. Interactive top-k spatial keyword queries. In ICDE, pages 423–434, (2015) Zheng, K., Su, H., Zheng, B., Shang, S., Xu, J., Liu, J., and X. Zhou. Interactive top-k spatial keyword queries. In ICDE, pages 423–434, (2015)
47.
Zurück zum Zitat Zheng, Y., Bao, Z., Shou, L., Tung, A.K.H.: INSPIRE: A framework for incremental spatial prefix query relaxation. IEEE Trans. Knowl. Data Eng. 27(7), 1949–1963 (2015)CrossRef Zheng, Y., Bao, Z., Shou, L., Tung, A.K.H.: INSPIRE: A framework for incremental spatial prefix query relaxation. IEEE Trans. Knowl. Data Eng. 27(7), 1949–1963 (2015)CrossRef
48.
Zurück zum Zitat B. Zheng, K. Zheng, X. Xiao, H. Su, H. Yin, X. Zhou, and G. Li. Keyword-aware continuous knn query on road networks. In ICDE, pages 871–882, 2016. B. Zheng, K. Zheng, X. Xiao, H. Su, H. Yin, X. Zhou, and G. Li. Keyword-aware continuous knn query on road networks. In ICDE, pages 871–882, 2016.
Metadaten
Titel
Clustering Enhanced Error-tolerant Top-k Spatio-textual Search
verfasst von
Yong Zhang
Yu Chen
Junye Yang
Jin Wang
Huiqi Hu
Chunxiao Xing
Xiaofang Zhou
Publikationsdatum
28.05.2021
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 4/2021
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-021-00883-6

Weitere Artikel der Ausgabe 4/2021

World Wide Web 4/2021 Zur Ausgabe