Skip to main content
Erschienen in: World Wide Web 3/2020

28.02.2020

Geographical address representation learning for address matching

verfasst von: Shuangli Shan, Zhixu Li, Qiang Yang, An Liu, Lei Zhao, Guanfeng Liu, Zhigang Chen

Erschienen in: World Wide Web | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Address matching is a crucial task in various location-based businesses like take-out services and express delivery, which aims at identifying addresses referring to the same location in address databases. It is a challenging one due to various possible ways to express the address of a location, especially in Chinese. Traditional address matching approaches relying on string similarities and learning matching rules to identify addresses referring to the same location, could hardly solve the cases with redundant, incomplete or unusual expression of addresses. In this paper, to learn the geographical semantic representations for address strings, we novelly propose to get rich contexts for addresses from the Web through Web search engines, which could strongly enrich the semantic meaning of addresses that could be learned. Apart from that, we propose a two-stage geographical address representation learning model for address matching. In the first stage, we propose to use an encode-decoder architecture to learn the semantic vector representation for each address string where an up-sampling and sub-sampling strategy is applied to solve the problem of address redundancy and incompleteness. The attention mechanism is also applied to the model to highlight important features of addresses in their semantic representations. And in the second stage, we construct a single large graph from the corpus, which contains address elements and addresses as nodes, and the edges between nodes are built by word co-occurrence information to learn embedding representations for all the nodes on the graph. Our empirical study conducted on two real-world address datasets demonstrates that our approach greatly improves both precision (up to 8%) and recall (up to 12%) of the state-of-the-art existing methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (2015) Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org (2015)
2.
Zurück zum Zitat Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., Sima’an, m, K.: Graph convolutional encoders for syntax-aware neural machine translation. arXiv preprint arXiv:1704.04675, (2017) Bastings, J., Titov, I., Aziz, W., Marcheggiani, D., Sima’an, m, K.: Graph convolutional encoders for syntax-aware neural machine translation. arXiv preprint arXiv:1704.04675, (2017)
3.
Zurück zum Zitat Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013) Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)
4.
Zurück zum Zitat Cheng, C.-x., Yu, B.: A rule-based segmenting and matching method for fuzzy chinese addresses [j]. Geography and Geo-Information Science. 3, 007 (2011) Cheng, C.-x., Yu, B.: A rule-based segmenting and matching method for fuzzy chinese addresses [j]. Geography and Geo-Information Science. 3, 007 (2011)
5.
Zurück zum Zitat Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844–3852 (2016) Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844–3852 (2016)
6.
Zurück zum Zitat Ding, Z.-g., Zhang, Z., Li, J.: Improvement on reverse directional maximum matching method based on hash structure for chinese word segmentation. Computer Engineering and Design. 29(12), 3208–3211 (2008) Ding, Z.-g., Zhang, Z., Li, J.: Improvement on reverse directional maximum matching method based on hash structure for chinese word segmentation. Computer Engineering and Design. 29(12), 3208–3211 (2008)
7.
Zurück zum Zitat Drummond, W.J.: Address matching: Gis technology for mapping human activity patterns. J. Am. Plan. Assoc. 61(2), 240–251 (1995)CrossRef Drummond, W.J.: Address matching: Gis technology for mapping human activity patterns. J. Am. Plan. Assoc. 61(2), 240–251 (1995)CrossRef
8.
Zurück zum Zitat Guo, H., Zhu, H., Guo, Z., Zhang, X.X., Su, Z.: Address standardization with latent semantic association. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1155–1164. ACM, (2009) Guo, H., Zhu, H., Guo, Z., Zhang, X.X., Su, Z.: Address standardization with latent semantic association. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1155–1164. ACM, (2009)
9.
Zurück zum Zitat Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163, (2015) Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163, (2015)
10.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Lstm can solve hard long time lag problems. In Advances in neural information processing systems. 473–479 (1997) Hochreiter, S., Schmidhuber, J.: Lstm can solve hard long time lag problems. In Advances in neural information processing systems. 473–479 (1997)
11.
Zurück zum Zitat Hu, Z., Huang, P., Deng, Y., Gao, Y., Xing, E.: Entity hierarchy embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1292–1300, (2015) Hu, Z., Huang, P., Deng, Y., Gao, Y., Xing, E.: Entity hierarchy embedding. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1292–1300, (2015)
12.
Zurück zum Zitat Kaleem, A.B.D.U.L., Ghori, K.M., Khanzada, Z., Malik, M.N.: Address standardization using supervised machine learning. Interpretation. 1(2), 10 (2011) Kaleem, A.B.D.U.L., Ghori, K.M., Khanzada, Z., Malik, M.N.: Address standardization using supervised machine learning. Interpretation. 1(2), 10 (2011)
13.
Zurück zum Zitat Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016) Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
14.
Zurück zum Zitat Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In Adv. Neural Inf. Proces. Syst. 3294–3302 (2015) Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-thought vectors. In Adv. Neural Inf. Proces. Syst. 3294–3302 (2015)
15.
Zurück zum Zitat Kothari, G., Faruquie, T.A., Subramaniam, L.V., Prasad, K.H., Mohania, M.K.. Transfer of supervision for improved address standardization. In Pattern Recognition (ICPR), 20th International Conference on, pages 2178–2181. IEEE, (2010) Kothari, G., Faruquie, T.A., Subramaniam, L.V., Prasad, K.H., Mohania, M.K.. Transfer of supervision for improved address standardization. In Pattern Recognition (ICPR), 20th International Conference on, pages 2178–2181. IEEE, (2010)
16.
Zurück zum Zitat Li, D., Wang, S., Mei, Z.: Approximate address matching. In 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pages 264–269. IEEE, (2010) Li, D., Wang, S., Mei, Z.: Approximate address matching. In 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pages 264–269. IEEE, (2010)
17.
Zurück zum Zitat Li, Q., Han, Z., Wu, X.-M.: Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, (2018) Li, Q., Han, Z., Wu, X.-M.: Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, (2018)
18.
Zurück zum Zitat Luo, M., Huang, H.: New method of chinese address standardization based on finite state machine theory. Application Research of Computers, (2016) Luo, M., Huang, H.: New method of chinese address standardization based on finite state machine theory. Application Research of Computers, (2016)
19.
Zurück zum Zitat Mengjun, K., Qingyun, D., Mingjun, W.: A new method of chinese address extraction based on address tree model. Acta Geodaetica et Cartographica Sinica. 44(1), 99–107 (2015) Mengjun, K., Qingyun, D., Mingjun, W.: A new method of chinese address extraction based on address tree model. Acta Geodaetica et Cartographica Sinica. 44(1), 99–107 (2015)
20.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119 (2013)
21.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543 (2014)
22.
Zurück zum Zitat Pu-le, X., Yang, W., Ya-kun, H., Shao-fen, H., Chuan-xin, Z., Fu-long, C.: Chinese place-name address matching method based on large data analysis and bayesian decision. Computer Science. 9, 050 (2017) Pu-le, X., Yang, W., Ya-kun, H., Shao-fen, H., Chuan-xin, Z., Fu-long, C.: Chinese place-name address matching method based on large data analysis and bayesian decision. Computer Science. 9, 050 (2017)
23.
Zurück zum Zitat Qiu, Y., Li, H., Shen, L., Jiang, Y., Hu. R., Yang, L.: Revisiting correlations between intrinsic and extrinsic evaluations of word embeddings. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pages 209–221. Springer (2018) Qiu, Y., Li, H., Shen, L., Jiang, Y., Hu. R., Yang, L.: Revisiting correlations between intrinsic and extrinsic evaluations of word embeddings. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pages 209–221. Springer (2018)
24.
Zurück zum Zitat Sharma, S., Ratti, R., Arora, I., Solanki, A., Bhatt, G.: Automated parsing of geographical addresses: A multilayer feedforward neural network based approach. In Semantic Computing (ICSC), 2018 IEEE 12th International Conference on, pages 123–130. IEEE (2018) Sharma, S., Ratti, R., Arora, I., Solanki, A., Bhatt, G.: Automated parsing of geographical addresses: A multilayer feedforward neural network based approach. In Semantic Computing (ICSC), 2018 IEEE 12th International Conference on, pages 123–130. IEEE (2018)
25.
Zurück zum Zitat Song, Z.: Address matching algorithm based on chinese natural language understanding [j]. J. Remote Sens. 17(4), 788–801 (2013) Song, Z.: Address matching algorithm based on chinese natural language understanding [j]. J. Remote Sens. 17(4), 788–801 (2013)
26.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112 (2014)
27.
Zurück zum Zitat Thekumparampil, K.K., Wang, C., Oh, S., Li, L.-J.: Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735 (2018) Thekumparampil, K.K., Wang, C., Oh, S., Li, L.-J.: Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735 (2018)
28.
Zurück zum Zitat Tian, Q., Ren, F., Hu, T., Liu, J., Li, R., Qingyun, D.: Using an optimized chinese address matching method to develop a geocoding service: A case study of shenzhen, China. ISPRS International Journal of Geo-Information. 5(5), 65 (2016)CrossRef Tian, Q., Ren, F., Hu, T., Liu, J., Li, R., Qingyun, D.: Using an optimized chinese address matching method to develop a geocoding service: A case study of shenzhen, China. ISPRS International Journal of Geo-Information. 5(5), 65 (2016)CrossRef
29.
Zurück zum Zitat Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596 (2019) Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596 (2019)
30.
Zurück zum Zitat Yong, W., Jiping, L.I.U., Qingsheng, G.U.O., An, L.U.O.: The standardization method of address information for pois from internet based on positional relation. Acta Geodaetica et Cartographica Sinica. 45(5), 623–630 (2016) Yong, W., Jiping, L.I.U., Qingsheng, G.U.O., An, L.U.O.: The standardization method of address information for pois from internet based on positional relation. Acta Geodaetica et Cartographica Sinica. 45(5), 623–630 (2016)
32.
Zurück zum Zitat Zhu, X., Li, X., Zhang, S., Xu, Z., Yu, L., Wang, C.: Graph pca hashing for similarity search. IEEE Transactions on Multimedia. 19(9), 2033–2044 (2017)CrossRef Zhu, X., Li, X., Zhang, S., Xu, Z., Yu, L., Wang, C.: Graph pca hashing for similarity search. IEEE Transactions on Multimedia. 19(9), 2033–2044 (2017)CrossRef
33.
Zurück zum Zitat Zhu, X., Zhang, S., Hu, R., He, W., Lei, C., Zhu, P.: One-step multi-view spectral clustering. IEEE Trans. Knowl. Data Eng. 31(10), 2022–2034 (2019)CrossRef Zhu, X., Zhang, S., Hu, R., He, W., Lei, C., Zhu, P.: One-step multi-view spectral clustering. IEEE Trans. Knowl. Data Eng. 31(10), 2022–2034 (2019)CrossRef
34.
Zurück zum Zitat Zhu, X., Zhang, S., Li, Y., Zhang, J., Yang, L., Fang, Y.: Low-rank sparse subspace for spectral clustering. IEEE Trans. Knowl. Data Eng. 31(8), 1532–1543 (2019)CrossRef Zhu, X., Zhang, S., Li, Y., Zhang, J., Yang, L., Fang, Y.: Low-rank sparse subspace for spectral clustering. IEEE Trans. Knowl. Data Eng. 31(8), 1532–1543 (2019)CrossRef
Metadaten
Titel
Geographical address representation learning for address matching
verfasst von
Shuangli Shan
Zhixu Li
Qiang Yang
An Liu
Lei Zhao
Guanfeng Liu
Zhigang Chen
Publikationsdatum
28.02.2020
Verlag
Springer US
Erschienen in
World Wide Web / Ausgabe 3/2020
Print ISSN: 1386-145X
Elektronische ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-020-00782-2

Weitere Artikel der Ausgabe 3/2020

World Wide Web 3/2020 Zur Ausgabe

Premium Partner