Skip to main content
Top
Published in: Journal of Geographical Systems 2/2022

18-02-2022 | Original Article

Chinese toponym recognition with variant neural structures from social media messages based on BERT methods

Authors: Kai Ma, YongJian Tan, Zhong Xie, Qinjun Qiu, Siqiong Chen

Published in: Journal of Geographical Systems | Issue 2/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Many natural language tasks related to geographic information retrieval (GIR) require toponym recognition, and identifying Chinese toponyms from social media messages to share real-time information is a critical problem for many practical applications, such as natural disaster response and geolocating. In this article, we focused on toponym recognition from social media messages in Chinese. While existing off-the-shelf Chinese named entity recognition (NER) tools could be applied to identify toponyms, these approaches cannot address a variety of language irregularities taken from social media messages, including location name abbreviations, informal sentence structures and combination toponyms. We present a deep neural network named BERT-BiLSTM-CRF, which extends a basic bidirectional recurrent neural network model (BiLSTM) with the pretraining bidirectional encoder representation from transformers (BERT) representation to handle the toponym recognition task in Chinese text. Using three datasets taken from lists of alternative location names, the experimental results showed that the proposed model can significantly outperform previous Chinese NER models/algorithms and a set of state-of-the-art deep learning models.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Alex B, Byrne K, Grover C, Tobin R (2015) Adapting the Edinburgh geoparser for historical georeferencing. Int J Humanities Arts Comput 9(1):15–35CrossRef Alex B, Byrne K, Grover C, Tobin R (2015) Adapting the Edinburgh geoparser for historical georeferencing. Int J Humanities Arts Comput 9(1):15–35CrossRef
go back to reference Arribas-Bel D, Green M, Rowe F, Singleton A (2021) Open data products-A framework for creating valuable analysis ready data. J Geogr Syst 23(4):497–514CrossRef Arribas-Bel D, Green M, Rowe F, Singleton A (2021) Open data products-A framework for creating valuable analysis ready data. J Geogr Syst 23(4):497–514CrossRef
go back to reference Borges KA, Davis CA Jr, Laender AH, Medeiros CB (2011) Ontology-driven discovery of geospatial evidence in web pages. GeoInformatica 15:609–631CrossRef Borges KA, Davis CA Jr, Laender AH, Medeiros CB (2011) Ontology-driven discovery of geospatial evidence in web pages. GeoInformatica 15:609–631CrossRef
go back to reference Chen Y, Ouyang Y, Li WJ, Zheng DQ, Zhao TJ (2010) Using deep belief nets for Chinese named entity categorization. In: Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, 16 July 2010; pp. 102–109. Chen Y, Ouyang Y, Li WJ, Zheng DQ, Zhao TJ (2010) Using deep belief nets for Chinese named entity categorization. In: Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, 16 July 2010; pp. 102–109.
go back to reference Delboni TM, Borges KAV, Laender AHF, Davis CA Jr (2007) Semantic expansion of geographic web queries based on natural language positioning expressions. Trans GIS 11:377–397CrossRef Delboni TM, Borges KAV, Laender AHF, Davis CA Jr (2007) Semantic expansion of geographic web queries based on natural language positioning expressions. Trans GIS 11:377–397CrossRef
go back to reference DeLozier G, Baldridge J, London L (2015) Gazetteer-independent toponym resolution using geographic word profiles. In: Twenty-Ninth AAAI conference on artificial intelligence DeLozier G, Baldridge J, London L (2015) Gazetteer-independent toponym resolution using geographic word profiles. In: Twenty-Ninth AAAI conference on artificial intelligence
go back to reference Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
go back to reference Gelernter J, Balaji S (2013) An algorithm for local geoparsing of microtext. GeoInformatica 17(4):635–667CrossRef Gelernter J, Balaji S (2013) An algorithm for local geoparsing of microtext. GeoInformatica 17(4):635–667CrossRef
go back to reference Gelernter J, Mushegian N (2011) Geo-parsing messages from microtext. Trans GIS 15(6):753–773CrossRef Gelernter J, Mushegian N (2011) Geo-parsing messages from microtext. Trans GIS 15(6):753–773CrossRef
go back to reference Gritta M, Pilehvar MT, Collier N (2018) Which Melbourne? Augmenting geocoding with maps. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia (Vol. 1: Long Papers, pp. 1285–1296). Stroudsburg, PA: ACL. Gritta M, Pilehvar MT, Collier N (2018) Which Melbourne? Augmenting geocoding with maps. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia (Vol. 1: Long Papers, pp. 1285–1296). Stroudsburg, PA: ACL.
go back to reference Hill, LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. In: Borbinha J, Baker T (eds) Research and advanced technology for digital libraries. Springer Lecture Notes in Computer Science, Germany, Berlin, Vol. 1923, pp. 280–290 Hill, LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. In: Borbinha J, Baker T (eds) Research and advanced technology for digital libraries. Springer Lecture Notes in Computer Science, Germany, Berlin, Vol. 1923, pp. 280–290
go back to reference Hu Y, Mao H, McKenzie G (2019) A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. Int J Geogr Inf Sci 33(4):714–738CrossRef Hu Y, Mao H, McKenzie G (2019) A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. Int J Geogr Inf Sci 33(4):714–738CrossRef
go back to reference Ju Y, Adams B, Janowicz K, Hu Y, Yan B, McKenzie G (2016) Things and strings: improving place name disambiguation from short texts by combining entity co-occurrence with topic modeling. In: Blomqvist E, Ciancarini P, Poggi F, Vitali F (eds) Knowledge engineering and knowledge management: EKAW 2016 (Lecture notes in computer science, vol 10024. Springer, Cham, Switzerland, pp 353–367CrossRef Ju Y, Adams B, Janowicz K, Hu Y, Yan B, McKenzie G (2016) Things and strings: improving place name disambiguation from short texts by combining entity co-occurrence with topic modeling. In: Blomqvist E, Ciancarini P, Poggi F, Vitali F (eds) Knowledge engineering and knowledge management: EKAW 2016 (Lecture notes in computer science, vol 10024. Springer, Cham, Switzerland, pp 353–367CrossRef
go back to reference Karimzadeh M, Huang W, Banerjee S, Wallgrün JO, Hardisty F, Pezanowski S, MacEachren AM (2013) GeoTxt: a web API to leverage place references in text. In: Proceedings of the Seventh Workshop on Geographic Information Retrieval, Orlando, FL (pp. 72–73). New York, NY: ACM. Karimzadeh M, Huang W, Banerjee S, Wallgrün JO, Hardisty F, Pezanowski S, MacEachren AM (2013) GeoTxt: a web API to leverage place references in text. In: Proceedings of the Seventh Workshop on Geographic Information Retrieval, Orlando, FL (pp. 72–73). New York, NY: ACM.
go back to reference Karimzadeh M, Pezanowski S, MacEachren AM, Wallgrün JO (2019) GeoTxt: A scalable geoparsing system for unstructured text geolocation. Trans GIS 23(1):118–136CrossRef Karimzadeh M, Pezanowski S, MacEachren AM, Wallgrün JO (2019) GeoTxt: A scalable geoparsing system for unstructured text geolocation. Trans GIS 23(1):118–136CrossRef
go back to reference Levow G-A (2006) The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp 108–117 Levow G-A (2006) The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp 108–117
go back to reference Li H, Wang M, Baldwin T, Tomko M, Vasardani M (2019) UniMelb at SemEval-2019 Task 12: multi-model combination for toponym resolution. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 1313–1318). Stroudsburg, PA: ACL. Li H, Wang M, Baldwin T, Tomko M, Vasardani M (2019) UniMelb at SemEval-2019 Task 12: multi-model combination for toponym resolution. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 1313–1318). Stroudsburg, PA: ACL.
go back to reference Lovelace R (2021) Open source tools for geographic analysis in transport planning. J Geogr Syst 23(4):547–578CrossRef Lovelace R (2021) Open source tools for geographic analysis in transport planning. J Geogr Syst 23(4):547–578CrossRef
go back to reference Mcdonough K, Moncla L, Camp MVD (2019) Named entity recognition goes to old regime france: geographic text analysis for early modern french corpora. Int J Geograph Inf ence(1). Mcdonough K, Moncla L, Camp MVD (2019) Named entity recognition goes to old regime france: geographic text analysis for early modern french corpora. Int J Geograph Inf ence(1).
go back to reference McCurley KS (2001) Geospatial mapping and navigation of the web. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 221–229. McCurley KS (2001) Geospatial mapping and navigation of the web. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 221–229.
go back to reference Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech, vol 2, No. 3, pp 1045–1048 Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech, vol 2, No. 3, pp 1045–1048
go back to reference Mikolov T, Deoras A, Povey D, Burget L, Černocký J (2011) Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE. pp. 196–201. Mikolov T, Deoras A, Povey D, Burget L, Černocký J (2011) Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE. pp. 196–201.
go back to reference Moura TH VM, Davis CA Jr, Fonseca FT (2017) Reference data enhancement for geographic information retrieval using linked data. Trans GIS 21(4):683–700CrossRef Moura TH VM, Davis CA Jr, Fonseca FT (2017) Reference data enhancement for geographic information retrieval using linked data. Trans GIS 21(4):683–700CrossRef
go back to reference Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (pp. 1532–1543). Stroudsburg, PA: ACL. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (pp. 1532–1543). Stroudsburg, PA: ACL.
go back to reference Purves RS, Clough P, Jones CB, Hall MH, Murdock V (2018) Geographic information retrieval: progress and challenges in spatial search of text. Found Trends Inf Retr 12(2&3):164–318CrossRef Purves RS, Clough P, Jones CB, Hall MH, Murdock V (2018) Geographic information retrieval: progress and challenges in spatial search of text. Found Trends Inf Retr 12(2&3):164–318CrossRef
go back to reference Qiu Q, Xie Z, Wu L, Li W (2019) Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst Appl 125:157–169CrossRef Qiu Q, Xie Z, Wu L, Li W (2019) Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst Appl 125:157–169CrossRef
go back to reference Quercini G, Samet H (2014) Uncovering the spatial relatedness in Wikipedia. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, Texas, pp 153–162 Quercini G, Samet H (2014) Uncovering the spatial relatedness in Wikipedia. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, Texas, pp 153–162
go back to reference Santos R, Murrieta-Flores P, Calado P, Martins B (2018) Toponym matching through deep neural networks. Int J Geogr Inf Sci 32(2):324–348CrossRef Santos R, Murrieta-Flores P, Calado P, Martins B (2018) Toponym matching through deep neural networks. Int J Geogr Inf Sci 32(2):324–348CrossRef
go back to reference Speriosu M, Baldridge J (2013) Text-driven toponym resolution using indirect supervision. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (pp. 1466–1476). Stroudsburg, PA: ACL Speriosu M, Baldridge J (2013) Text-driven toponym resolution using indirect supervision. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (pp. 1466–1476). Stroudsburg, PA: ACL
go back to reference Wang J, Hu Y (2019) Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers. Trans GIS 23(6):1393–1419CrossRef Wang J, Hu Y (2019) Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers. Trans GIS 23(6):1393–1419CrossRef
go back to reference Wang S, Zhang X, Ye P, Du M (2018) Deep belief networks based toponym recognition for Chinese text. ISPRS Int J Geo-Inf 7(6):217CrossRef Wang S, Zhang X, Ye P, Du M (2018) Deep belief networks based toponym recognition for Chinese text. ISPRS Int J Geo-Inf 7(6):217CrossRef
go back to reference Weissenbacher D, Magge A, O’Connor K, Scotch M, Gonzalez G (2019) SemEval-2019 Task 12: Toponym resolution in scientific papers. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 907–916). Stroudsburg, PA: ACL Weissenbacher D, Magge A, O’Connor K, Scotch M, Gonzalez G (2019) SemEval-2019 Task 12: Toponym resolution in scientific papers. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 907–916). Stroudsburg, PA: ACL
go back to reference Yadav V, Laparra E, Wang T-T, Surdeanu M, Bethard S (2019) University of Arizona at SemEval-2019 Task 12: Deep-affix named entity recognition of geolocation entities. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 1319–1323). Stroudsburg, PA: ACL Yadav V, Laparra E, Wang T-T, Surdeanu M, Bethard S (2019) University of Arizona at SemEval-2019 Task 12: Deep-affix named entity recognition of geolocation entities. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 1319–1323). Stroudsburg, PA: ACL
go back to reference Yi X, Raghavan H, Leggetter C (2009) Discovering users’ specific geo intention in web search. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 481–490 Yi X, Raghavan H, Leggetter C (2009) Discovering users’ specific geo intention in web search. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 481–490
go back to reference Zhou D, Qian M, Hua M, Liu D, Tang X (2011) Structural analysis and computation of Chinese toponyms. Int J Knowl Lang Process 2(3):36–47 Zhou D, Qian M, Hua M, Liu D, Tang X (2011) Structural analysis and computation of Chinese toponyms. Int J Knowl Lang Process 2(3):36–47
Metadata
Title
Chinese toponym recognition with variant neural structures from social media messages based on BERT methods
Authors
Kai Ma
YongJian Tan
Zhong Xie
Qinjun Qiu
Siqiong Chen
Publication date
18-02-2022
Publisher
Springer Berlin Heidelberg
Published in
Journal of Geographical Systems / Issue 2/2022
Print ISSN: 1435-5930
Electronic ISSN: 1435-5949
DOI
https://doi.org/10.1007/s10109-022-00375-9

Other articles of this Issue 2/2022

Journal of Geographical Systems 2/2022 Go to the issue

Premium Partner