Skip to main content
Erschienen in: Journal of Geographical Systems 2/2022

18.02.2022 | Original Article

Chinese toponym recognition with variant neural structures from social media messages based on BERT methods

verfasst von: Kai Ma, YongJian Tan, Zhong Xie, Qinjun Qiu, Siqiong Chen

Erschienen in: Journal of Geographical Systems | Ausgabe 2/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many natural language tasks related to geographic information retrieval (GIR) require toponym recognition, and identifying Chinese toponyms from social media messages to share real-time information is a critical problem for many practical applications, such as natural disaster response and geolocating. In this article, we focused on toponym recognition from social media messages in Chinese. While existing off-the-shelf Chinese named entity recognition (NER) tools could be applied to identify toponyms, these approaches cannot address a variety of language irregularities taken from social media messages, including location name abbreviations, informal sentence structures and combination toponyms. We present a deep neural network named BERT-BiLSTM-CRF, which extends a basic bidirectional recurrent neural network model (BiLSTM) with the pretraining bidirectional encoder representation from transformers (BERT) representation to handle the toponym recognition task in Chinese text. Using three datasets taken from lists of alternative location names, the experimental results showed that the proposed model can significantly outperform previous Chinese NER models/algorithms and a set of state-of-the-art deep learning models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Alex B, Byrne K, Grover C, Tobin R (2015) Adapting the Edinburgh geoparser for historical georeferencing. Int J Humanities Arts Comput 9(1):15–35CrossRef Alex B, Byrne K, Grover C, Tobin R (2015) Adapting the Edinburgh geoparser for historical georeferencing. Int J Humanities Arts Comput 9(1):15–35CrossRef
Zurück zum Zitat Arribas-Bel D, Green M, Rowe F, Singleton A (2021) Open data products-A framework for creating valuable analysis ready data. J Geogr Syst 23(4):497–514CrossRef Arribas-Bel D, Green M, Rowe F, Singleton A (2021) Open data products-A framework for creating valuable analysis ready data. J Geogr Syst 23(4):497–514CrossRef
Zurück zum Zitat Borges KA, Davis CA Jr, Laender AH, Medeiros CB (2011) Ontology-driven discovery of geospatial evidence in web pages. GeoInformatica 15:609–631CrossRef Borges KA, Davis CA Jr, Laender AH, Medeiros CB (2011) Ontology-driven discovery of geospatial evidence in web pages. GeoInformatica 15:609–631CrossRef
Zurück zum Zitat Brunsdon C, Comber A (2020) Big issues for big data: challenges for critical spatial data analytics. arXiv preprint arXiv:2007.11281 Brunsdon C, Comber A (2020) Big issues for big data: challenges for critical spatial data analytics. arXiv preprint arXiv:​2007.​11281
Zurück zum Zitat Chen Y, Ouyang Y, Li WJ, Zheng DQ, Zhao TJ (2010) Using deep belief nets for Chinese named entity categorization. In: Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, 16 July 2010; pp. 102–109. Chen Y, Ouyang Y, Li WJ, Zheng DQ, Zhao TJ (2010) Using deep belief nets for Chinese named entity categorization. In: Proceedings of the 2010 Named Entities Workshop, Uppsala, Sweden, 16 July 2010; pp. 102–109.
Zurück zum Zitat Delboni TM, Borges KAV, Laender AHF, Davis CA Jr (2007) Semantic expansion of geographic web queries based on natural language positioning expressions. Trans GIS 11:377–397CrossRef Delboni TM, Borges KAV, Laender AHF, Davis CA Jr (2007) Semantic expansion of geographic web queries based on natural language positioning expressions. Trans GIS 11:377–397CrossRef
Zurück zum Zitat DeLozier G, Baldridge J, London L (2015) Gazetteer-independent toponym resolution using geographic word profiles. In: Twenty-Ninth AAAI conference on artificial intelligence DeLozier G, Baldridge J, London L (2015) Gazetteer-independent toponym resolution using geographic word profiles. In: Twenty-Ninth AAAI conference on artificial intelligence
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
Zurück zum Zitat Gelernter J, Balaji S (2013) An algorithm for local geoparsing of microtext. GeoInformatica 17(4):635–667CrossRef Gelernter J, Balaji S (2013) An algorithm for local geoparsing of microtext. GeoInformatica 17(4):635–667CrossRef
Zurück zum Zitat Gelernter J, Mushegian N (2011) Geo-parsing messages from microtext. Trans GIS 15(6):753–773CrossRef Gelernter J, Mushegian N (2011) Geo-parsing messages from microtext. Trans GIS 15(6):753–773CrossRef
Zurück zum Zitat Gritta M, Pilehvar MT, Collier N (2018) Which Melbourne? Augmenting geocoding with maps. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia (Vol. 1: Long Papers, pp. 1285–1296). Stroudsburg, PA: ACL. Gritta M, Pilehvar MT, Collier N (2018) Which Melbourne? Augmenting geocoding with maps. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia (Vol. 1: Long Papers, pp. 1285–1296). Stroudsburg, PA: ACL.
Zurück zum Zitat Hill, LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. In: Borbinha J, Baker T (eds) Research and advanced technology for digital libraries. Springer Lecture Notes in Computer Science, Germany, Berlin, Vol. 1923, pp. 280–290 Hill, LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. In: Borbinha J, Baker T (eds) Research and advanced technology for digital libraries. Springer Lecture Notes in Computer Science, Germany, Berlin, Vol. 1923, pp. 280–290
Zurück zum Zitat Hu Y, Mao H, McKenzie G (2019) A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. Int J Geogr Inf Sci 33(4):714–738CrossRef Hu Y, Mao H, McKenzie G (2019) A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. Int J Geogr Inf Sci 33(4):714–738CrossRef
Zurück zum Zitat Ju Y, Adams B, Janowicz K, Hu Y, Yan B, McKenzie G (2016) Things and strings: improving place name disambiguation from short texts by combining entity co-occurrence with topic modeling. In: Blomqvist E, Ciancarini P, Poggi F, Vitali F (eds) Knowledge engineering and knowledge management: EKAW 2016 (Lecture notes in computer science, vol 10024. Springer, Cham, Switzerland, pp 353–367CrossRef Ju Y, Adams B, Janowicz K, Hu Y, Yan B, McKenzie G (2016) Things and strings: improving place name disambiguation from short texts by combining entity co-occurrence with topic modeling. In: Blomqvist E, Ciancarini P, Poggi F, Vitali F (eds) Knowledge engineering and knowledge management: EKAW 2016 (Lecture notes in computer science, vol 10024. Springer, Cham, Switzerland, pp 353–367CrossRef
Zurück zum Zitat Karimzadeh M, Huang W, Banerjee S, Wallgrün JO, Hardisty F, Pezanowski S, MacEachren AM (2013) GeoTxt: a web API to leverage place references in text. In: Proceedings of the Seventh Workshop on Geographic Information Retrieval, Orlando, FL (pp. 72–73). New York, NY: ACM. Karimzadeh M, Huang W, Banerjee S, Wallgrün JO, Hardisty F, Pezanowski S, MacEachren AM (2013) GeoTxt: a web API to leverage place references in text. In: Proceedings of the Seventh Workshop on Geographic Information Retrieval, Orlando, FL (pp. 72–73). New York, NY: ACM.
Zurück zum Zitat Karimzadeh M, Pezanowski S, MacEachren AM, Wallgrün JO (2019) GeoTxt: A scalable geoparsing system for unstructured text geolocation. Trans GIS 23(1):118–136CrossRef Karimzadeh M, Pezanowski S, MacEachren AM, Wallgrün JO (2019) GeoTxt: A scalable geoparsing system for unstructured text geolocation. Trans GIS 23(1):118–136CrossRef
Zurück zum Zitat Levow G-A (2006) The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp 108–117 Levow G-A (2006) The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp 108–117
Zurück zum Zitat Li H, Wang M, Baldwin T, Tomko M, Vasardani M (2019) UniMelb at SemEval-2019 Task 12: multi-model combination for toponym resolution. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 1313–1318). Stroudsburg, PA: ACL. Li H, Wang M, Baldwin T, Tomko M, Vasardani M (2019) UniMelb at SemEval-2019 Task 12: multi-model combination for toponym resolution. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 1313–1318). Stroudsburg, PA: ACL.
Zurück zum Zitat Lovelace R (2021) Open source tools for geographic analysis in transport planning. J Geogr Syst 23(4):547–578CrossRef Lovelace R (2021) Open source tools for geographic analysis in transport planning. J Geogr Syst 23(4):547–578CrossRef
Zurück zum Zitat Mcdonough K, Moncla L, Camp MVD (2019) Named entity recognition goes to old regime france: geographic text analysis for early modern french corpora. Int J Geograph Inf ence(1). Mcdonough K, Moncla L, Camp MVD (2019) Named entity recognition goes to old regime france: geographic text analysis for early modern french corpora. Int J Geograph Inf ence(1).
Zurück zum Zitat McCurley KS (2001) Geospatial mapping and navigation of the web. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 221–229. McCurley KS (2001) Geospatial mapping and navigation of the web. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 221–229.
Zurück zum Zitat Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech, vol 2, No. 3, pp 1045–1048 Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech, vol 2, No. 3, pp 1045–1048
Zurück zum Zitat Mikolov T, Deoras A, Povey D, Burget L, Černocký J (2011) Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE. pp. 196–201. Mikolov T, Deoras A, Povey D, Burget L, Černocký J (2011) Strategies for training large scale neural network language models. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE. pp. 196–201.
Zurück zum Zitat Moura TH VM, Davis CA Jr, Fonseca FT (2017) Reference data enhancement for geographic information retrieval using linked data. Trans GIS 21(4):683–700CrossRef Moura TH VM, Davis CA Jr, Fonseca FT (2017) Reference data enhancement for geographic information retrieval using linked data. Trans GIS 21(4):683–700CrossRef
Zurück zum Zitat Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (pp. 1532–1543). Stroudsburg, PA: ACL. Pennington J, Socher R, Manning CD (2014) GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar (pp. 1532–1543). Stroudsburg, PA: ACL.
Zurück zum Zitat Purves RS, Clough P, Jones CB, Hall MH, Murdock V (2018) Geographic information retrieval: progress and challenges in spatial search of text. Found Trends Inf Retr 12(2&3):164–318CrossRef Purves RS, Clough P, Jones CB, Hall MH, Murdock V (2018) Geographic information retrieval: progress and challenges in spatial search of text. Found Trends Inf Retr 12(2&3):164–318CrossRef
Zurück zum Zitat Qiu Q, Xie Z, Wu L, Li W (2019) Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst Appl 125:157–169CrossRef Qiu Q, Xie Z, Wu L, Li W (2019) Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst Appl 125:157–169CrossRef
Zurück zum Zitat Quercini G, Samet H (2014) Uncovering the spatial relatedness in Wikipedia. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, Texas, pp 153–162 Quercini G, Samet H (2014) Uncovering the spatial relatedness in Wikipedia. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, Texas, pp 153–162
Zurück zum Zitat Santos R, Murrieta-Flores P, Calado P, Martins B (2018) Toponym matching through deep neural networks. Int J Geogr Inf Sci 32(2):324–348CrossRef Santos R, Murrieta-Flores P, Calado P, Martins B (2018) Toponym matching through deep neural networks. Int J Geogr Inf Sci 32(2):324–348CrossRef
Zurück zum Zitat Speriosu M, Baldridge J (2013) Text-driven toponym resolution using indirect supervision. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (pp. 1466–1476). Stroudsburg, PA: ACL Speriosu M, Baldridge J (2013) Text-driven toponym resolution using indirect supervision. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria (pp. 1466–1476). Stroudsburg, PA: ACL
Zurück zum Zitat Wang J, Hu Y (2019) Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers. Trans GIS 23(6):1393–1419CrossRef Wang J, Hu Y (2019) Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers. Trans GIS 23(6):1393–1419CrossRef
Zurück zum Zitat Wang S, Zhang X, Ye P, Du M (2018) Deep belief networks based toponym recognition for Chinese text. ISPRS Int J Geo-Inf 7(6):217CrossRef Wang S, Zhang X, Ye P, Du M (2018) Deep belief networks based toponym recognition for Chinese text. ISPRS Int J Geo-Inf 7(6):217CrossRef
Zurück zum Zitat Weissenbacher D, Magge A, O’Connor K, Scotch M, Gonzalez G (2019) SemEval-2019 Task 12: Toponym resolution in scientific papers. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 907–916). Stroudsburg, PA: ACL Weissenbacher D, Magge A, O’Connor K, Scotch M, Gonzalez G (2019) SemEval-2019 Task 12: Toponym resolution in scientific papers. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 907–916). Stroudsburg, PA: ACL
Zurück zum Zitat Yadav V, Laparra E, Wang T-T, Surdeanu M, Bethard S (2019) University of Arizona at SemEval-2019 Task 12: Deep-affix named entity recognition of geolocation entities. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 1319–1323). Stroudsburg, PA: ACL Yadav V, Laparra E, Wang T-T, Surdeanu M, Bethard S (2019) University of Arizona at SemEval-2019 Task 12: Deep-affix named entity recognition of geolocation entities. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN (pp. 1319–1323). Stroudsburg, PA: ACL
Zurück zum Zitat Yi X, Raghavan H, Leggetter C (2009) Discovering users’ specific geo intention in web search. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 481–490 Yi X, Raghavan H, Leggetter C (2009) Discovering users’ specific geo intention in web search. In: Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 481–490
Zurück zum Zitat Zhou D, Qian M, Hua M, Liu D, Tang X (2011) Structural analysis and computation of Chinese toponyms. Int J Knowl Lang Process 2(3):36–47 Zhou D, Qian M, Hua M, Liu D, Tang X (2011) Structural analysis and computation of Chinese toponyms. Int J Knowl Lang Process 2(3):36–47
Metadaten
Titel
Chinese toponym recognition with variant neural structures from social media messages based on BERT methods
verfasst von
Kai Ma
YongJian Tan
Zhong Xie
Qinjun Qiu
Siqiong Chen
Publikationsdatum
18.02.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
Journal of Geographical Systems / Ausgabe 2/2022
Print ISSN: 1435-5930
Elektronische ISSN: 1435-5949
DOI
https://doi.org/10.1007/s10109-022-00375-9

Weitere Artikel der Ausgabe 2/2022

Journal of Geographical Systems 2/2022 Zur Ausgabe

Premium Partner