Skip to main content
Top

2020 | OriginalPaper | Chapter

MTNER: A Corpus for Mongolian Tourism Named Entity Recognition

Authors : Xiao Cheng, Weihua Wang, Feilong Bao, Guanglai Gao

Published in: Machine Translation

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Name Entity Recognition is the essential tool for machine translation. Traditional Named Entity Recognition focuses on the person, location and organization names. However, there is still a lack of data to identify travel-related named entities, especially in Mongolian. In this paper, we introduce a newly corpus for Mongolian Tourism Named Entity Recognition (MTNER), consisting of 16,000 sentences annotated with 18 entity types. We trained in-domain BERT representations with the 10 GB of unannotated Mongolian corpus, and trained a NER model based on the BERT tagging model with the newly corpus. Which achieves an overall 82.09 F1 score on Mongolian Tourism Named Entity Recognition and lead to an absolute increase of +3.54 F1 score over the traditional CRF Named Entity Recognition method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Tabassum, J., Maddela, M., Xu, W., et al.: Code and Named Entity Recognition in StackOverflow. arXiv (2020) Tabassum, J., Maddela, M., Xu, W., et al.: Code and Named Entity Recognition in StackOverflow. arXiv (2020)
2.
go back to reference Wang, W, Bao, F., Gao, G.: Learning morpheme representation for mongolian named entity recognition. Neural Process. Lett 50, 2647–2664 (2019) Wang, W, Bao, F., Gao, G.: Learning morpheme representation for mongolian named entity recognition. Neural Process. Lett 50, 2647–2664 (2019)
3.
go back to reference Wang, W, Bao, F., Gao, G.: Mongolian named entity recognition with bidirectional recurrent neural networks. In: The 28th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2016), pp. 495–500 (2016) Wang, W, Bao, F., Gao, G.: Mongolian named entity recognition with bidirectional recurrent neural networks. In: The 28th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2016), pp. 495–500 (2016)
4.
go back to reference Marcus, M.P., Marcinkiewicz, M.A., Santorini, B., et al.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993) Marcus, M.P., Marcinkiewicz, M.A., Santorini, B., et al.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
5.
go back to reference Devlin, J., Chang, M., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: North American chapter of the Association for Computational Linguistics, pp. 4171–4186 (2019) Devlin, J., Chang, M., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: North American chapter of the Association for Computational Linguistics, pp. 4171–4186 (2019)
6.
go back to reference Nadeau, D., Sekine, S. A survey of named entity recognition and classification. Lingvae Investigationes. 30(1), 3–26 (2007) Nadeau, D., Sekine, S. A survey of named entity recognition and classification. Lingvae Investigationes. 30(1), 3–26 (2007)
7.
go back to reference Geng, X.: Research and Construction of the Map of Mongolian and Chinese Bilingual Knowledge for Tourism (2019) Geng, X.: Research and Construction of the Map of Mongolian and Chinese Bilingual Knowledge for Tourism (2019)
8.
go back to reference Cao, Y., Hu, Z., Chua, T., et al.: Low-resource name tagging learned with weakly labeled data. In: International Joint Conference on Natural Language Processing, pp. 261–270 (2019) Cao, Y., Hu, Z., Chua, T., et al.: Low-resource name tagging learned with weakly labeled data. In: International Joint Conference on Natural Language Processing, pp. 261–270 (2019)
9.
go back to reference Zhou, G., Named entity recognition using an HMM-based chunk tagger. In: Proceedings of North American chapter of the Association for Computational Linguistics 2002, pp. 473–480 (2002) Zhou, G., Named entity recognition using an HMM-based chunk tagger. In: Proceedings of North American chapter of the Association for Computational Linguistics 2002, pp. 473–480 (2002)
10.
go back to reference Kudo, T., Matsumoto, Y.: Chunking with support vector machines. North American chapter of the Association for Computational Linguistics, 1508.01991 (2001) Kudo, T., Matsumoto, Y.: Chunking with support vector machines. North American chapter of the Association for Computational Linguistics, 1508.01991 (2001)
11.
go back to reference Lafferty, J., Mccallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning (ICML), pp. 282–289 (2002) Lafferty, J., Mccallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning (ICML), pp. 282–289 (2002)
12.
go back to reference Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Comput. Sci. (2015) Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Comput. Sci. (2015)
14.
go back to reference Yin, X., Zhao, H., Zhao, J., Yao, W., Huang, Z.: Named entity recognition in military field by multi-neural network collaboration. J. Tsinghua Univ. 60(08), 648–655 (2020) Yin, X., Zhao, H., Zhao, J., Yao, W., Huang, Z.: Named entity recognition in military field by multi-neural network collaboration. J. Tsinghua Univ. 60(08), 648–655 (2020)
16.
go back to reference Guo, J., Xue, Z., Yu, Z., et al.: Named entity identification in tourism based on cascading conditions. Chinese J. Inf. Technol. 023(005), 47–52 (2009) Guo, J., Xue, Z., Yu, Z., et al.: Named entity identification in tourism based on cascading conditions. Chinese J. Inf. Technol. 023(005), 47–52 (2009)
17.
go back to reference Xue, Z., Guo, J., Yu, Z., et al.: Identification of Chinese tourist attractions based on HMM. J. Kunming Univ. Sci. Technol. 34(006), 44–48 (2009) Xue, Z., Guo, J., Yu, Z., et al.: Identification of Chinese tourist attractions based on HMM. J. Kunming Univ. Sci. Technol. 34(006), 44–48 (2009)
18.
go back to reference Dongdong, L.: Named entity recognition for medical field (2018) Dongdong, L.: Named entity recognition for medical field (2018)
19.
go back to reference Zhao, P., Sun, L., Wan, Y., Ge, N.: BERT + BiLSTM + CRF based named entity recognition of scenic spots in Chinese. Comput. Syst. Appl. 29(06), 169–174 (2020) Zhao, P., Sun, L., Wan, Y., Ge, N.: BERT + BiLSTM + CRF based named entity recognition of scenic spots in Chinese. Comput. Syst. Appl. 29(06), 169–174 (2020)
20.
go back to reference Wang, C.: The Research and construction of Yi corpus for information processing. Int. J. New Dev. Eng. Soc. 3(4), 57–63 (2019) Wang, C.: The Research and construction of Yi corpus for information processing. Int. J. New Dev. Eng. Soc. 3(4), 57–63 (2019)
21.
go back to reference Lin, B., Yip, P.C.: On the construction and application of a platform-based corpus in tourism translation teaching. Int. J. Translation Interpretation Appl. Linguist. 2(2), 30–41 (2020) Lin, B., Yip, P.C.: On the construction and application of a platform-based corpus in tourism translation teaching. Int. J. Translation Interpretation Appl. Linguist. 2(2), 30–41 (2020)
22.
go back to reference Ren, Z., Hou, H., Jia, T., Wu, Z., Bai, T., Lei, Y.: Application of particle size segmentation in the translation of mongolian and Chinese neural machines. Chinese J. Inf. Technol. 33(01), 85–92 (2019) Ren, Z., Hou, H., Jia, T., Wu, Z., Bai, T., Lei, Y.: Application of particle size segmentation in the translation of mongolian and Chinese neural machines. Chinese J. Inf. Technol. 33(01), 85–92 (2019)
23.
go back to reference Cui, J., Zheng, D., Wang, D., Li, T.: Entity recognition for chrysanthemum named poems based on deep learning model. Information Theory and Practice pp. 1–11 (2020) Cui, J., Zheng, D., Wang, D., Li, T.: Entity recognition for chrysanthemum named poems based on deep learning model. Information Theory and Practice pp. 1–11 (2020)
24.
go back to reference Liu, G.: Construction of parallel corpus for legal translation. Overseas English. (10) 32–33 (2020) Liu, G.: Construction of parallel corpus for legal translation. Overseas English. (10) 32–33 (2020)
25.
go back to reference Li, J., Sun, A., Han, J., et al.: A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 1. (2020) Li, J., Sun, A., Han, J., et al.: A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 1. (2020)
26.
go back to reference Wang, W., Bao, F., Gao, G.: Mongolian named entity recognition system with rich features. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 505–512 (2016) Wang, W., Bao, F., Gao, G.: Mongolian named entity recognition system with rich features. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 505–512 (2016)
Metadata
Title
MTNER: A Corpus for Mongolian Tourism Named Entity Recognition
Authors
Xiao Cheng
Weihua Wang
Feilong Bao
Guanglai Gao
Copyright Year
2020
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-33-6162-1_2

Premium Partner