Skip to main content
Top

2019 | OriginalPaper | Chapter

A Context-Free Spelling Correction Method for Classical Mongolian

Authors : Min Lu, Feilong Bao, Guanglai Gao

Published in: Natural Language Processing and Chinese Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Spelling errors in the classical Mongolian text are mainly caused by misuse of polyphonic letters which present the same shape in the certain position of the word. About half to three-quarters of the classical Mongolian words are misspellings which have the correct appearances but wrong codes. In this paper, we code the Mongolian words by glyph codes to map the words to their shapes one-to-one. In addition, we also proposed the correction of out-of-vocabulary words (OOV) based on the Evolved Transformer by formalizing the correction task as a translation from misspellings to target spellings. The experimental results show that this approach achieves the new state-of-the-art performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
If the black-pixel area of the line and other ones between them can completely cover the black pixels of the current candidate, the line is considered as the dependent line of the current candidate.
 
2
Control characters are used in conjunction with Mongolian letters to control the word shapes. They mainly refer to three Mongolian Free Variation Selector: “U+180B”, “U+180C”, “U+180D”.
 
Literature
1.
go back to reference Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016) Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)
2.
go back to reference Chollampatt, S., Ng, H.T.: Connecting the dots: towards human-level grammatical error correction. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 327–333 (2017) Chollampatt, S., Ng, H.T.: Connecting the dots: towards human-level grammatical error correction. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 327–333 (2017)
3.
go back to reference GB25914-2010: Information technology of traditional Mongolian nominal characters, presentation characters and control characters using the rules. China National Standardization Technical Committee, Beijing (2010) GB25914-2010: Information technology of traditional Mongolian nominal characters, presentation characters and control characters using the rules. China National Standardization Technical Committee, Beijing (2010)
4.
go back to reference Hua, S.: Modern Mongolian automatic proofreading system–MHAHP. J. Inner Mongolia Univ. Philos. Soc. Sci. Ed. 4, 49–53 (1997) Hua, S.: Modern Mongolian automatic proofreading system–MHAHP. J. Inner Mongolia Univ. Philos. Soc. Sci. Ed. 4, 49–53 (1997)
5.
go back to reference Jiang, B.: Research on rule-based method of Mongolian automatic correction. Ph.D. thesis (2014) Jiang, B.: Research on rule-based method of Mongolian automatic correction. Ph.D. thesis (2014)
6.
go back to reference Kernighan, M.D., Church, K.W., Gale, W.A.: A spelling correction program based on a noisy channel model. In: Proceedings of the 13th Conference on Computational Linguistics, vol. 2, pp. 205–210. Association for Computational Linguistics (1990) Kernighan, M.D., Church, K.W., Gale, W.A.: A spelling correction program based on a noisy channel model. In: Proceedings of the 13th Conference on Computational Linguistics, vol. 2, pp. 205–210. Association for Computational Linguistics (1990)
7.
go back to reference Li, H., Wang, Y., Liu, X., Sheng, Z., Wei, S.: Spelling error correction using a nested RNN model and pseudo training data. arXiv preprint arXiv:1811.00238 (2018) Li, H., Wang, Y., Liu, X., Sheng, Z., Wei, S.: Spelling error correction using a nested RNN model and pseudo training data. arXiv preprint arXiv:​1811.​00238 (2018)
8.
10.
go back to reference Maas, A., Xie, Z., Jurafsky, D., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 345–354 (2015) Maas, A., Xie, Z., Jurafsky, D., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 345–354 (2015)
11.
go back to reference Sakaguchi, K., Duh, K., Post, M., Van Durme, B.: Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017) Sakaguchi, K., Duh, K., Post, M., Van Durme, B.: Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
12.
go back to reference Saluja, R., Adiga, D., Chaudhuri, P., Ramakrishnan, G., Carman, M.: Error detection and corrections in indic OCR using LSTMs. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 17–22. IEEE (2017) Saluja, R., Adiga, D., Chaudhuri, P., Ramakrishnan, G., Carman, M.: Error detection and corrections in indic OCR using LSTMs. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 17–22. IEEE (2017)
13.
go back to reference Si, L.: Mongolian proofreading algorithm based on non-deterministic finite automata. J. Chin. Inf. Process. 23(6), 110–116 (2009) Si, L.: Mongolian proofreading algorithm based on non-deterministic finite automata. J. Chin. Inf. Process. 23(6), 110–116 (2009)
15.
go back to reference Su, C., Hou, H., Yang, P., Yuan, H.: Based on the statistical translation framework of the Mongolian automatic spelling correction method. J. Chin. Inf. Process. 175–179 (2013) Su, C., Hou, H., Yang, P., Yuan, H.: Based on the statistical translation framework of the Mongolian automatic spelling correction method. J. Chin. Inf. Process. 175–179 (2013)
16.
go back to reference Toutanova, K., Moore, R.C.: Pronunciation Modeling for Improved Spelling Correction (2002) Toutanova, K., Moore, R.C.: Pronunciation Modeling for Improved Spelling Correction (2002)
18.
go back to reference Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
20.
go back to reference Xie, Z., Avati, A., Arivazhagan, N., Jurafsky, D., Ng, A.Y.: Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727 (2016) Xie, Z., Avati, A., Arivazhagan, N., Jurafsky, D., Ng, A.Y.: Neural language correction with character-based attention. arXiv preprint arXiv:​1603.​09727 (2016)
22.
go back to reference Yu, A.W., Dohan, D., Luong, M.T., Zhao, R., Chen, K., Norouzi, M., Le, Q.V.: QANet: combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:1804.09541 (2018) Yu, A.W., Dohan, D., Luong, M.T., Zhao, R., Chen, K., Norouzi, M., Le, Q.V.: QANet: combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:​1804.​09541 (2018)
Metadata
Title
A Context-Free Spelling Correction Method for Classical Mongolian
Authors
Min Lu
Feilong Bao
Guanglai Gao
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-32236-6_50

Premium Partner