Skip to main content

2019 | OriginalPaper | Buchkapitel

A Context-Free Spelling Correction Method for Classical Mongolian

verfasst von : Min Lu, Feilong Bao, Guanglai Gao

Erschienen in: Natural Language Processing and Chinese Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Spelling errors in the classical Mongolian text are mainly caused by misuse of polyphonic letters which present the same shape in the certain position of the word. About half to three-quarters of the classical Mongolian words are misspellings which have the correct appearances but wrong codes. In this paper, we code the Mongolian words by glyph codes to map the words to their shapes one-to-one. In addition, we also proposed the correction of out-of-vocabulary words (OOV) based on the Evolved Transformer by formalizing the correction task as a translation from misspellings to target spellings. The experimental results show that this approach achieves the new state-of-the-art performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
If the black-pixel area of the line and other ones between them can completely cover the black pixels of the current candidate, the line is considered as the dependent line of the current candidate.
 
2
Control characters are used in conjunction with Mongolian letters to control the word shapes. They mainly refer to three Mongolian Free Variation Selector: “U+180B”, “U+180C”, “U+180D”.
 
Literatur
1.
Zurück zum Zitat Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016) Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)
2.
Zurück zum Zitat Chollampatt, S., Ng, H.T.: Connecting the dots: towards human-level grammatical error correction. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 327–333 (2017) Chollampatt, S., Ng, H.T.: Connecting the dots: towards human-level grammatical error correction. In: Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 327–333 (2017)
3.
Zurück zum Zitat GB25914-2010: Information technology of traditional Mongolian nominal characters, presentation characters and control characters using the rules. China National Standardization Technical Committee, Beijing (2010) GB25914-2010: Information technology of traditional Mongolian nominal characters, presentation characters and control characters using the rules. China National Standardization Technical Committee, Beijing (2010)
4.
Zurück zum Zitat Hua, S.: Modern Mongolian automatic proofreading system–MHAHP. J. Inner Mongolia Univ. Philos. Soc. Sci. Ed. 4, 49–53 (1997) Hua, S.: Modern Mongolian automatic proofreading system–MHAHP. J. Inner Mongolia Univ. Philos. Soc. Sci. Ed. 4, 49–53 (1997)
5.
Zurück zum Zitat Jiang, B.: Research on rule-based method of Mongolian automatic correction. Ph.D. thesis (2014) Jiang, B.: Research on rule-based method of Mongolian automatic correction. Ph.D. thesis (2014)
6.
Zurück zum Zitat Kernighan, M.D., Church, K.W., Gale, W.A.: A spelling correction program based on a noisy channel model. In: Proceedings of the 13th Conference on Computational Linguistics, vol. 2, pp. 205–210. Association for Computational Linguistics (1990) Kernighan, M.D., Church, K.W., Gale, W.A.: A spelling correction program based on a noisy channel model. In: Proceedings of the 13th Conference on Computational Linguistics, vol. 2, pp. 205–210. Association for Computational Linguistics (1990)
7.
Zurück zum Zitat Li, H., Wang, Y., Liu, X., Sheng, Z., Wei, S.: Spelling error correction using a nested RNN model and pseudo training data. arXiv preprint arXiv:1811.00238 (2018) Li, H., Wang, Y., Liu, X., Sheng, Z., Wei, S.: Spelling error correction using a nested RNN model and pseudo training data. arXiv preprint arXiv:​1811.​00238 (2018)
8.
Zurück zum Zitat Ling, W., Trancoso, I., Dyer, C., Black, A.W.: Character-based neural machine translation. arXiv preprint arXiv:1511.04586 (2015) Ling, W., Trancoso, I., Dyer, C., Black, A.W.: Character-based neural machine translation. arXiv preprint arXiv:​1511.​04586 (2015)
10.
Zurück zum Zitat Maas, A., Xie, Z., Jurafsky, D., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 345–354 (2015) Maas, A., Xie, Z., Jurafsky, D., Ng, A.: Lexicon-free conversational speech recognition with neural networks. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 345–354 (2015)
11.
Zurück zum Zitat Sakaguchi, K., Duh, K., Post, M., Van Durme, B.: Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017) Sakaguchi, K., Duh, K., Post, M., Van Durme, B.: Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
12.
Zurück zum Zitat Saluja, R., Adiga, D., Chaudhuri, P., Ramakrishnan, G., Carman, M.: Error detection and corrections in indic OCR using LSTMs. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 17–22. IEEE (2017) Saluja, R., Adiga, D., Chaudhuri, P., Ramakrishnan, G., Carman, M.: Error detection and corrections in indic OCR using LSTMs. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 17–22. IEEE (2017)
13.
Zurück zum Zitat Si, L.: Mongolian proofreading algorithm based on non-deterministic finite automata. J. Chin. Inf. Process. 23(6), 110–116 (2009) Si, L.: Mongolian proofreading algorithm based on non-deterministic finite automata. J. Chin. Inf. Process. 23(6), 110–116 (2009)
15.
Zurück zum Zitat Su, C., Hou, H., Yang, P., Yuan, H.: Based on the statistical translation framework of the Mongolian automatic spelling correction method. J. Chin. Inf. Process. 175–179 (2013) Su, C., Hou, H., Yang, P., Yuan, H.: Based on the statistical translation framework of the Mongolian automatic spelling correction method. J. Chin. Inf. Process. 175–179 (2013)
16.
Zurück zum Zitat Toutanova, K., Moore, R.C.: Pronunciation Modeling for Improved Spelling Correction (2002) Toutanova, K., Moore, R.C.: Pronunciation Modeling for Improved Spelling Correction (2002)
18.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
20.
Zurück zum Zitat Xie, Z., Avati, A., Arivazhagan, N., Jurafsky, D., Ng, A.Y.: Neural language correction with character-based attention. arXiv preprint arXiv:1603.09727 (2016) Xie, Z., Avati, A., Arivazhagan, N., Jurafsky, D., Ng, A.Y.: Neural language correction with character-based attention. arXiv preprint arXiv:​1603.​09727 (2016)
22.
Zurück zum Zitat Yu, A.W., Dohan, D., Luong, M.T., Zhao, R., Chen, K., Norouzi, M., Le, Q.V.: QANet: combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:1804.09541 (2018) Yu, A.W., Dohan, D., Luong, M.T., Zhao, R., Chen, K., Norouzi, M., Le, Q.V.: QANet: combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:​1804.​09541 (2018)
Metadaten
Titel
A Context-Free Spelling Correction Method for Classical Mongolian
verfasst von
Min Lu
Feilong Bao
Guanglai Gao
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-32236-6_50