Skip to main content
Top
Published in:
Cover of the book

2023 | OriginalPaper | Chapter

BEDSpell: Spelling Error Correction Using BERT-Based Masked Language Model and Edit Distance

Authors : Fatemeh Tohidian, Amin Kashiri, Fariba Lotfi

Published in: Service-Oriented Computing – ICSOC 2022 Workshops

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The spelling correction problem, the task of automatically correcting misspellings in a text, is critical in natural language processing (NLP). Although it can be considered a standalone task, in most cases, it is an integral component of various NLP tasks as a preprocessing step since a dataset with typos can lead to erroneous results. Many previous automatic spelling correctors use a dictionary, independently search the word in a predefined list of words, and recommend the most similar one without considering the context. Even though these models’ output may be a correctly spelled word, it could be semantically incorrect. Therefore, some correctors consider the context when correcting typos based on language models. However, only employing the language model is insufficient, and the corrected word should be similar to the misspelled word. In our approach, we select a candidate for the typo based on masked language model output, character-level similarities, and edit distance. Exploiting the combination of the masked language model, character-level similarities, and edit distance assists us in recommending similar context-related candidates. We have used recall (correction rate) as our evaluation metric, and the results demonstrate a considerable improvement compared with previous studies.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
2.
go back to reference Hládek, D., Staš, J., Pleva, M.: Survey of automatic spelling correction. Electronics 9(10), 1670 (2020)CrossRef Hládek, D., Staš, J., Pleva, M.: Survey of automatic spelling correction. Electronics 9(10), 1670 (2020)CrossRef
3.
go back to reference Fahda, A., Purwarianti, A.: A statistical and rule-based spelling and grammar checker for Indonesian text. In: 2017 International Conference on Data and Software Engineering (ICoDSE), pp. 1–6. IEEE (2017) Fahda, A., Purwarianti, A.: A statistical and rule-based spelling and grammar checker for Indonesian text. In: 2017 International Conference on Data and Software Engineering (ICoDSE), pp. 1–6. IEEE (2017)
4.
go back to reference Yunus, A., Masum, M.: A context free spell correction method using supervised machine learning algorithms. Int. J. Comput. Appl. 975, 8887 (2020) Yunus, A., Masum, M.: A context free spell correction method using supervised machine learning algorithms. Int. J. Comput. Appl. 975, 8887 (2020)
5.
go back to reference Huang, G., Chen, J., Sun, Z.: A correction method of word spelling mistake for English text. In: Journal of Physics: Conference Series, vol. 1693, no. 1, p. 012118. IOP Publishing (2020) Huang, G., Chen, J., Sun, Z.: A correction method of word spelling mistake for English text. In: Journal of Physics: Conference Series, vol. 1693, no. 1, p. 012118. IOP Publishing (2020)
6.
go back to reference Carlson, A., Fette, I.: Memory-based context-sensitive spelling correction at web scale. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 166–171. IEEE (2007) Carlson, A., Fette, I.: Memory-based context-sensitive spelling correction at web scale. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 166–171. IEEE (2007)
7.
go back to reference Bassil, Y., Alwani, M.: Context-sensitive spelling correction using google web 1t 5-gram information. arXiv preprint arXiv:1204.5852 (2012) Bassil, Y., Alwani, M.: Context-sensitive spelling correction using google web 1t 5-gram information. arXiv preprint arXiv:​1204.​5852 (2012)
8.
go back to reference Hu, Y., Jing, X., Ko, Y., Rayz, J.T.: Misspelling correction with pre-trained contextual language model. In: 2020 IEEE 19th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 144–149. IEEE (2020) Hu, Y., Jing, X., Ko, Y., Rayz, J.T.: Misspelling correction with pre-trained contextual language model. In: 2020 IEEE 19th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 144–149. IEEE (2020)
9.
go back to reference Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
10.
go back to reference Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: ACL (2020) Zhang, S., Huang, H., Liu, J., Li, H.: Spelling error correction with soft-masked BERT. In: ACL (2020)
11.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
12.
go back to reference Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710 (1966) Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710 (1966)
Metadata
Title
BEDSpell: Spelling Error Correction Using BERT-Based Masked Language Model and Edit Distance
Authors
Fatemeh Tohidian
Amin Kashiri
Fariba Lotfi
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-26507-5_1

Premium Partner