Skip to main content
Top

2021 | OriginalPaper | Chapter

Development and Study of an Approach for Determining Incorrect Words of the Kazakh Language in Semi-structured Data

Authors : Yntymak Abdrazakh, Aliya Turganbayeva, Diana Rakhimova

Published in: Advances in Computational Collective Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Research in the field of computer linguistics is relevant due to the rapid growth of information in natural languages on the Internet and social networks. Currently, there is an increase in the amount of information that humans and machines create in natural language. Information retrieval systems, dialog systems, machine translation, and automatic resume tools, spelling check modules analyze and process texts in natural languages. Thus, the range of automatic word processing systems is wide and covers a variety of tasks. One of the most important tasks of natural language processing (NLP) is to find errors in texts and including words, identify and correct incorrect words. The article provides an overview of semi-structured data, methods, and technologies for detecting incorrect words in natural languages. An approach for identifying incorrect words in the Kazakh language was developed and the features and capabilities of this approach were analyzed. A comparative analysis of texts on the Internet and social networks and of technologies that identify incorrect words in natural languages has been carried out.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Rakhimova, D.R. (ed.) Computational processing of the Kazakh language: collection of scientific works (materials). Qazaq Universiteti, Almaty, p. 146 (2020) Rakhimova, D.R. (ed.) Computational processing of the Kazakh language: collection of scientific works (materials). Qazaq Universiteti, Almaty, p. 146 (2020)
2.
go back to reference Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a# twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 368–378. Association for Computational Linguistics (2011) Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a# twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 368–378. Association for Computational Linguistics (2011)
3.
go back to reference Farra, N., et al.: Generalized Character-Level Spelling Error Correction, vol. 2, pp. 161–167. ACL (2014) Farra, N., et al.: Generalized Character-Level Spelling Error Correction, vol. 2, pp. 161–167. ACL (2014)
4.
go back to reference Hladek, D., et al.: Survey of automatic spelling correction. Electronics 9(10), 1670 (2020)CrossRef Hladek, D., et al.: Survey of automatic spelling correction. Electronics 9(10), 1670 (2020)CrossRef
5.
go back to reference Peter, B.: Semistructured data. In: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 11–15 May 1997, Tucson, Arizona, United States, pp. 117–121 (1997) Peter, B.: Semistructured data. In: Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 11–15 May 1997, Tucson, Arizona, United States, pp. 117–121 (1997)
6.
go back to reference Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293. Association for Computational Linguistics (2000) Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293. Association for Computational Linguistics (2000)
7.
go back to reference Ahmed, F., Luca, E.W.D., Nürnberger, A.: Revised N-Gram based automatic spelling correction tool to improve retrieval effectiveness. Polibits 40, 39–48 (2009)CrossRef Ahmed, F., Luca, E.W.D., Nürnberger, A.: Revised N-Gram based automatic spelling correction tool to improve retrieval effectiveness. Polibits 40, 39–48 (2009)CrossRef
8.
go back to reference Kaufmann, M., Kalita, J.: Syntactic normalization of twitter messages. In: International Conference on Natural Language Processing, Kharagpur, India (2010) Kaufmann, M., Kalita, J.: Syntactic normalization of twitter messages. In: International Conference on Natural Language Processing, Kharagpur, India (2010)
9.
go back to reference Rakhimova, D.R.: Research of models and methods of semantics of machine translation from Russian into Kazakh language. Dissertation, Almaty (2014) Rakhimova, D.R.: Research of models and methods of semantics of machine translation from Russian into Kazakh language. Dissertation, Almaty (2014)
10.
12.
go back to reference Shaalan, K., Aref, R., Fahmy, A.: An approach for analyzing and correcting spelling errors for non-native Arabic learners. In: 2010 The 7th International Conference on Informatics and Systems (INFOS). Published 2010, Computer Science (2010) Shaalan, K., Aref, R., Fahmy, A.: An approach for analyzing and correcting spelling errors for non-native Arabic learners. In: 2010 The 7th International Conference on Informatics and Systems (INFOS). Published 2010, Computer Science (2010)
14.
go back to reference Kumar, R., Bala, M., Sourabh, K.: A study of spell checking techniques for Indian languages. JK Res. J. Math. Comput. Sci. 1(1), 105–113 (2018) Kumar, R., Bala, M., Sourabh, K.: A study of spell checking techniques for Indian languages. JK Res. J. Math. Comput. Sci. 1(1), 105–113 (2018)
15.
go back to reference Rakhimova, D., Turganbayeva, A.: Approach to extract keywords and keyphrases of text resources and documents in the Kazakh language. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawinski, B., Vossen, G. (eds.) ICCCI 2020. LNCS (LNAI), vol. 12496, pp. 719–729. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63007-2_56CrossRef Rakhimova, D., Turganbayeva, A.: Approach to extract keywords and keyphrases of text resources and documents in the Kazakh language. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawinski, B., Vossen, G. (eds.) ICCCI 2020. LNCS (LNAI), vol. 12496, pp. 719–729. Springer, Cham (2020). https://​doi.​org/​10.​1007/​978-3-030-63007-2_​56CrossRef
16.
go back to reference Tukeyev, U.A., Turganbaeva, A.O.: Lexicon-free stemming for the Kazakh language. In: Materials of the International Scientific Conference “Computer science and Applied Mathematics” dedicated to the 25th anniversary of the Independence of the Republic of Kazakhstan and the 25th anniversary of the Institute of Information and Computational Technologies, Part II, Almaty, September 21–24, 2016 (2016) Tukeyev, U.A., Turganbaeva, A.O.: Lexicon-free stemming for the Kazakh language. In: Materials of the International Scientific Conference “Computer science and Applied Mathematics” dedicated to the 25th anniversary of the Independence of the Republic of Kazakhstan and the 25th anniversary of the Institute of Information and Computational Technologies, Part II, Almaty, September 21–24, 2016 (2016)
Metadata
Title
Development and Study of an Approach for Determining Incorrect Words of the Kazakh Language in Semi-structured Data
Authors
Yntymak Abdrazakh
Aliya Turganbayeva
Diana Rakhimova
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-88113-9_43

Premium Partner