Skip to main content
Top

2019 | OriginalPaper | Chapter

Phonetic String Matching for Languages with Cyrillic Alphabet

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The usage of phonetic similarity in comparison of textual strings and elimination of misprints is one of significant issues in philology. It is widely used in automatic text checking. Nowadays most of phonetic algorithms are designed for English language words processing. The quality of comparison may be decreased for non-English languages especially for languages, which have rich morphology and use non-Latin alphabet symbols, e.g. East Slavic languages with Cyrillic letters. We propose an approach to phonetic comparison of Russian language words. It is based on detection letters and letter sequences that have similar pronunciation according to rules of the language. The resultant phonetic representation of the words are coded by prime numbers. The efficiency of the reviewed algorithm is considered in the paper. The algorithm was adopted for Mongolian language phonetic processing.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Yandex keyword statistics service.
 
Literature
1.
go back to reference Storeya, V.C., Songb, I.-Y.: Big data technologies and management: what conceptual modelling can do. Data Know. Eng. 108, 50–67 (2017)CrossRef Storeya, V.C., Songb, I.-Y.: Big data technologies and management: what conceptual modelling can do. Data Know. Eng. 108, 50–67 (2017)CrossRef
2.
go back to reference Cubberley Russian, P.: A Linguistic Introduction, 396 p. Cambridge Press (2002) Cubberley Russian, P.: A Linguistic Introduction, 396 p. Cambridge Press (2002)
3.
go back to reference Parmar, V.P., Kumbharana, C.K.: Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing it with existing algorithm(s). Int. J. Comput. Appl. 98(19), 45–49 (2014). (0975 — 8887) Parmar, V.P., Kumbharana, C.K.: Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing it with existing algorithm(s). Int. J. Comput. Appl. 98(19), 45–49 (2014). (0975 — 8887)
4.
go back to reference Zahoranský, D., Polasek, I.: Text search of surnames in some slavic and other morphologically rich languages using rule based phonetic algorithms. IEEE/ACM Trans. Audio Speech Lang. Proces. (T–ASL), 553–563. IEEE (2015) Zahoranský, D., Polasek, I.: Text search of surnames in some slavic and other morphologically rich languages using rule based phonetic algorithms. IEEE/ACM Trans. Audio Speech Lang. Proces. (T–ASL), 553–563. IEEE (2015)
5.
go back to reference Orr, K.: Data quality and systems theory. Commun. ACM 41(2), 66–71 (1998)CrossRef Orr, K.: Data quality and systems theory. Commun. ACM 41(2), 66–71 (1998)CrossRef
6.
go back to reference Skripnik, Y.N., Smolenskaya, T.M.: Phonetics of modern Russian Language . Skripnik, Y.N. (ed.) Stavropol — VoSIGI (2010). 152 p. (in Russian) Skripnik, Y.N., Smolenskaya, T.M.: Phonetics of modern Russian Language https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbb_HTML.gif . Skripnik, Y.N. (ed.) Stavropol — VoSIGI (2010). 152 p. (in Russian)
7.
go back to reference Valgina, N.S., Rozental’, D.E., Fomina, M.I.: Modern Russian Language: Textbook , 6th edn. In: Valgina, N.S. (ed.) . Moscow Logos (2002). 528 p. (in Russian) Valgina, N.S., Rozental’, D.E., Fomina, M.I.: Modern Russian Language: Textbook https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbc_HTML.gif , 6th edn. In: Valgina, N.S. (ed.) . Moscow Logos (2002). 528 p. (in Russian)
8.
go back to reference Parubchenko, L.B.: Hypercorrection errors . Russian Literature 4, 23–27 (2005). (in Russian) Parubchenko, L.B.: Hypercorrection errors https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbd_HTML.gif . Russian Literature 4, 23–27 (2005). (in Russian)
9.
go back to reference GOST R 52535.1-2006. Identification cards. Machine readable travel documents. Part 1 Machine Readable Passports. National Standard of the Russian Federation . Moscow, Russia (2006). 18 p. (in Russian) GOST R 52535.1-2006. Identification cards. Machine readable travel documents. Part 1 Machine Readable Passports. National Standard of the Russian Federation https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbe_HTML.gif https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbf_HTML.gif https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbg_HTML.gif . Moscow, Russia (2006). 18 p. (in Russian)
10.
go back to reference Paramonov, V.V., Shigarov, A.O., Ruzhnikov, G.M., Belykh, P.V.: Polyphon: an algorithm for phonetic string matching in russian language. In: Proceeding of the 22nd International Conference Information and Software Thechnologies, ICTIST 2016. Communications in Computer Science, vol. 639, pp. 568–579 (2016) Paramonov, V.V., Shigarov, A.O., Ruzhnikov, G.M., Belykh, P.V.: Polyphon: an algorithm for phonetic string matching in russian language. In: Proceeding of the 22nd International Conference Information and Software Thechnologies, ICTIST 2016. Communications in Computer Science, vol. 639, pp. 568–579 (2016)
11.
go back to reference Alotaibi, Y., Meftah, A.: Review of distinctive phonetic features and the Arabic share in related modern research. Turk. J. Electr. Eng. Comput. Sci. 21(5), 1426–1439 (2013)CrossRef Alotaibi, Y., Meftah, A.: Review of distinctive phonetic features and the Arabic share in related modern research. Turk. J. Electr. Eng. Comput. Sci. 21(5), 1426–1439 (2013)CrossRef
13.
go back to reference Ivanova, T.F.: New orthoepic dictionary of Russian. Pronunciation. Accent. Grammatical forms , 2nd edn. Russian language-Media (2005). 893 p. (in Russian) Ivanova, T.F.: New orthoepic dictionary of Russian. Pronunciation. Accent. Grammatical forms https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbh_HTML.gif https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbi_HTML.gif , 2nd edn. Russian language-Media (2005). 893 p. (in Russian)
14.
go back to reference Zhirmunsky, V.: National Language and social dialects . The State Publisher of Fiction, Moscow (1936). 300 p. (in Russian) Zhirmunsky, V.: National Language and social dialects https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbj_HTML.gif . The State Publisher of Fiction, Moscow (1936). 300 p. (in Russian)
15.
go back to reference Ozhegov, S.I.: Dictionary of Russian language. About 53000 words . In: Skvortsova L.I. (ed.) 24 edn. Oniks, World and Education, Moscow (2007). 1200 p. (in Russian) Ozhegov, S.I.: Dictionary of Russian language. About 53000 words https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbk_HTML.gif . In: Skvortsova L.I. (ed.) 24 edn. Oniks, World and Education, Moscow (2007). 1200 p. (in Russian)
16.
go back to reference Kasatkin, L.L.: Modern Russian dialectics and literary phonetics as a source for the history of the Russian language . Nauka, Moscow (1999). 528 p. (in Russian) Kasatkin, L.L.: Modern Russian dialectics and literary phonetics as a source for the history of the Russian language https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbl_HTML.gif https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbm_HTML.gif . Nauka, Moscow (1999). 528 p. (in Russian)
17.
go back to reference Budnjam, S., Paramonov, V.V., Ruzhnikov, G.M.: Phonetic strings comparison with particularities of the Mongolian language. Scientific Notes of the University of Science of Mongolia , Ulaanbaatar, N 1 , pp. 40–47 (2017). (in Russian) Budnjam, S., Paramonov, V.V., Ruzhnikov, G.M.: Phonetic strings comparison with particularities of the Mongolian language. Scientific Notes of the University of Science of Mongolia https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbn_HTML.gif https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-99981-4_28/MediaObjects/466741_1_En_28_Figbo_HTML.gif , Ulaanbaatar, N 1 , pp. 40–47 (2017). (in Russian)
18.
go back to reference Damaševičius, R., Kapociute-Dzikine, J., Wozniak, M.: Towards Rhythmicity analysis of text using empirical mode decomposition. In: Proceeding of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2017), vol. 1, pp. 310–317. KDIR (2017) Damaševičius, R., Kapociute-Dzikine, J., Wozniak, M.: Towards Rhythmicity analysis of text using empirical mode decomposition. In: Proceeding of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2017), vol. 1, pp. 310–317. KDIR (2017)
Metadata
Title
Phonetic String Matching for Languages with Cyrillic Alphabet
Authors
Viacheslav Paramonov
Alexey Shigarov
Gennady Ruzhnikov
Evgeny Cherkashin
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-319-99981-4_28

Premium Partner