Skip to main content
Top

2021 | OriginalPaper | Chapter

Korean-Chinese Bilingual Sentence Alignment Method Based on Character Length

Authors : Qi Wang, Yahui Zhao, Rongyi Cui

Published in: Big Data Analytics for Cyber-Physical System in Smart City

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Sentence-level aligned bilingual parallel corpus is an indispensable and valuable resource for machine translation, translation knowledge acquisition, and bilingual dictionary compilation. Based on the Korean-Chinese parallel corpus, this paper uses a Korean-Chinese sentence alignment algorithm based on character length to realize automatic Korean-Chinese sentence alignment, and proposes a sentence alignment evaluation method. Firstly, preprocess and segment the Korean-Chinese corpus; Secondly, calculate the mean and variance of the corpus distribution based on the Korean Chinese corpus, and use the probability score to find the maximum likelihood probability of the sentence under the framework of dynamic programming; Finally, proposed a sentence alignment judgment method, with the help of Hanjaja tool to convert Sino-Korean words in Korean sentences into Chinese words to form Korean sentences containing Chinese words (abbreviated as C-K sentences), then calculate the Jaccard coefficient between the C-K sentence and the Chinese sentence, by determining the appropriate threshold to automatically determine the alignment or not. Experiments show that the length-based sentence alignment method has a good effect on automatic alignment of Korean-Chinese sentences, and the accuracy of sentence alignment reaches 88.61%. The proposed sentence alignment judgment method is simple and effective.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cai, Z., Sonan, C.: Research on the sentence alignment method based on the combination of anchor point information and sentence length. J. Minorities Teach. College Qinghai Teach. Univ. 27(01), 91–93 (2016). (in Chinese) Cai, Z., Sonan, C.: Research on the sentence alignment method based on the combination of anchor point information and sentence length. J. Minorities Teach. College Qinghai Teach. Univ. 27(01), 91–93 (2016). (in Chinese)
2.
go back to reference Let, Z.: Research on the Alignment Method of Sino-Old Bilingual Sentences. Kunming University of Science and Technology, Kunming (2017). (in Chinese) Let, Z.: Research on the Alignment Method of Sino-Old Bilingual Sentences. Kunming University of Science and Technology, Kunming (2017). (in Chinese)
3.
go back to reference Ni, Y., Xu, H., Cheng, X.: Uyghur Chinese sentence alignment based on multi features and optimal matching. J. Chin. Inf. Process. 30(04), 124–133 (2016). (in Chinese) Ni, Y., Xu, H., Cheng, X.: Uyghur Chinese sentence alignment based on multi features and optimal matching. J. Chin. Inf. Process. 30(04), 124–133 (2016). (in Chinese)
4.
go back to reference Maimaitimin, S., Hou, M., Yibulayin, T.: Chinese-Uyghur sentence alignment method based on anchor sentence pairs. Comput. Eng. 41(04), 166–170 (2015). (in Chinese) Maimaitimin, S., Hou, M., Yibulayin, T.: Chinese-Uyghur sentence alignment method based on anchor sentence pairs. Comput. Eng. 41(04), 166–170 (2015). (in Chinese)
5.
go back to reference Hong, J.P., Cha, J.W.: Korean-English sentence alignment using length and similarity information. In: The Proceedings of the Annual Conference on Human and Language Technology. Human and Language Technology (2010) Hong, J.P., Cha, J.W.: Korean-English sentence alignment using length and similarity information. In: The Proceedings of the Annual Conference on Human and Language Technology. Human and Language Technology (2010)
6.
go back to reference Ling, T., Bi, Y.: Noun phrase alignment in the Korean-Chinese bilingual corpus based on statistics and lexicon. J. Chin. Inf. Process. 32(08), 27–31 (2018). (in Chinese) Ling, T., Bi, Y.: Noun phrase alignment in the Korean-Chinese bilingual corpus based on statistics and lexicon. J. Chin. Inf. Process. 32(08), 27–31 (2018). (in Chinese)
7.
go back to reference Cui, R., Zhao, X.: On Zipf's law in Korean language. J. Chin. Inf. Process. 31(05), 81–84+91 (2017). (in Chinese) Cui, R., Zhao, X.: On Zipf's law in Korean language. J. Chin. Inf. Process. 31(05), 81–84+91 (2017). (in Chinese)
8.
go back to reference Gale, A., Kenneth, C.: A program for aligning sentences in bilingual corpora. Comput. Linguist. 19(1), 75–102 (1993) Gale, A., Kenneth, C.: A program for aligning sentences in bilingual corpora. Comput. Linguist. 19(1), 75–102 (1993)
9.
go back to reference Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. arXiv preprint cmp-lg/9406007 (1994) Wu, D.: Aligning a parallel English-Chinese corpus statistically with lexical criteria. arXiv preprint cmp-lg/9406007 (1994)
10.
11.
go back to reference Song, Z.: Study About Sino-Korea Phonetics in Medieval and Old time. Huazhong University of Science and Technology, Wuhan (2008). (in Chinese) Song, Z.: Study About Sino-Korea Phonetics in Medieval and Old time. Huazhong University of Science and Technology, Wuhan (2008). (in Chinese)
12.
go back to reference Niwattanakul, S., Singthongchai, J., Naenudorn, E., et al.: Using of Jaccard coefficient for keywords similarity. In: The Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1, no. 6, pp. 380–384 (2013) Niwattanakul, S., Singthongchai, J., Naenudorn, E., et al.: Using of Jaccard coefficient for keywords similarity. In: The Proceedings of the International Multiconference of Engineers and Computer Scientists, vol. 1, no. 6, pp. 380–384 (2013)
Metadata
Title
Korean-Chinese Bilingual Sentence Alignment Method Based on Character Length
Authors
Qi Wang
Yahui Zhao
Rongyi Cui
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-33-4572-0_43

Premium Partner