Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 4/2015

01-08-2015 | Original Article

A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint

Authors: Yuanyuan Mo, Jianyi Guo, Zhengtao Yu, Lin Luo, Shengxiang Gao

Published in: International Journal of Machine Learning and Cybernetics | Issue 4/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

It is difficult to achieve auto-alignment between Vietnamese and Chinese, because their syntax and structure are quite different. In this case we present a novel method for the Vietnamese-Chinese word alignment which merges a variety of feature constraint models. In this article, an improved model based on the Vietnamese-Chinese progressive structure and offset features of word sequence is described. From this model which is trained by a log-linear model framework, and with parameters trained by the minimum error rate algorithm, the result of the Vietnamese-Chinese auto-alignment is obtained. The basic model of the experiments is IBM Model 3, and as experimental results suggest, this bilingual word alignment method for Vietnamese and Chinese performs well and precision, recall rates are increased by 28.57 and 25.02 %, AER is reduced by 14.25 %.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Wang XZ, He YL, Wang DD (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. Cybern IEEE Trans 44(1):21–39CrossRef Wang XZ, He YL, Wang DD (2014) Non-naive bayesian classifiers for classification problems with continuous attributes. Cybern IEEE Trans 44(1):21–39CrossRef
2.
go back to reference Wang XZ, Wang R, Feng HM, Wang HC (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620CrossRefMATH Wang XZ, Wang R, Feng HM, Wang HC (2014) A new approach to classifier fusion based on upper integral. IEEE Trans Cybern 44(5):620CrossRefMATH
3.
go back to reference Jiang J, Yan X, Yu Z, Guo J, and Tian W (2014) A Chinese expert disambiguation method based on semi-supervised graph clustering. Intern J Mach Learn Cybern. doi:10.1007/s13042-014-0255-z Jiang J, Yan X, Yu Z, Guo J, and Tian W (2014) A Chinese expert disambiguation method based on semi-supervised graph clustering. Intern J Mach Learn Cybern. doi:10.​1007/​s13042-014-0255-z
4.
go back to reference Riley D and Gildea D (2012) Improving the IBM alignment models using variational bayes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol 2. Association for Computational Linguistics, pp 306–310 Riley D and Gildea D (2012) Improving the IBM alignment models using variational bayes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol 2. Association for Computational Linguistics, pp 306–310
5.
go back to reference Cherry C and Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 427–436 Cherry C and Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 427–436
6.
go back to reference Tang J, Gentzler E (2009) Globalisation, networks and translation: a Chinese perspective. Perspect Stud Transl 16(3–4):169–182CrossRef Tang J, Gentzler E (2009) Globalisation, networks and translation: a Chinese perspective. Perspect Stud Transl 16(3–4):169–182CrossRef
7.
go back to reference Chu C, Nakazawa T, Kawahara D, and Kurohashi S (2012) Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese machine translation. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT’12) Chu C, Nakazawa T, Kawahara D, and Kurohashi S (2012) Exploiting shared Chinese characters in Chinese word segmentation optimization for Chinese-Japanese machine translation. In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation (EAMT’12)
8.
go back to reference Wang Z, Dong S, and Guo Y (2012) Machine translation of Japanese-Chinese for conditional sentences based on templates. In: Proceedings of 2012 international conference on measurement, information and control, vol 1, pp 397–400 Wang Z, Dong S, and Guo Y (2012) Machine translation of Japanese-Chinese for conditional sentences based on templates. In: Proceedings of 2012 international conference on measurement, information and control, vol 1, pp 397–400
9.
go back to reference Le HP and Ho TV (2008) A maximum entropy approach to sentence boundary detection of Vietnamese texts. In: IEEE international conference on research, innovation and vision for the future-RIVF 2008 Le HP and Ho TV (2008) A maximum entropy approach to sentence boundary detection of Vietnamese texts. In: IEEE international conference on research, innovation and vision for the future-RIVF 2008
10.
go back to reference Huyên NTM, Roussanaly A, and Vinh, HT (2008) A hybrid approach to word segmentation of Vietnamese texts. In: language and automata theory and applications. Springer Berlin Heidelberg, pp 240–249 Huyên NTM, Roussanaly A, and Vinh, HT (2008) A hybrid approach to word segmentation of Vietnamese texts. In: language and automata theory and applications. Springer Berlin Heidelberg, pp 240–249
11.
go back to reference Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311 Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
12.
go back to reference Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51CrossRefMATH Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51CrossRefMATH
13.
go back to reference Blunsom P and Cohn T (2006) Discriminative word alignment with conditional random fields. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 65–72 Blunsom P and Cohn T (2006) Discriminative word alignment with conditional random fields. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp 65–72
14.
go back to reference Tufiş D, Ion R, Ceauşu A, and Ştefánescu D (2005) Combined word alignments. In: Proceedings of the ACL workshop on building and using parallel texts. Association for Computational Linguistics, pp 107–110 Tufiş D, Ion R, Ceauşu A, and Ştefánescu D (2005) Combined word alignments. In: Proceedings of the ACL workshop on building and using parallel texts. Association for Computational Linguistics, pp 107–110
15.
go back to reference Liu Y, Liu Q, Lin S (2010) Discriminative word alignment by linear modeling. Comput Linguist 36(3):303–339CrossRef Liu Y, Liu Q, Lin S (2010) Discriminative word alignment by linear modeling. Comput Linguist 36(3):303–339CrossRef
16.
go back to reference Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol 1, Association for Computational Linguistics, pp 160–167 Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol 1, Association for Computational Linguistics, pp 160–167
17.
go back to reference Och FJ and Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 295–302 Och FJ and Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 295–302
Metadata
Title
A bilingual word alignment algorithm of Vietnamese-Chinese based on feature constraint
Authors
Yuanyuan Mo
Jianyi Guo
Zhengtao Yu
Lin Luo
Shengxiang Gao
Publication date
01-08-2015
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 4/2015
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-014-0293-6

Other articles of this Issue 4/2015

International Journal of Machine Learning and Cybernetics 4/2015 Go to the issue