Skip to main content
Top

2018 | OriginalPaper | Chapter

A Classifier-Based Preordering Approach for English-Vietnamese Statistical Machine Translation

Authors : Viet Hong Tran, Huyen Thuong Vu, Vinh Van Nguyen, Minh Le Nguyen

Published in: Computational Linguistics and Intelligent Text Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Reordering is of essential importance problem for phrase based statistical machine translation (SMT). In this paper, we propose an approach to automatically learn reordering rules as preprocessing step based on a dependency parser in phrase-based statistical machine translation for English to Vietnamese. We used dependency parsing and rules extracting from training the features-rich discriminative classifiers for reordering source-side sentences. We evaluated our approach on English-Vietnamese machine translation tasks, and showed that it outperform the baseline phrase-based SMT system.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, pp. 127–133 (2003) Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of HLT-NAACL 2003, Edmonton, Canada, pp. 127–133 (2003)
2.
go back to reference Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Comput. Linguist. 30(4), 417–449 (2004)CrossRefMATH Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Comput. Linguist. 30(4), 417–449 (2004)CrossRefMATH
3.
go back to reference Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, pp. 263–270, June 2005 Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, Michigan, pp. 263–270, June 2005
4.
go back to reference Zhang, Y., Zens, R., Ney, H.: Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of SSST, NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation, pp. 1–8 (2007) Zhang, Y., Zens, R., Ney, H.: Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In: Proceedings of SSST, NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation, pp. 1–8 (2007)
5.
go back to reference Collins, M., Koehn, P., Kucerová, I.: Clause restructuring for statistical machine translation. In: Proceedings of ACL 2005, Ann Arbor, USA, pp. 531–540 (2005) Collins, M., Koehn, P., Kucerová, I.: Clause restructuring for statistical machine translation. In: Proceedings of ACL 2005, Ann Arbor, USA, pp. 531–540 (2005)
6.
go back to reference Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL 2005, Ann Arbor, Michigan, USA, pp. 271–279 (2005) Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL 2005, Ann Arbor, Michigan, USA, pp. 271–279 (2005)
7.
go back to reference Xia, F., McCord, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of Coling 2004, Geneva, Switzerland, COLING, 23–27 August 2004, pp. 508–514 (2004) Xia, F., McCord, M.: Improving a statistical MT system with automatically learned rewrite patterns. In: Proceedings of Coling 2004, Geneva, Switzerland, COLING, 23–27 August 2004, pp. 508–514 (2004)
8.
go back to reference Xu, P., Kang, J., Ringgaard, M., Och, F.: Using a dependency parser to improve SMT for subject-object-verb languages. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, pp. 245–253. Association for Computational Linguistics, June 2009 Xu, P., Kang, J., Ringgaard, M., Och, F.: Using a dependency parser to improve SMT for subject-object-verb languages. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado, pp. 245–253. Association for Computational Linguistics, June 2009
9.
go back to reference Genzel, D.: Automatically learning source-side reordering rules for large scale machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics. COLING 2010, Stroudsburg, PA, USA, pp. 376–384. Association for Computational Linguistics (2010) Genzel, D.: Automatically learning source-side reordering rules for large scale machine translation. In: Proceedings of the 23rd International Conference on Computational Linguistics. COLING 2010, Stroudsburg, PA, USA, pp. 376–384. Association for Computational Linguistics (2010)
10.
go back to reference Lerner, U., Petrov, S.: Source-side classifier preordering for machine translation. In: EMNLP, pp. 513–523 (2013) Lerner, U., Petrov, S.: Source-side classifier preordering for machine translation. In: EMNLP, pp. 513–523 (2013)
11.
go back to reference Li, C.H., Li, M., Zhang, D., Li, M., Zhou, M., Guan, Y.: A probabilistic approach to syntax-based reordering for statistical machine translation. In: Annual Meeting-association for Computational Linguistics, vol. 45, p. 720 (2007) Li, C.H., Li, M., Zhang, D., Li, M., Zhou, M., Guan, Y.: A probabilistic approach to syntax-based reordering for statistical machine translation. In: Annual Meeting-association for Computational Linguistics, vol. 45, p. 720 (2007)
12.
go back to reference Yang, N., Li, M., Zhang, D., Yu, N.: A ranking-based approach to word reordering for statistical machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 912–920. Association for Computational Linguistics (2012) Yang, N., Li, M., Zhang, D., Yu, N.: A ranking-based approach to word reordering for statistical machine translation. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 912–920. Association for Computational Linguistics (2012)
13.
go back to reference Jehl, L., de Gispert, A., Hopkins, M., Byrne, B.: Source-side preordering for translation using logistic regression and depth-first branch-and-bound search. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 239–248. Association for Computational Linguistics, April 2014 Jehl, L., de Gispert, A., Hopkins, M., Byrne, B.: Source-side preordering for translation using logistic regression and depth-first branch-and-bound search. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 239–248. Association for Computational Linguistics, April 2014
14.
go back to reference Habash, N.: Syntactic preprocessing for statistical machine translation. In: Proceedings of the 11th MT Summit (2007) Habash, N.: Syntactic preprocessing for statistical machine translation. In: Proceedings of the 11th MT Summit (2007)
15.
go back to reference Cai, J., Utiyama, M., Sumita, E., Zhang, Y.: Dependency-based pre-ordering for Chinese-English machine translation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (2014) Cai, J., Utiyama, M., Sumita, E., Zhang, Y.: Dependency-based pre-ordering for Chinese-English machine translation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (2014)
16.
go back to reference Hoshino, S., Miyao, Y., Sudoh, K., Hayashi, K., Nagata, M.: Discriminative preordering meets kendall’s \(\uptau \) maximization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China, pp. 139–144. Association for Computational Linguistics, July 2015 Hoshino, S., Miyao, Y., Sudoh, K., Hayashi, K., Nagata, M.: Discriminative preordering meets kendall’s \(\uptau \) maximization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Beijing, China, pp. 139–144. Association for Computational Linguistics, July 2015
17.
go back to reference Nakagawa, T.: Efficient top-down BTG parsing for machine translation preordering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 208–218. Association for Computational Linguistics, July 2015 Nakagawa, T.: Efficient top-down BTG parsing for machine translation preordering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, pp. 208–218. Association for Computational Linguistics, July 2015
18.
go back to reference Wang, L.: Support Vector Machines: Theory and Applications, vol. 117. Springer Science & Business Media, Heidelberg (2005)CrossRefMATH Wang, L.: Support Vector Machines: Theory and Applications, vol. 117. Springer Science & Business Media, Heidelberg (2005)CrossRefMATH
19.
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef
20.
go back to reference Cer, D., de Marneffe, M.C., Jurafsky, D., Manning, C.D.: Parsing to stanford dependencies: trade-offs between speed and accuracy. In: 7th International Conference on Language Resources and Evaluation (LREC 2010) (2010) Cer, D., de Marneffe, M.C., Jurafsky, D., Manning, C.D.: Parsing to stanford dependencies: trade-offs between speed and accuracy. In: 7th International Conference on Language Resources and Evaluation (LREC 2010) (2010)
21.
go back to reference Tran, V.H., Nguyen, V.V., Nguyen, M.L.: Improving English-Vietnamese statistical machine translation using preprocessing dependency syntactic. In: Proceedings of the 2015 Conference of the Pacific Association for Computational Linguistics (Pacling 2015), pp. 115–121 (2015) Tran, V.H., Nguyen, V.V., Nguyen, M.L.: Improving English-Vietnamese statistical machine translation using preprocessing dependency syntactic. In: Proceedings of the 2015 Conference of the Pacific Association for Computational Linguistics (Pacling 2015), pp. 115–121 (2015)
22.
go back to reference Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Demonstration Session (2007) Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Demonstration Session (2007)
23.
go back to reference Nguyen, T.P., Shimazu, A., Ho, T.B., Nguyen, M.L., Nguyen, V.V.: A tree-to-string phrase-based model for statistical machine translation. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL 2008), Manchester, England. Coling 2008 Organizing Committee, pp. 143–150, August 2008 Nguyen, T.P., Shimazu, A., Ho, T.B., Nguyen, M.L., Nguyen, V.V.: A tree-to-string phrase-based model for statistical machine translation. In: Proceedings of the Twelfth Conference on Computational Natural Language Learning (CoNLL 2008), Manchester, England. Coling 2008 Organizing Committee, pp. 143–150, August 2008
24.
go back to reference Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of International Conference on Spoken Language Processing, vol. 29, pp. 901–904 (2002) Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of International Conference on Spoken Language Processing, vol. 29, pp. 901–904 (2002)
25.
go back to reference Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefMATH Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefMATH
Metadata
Title
A Classifier-Based Preordering Approach for English-Vietnamese Statistical Machine Translation
Authors
Viet Hong Tran
Huyen Thuong Vu
Vinh Van Nguyen
Minh Le Nguyen
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-75487-1_7

Premium Partner