Skip to main content
Top

2018 | OriginalPaper | Chapter

Forest to String Based Statistical Machine Translation with Hybrid Word Alignments

Authors : Santanu Pal, Sudip Kumar Naskar, Josef van Genabith

Published in: Computational Linguistics and Intelligent Text Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Forest to String Based Statistical Machine Translation (FSBSMT) is a forest-based tree sequence to string translation model for syntax based statistical machine translation. The model automatically learns tree sequence to string translation rules from a given word alignment estimated on a source-side-parsed bilingual parallel corpus. This paper presents a hybrid method which combines different word alignment methods and integrates them into an FSBSMT system. The hybrid word alignment provides the most informative alignment links to the FSBSMT system. We show that hybrid word alignment integrated into various experimental settings of FSBSMT provides considerable improvement over state-of-the-art Hierarchical Phrase based SMT (HPBSMT). The research also demonstrates that additional integration of Named Entities (NEs), their translations and Example Based Machine Translation (EBMT) phrases (all extracted from the bilingual parallel training data) into the system brings about further considerable performance improvements over the hybrid FSBSMT system. We apply our hybrid model to a distant language pair, English–Bengali. The proposed system achieves 78.5% relative (9.84 BLEU points absolute) improvement over baseline HPBSMT.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ayan, N.F., Dorr, B.J., Monz, C.: NeurAlign: combining word alignments using neural networks. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 65–72. Association for Computational Linguistics, Vancouver, October 2005 Ayan, N.F., Dorr, B.J., Monz, C.: NeurAlign: combining word alignments using neural networks. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 65–72. Association for Computational Linguistics, Vancouver, October 2005
2.
go back to reference Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day Company, Oakland (1977)MATH Bickel, P.J., Doksum, K.A.: Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day Company, Oakland (1977)MATH
3.
go back to reference Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. linguist. 19(2), 263–311 (1993) Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. linguist. 19(2), 263–311 (1993)
4.
go back to reference Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 427–436 (2012) Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 427–436 (2012)
5.
go back to reference Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270 (2005) Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270 (2005)
6.
go back to reference Chiang, D.: Hierarchical phrase-based translation. Comput. Linguist. 33(2), 201–228 (2007)CrossRefMATH Chiang, D.: Hierarchical phrase-based translation. Comput. Linguist. 33(2), 201–228 (2007)CrossRefMATH
7.
go back to reference Cicekli, I., Güvenir, H.A.: Learning translation templates from bilingual translation examples. Appl. Intell. 15(1), 57–76 (2001)CrossRefMATH Cicekli, I., Güvenir, H.A.: Learning translation templates from bilingual translation examples. Appl. Intell. 15(1), 57–76 (2001)CrossRefMATH
8.
go back to reference DeNero, J., Macherey, K.: Model-based aligner combination using dual decomposition. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 420–429. Association for Computational Linguistics, Stroudsburg (2011) DeNero, J., Macherey, K.: Model-based aligner combination using dual decomposition. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 420–429. Association for Computational Linguistics, Stroudsburg (2011)
9.
go back to reference Ding, Y., Palmer, M.: Machine translation using probabilistic synchronous dependency insertion grammars. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 541–548 (2005) Ding, Y., Palmer, M.: Machine translation using probabilistic synchronous dependency insertion grammars. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 541–548 (2005)
10.
go back to reference Ekbal, A., Bandyopadhyay, S.: Named entity recognition using support vector machine: a language independent approach. Int. J. Electr. Comput. Syst. Eng. 4(2), 155–170 (2010)MATH Ekbal, A., Bandyopadhyay, S.: Named entity recognition using support vector machine: a language independent approach. Int. J. Electr. Comput. Syst. Eng. 4(2), 155–170 (2010)MATH
11.
go back to reference Galley, M., Hopkins, M., Knight, K., Marcu, D.: What’s in a translation rule? In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 273–280. Association for Computational Linguistics, Boston (2004) Galley, M., Hopkins, M., Knight, K., Marcu, D.: What’s in a translation rule? In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 273–280. Association for Computational Linguistics, Boston (2004)
12.
go back to reference Graehl, J., Knight, K.: Training tree transducers. In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 105–112. Association for Computational Linguistics, Boston (2004) Graehl, J., Knight, K.: Training tree transducers. In: HLT-NAACL 2004: Main Proceedings, 2–7 May 2004, pp. 105–112. Association for Computational Linguistics, Boston (2004)
13.
go back to reference Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011) Heafield, K.: KenLM: faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197 (2011)
14.
go back to reference Huang, L.: Statistical syntax-directed translation with extended domain of locality. In: Proceedings of the AMTA 2006, pp. 66–73 (2006) Huang, L.: Statistical syntax-directed translation with extended domain of locality. In: Proceedings of the AMTA 2006, pp. 66–73 (2006)
15.
go back to reference Isozaki, H., Sudoh, K., Tsukada, H., Duh, K.: Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 244–251. Association for Computational Linguistics (2010) Isozaki, H., Sudoh, K., Tsukada, H., Duh, K.: Head finalization: a simple reordering rule for SOV languages. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 244–251. Association for Computational Linguistics (2010)
16.
go back to reference Junczys-Dowmunt, M., Szał, A.: SyMGiza++: symmetrized word alignment models for statistical machine translation. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 379–390. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25261-7_30 CrossRef Junczys-Dowmunt, M., Szał, A.: SyMGiza++: symmetrized word alignment models for statistical machine translation. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 379–390. Springer, Heidelberg (2012). https://​doi.​org/​10.​1007/​978-3-642-25261-7_​30 CrossRef
17.
go back to reference Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)MATH Koehn, P.: Statistical Machine Translation, 1st edn. Cambridge University Press, New York (2010)MATH
18.
go back to reference Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007) Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180 (2007)
19.
go back to reference Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54 (2003) Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 48–54 (2003)
20.
go back to reference Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231 (2007) Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 228–231 (2007)
21.
go back to reference Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 104–111 (2006) Liang, P., Taskar, B., Klein, D.: Alignment by agreement. In: Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL 2006, pp. 104–111 (2006)
22.
go back to reference Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pp. 609–616 (2006) Liu, Y., Liu, Q., Lin, S.: Tree-to-string alignment template for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pp. 609–616 (2006)
23.
go back to reference Liu, Y., Xia, T., Xiao, X., Liu, Q.: Weighted alignment matrices for statistical machine translation. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1017–1026. Association for Computational Linguistics, Singapore, August 2009 Liu, Y., Xia, T., Xiao, X., Liu, Q.: Weighted alignment matrices for statistical machine translation. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1017–1026. Association for Computational Linguistics, Singapore, August 2009
24.
go back to reference Marcu, D., Wang, W., Echihabi, A., Knight, K.: SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 44–52, July 2006 Marcu, D., Wang, W., Echihabi, A., Knight, K.: SPMT: statistical machine translation with syntactified target language phrases. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 44–52, July 2006
25.
go back to reference Mi, H., Huang, L.: Forest-based translation rule extraction. In: Proceedings of EMNLP, pp. 206–214. ACL (2008) Mi, H., Huang, L.: Forest-based translation rule extraction. In: Proceedings of EMNLP, pp. 206–214. ACL (2008)
26.
go back to reference Mi, H., Huang, L., Liu, Q.: Forest-based translation. In: Proceedings of ACL 2008: HLT, pp. 192–199. Association for Computational Linguistics, Columbus, June 2008 Mi, H., Huang, L., Liu, Q.: Forest-based translation. In: Proceedings of ACL 2008: HLT, pp. 192–199. Association for Computational Linguistics, Columbus, June 2008
27.
go back to reference Neubig, G.: Travatar: a forest-to-string machine translation engine based on tree transducers. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 91–96. Association for Computational Linguistics, Sofia (2013) Neubig, G.: Travatar: a forest-to-string machine translation engine based on tree transducers. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 91–96. Association for Computational Linguistics, Sofia (2013)
28.
go back to reference Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167 (2003) Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 160–167 (2003)
29.
go back to reference Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefMATH Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefMATH
30.
go back to reference Pal, S., Naskar, S.K., Pecina, P., Bandyopadhyay, S., Way, A.: Handling named entities and compound verbs in phrase-based statistical machine translation. In: Proceedings of the of Multiword Expression Workshop (MWE 2010) and the 23rd International Conference of Computational Linguistics (Coling 2010) (2010) Pal, S., Naskar, S.K., Pecina, P., Bandyopadhyay, S., Way, A.: Handling named entities and compound verbs in phrase-based statistical machine translation. In: Proceedings of the of Multiword Expression Workshop (MWE 2010) and the 23rd International Conference of Computational Linguistics (Coling 2010) (2010)
31.
go back to reference Pal, S., Naskar, S.K., Bandyopadhyay, S.: A hybrid word alignment model for phrase-based statistical machine translation. In: ACL 2013, pp. 94–101 (2013) Pal, S., Naskar, S.K., Bandyopadhyay, S.: A hybrid word alignment model for phrase-based statistical machine translation. In: ACL 2013, pp. 94–101 (2013)
32.
go back to reference Pal, S., Naskar, S.K., Bandyopadhyay, S.: Word alignment-based reordering of source chunks in PB-SMT. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), May 2014 Pal, S., Naskar, S.K., Bandyopadhyay, S.: Word alignment-based reordering of source chunks in PB-SMT. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), May 2014
33.
go back to reference Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 311–318 (2002) Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 311–318 (2002)
34.
go back to reference Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp. 271–279 (2005) Quirk, C., Menezes, A., Cherry, C.: Dependency treelet translation: syntactically informed phrasal SMT. In: Proceedings of ACL, pp. 271–279 (2005)
35.
go back to reference Shen, L., Xu, J., Weischedel, R.: A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of Association for Computational Linguistics, pp. 577–585 (2008) Shen, L., Xu, J., Weischedel, R.: A new string-to-dependency machine translation algorithm with a target dependency language model. In: Proceedings of Association for Computational Linguistics, pp. 577–585 (2008)
36.
go back to reference Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006) Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
37.
go back to reference Tan, L., Pal, S.: Manawi: using multi-word expressions and named entities to improve machine translation. In: Proceedings of Ninth Workshop on Statistical Machine Translation (2014) Tan, L., Pal, S.: Manawi: using multi-word expressions and named entities to improve machine translation. In: Proceedings of Ninth Workshop on Statistical Machine Translation (2014)
38.
go back to reference Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S.: Combining multiple alignments to improve machine translation. In: The 24th International Conference of Computational Linguistics (Coling 2012), pp. 1249–1260 (2012) Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S.: Combining multiple alignments to improve machine translation. In: The 24th International Conference of Computational Linguistics (Coling 2012), pp. 1249–1260 (2012)
39.
go back to reference Tu, Z., Liu, Y., Liu, Q., Lin, S.: Extracting hierarchical rules from a weighted alignment matrix. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1294–1303 (2011) Tu, Z., Liu, Y., Liu, Q., Lin, S.: Extracting hierarchical rules from a weighted alignment matrix. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1294–1303 (2011)
40.
go back to reference Vilar, D., Popovi, M., Ney, H.: AER: do we need to improve our alignments. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 205–212 (2006) Vilar, D., Popovi, M., Ney, H.: AER: do we need to improve our alignments. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 205–212 (2006)
41.
go back to reference Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 836–841. Association for Computational Linguistics (1996) Vogel, S., Ney, H., Tillmann, C.: Hmm-based word alignment in statistical translation. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 836–841. Association for Computational Linguistics (1996)
42.
go back to reference Wu, X., Matsuzaki, T., Tsujii, J.: Effective use of function words for rule generalization in forest-based translation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies, Portland, Oregon, USA, pp. 22–31, June 2011 Wu, X., Matsuzaki, T., Tsujii, J.: Effective use of function words for rule generalization in forest-based translation. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Techologies, Portland, Oregon, USA, pp. 22–31, June 2011
43.
go back to reference Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 523–530 (2001) Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 523–530 (2001)
44.
go back to reference Zhang, H., Zhang, M., Li, H., Aw, A., Tan, C.L.: Forest-based tree sequence to string translation model. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 172–180 (2009) Zhang, H., Zhang, M., Li, H., Aw, A., Tan, C.L.: Forest-based tree sequence to string translation model. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 172–180 (2009)
45.
go back to reference Zollmann, A., Venugopal, A.: Syntax augmented machine translation via chart parsing. In: Proceedings on the Workshop on Statistical Machine Translation, New York City, pp. 138–141, June 2006 Zollmann, A., Venugopal, A.: Syntax augmented machine translation via chart parsing. In: Proceedings on the Workshop on Statistical Machine Translation, New York City, pp. 138–141, June 2006
46.
go back to reference Zollmann, A., Venugopal, A., Paulik, M., Vogel, S.: The syntax augmented MT (SAMT) system for the shared task in the 2007 ACL workshop on statistical machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 216–219. Association for Computational Linguistics (2007) Zollmann, A., Venugopal, A., Paulik, M., Vogel, S.: The syntax augmented MT (SAMT) system for the shared task in the 2007 ACL workshop on statistical machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 216–219. Association for Computational Linguistics (2007)
Metadata
Title
Forest to String Based Statistical Machine Translation with Hybrid Word Alignments
Authors
Santanu Pal
Sudip Kumar Naskar
Josef van Genabith
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-75487-1_4

Premium Partner