Skip to main content
Top
Published in: Journal of Intelligent Information Systems 2/2011

01-10-2011

A vector-space dynamic feature for phrase-based statistical machine translation

Authors: Marta R. Costa-jussà, Rafael E. Banchs

Published in: Journal of Intelligent Information Systems | Issue 2/2011

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose and evaluate a novel dynamic feature function for log-linear model combinations in phrase-based statistical machine translation. The feature function is inspired on the popularly known vector-space model which is typically used in information retrieval and text mining applications, and it aims at improving translation unit selection at decoding time by incorporating context information from the source language. Significant improvements on an English-Spanish experimental corpus are presented and discussed.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
But it still falls within the range of acceptability based on the fact that other corpora of similar size are used. See for instance IWSLT International Evaluation Campaign (http://​mastarpj.​nict.​go.​jp/​IWSLT2009/​.
 
Literature
go back to reference Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. In Empirical methods in natural language processing (EMNLP) (pp. 61–72). Prague. Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. In Empirical methods in natural language processing (EMNLP) (pp. 61–72). Prague.
go back to reference Chew, P. A., Verzi, S. J., Bauer, T. L., & McClain, J. T. (2006). Evaluation of the bible as a resource for cross-language information retrieval. In Proceedings of the workshop on multilingual language resources and interoperability (pp. 68–74). Sydney, Australia. Chew, P. A., Verzi, S. J., Bauer, T. L., & McClain, J. T. (2006). Evaluation of the bible as a resource for cross-language information retrieval. In Proceedings of the workshop on multilingual language resources and interoperability (pp. 68–74). Sydney, Australia.
go back to reference Costa-jussà, M. R., & Fonollosa, J. A. R. (2009). State-of-the-art word reordering approaches in statistical machine translation. IEICE Transactions on Information and Systems, 92(11), 2179–2185.CrossRef Costa-jussà, M. R., & Fonollosa, J. A. R. (2009). State-of-the-art word reordering approaches in statistical machine translation. IEICE Transactions on Information and Systems, 92(11), 2179–2185.CrossRef
go back to reference Haque, R., Kumar Naskar, S., Ma, Y., & Way, A. (2009). Using supertags as source language context in smt. In 13th annual conference of the European association for machine translation (EAMT) (pp. 234–241). Barcelona. Haque, R., Kumar Naskar, S., Ma, Y., & Way, A. (2009). Using supertags as source language context in smt. In 13th annual conference of the European association for machine translation (EAMT) (pp. 234–241). Barcelona.
go back to reference Koehn, K., & Knight, K. (2003). Empirical methods for compound splitting. In Proc. of the 10th conf. of the European chapter of the association for computational linguistics (pp. 347–354). Budapest, Hungary. Koehn, K., & Knight, K. (2003). Empirical methods for compound splitting. In Proc. of the 10th conf. of the European chapter of the association for computational linguistics (pp. 347–354). Budapest, Hungary.
go back to reference Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proc. of the 45th annual meeting of the association for computational linguistics (pp. 177–180). Prague, Czech Republic. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proc. of the 45th annual meeting of the association for computational linguistics (pp. 177–180). Prague, Czech Republic.
go back to reference Och, F. J. (1999). An efficient method for determining bilingual word classes. In Proc. of the 9th conf. of the European chapter of the association for computational linguistics (pp. 71–76). Bergen, Norway. Och, F. J. (1999). An efficient method for determining bilingual word classes. In Proc. of the 9th conf. of the European chapter of the association for computational linguistics (pp. 71–76). Bergen, Norway.
go back to reference Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proc. of the 41th annual meeting of the association for computational linguistics (pp. 160–167). Sapporo. Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proc. of the 41th annual meeting of the association for computational linguistics (pp. 160–167). Sapporo.
go back to reference Och, F. J., & Ney, H. (2000). A comparison of alignment models for statistical machine translation. In Proc. of the 18th conference on computational linguistics (pp. 1086–1090). Morristown, USA. Och, F. J., & Ney, H. (2000). A comparison of alignment models for statistical machine translation. In Proc. of the 18th conference on computational linguistics (pp. 1086–1090). Morristown, USA.
go back to reference Och, F. J., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proc. of the 40th annual meeting of the association for computational linguistics (pp. 295–302). Philadelphia, USA. Och, F. J., & Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proc. of the 40th annual meeting of the association for computational linguistics (pp. 295–302). Philadelphia, USA.
go back to reference Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proc. of the 40th annual meeting of the association for computational linguistics (pp. 311–318). Philadelphia, PA. Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proc. of the 40th annual meeting of the association for computational linguistics (pp. 311–318). Philadelphia, PA.
go back to reference Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill. Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. McGraw-Hill.
go back to reference Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.MATHCrossRef Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.MATHCrossRef
go back to reference Schwenk, H., Costa-jussà, M. R., & Fonollosa, J. A. R. (2007). Smooth bilingual translation. In Empirical methods in natural language processing (EMNLP) (pp. 430–438). Prague. Schwenk, H., Costa-jussà, M. R., & Fonollosa, J. A. R. (2007). Smooth bilingual translation. In Empirical methods in natural language processing (EMNLP) (pp. 430–438). Prague.
go back to reference Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proc. of the 7th int. conf. on spoken language processing, ICSLP’02 (pp. 901–904). Denver, USA. Stolcke, A. (2002). SRILM—an extensible language modeling toolkit. In Proc. of the 7th int. conf. on spoken language processing, ICSLP’02 (pp. 901–904). Denver, USA.
go back to reference Stroppa, N., van de Bosch, A., & Way, A. (2007). Exploiting source similarity for smt using context-informed features. In 11th conference on theoretical and methodological issues in machine translation (TMI) (pp. 231–240). Skövde. Stroppa, N., van de Bosch, A., & Way, A. (2007). Exploiting source similarity for smt using context-informed features. In 11th conference on theoretical and methodological issues in machine translation (TMI) (pp. 231–240). Skövde.
Metadata
Title
A vector-space dynamic feature for phrase-based statistical machine translation
Authors
Marta R. Costa-jussà
Rafael E. Banchs
Publication date
01-10-2011
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 2/2011
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-010-0130-7

Other articles of this Issue 2/2011

Journal of Intelligent Information Systems 2/2011 Go to the issue

Premium Partner