Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2015

01.08.2015 | Theoretical Advances

Minimum Bayes’ risk subsequence combination for machine translation

verfasst von: Jesús González-Rubio, Francisco Casacuberta

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

System combination has proved to be a successful technique in the pattern recognition field. However, several difficulties arise when combining the outputs of tasks, e.g. machine translation, that generates structured patterns. So far, machine translation system combination approaches either implement sophisticated classifiers to select one of the provided translations, or generate new sentences by combining the “best” subsequences of the provided translations. We present minimum Bayes’ risk system combination (MBRSC), a system combination method for machine translation that gathers together the advantages of sentence-selection and subsequence-combination methods. MBRSC is able to detect and utilize the “best” subsequences of the provided translations to generate the optimal consensus translation with respect to a particular performance metric. Experiments show that MBRSC obtains significant improvements in translation quality, and a particularly competitive performance when applied to languages with scarce resources.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We will refer as \(n\)-gram to a sequence of \(n\) consecutive words in a sentence.
 
2
\(Pr(\cdot )\) denotes general probability distributions, \(P(\cdot )\) denotes model-based distributions, and \(\mathbb {E}_{Pr(X)}[X]\) denotes the expected value of a random variable \(X\) under distribution \(Pr(X)\).
 
3
The brevity penalty is also a function of \(n\)-gram counts: \(|{{\mathrm{\mathbf {y}}}}'|=\sum _{{{\mathrm{\mathbf {w}}}}\in {{\mathrm{\mathcal {W}}}}_1({{\mathrm{\mathbf {y}}}}')}\#_{{{\mathrm{\mathbf {w}}}}}({{\mathrm{\mathbf {y}}}}')\).
 
4
This can be done straightforwardly if the domain of translations is represented as a list. For more complex graph-based representations, we can use the algorithms proposed in [10, 11, 26].
 
5
Following the definition of the BLEU score (see previous section), we take into consideration \(n\)-grams up to size four.
 
6
The number is computed by the multiset coefficient [42] and it is exponential in the size of the target vocabulary.
 
7
The BLEU-based score cannot be computed incrementally due to the \(\text{ min }(\cdot )\) functions in its formulation.
 
9
Similarly as done in [2], we give \(p\) values on a logarithmic scale. Note that \(10^{-4}\) is the smallest possible \(p\) value that can be computed with \(9,999\) shuffles in the randomized test.
 
Literatur
1.
Zurück zum Zitat Bangalore S (2001) Computing consensus translation from multiple machine translation systems. In: IEEE automatic speech recognition and understanding workshop, pp 351–354 Bangalore S (2001) Computing consensus translation from multiple machine translation systems. In: IEEE automatic speech recognition and understanding workshop, pp 351–354
2.
Zurück zum Zitat Becker MA (2008) Active learning - an explicit treatment of unreliable parameters. Ph.D. thesis, University of Edinburgh Becker MA (2008) Active learning - an explicit treatment of unreliable parameters. Ph.D. thesis, University of Edinburgh
3.
Zurück zum Zitat Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonMATH Bellman R (1957) Dynamic programming. Princeton University Press, PrincetonMATH
4.
Zurück zum Zitat Bickel PJ, Doksum KA (1977) Mathematical statistics : basic ideas and selected topics. Holden-Day, San Francisco Bickel PJ, Doksum KA (1977) Mathematical statistics : basic ideas and selected topics. Holden-Day, San Francisco
5.
Zurück zum Zitat Callison-burch C, Flournoy RS (2001) A program for automatically selecting the best output from multiple machine translation engines. In: Proceedings of the VIII machine translation summit, pp 63–66 Callison-burch C, Flournoy RS (2001) A program for automatically selecting the best output from multiple machine translation engines. In: Proceedings of the VIII machine translation summit, pp 63–66
6.
Zurück zum Zitat Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the 3rd Workshop on statistical machine translation, Association for Computational Linguistics, pp 70–106 Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the 3rd Workshop on statistical machine translation, Association for Computational Linguistics, pp 70–106
7.
Zurück zum Zitat Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the 4th workshop on statistical machine translation, Association for Computational Linguistics, Athens, pp 1–28 Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the 4th workshop on statistical machine translation, Association for Computational Linguistics, Athens, pp 1–28
8.
Zurück zum Zitat Callison-Burch C, Koehn P, Monz C, Zaidan OF (eds) (2011) Proceedings of the 6th workshop on statistical machine translation. Association for Computational Linguistics, Edinburgh Callison-Burch C, Koehn P, Monz C, Zaidan OF (eds) (2011) Proceedings of the 6th workshop on statistical machine translation. Association for Computational Linguistics, Edinburgh
9.
Zurück zum Zitat Chinchor N (1992) The statistical significance of the muc-4 results. In: Proceedings of the conference on message understanding, pp 30–50 Chinchor N (1992) The statistical significance of the muc-4 results. In: Proceedings of the conference on message understanding, pp 30–50
10.
Zurück zum Zitat DeNero J, Chiang D, Knight K (2009) Fast consensus decoding over translation forests. In: Proceedings of the 47th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 567–575 DeNero J, Chiang D, Knight K (2009) Fast consensus decoding over translation forests. In: Proceedings of the 47th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 567–575
11.
Zurück zum Zitat DeNero J, Kumar S, Chelba C, Och F (2010) Model combination for machine translation. In: Proceedings of the 11th conference of the North American chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 975–983 DeNero J, Kumar S, Chelba C, Och F (2010) Model combination for machine translation. In: Proceedings of the 11th conference of the North American chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 975–983
12.
Zurück zum Zitat Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the 1st International workshop on multiple classifier systems, MCS ’00, Springer, pp 1–15 Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the 1st International workshop on multiple classifier systems, MCS ’00, Springer, pp 1–15
13.
Zurück zum Zitat Duan N, Li M, Zhang D, Zhou M (2010) Mixture model-based minimum bayes risk decoding using multiple machine translation systems. In: Proceedings of the 23rd conference on Computational Linguistics, pp 313–321 Duan N, Li M, Zhang D, Zhou M (2010) Mixture model-based minimum bayes risk decoding using multiple machine translation systems. In: Proceedings of the 23rd conference on Computational Linguistics, pp 313–321
14.
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkMATH Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkMATH
15.
Zurück zum Zitat Ehling N, Zens R, Ney H (2007) Minimum bayes risk decoding for bleu. In: Proceedings of the 45th annual aeeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 101–104 Ehling N, Zens R, Ney H (2007) Minimum bayes risk decoding for bleu. In: Proceedings of the 45th annual aeeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 101–104
16.
Zurück zum Zitat Fiscus JG (1997) A post-processing system to yield reduced Word error rates: recogniser output voting error reduction (ROVER). In: Proceedings IEEE Workshop on automatic speech recognition and understanding, pp 347–352 Fiscus JG (1997) A post-processing system to yield reduced Word error rates: recogniser output voting error reduction (ROVER). In: Proceedings IEEE Workshop on automatic speech recognition and understanding, pp 347–352
17.
Zurück zum Zitat González-Rubio J, Juan A, Casacuberta F (2011) Minimum bayes-risk system combination. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 1268–1277 González-Rubio J, Juan A, Casacuberta F (2011) Minimum bayes-risk system combination. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 1268–1277
18.
Zurück zum Zitat González-Rubio J, Casacuberta F (2011) The UPV-PRHLT combinatio nsystem for WMT 2011. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 1268–1277 González-Rubio J, Casacuberta F (2011) The UPV-PRHLT combinatio nsystem for WMT 2011. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics, pp 1268–1277
19.
Zurück zum Zitat He X, Toutanova K (2009) Joint optimization for machine translation system combination. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 1202–1211 He X, Toutanova K (2009) Joint optimization for machine translation system combination. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 1202–1211
20.
Zurück zum Zitat He X, Yang M, Gao J, Nguyen P, Moore R (2008) Indirect-hmm-based hypothesis alignment for combining outputs from machine translation systems. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 98–107 He X, Yang M, Gao J, Nguyen P, Moore R (2008) Indirect-hmm-based hypothesis alignment for combining outputs from machine translation systems. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 98–107
21.
Zurück zum Zitat Heafield K, Lavie A (2011) Cmu system combination in wmt 2011. In: Proceedings of the 6th workshop on statistical machine translation, Association for Computational Linguistics, Edinburgh, pp 145–151 Heafield K, Lavie A (2011) Cmu system combination in wmt 2011. In: Proceedings of the 6th workshop on statistical machine translation, Association for Computational Linguistics, Edinburgh, pp 145–151
22.
Zurück zum Zitat Jayaraman S, Lavie A (2005) Multi-engine machine translation guided by explicit word matching. In: Proceeding of the 10th conference of the European Association for Machine Translation, pp 143–152 Jayaraman S, Lavie A (2005) Multi-engine machine translation guided by explicit word matching. In: Proceeding of the 10th conference of the European Association for Machine Translation, pp 143–152
23.
Zurück zum Zitat Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge
26.
Zurück zum Zitat Kumar S, Macherey W, Dyer C, Och F (2009) Efficient minimum error rate training and minimum bayes-risk decoding for translation hypergraphs and lattices. In: Proceedings of the 47th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 163–171 Kumar S, Macherey W, Dyer C, Och F (2009) Efficient minimum error rate training and minimum bayes-risk decoding for translation hypergraphs and lattices. In: Proceedings of the 47th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp 163–171
27.
28.
Zurück zum Zitat Larkey LS, Croft BW (1996) Combining classifiers in text categorization. In: Frei HP, Harman D, Schäuble P, Wilkinson R (eds) Proceedings of the 19th ACM International Conference on Research and Development in Information Retrieval. ACM Press, New York, pp 289–297 Larkey LS, Croft BW (1996) Combining classifiers in text categorization. In: Frei HP, Harman D, Schäuble P, Wilkinson R (eds) Proceedings of the 19th ACM International Conference on Research and Development in Information Retrieval. ACM Press, New York, pp 289–297
29.
Zurück zum Zitat Leusch G, Freitag M, Ney H (2011) The rwth system combination system for wmt 2011. In: Proceedings of the 6th workshop on Statistical Machine Translation, Association for Computational Linguistics, Edinburgh, pp 152–158 Leusch G, Freitag M, Ney H (2011) The rwth system combination system for wmt 2011. In: Proceedings of the 6th workshop on Statistical Machine Translation, Association for Computational Linguistics, Edinburgh, pp 152–158
30.
Zurück zum Zitat Matusov E, Leusch G, Banchs RE, Bertoldi N, Dechelotte D, Federico M, Kolss M, suk Lee Y, no JBM, Paulik M, Roukos S, Schwenk H, Ney H (2008) System combination for machine translation of spoken and written language. IEEE Trans Audio Speech Lang Process 16:1222–1237CrossRef Matusov E, Leusch G, Banchs RE, Bertoldi N, Dechelotte D, Federico M, Kolss M, suk Lee Y, no JBM, Paulik M, Roukos S, Schwenk H, Ney H (2008) System combination for machine translation of spoken and written language. IEEE Trans Audio Speech Lang Process 16:1222–1237CrossRef
31.
Zurück zum Zitat Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313CrossRefMATH Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313CrossRefMATH
33.
Zurück zum Zitat Nomoto T (2004) Multi-engine machine translation with voted language model. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 494–501 Nomoto T (2004) Multi-engine machine translation with voted language model. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 494–501
34.
Zurück zum Zitat Noreen E (1989) Computer-intensive methods for testing hypotheses: an introduction. A wiley interscience publication. Wiley, New York Noreen E (1989) Computer-intensive methods for testing hypotheses: an introduction. A wiley interscience publication. Wiley, New York
35.
Zurück zum Zitat Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 160–167 Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 160–167
36.
Zurück zum Zitat Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 311–318 Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp 311–318
37.
Zurück zum Zitat Paul M, Doi T, Hwang Y, Imamura K, Okuma H, Sumita E (2005) Nobody is perfect: atr’s hybrid approach to spoken language translation. In: Proceedings of the 2005 International Workshop on spoken language translation, pp 55–62 Paul M, Doi T, Hwang Y, Imamura K, Okuma H, Sumita E (2005) Nobody is perfect: atr’s hybrid approach to spoken language translation. In: Proceedings of the 2005 International Workshop on spoken language translation, pp 55–62
38.
Zurück zum Zitat Rosti A, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr B (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the 6th conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 228–235 Rosti A, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr B (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the 6th conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 228–235
39.
Zurück zum Zitat Rosti A, Zhang B, Matsoukas S, Schwartz R (2011) Expected bleu training for graphs: Bbn system description for wmt11 system combination task. In: Proceedings of the 6th workshop on statistical machine translation, Association for Computational Linguistics, pp 159–165 Rosti A, Zhang B, Matsoukas S, Schwartz R (2011) Expected bleu training for graphs: Bbn system description for wmt11 system combination task. In: Proceedings of the 6th workshop on statistical machine translation, Association for Computational Linguistics, pp 159–165
40.
Zurück zum Zitat Roth D, Zelenko D (1998) Part of speech tagging using a network of linear separators. In: Proceedings of the 17th international conference on Computational linguistics - Volume 2, COLING ’98, Association for Computational Linguistics, pp 1136–1142 Roth D, Zelenko D (1998) Part of speech tagging using a network of linear separators. In: Proceedings of the 17th international conference on Computational linguistics - Volume 2, COLING ’98, Association for Computational Linguistics, pp 1136–1142
41.
Zurück zum Zitat Snover M, Dorr B, Schwartz R, Micciulla L, Weischedel R (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the Association for Machine Transaltion in the Americas, pp 223–231 Snover M, Dorr B, Schwartz R, Micciulla L, Weischedel R (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the Association for Machine Transaltion in the Americas, pp 223–231
42.
Zurück zum Zitat Stanley R (2002) Enumerative combinatorics. Cambridge studies in advanced mathematics. Cambridge University Press, Cambridge Stanley R (2002) Enumerative combinatorics. Cambridge studies in advanced mathematics. Cambridge University Press, Cambridge
43.
Zurück zum Zitat Udupa R, Maji HK (2006) Computational complexity of statistical machine translation. In: McCarthy D, Wintner S (eds) Proceedings of the European Chapter of the Association for Computational Linguistics. The Association for Computer Linguistics. http://acl.ldc.upenn.edu/E/E06/E06-1004 Udupa R, Maji HK (2006) Computational complexity of statistical machine translation. In: McCarthy D, Wintner S (eds) Proceedings of the European Chapter of the Association for Computational Linguistics. The Association for Computer Linguistics. http://​acl.​ldc.​upenn.​edu/​E/​E06/​E06-1004
44.
Zurück zum Zitat Xu D, Cao Y, Karakos D (2011) Description of the jhu system combination scheme for wmt 2011. In: Proceedings of the 6th workshop on Statistical Machine Translation, Association for Computational Linguistics, pp 171–176 Xu D, Cao Y, Karakos D (2011) Description of the jhu system combination scheme for wmt 2011. In: Proceedings of the 6th workshop on Statistical Machine Translation, Association for Computational Linguistics, pp 171–176
Metadaten
Titel
Minimum Bayes’ risk subsequence combination for machine translation
verfasst von
Jesús González-Rubio
Francisco Casacuberta
Publikationsdatum
01.08.2015
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2015
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-014-0387-5

Weitere Artikel der Ausgabe 3/2015

Pattern Analysis and Applications 3/2015 Zur Ausgabe

Industrial and Commercial Application

Image-based logical document structure recognition

Premium Partner