Skip to main content
Top

2018 | OriginalPaper | Chapter

Leveraging the Advantages of Associative Alignment Methods for PB-SMT Systems

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Training statistical machine translation systems used to require heavy computation times. It has been shown that approximations in the probabilistic approach could lead to impressing improvements (Fast align). We show that, by leveraging the advantages of the associative approach, we achieve similar, even faster, training times, while keeping comparable BLEU scores. Our contributions are of two types: of the engineering type, by introducing multi-processing both in sampling-based alignment and hierarchical sub-sentential alignment; of modeling type, by introducting approximations in hierarchical sub-sentential alignment that lead to important reductions in time without affecting the alignments produced. We test and compare our improvements on six typical language pairs of the Europarl corpus.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
4
Thanks to the authors for providing the source code.
 
5
train-model.perl –first step 4 .
 
6
Anymalign is an anytime process, and should be given a timeout.
 
7
Notice that, by definition: \( \mathrm{Ncut} (X, Y) = \mathrm{Ncut} (\bar{X}, \bar{Y}) \) and \( \mathrm{Ncut} (X, \bar{Y}) = \mathrm{Ncut} (\bar{X}, Y) \). The same holds for \( \mathrm{cut} \).
 
Literature
1.
go back to reference Ayan, N.F., Dorr, B.J.: Going beyond AER: an extensive analysis of word alignments and their impact on MT. In: Proceedings of COLING/ACL, pp. 9–16 (2006) Ayan, N.F., Dorr, B.J.: Going beyond AER: an extensive analysis of word alignments and their impact on MT. In: Proceedings of COLING/ACL, pp. 9–16 (2006)
2.
go back to reference Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993) Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
3.
go back to reference Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993) Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)
4.
go back to reference Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of HLT-NAACL, pp. 644–648 (2013) Dyer, C., Chahuneau, V., Smith, N.A.: A simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of HLT-NAACL, pp. 644–648 (2013)
5.
go back to reference Gale, W.A., Church, K.W.: Identifying word correspondences in parallel texts. In: Proceedings of the Workshop on Speech and Natural Language, vol. 91, pp. 152–157 (1991) Gale, W.A., Church, K.W.: Identifying word correspondences in parallel texts. In: Proceedings of the Workshop on Speech and Natural Language, vol. 91, pp. 152–157 (1991)
6.
go back to reference Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49–57 (2008) Gao, Q., Vogel, S.: Parallel implementations of word alignment tool. In: Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 49–57 (2008)
7.
go back to reference Gong, L., Max, A., Yvon, F.: Improving bilingual sub-sentential alignment by sampling-based transpotting. In: Proceedings of IWSLT, pp. 243–250 (2013) Gong, L., Max, A., Yvon, F.: Improving bilingual sub-sentential alignment by sampling-based transpotting. In: Proceedings of IWSLT, pp. 243–250 (2013)
8.
go back to reference Heafield, K.: Kenlm: faster and smaller language model queries. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 187–197 (2011) Heafield, K.: Kenlm: faster and smaller language model queries. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 187–197 (2011)
9.
go back to reference Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of Machine Translation Summit, vol. 5, pp. 79–86 (2005) Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of Machine Translation Summit, vol. 5, pp. 79–86 (2005)
10.
go back to reference Koehn, P., Axelrod, A., Birch, A., Callison-Burch, C., Osborne, M., Talbot, D., White, M.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of IWSLT, pp. 68–75 (2005) Koehn, P., Axelrod, A., Birch, A., Callison-Burch, C., Osborne, M., Talbot, D., White, M.: Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of IWSLT, pp. 68–75 (2005)
11.
go back to reference Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL (Poster sessions), pp. 177–180 (2007) Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL (Poster sessions), pp. 177–180 (2007)
12.
go back to reference Lardilleux, A., Yvon, F., Lepage, Y.: Hierarchical sub-sentential alignment with Anymalign. In: Proceedings of EAMT 2012, pp. 279–286 (2012) Lardilleux, A., Yvon, F., Lepage, Y.: Hierarchical sub-sentential alignment with Anymalign. In: Proceedings of EAMT 2012, pp. 279–286 (2012)
13.
go back to reference Lardilleux, A., Yvon, F., Lepage, Y.: Generalizing sampling-based multilingual alignment. Mach. Transl. 27(1), 1–23 (2013)CrossRef Lardilleux, A., Yvon, F., Lepage, Y.: Generalizing sampling-based multilingual alignment. Mach. Transl. 27(1), 1–23 (2013)CrossRef
14.
go back to reference Levenberg, A., Callison-Burch, C., Osborne, M.: Stream-based translation models for statistical machine translation. In: Proceedings of HLT-NAACL, pp. 394–402 (2010) Levenberg, A., Callison-Burch, C., Osborne, M.: Stream-based translation models for statistical machine translation. In: Proceedings of HLT-NAACL, pp. 394–402 (2010)
15.
go back to reference Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRef Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRef
16.
go back to reference Smaïli, K., Jamoussi, S., Langlois, D., Haton, J.P.: Statistical feature language model. In: Proceedings of ICSLP, pp. 1357–1360 (2004) Smaïli, K., Jamoussi, S., Langlois, D., Haton, J.P.: Statistical feature language model. In: Proceedings of ICSLP, pp. 1357–1360 (2004)
17.
go back to reference Zha, H., He, X., Ding, C., Simon, H., Gu, M.: Bipartite graph partitioning and data clustering. In: Proceedings of International Conference on Information and Knowledge Management, pp. 25–32 (2001) Zha, H., He, X., Ding, C., Simon, H., Gu, M.: Bipartite graph partitioning and data clustering. In: Proceedings of International Conference on Information and Knowledge Management, pp. 25–32 (2001)
Metadata
Title
Leveraging the Advantages of Associative Alignment Methods for PB-SMT Systems
Authors
Baosong Yang
Yves Lepage
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-93782-3_16

Premium Partner