Skip to main content
Erschienen in: Neural Computing and Applications 23/2020

10.05.2020 | Original Article

Similarity-aware neural machine translation: reducing human translator efforts by leveraging high-potential sentences with translation memory

verfasst von: Tianfu Zhang, Heyan Huang, Chong Feng, Xiaochi Wei

Erschienen in: Neural Computing and Applications | Ausgabe 23/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In computer-aided translation tasks, reducing the time of reviewing and post-editing on translations is meaningful for human translators. However, existing studies mainly aim to improve overall translation quality, which only reduces post-editing time. In this work, we firstly identify testing sentences which are highly similar to training set (high-potential sentences) to reduce reviewing time, then we focus on improving corresponding translation quality greatly to reduce post-editing time. From this point, we firstly propose two novel translation memory methods to characterize similarity between sentences on syntactic and template dimensions separately. Based on that, we propose a similarity-aware neural machine translation (similarity-NMT) which consists of two independent modules: (1) Identification Module, which can identify high-potential sentences of testing set according to multi-dimensional similarity information; (2) Translation Module, which can integrate multi-dimensional similarity information of parallel training sentence pairs into an attention-based NMT model by leveraging posterior regularization. Experiments on two Chinese \(\Rightarrow \) English domains have well-validated the effectiveness and universality of the proposed method of reducing human translator efforts.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
Reducing \(T_{\mathrm{Rh}}\) and \(T_{\mathrm{Ph}}\) to 0 is highly valuable for human translators. Even if it seems impossible theoretically, it could be approximated in industrial applications.
 
2
We will publish the data and code in the future.
 
3
we use the same search engine of Apache Lucene as [7].
 
4
LDC2004T08 is a subset of LDC corpus combination, which was used by [27].
 
Literatur
1.
Zurück zum Zitat Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473 Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR arXiv:​1409.​0473
6.
Zurück zum Zitat Ganchev K, Graça J, Gillenwater J, Taskar B (2010) Posterior regularization for structured latent variable models. J Mach Learn Res 11:2001–2049MathSciNetMATH Ganchev K, Graça J, Gillenwater J, Taskar B (2010) Posterior regularization for structured latent variable models. J Mach Learn Res 11:2001–2049MathSciNetMATH
11.
Zurück zum Zitat Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, IJCAI 95, Montréal Québec, Canada, 20–25 Aug 1995, 2 volumes, pp 1137–1145. http://ijcai.org/Proceedings/95-2/Papers/016.pdf Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence, IJCAI 95, Montréal Québec, Canada, 20–25 Aug 1995, 2 volumes, pp 1137–1145. http://​ijcai.​org/​Proceedings/​95-2/​Papers/​016.​pdf
12.
Zurück zum Zitat Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: Proceedings of WPTP, pp 11–20 Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: Proceedings of WPTP, pp 11–20
13.
Zurück zum Zitat Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707–710MathSciNet Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707–710MathSciNet
18.
Zurück zum Zitat Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation in the Americas, vol 200 Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation in the Americas, vol 200
20.
Zurück zum Zitat Temnikova IP (2010) Cognitive evaluation approach for a controlled language post-editing experiment. In: LREC, Citeseer Temnikova IP (2010) Cognitive evaluation approach for a controlled language post-editing experiment. In: LREC, Citeseer
24.
Zurück zum Zitat Wu Y et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR arXiv:1609.08144 Wu Y et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR arXiv:​1609.​08144
26.
Zurück zum Zitat Zhang J, Ding Y, Shen S, Cheng Y, Sun M, Luan H, Liu Y (2017) THUMT: an open source toolkit for neural machine translation. CoRR arXiv:1706.06415 Zhang J, Ding Y, Shen S, Cheng Y, Sun M, Luan H, Liu Y (2017) THUMT: an open source toolkit for neural machine translation. CoRR arXiv:​1706.​06415
Metadaten
Titel
Similarity-aware neural machine translation: reducing human translator efforts by leveraging high-potential sentences with translation memory
verfasst von
Tianfu Zhang
Heyan Huang
Chong Feng
Xiaochi Wei
Publikationsdatum
10.05.2020
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 23/2020
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-04939-y

Weitere Artikel der Ausgabe 23/2020

Neural Computing and Applications 23/2020 Zur Ausgabe