Skip to main content

2019 | OriginalPaper | Buchkapitel

Target Oriented Data Generation for Quality Estimation of Machine Translation

verfasst von : Huanqin Wu, Muyun Yang, Jiaqi Wang, Junguo Zhu, Tiejun Zhao

Erschienen in: Natural Language Processing and Chinese Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Quality estimation (QE) is a non-trivial issue for machine translation (MT) and the neural approach appears a promising solution to this task. Annotating QE training corpora is a costly process but necessary for supervised QE systems. To provide informative large scale training data for the MT quality estimation model, this paper proposes an approach to generate pseudo QE training data. By leveraging the provided labeled corpus in this task, our method generates pseudo training samples with a purpose of similar distribution of translation error of the labeled corpus. It also describes a sentence specific data expansion strategy to incrementally boost the model performance. The experiments on the different open datasets and models confirm the effectiveness of the method, and indicate that our proposed method can significantly improve the QE performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Specia, L., Turchi, M., Cancedda, N., Dymetman, M., Cristianini, N.: Estimating the sentence-level quality of machine translation systems. In: 13th Conference of the European Association for Machine Translation, pp. 28–37 (2009) Specia, L., Turchi, M., Cancedda, N., Dymetman, M., Cristianini, N.: Estimating the sentence-level quality of machine translation systems. In: 13th Conference of the European Association for Machine Translation, pp. 28–37 (2009)
2.
Zurück zum Zitat Kreutzer, J., Schamoni, S., Riezler, S.: Quality estimation from scratch (QUETCH): deep learning for word-level translation quality estimation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 316–322 (2015) Kreutzer, J., Schamoni, S., Riezler, S.: Quality estimation from scratch (QUETCH): deep learning for word-level translation quality estimation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation, pp. 316–322 (2015)
3.
Zurück zum Zitat Kim, H., Lee, J.H.: A recurrent neural networks approach for estimating the quality of machine translation output. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 494–498 (2016) Kim, H., Lee, J.H.: A recurrent neural networks approach for estimating the quality of machine translation output. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 494–498 (2016)
4.
Zurück zum Zitat Patel, R.N., Sasikumar, M.: Translation quality estimation using recurrent neural network. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 819–824 (2016) Patel, R.N., Sasikumar, M.: Translation quality estimation using recurrent neural network. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 819–824 (2016)
5.
Zurück zum Zitat Martins, A.F., Astudillo, R., Hokamp, C., Kepler, F.: Unbabel’s participation in the WMT16 word-level translation quality estimation shared task. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 806–811 (2016) Martins, A.F., Astudillo, R., Hokamp, C., Kepler, F.: Unbabel’s participation in the WMT16 word-level translation quality estimation shared task. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 806–811 (2016)
6.
Zurück zum Zitat Ive, J., Blain, F., Specia, L.: DeepQuest: a framework for neural-based quality estimation. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3146–3157 (2018) Ive, J., Blain, F., Specia, L.: DeepQuest: a framework for neural-based quality estimation. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3146–3157 (2018)
8.
Zurück zum Zitat Fan, K., Wang, J., Li, B., Zhou, F., Chen, B., Si, L.: “Bilingual Expert” can find translation errors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6367–6374, July 2019 Fan, K., Wang, J., Li, B., Zhou, F., Chen, B., Si, L.: “Bilingual Expert” can find translation errors. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6367–6374, July 2019
9.
Zurück zum Zitat Liu, L., et al.: Translation quality estimation using only bilingual corpora. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 25(9), 1762–1772 (2017) Liu, L., et al.: Translation quality estimation using only bilingual corpora. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 25(9), 1762–1772 (2017)
10.
Zurück zum Zitat Duma, M., Menzel, W.: The benefit of pseudo-reference translations in quality estimation of MT output. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp. 776–781, October 2018 Duma, M., Menzel, W.: The benefit of pseudo-reference translations in quality estimation of MT output. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp. 776–781, October 2018
11.
Zurück zum Zitat Albrecht, J., Hwa, R.: Regression for sentence-level MT evaluation with pseudo references. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 296–303, June 2007 Albrecht, J., Hwa, R.: Regression for sentence-level MT evaluation with pseudo references. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 296–303, June 2007
12.
Zurück zum Zitat Koehn, P.: A parallel corpus for statistical machine translation. In: Proceedings of the Third Workshop on Statistical Machine Translation, vol. 1, pp. 3–4 (2005) Koehn, P.: A parallel corpus for statistical machine translation. In: Proceedings of the Third Workshop on Statistical Machine Translation, vol. 1, pp. 3–4 (2005)
Metadaten
Titel
Target Oriented Data Generation for Quality Estimation of Machine Translation
verfasst von
Huanqin Wu
Muyun Yang
Jiaqi Wang
Junguo Zhu
Tiejun Zhao
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-32233-5_31