Skip to main content

2017 | OriginalPaper | Buchkapitel

Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data

verfasst von : Mārcis Pinnis, Rihards Krišlauks, Daiga Deksne, Toms Miks

Erschienen in: Text, Speech, and Dialogue

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper analyses issues of rare and unknown word splitting with byte pair encoding for neural machine translation and proposes two methods that allow improving the quality of word splitting. The first method linguistically guides byte pair encoding and the second method limits splitting of unknown words. We also evaluate corpus re-translation for a new language pair – English-Latvian. We show a significant improvement in translation quality over baseline systems in all reported experiments. We envision that the proposed methods will allow improving the translation of named entities and technical texts in production systems that often receive data not represented in the training corpus.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
2.
Zurück zum Zitat Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., Jimeno Yepes, A., Koehn, P., Logacheva, V., Monz, C., Negri, M., Neveol, A., Neves, M., Popel, M., Post, M., Rubino, R., Scarton, C., Specia, L., Turchi, M., Verspoor, K., Zampieri, M.: Findings of the 2016 conference on machine translation. In: Proceedings of the First Conference on Machine Translation (WMT 2016), vol. 2, pp. 131–198 (2016). Shared Task Papers Bojar, O., Chatterjee, R., Federmann, C., Graham, Y., Haddow, B., Huck, M., Jimeno Yepes, A., Koehn, P., Logacheva, V., Monz, C., Negri, M., Neveol, A., Neves, M., Popel, M., Post, M., Rubino, R., Scarton, C., Specia, L., Turchi, M., Verspoor, K., Zampieri, M.: Findings of the 2016 conference on machine translation. In: Proceedings of the First Conference on Machine Translation (WMT 2016), vol. 2, pp. 131–198 (2016). Shared Task Papers
4.
Zurück zum Zitat Devlin, J., Zbib, R., Huang, Z., Lamar, T., Schwartz, R.M., Makhoul, J.: Fast and robust neural network joint models for statistical machine translation. In: ACL (1), pp. 1370–1380. Citeseer (2014) Devlin, J., Zbib, R., Huang, Z., Lamar, T., Schwartz, R.M., Makhoul, J.: Fast and robust neural network joint models for statistical machine translation. In: ACL (1), pp. 1370–1380. Citeseer (2014)
5.
Zurück zum Zitat Dyer, C., Chahuneau, V., Smith, N.A.: A Simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of NAACL HLT 2013, pp. 644–648, Atlanta, June 2013 Dyer, C., Chahuneau, V., Smith, N.A.: A Simple, fast, and effective reparameterization of IBM model 2. In: Proceedings of NAACL HLT 2013, pp. 644–648, Atlanta, June 2013
6.
Zurück zum Zitat Firat, O., Cho, K., Bengio, Y.: Multi-way, multilingual neural machine translation with a shared attention mechanism. In: NAACL-HLT 2016, pp. 866–875 (2016) Firat, O., Cho, K., Bengio, Y.: Multi-way, multilingual neural machine translation with a shared attention mechanism. In: NAACL-HLT 2016, pp. 866–875 (2016)
7.
Zurück zum Zitat Girgždis, V., Kāle, M., Vaicekauskis, M., Zariņa, I., Skadiņa, I.: Tracing mistakes and finding gaps in automatic word alignments for Latvian-English translation. In: Proceedings of Baltic HLT 2014, pp. 87–94. IOS Press (2014) Girgždis, V., Kāle, M., Vaicekauskis, M., Zariņa, I., Skadiņa, I.: Tracing mistakes and finding gaps in automatic word alignments for Latvian-English translation. In: Proceedings of Baltic HLT 2014, pp. 87–94. IOS Press (2014)
8.
Zurück zum Zitat Jean, S., Firat, O., Cho, K., Memisevic, R., Bengio, Y.: Montreal neural machine translation systems for WMT15. In: Proceedings of WMT 2015, pp. 134–140 (2015) Jean, S., Firat, O., Cho, K., Memisevic, R., Bengio, Y.: Montreal neural machine translation systems for WMT15. In: Proceedings of WMT 2015, pp. 134–140 (2015)
9.
Zurück zum Zitat Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, pp. 177–180. Association for Computational Linguistics, Stroudsburg (2007) Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, pp. 177–180. Association for Computational Linguistics, Stroudsburg (2007)
10.
Zurück zum Zitat Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of NAACL HLT 2013, pp. 48–54. Association for Computational Linguistics (2003) Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of NAACL HLT 2013, pp. 48–54. Association for Computational Linguistics (2003)
11.
Zurück zum Zitat Lee, J., Cho, K., Hofmann, T.: Fully Character-Level Neural Machine Translation without Explicit Segmentation (2016) Lee, J., Cho, K., Hofmann, T.: Fully Character-Level Neural Machine Translation without Explicit Segmentation (2016)
12.
Zurück zum Zitat Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP 2015, pp. 1412–1421. Association for Computational Linguistics, Lisbon (2015) Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP 2015, pp. 1412–1421. Association for Computational Linguistics, Lisbon (2015)
13.
Zurück zum Zitat Meng, F., Lu, Z., Li, H., Liu, Q.: Interactive attention for neural machine translation. In: Proceedings of the 26th International Conference on Computational Linguistics, Osaka, Japan, pp. 2174–2185 (2016) Meng, F., Lu, Z., Li, H., Liu, Q.: Interactive attention for neural machine translation. In: Proceedings of the 26th International Conference on Computational Linguistics, Osaka, Japan, pp. 2174–2185 (2016)
14.
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002) Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
15.
Zurück zum Zitat Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: Proceedings of the First Conference on Machine Translation (WMT 2016), vol. 1, pp. 83–91 (2016). Research Papers Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: Proceedings of the First Conference on Machine Translation (WMT 2016), vol. 1, pp. 83–91 (2016). Research Papers
16.
Zurück zum Zitat Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the First Conference on Machine Translation (WMT 2016), vol. 2 (2016). Shared Task Papers Sennrich, R., Haddow, B., Birch, A.: Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the First Conference on Machine Translation (WMT 2016), vol. 2 (2016). Shared Task Papers
17.
Zurück zum Zitat Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany (2016) Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), Berlin, Germany (2016)
18.
Zurück zum Zitat Skadiņš, R., Goba, K., Šics, V.: Improving SMT for baltic languages with factored models. In: Proceedings of the Fourth International Conference on Human Language Technologies: The Baltic Perspective, Baltic HLT 2010, vol. 219, pp. 125–132. IOS Press (2010) Skadiņš, R., Goba, K., Šics, V.: Improving SMT for baltic languages with factored models. In: Proceedings of the Fourth International Conference on Human Language Technologies: The Baltic Perspective, Baltic HLT 2010, vol. 219, pp. 125–132. IOS Press (2010)
19.
Zurück zum Zitat Stanojevic, M., Sima’an, K.: BEER: BEtter evaluation as ranking. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 414–419 (2014) Stanojevic, M., Sima’an, K.: BEER: BEtter evaluation as ranking. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 414–419 (2014)
20.
Zurück zum Zitat Vasiļjevs, A., Skadiņš, R., Tiedemann, J.: LetsMT!: a cloud-based platform for do-it-yourself machine translation. In: Proceedings of the ACL 2012 System Demonstrations, pp. 43–48. Association for Computational Linguistics, Jeju Island (2012) Vasiļjevs, A., Skadiņš, R., Tiedemann, J.: LetsMT!: a cloud-based platform for do-it-yourself machine translation. In: Proceedings of the ACL 2012 System Demonstrations, pp. 43–48. Association for Computational Linguistics, Jeju Island (2012)
21.
Zurück zum Zitat Wang, W., Peter, J.T., Rosendahl, H., Ney, H.: CharacTER: translation edit rate on character level. In: Proceedings of the First Conference on Machine Translation (WMT 2016), Berlin, Germany, vol. 2, pp. 505–510 (2016). Shared Task Papers Wang, W., Peter, J.T., Rosendahl, H., Ney, H.: CharacTER: translation edit rate on character level. In: Proceedings of the First Conference on Machine Translation (WMT 2016), Berlin, Germany, vol. 2, pp. 505–510 (2016). Shared Task Papers
Metadaten
Titel
Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data
verfasst von
Mārcis Pinnis
Rihards Krišlauks
Daiga Deksne
Toms Miks
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64206-2_27

Premium Partner