nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Improving Non-autoregressive Machine Translation with Soft-Masking

verfasst von : Shuheng Wang, Shumin Shi, Heyan Huang

Erschienen in: Natural Language Processing and Chinese Computing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In recent years, non-autoregressive machine translation has achieved great success due to its promising inference speedup. Non-autoregressive machine translation reduces the decoding latency by generating the target words in single-pass. However, there is a considerable gap in the accuracy between non-autoregressive machine translation and autoregressive machine translation. Because it removes the dependencies between the target words, non-autoregressive machine translation tends to generate repetitive words or wrong words, and these repetitive or wrong words lead to low performance. In this paper, we introduce a soft-masking method to alleviate this issue. Specifically, we introduce an autoregressive discriminator, which will output the probabilities hinting which embeddings are correct. Then according to the probabilities, we add mask on the copied representations, which enables the model to consider which words are easy to be predicted. We evaluated our method on three benchmarks, including WMT14 EN \(\rightarrow \) DE, WMT16 EN \(\rightarrow \) RO, and IWSLT14 DE \(\rightarrow \) EN. The experimental results demonstrate that our method can outperform the baseline by a large margin with a bit of speed sacrifice.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Adaptive Transformer for Multilingual Neural Machine Translation

Nächstes Kapitel AutoNLU: Architecture Search for Sentence and Cross-sentence Attention Modeling with Re-designed Search Space

The discriminator is made up of a single Transformer decoder block. And the hidden size of the discriminator is consistent with NAT decoder.

https://github.com/moses-smt/mosesdecoder.

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.N.: Convolutional sequence to sequence learning. In: ICML (2017)

Ghazvininejad, M., Levy, O., Liu, Y., Zettlemoyer, L.: Mask-predict: parallel decoding of conditional masked language models. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6114–6123 (2019)

Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher, R.: Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281 (2017)

Guo, J., Xu, L., Chen, E.: Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 376–385 (2020)

Kasai, J., Pappas, N., Peng, H., Cross, J., Smith, N.A.: Deep encoder, shallow decoder: reevaluating the speed-quality tradeoff in machine translation. arXiv preprint arXiv:2006.10369 (2020)

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Lee, J., Mansimov, E., Cho, K.: Deterministic non-autoregressive neural sequence modeling by iterative refinement. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1173–1182 (2018)

Li, Z., He, D., Tian, F., Qin, T., Wang, L., Liu, T.Y.: Hint-based training for non-autoregressive translation (2018)

10.

Liu, J., et al.: Task-level curriculum learning for non-autoregressive neural machine translation. arXiv preprint arXiv:2007.08772 (2020)

11.

Ma, X., Zhou, C., Li, X., Neubig, G., Hovy, E.: FlowSeq: non-autoregressive conditional sequence generation with generative flow. arXiv preprint arXiv:1909.02480 (2019)

12.

Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. In: Proceedings of NAACL-HLT 2019: Demonstrations (2019)

13.

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

14.

Qian, L., et al.: Glancing transformer for non-autoregressive neural machine translation. arXiv preprint arXiv:2008.07905 (2020)

15.

Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725 (2016)

16.

Shao, C., Zhang, J., Feng, Y., Meng, F., Zhou, J.: Minimizing the bag-of-n-grams difference for non-autoregressive neural machine translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 198–205 (2020)

17.

Sun, Z., Li, Z., Wang, H., He, D., Lin, Z., Deng, Z.: Fast structured decoding for sequence models. In: Advances in Neural Information Processing Systems, vol. 32, pp. 3016–3026 (2019)

18.

Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

19.

Wang, Y., Tian, F., He, D., Qin, T., Zhai, C., Liu, T.Y.: Non-autoregressive machine translation with auxiliary regularization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5377–5384 (2019)

20.

Xie, P., Cui, Z., Chen, X., Hu, X., Cui, J., Wang, B.: Infusing sequential information into conditional masked translation model with self-review mechanism. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 15–25 (2020)

Titel: Improving Non-autoregressive Machine Translation with Soft-Masking
verfasst von: Shuheng Wang
Shumin Shi
Heyan Huang
Verlag: Springer International Publishing
Buch: Natural Language Processing and Chinese Computing
Print ISBN: 978-3-030-88479-6

Electronic ISBN: 978-3-030-88480-2

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-88480-2_12

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"