Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 5/2024

10.11.2023 | Original Article

Fine-tuning pretrained transformer encoders for sequence-to-sequence learning

verfasst von: Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Songhao Piao, Furu Wei

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 5/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we introduce s2s-ft, a method for adapting pretrained bidirectional Transformer encoders, such as BERT and RoBERTa, to sequence-to-sequence tasks like abstractive summarization and question generation. By employing a unified modeling approach and well-designed self-attention masks, s2s-ft leverages the generative capabilities of pretrained Transformer encoders without the need for an additional decoder. We conduct extensive experiments comparing three fine-tuning algorithms (causal fine-tuning, masked fine-tuning, and pseudo-masked fine-tuning) and various pretrained models for initialization. Results demonstrate that s2s-ft achieves strong performance across different tasks and languages. Additionally, the method is successfully extended to multilingual pretrained models, such as XLM-RoBERTa, and evaluated on multilingual generation tasks. Our work highlights the importance of reducing the discrepancy between masked language model pretraining and sequence-to-sequence fine-tuning and showcases the effectiveness and expansibility of the s2s-ft method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics, June, 2227–2237 Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics, June, 2227–2237
2.
Zurück zum Zitat Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training, Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training,
3.
Zurück zum Zitat Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners, Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners,
4.
Zurück zum Zitat Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, 4171–4186 Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, 4171–4186
5.
Zurück zum Zitat Yang Z, Dai Z, Yang Y, Carbonell JG, Salakhutdinov R, Le QV (2019) XLNet: Generalized autoregressive pretraining for language understanding, in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 5754–5764 Yang Z, Dai Z, Yang Y, Carbonell JG, Salakhutdinov R, Le QV (2019) XLNet: Generalized autoregressive pretraining for language understanding, in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 5754–5764
6.
Zurück zum Zitat Dong L, Yang N, Wang W, Wei F, Liu X, Wang Y, Gao J, Zhou M, Hon H (2019) Unified language model pre-training for natural language understanding and generation, in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 13 042–13 054 Dong L, Yang N, Wang W, Wei F, Liu X, Wang Y, Gao J, Zhou M, Hon H (2019) Unified language model pre-training for natural language understanding and generation, in Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 13 042–13 054
7.
Zurück zum Zitat Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized BERT pretraining approach,” CoRR, vol. abs/1907.11692, Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: A robustly optimized BERT pretraining approach,” CoRR, vol. abs/1907.11692,
8.
Zurück zum Zitat Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association for Computational Linguistics, 8440–8451 Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2020) Unsupervised cross-lingual representation learning at scale, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association for Computational Linguistics, 8440–8451
9.
Zurück zum Zitat Clark K, Luong M-T, Le QV, Manning CD (2020) ELECTRA: Pre-training text encoders as discriminators rather than generators, in ICLR, Clark K, Luong M-T, Le QV, Manning CD (2020) ELECTRA: Pre-training text encoders as discriminators rather than generators, in ICLR,
10.
Zurück zum Zitat Bao H, Dong L, Wei F, Wang W, Yang N, Liu X, Wang Y, Gao J, Piao S, Zhou M, Hon H (2020) Unilmv2: Pseudo-masked language models for unified language model pre-training, in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 119. PMLR, 642–652 Bao H, Dong L, Wei F, Wang W, Yang N, Liu X, Wang Y, Gao J, Piao S, Zhou M, Hon H (2020) Unilmv2: Pseudo-masked language models for unified language model pre-training, in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 119. PMLR, 642–652
11.
Zurück zum Zitat Taylor WL (1953) Cloze procedure: A new tool for measuring readability. Journalism Bulletin 30(4):415–433CrossRef Taylor WL (1953) Cloze procedure: A new tool for measuring readability. Journalism Bulletin 30(4):415–433CrossRef
12.
Zurück zum Zitat Wang A, Cho K (2019) BERT has a mouth, and it must speak: BERT as a markov random field language model,” CoRR, vol. abs/1902.04094, Wang A, Cho K (2019) BERT has a mouth, and it must speak: BERT as a markov random field language model,” CoRR, vol. abs/1902.04094,
13.
Zurück zum Zitat Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in International Conference on Learning Representations, Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in International Conference on Learning Representations,
14.
Zurück zum Zitat Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) SQuAD: 100,000+ questions for machine comprehension of text, in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, Nov. 2383–2392 Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) SQuAD: 100,000+ questions for machine comprehension of text, in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, Nov. 2383–2392
15.
Zurück zum Zitat Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for SQuAD, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, 784–789 Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for SQuAD, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, 784–789
16.
Zurück zum Zitat Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks, in NIPS, Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks, in NIPS,
17.
Zurück zum Zitat Liu Y, Lapata M (2019) Text summarization with pretrained encoders, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, Nov. 3728–3738 Liu Y, Lapata M (2019) Text summarization with pretrained encoders, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, Nov. 3728–3738
18.
Zurück zum Zitat Rothe S, Narayan S, Severyn A (2020) Leveraging pre-trained checkpoints for sequence generation tasks. Trans. Assoc. Comput. Linguistics 8:264–280CrossRef Rothe S, Narayan S, Severyn A (2020) Leveraging pre-trained checkpoints for sequence generation tasks. Trans. Assoc. Comput. Linguistics 8:264–280CrossRef
19.
Zurück zum Zitat Zou Y, Zhang X, Lu W, Wei F, Zhou M (2020) Pre-training for abstractive document summarization by reinstating source text,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, Nov. 3646–3660. [Online]. Available: https://aclanthology.org/2020.emnlp-main.297 Zou Y, Zhang X, Lu W, Wei F, Zhou M (2020) Pre-training for abstractive document summarization by reinstating source text,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, Nov. 3646–3660. [Online]. Available: https://​aclanthology.​org/​2020.​emnlp-main.​297
20.
Zurück zum Zitat Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association for Computational Linguistics, 7871–7880 Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2020) BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association for Computational Linguistics, 7871–7880
21.
Zurück zum Zitat Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21(1):5485–5551MathSciNet Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21(1):5485–5551MathSciNet
22.
Zurück zum Zitat Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Association for Computational Linguistics, 1797–1807 Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization, in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Association for Computational Linguistics, 1797–1807
23.
Zurück zum Zitat Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 1693–1701 Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 1693–1701
24.
Zurück zum Zitat Conneau A, Lample G (2019) Cross-lingual language model pretraining, in Advances in Neural Information Processing Systems. Curran Associates, Inc., 7057–7067 Conneau A, Lample G (2019) Cross-lingual language model pretraining, in Advances in Neural Information Processing Systems. Curran Associates, Inc., 7057–7067
25.
Zurück zum Zitat Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space,” 1st International Conference on Learning Representations, ICLR 2013, Workshop Track Proceedings, Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space,” 1st International Conference on Learning Representations, ICLR 2013, Workshop Track Proceedings,
26.
Zurück zum Zitat Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543 Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543
27.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need, in Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need, in Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 5998–6008
28.
Zurück zum Zitat Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets, in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds., vol. 27. Curran Associates, Inc., Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets, in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger, Eds., vol. 27. Curran Associates, Inc.,
29.
Zurück zum Zitat Chi Z, Dong L, Wei F, Yang N, Singhal S, Wang W, Song X, Mao X-L, Huang H-Y, Zhou M (2021) Infoxlm: An information-theoretic framework for cross-lingual language model pre-training,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3576–3588 Chi Z, Dong L, Wei F, Yang N, Singhal S, Wang W, Song X, Mao X-L, Huang H-Y, Zhou M (2021) Infoxlm: An information-theoretic framework for cross-lingual language model pre-training,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3576–3588
30.
Zurück zum Zitat Zheng B, Che W (2023) Improving cross-lingual language understanding with consistency regularization-based fine-tuning, International Journal of Machine Learning and Cybernetics, 1–19, Zheng B, Che W (2023) Improving cross-lingual language understanding with consistency regularization-based fine-tuning, International Journal of Machine Learning and Cybernetics, 1–19,
31.
Zurück zum Zitat Li Z, Peng Z, Tang S, Zhang C, Ma H (2020) Text summarization method based on double attention pointer network. IEEE Access 8:11 279-11 288CrossRef Li Z, Peng Z, Tang S, Zhang C, Ma H (2020) Text summarization method based on double attention pointer network. IEEE Access 8:11 279-11 288CrossRef
32.
Zurück zum Zitat Du X, Cardie C (2018) Harvesting paragraph-level question-answer pairs from wikipedia, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, I. Gurevych and Y. Miyao, Eds. Association for Computational Linguistics, 1907–1917 Du X, Cardie C (2018) Harvesting paragraph-level question-answer pairs from wikipedia, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, I. Gurevych and Y. Miyao, Eds. Association for Computational Linguistics, 1907–1917
33.
Zurück zum Zitat Song K, Tan X, Qin T, Lu J, Liu T (2019) MASS: masked sequence to sequence pre-training for language generation, in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 5926–5936 Song K, Tan X, Qin T, Lu J, Liu T (2019) MASS: masked sequence to sequence pre-training for language generation, in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 5926–5936
34.
Zurück zum Zitat Liu Y (2019) Fine-tune BERT for extractive summarization, CoRR, vol. abs/1903.10318, Liu Y (2019) Fine-tune BERT for extractive summarization, CoRR, vol. abs/1903.10318,
35.
Zurück zum Zitat Xian T, Li Z, Tang Z, Ma H (2022) Adaptive path selection for dynamic image captioning. IEEE Transactions on Circuits and Systems for Video Technology 32(9):5762–5775CrossRef Xian T, Li Z, Tang Z, Ma H (2022) Adaptive path selection for dynamic image captioning. IEEE Transactions on Circuits and Systems for Video Technology 32(9):5762–5775CrossRef
36.
Zurück zum Zitat Graves A (2013) Generating sequences with recurrent neural networks, CoRR, vol. abs/1308.0850, Graves A (2013) Generating sequences with recurrent neural networks, CoRR, vol. abs/1308.0850,
37.
Zurück zum Zitat Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8:64–77CrossRef Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) Spanbert: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics 8:64–77CrossRef
38.
Zurück zum Zitat Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2020) Transformers: State-of-the-art natural language processing, in Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38–45 Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2020) Transformers: State-of-the-art natural language processing, in Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38–45
39.
Zurück zum Zitat Lin C-Y (2004) ROUGE: A package for automatic evaluation of summaries, in Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. Barcelona, Spain: Association for Computational Linguistics, Jul. 74–81 Lin C-Y (2004) ROUGE: A package for automatic evaluation of summaries, in Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. Barcelona, Spain: Association for Computational Linguistics, Jul. 74–81
40.
Zurück zum Zitat Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: A method for automatic evaluation of machine translation, in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Jul. 311–318 Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: A method for automatic evaluation of machine translation, in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, Jul. 311–318
41.
Zurück zum Zitat Banerjee S, Lavie A (2005) METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor, Michigan: Association for Computational Linguistics, Jun 65–72 Banerjee S, Lavie A (2005) METEOR: An automatic metric for MT evaluation with improved correlation with human judgments, in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor, Michigan: Association for Computational Linguistics, Jun 65–72
42.
Zurück zum Zitat See A, Liu PJ, Manning CD (2017) Get to the point: Summarization with pointer-generator networks, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, Jul 1073–1083 See A, Liu PJ, Manning CD (2017) Get to the point: Summarization with pointer-generator networks, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada: Association for Computational Linguistics, Jul 1073–1083
43.
Zurück zum Zitat Zhou Q, Yang N, Wei F, Tan C, Bao H, Zhou M (2017) Neural question generation from text: A preliminary study, in Natural Language Processing and Chinese Computing - 6th CCF International Conference, NLPCC 2017, Dalian, China, November 8-12, 2017, Proceedings, 662–671 Zhou Q, Yang N, Wei F, Tan C, Bao H, Zhou M (2017) Neural question generation from text: A preliminary study, in Natural Language Processing and Chinese Computing - 6th CCF International Conference, NLPCC 2017, Dalian, China, November 8-12, 2017, Proceedings, 662–671
44.
Zurück zum Zitat Chi Z, Dong L, Wei F, Wang W, Mao X, Huang H (2020) Cross-lingual natural language generation via pre-training, in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, February 7-12, AAAI Press, 7570–7577 Chi Z, Dong L, Wei F, Wang W, Mao X, Huang H (2020) Cross-lingual natural language generation via pre-training, in The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, February 7-12, AAAI Press, 7570–7577
45.
Zurück zum Zitat Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725 Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725
46.
Zurück zum Zitat Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation, CoRR, vol. abs/1609.08144, Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation, CoRR, vol. abs/1609.08144,
47.
Zurück zum Zitat Kingma DP, Ba J (2015) Adam: A method for stochastic optimization, in 3rd International Conference on Learning Representations, San Diego, CA, Kingma DP, Ba J (2015) Adam: A method for stochastic optimization, in 3rd International Conference on Learning Representations, San Diego, CA,
48.
Zurück zum Zitat Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826
49.
Zurück zum Zitat Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization, CoRR, vol. abs/1705.04304, Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization, CoRR, vol. abs/1705.04304,
50.
Zurück zum Zitat Edunov S, Baevski A, Auli M (2019) Pre-trained language model representations for language generation, CoRR, vol. abs/1903.09722, Edunov S, Baevski A, Auli M (2019) Pre-trained language model representations for language generation, CoRR, vol. abs/1903.09722,
51.
Zurück zum Zitat Xiao D, Zhang H, Li Y, Sun Y, Tian H, Wu H, Wang H (2020) ERNIE-GEN: An enhanced multi-flow pre-training and fine-tuning framework for natural language generation, CoRR, vol. abs/2001.11314, Xiao D, Zhang H, Li Y, Sun Y, Tian H, Wu H, Wang H (2020) ERNIE-GEN: An enhanced multi-flow pre-training and fine-tuning framework for natural language generation, CoRR, vol. abs/2001.11314,
52.
Zurück zum Zitat Akiyama K, Tamura A, Ninomiya T (2021) Hie-bart: Document summarization with hierarchical bart, in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, 159–165 Akiyama K, Tamura A, Ninomiya T (2021) Hie-bart: Document summarization with hierarchical bart, in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, 159–165
53.
Zurück zum Zitat Du Z, Qian Y, Liu X, Ding M, Qiu J, Yang Z, Tang J (2022) Glm: General language model pretraining with autoregressive blank infilling, in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 320–335 Du Z, Qian Y, Liu X, Ding M, Qiu J, Yang Z, Tang J (2022) Glm: General language model pretraining with autoregressive blank infilling, in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 320–335
54.
Zurück zum Zitat Yan Y, Qi W, Gong Y, Liu D, Duan N, Chen J, Zhang R, Zhou M (2020) ProphetNet: Predicting future n-gram for sequence-to-sequence pre-training, CoRR, vol. abs/2001.04063, Yan Y, Qi W, Gong Y, Liu D, Duan N, Chen J, Zhang R, Zhou M (2020) ProphetNet: Predicting future n-gram for sequence-to-sequence pre-training, CoRR, vol. abs/2001.04063,
55.
Zurück zum Zitat Zhang J, Zhao Y, Saleh M, Liu PJ (2020) PEGASUS: pre-training with extracted gap-sentences for abstractive summarization, in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 119. PMLR, 11 328–11 339 Zhang J, Zhao Y, Saleh M, Liu PJ (2020) PEGASUS: pre-training with extracted gap-sentences for abstractive summarization, in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, ser. Proceedings of Machine Learning Research, vol. 119. PMLR, 11 328–11 339
56.
Zurück zum Zitat Zhao Y, Ni X, Ding Y, Ke Q (2018) “Paragraph-level neural question generation with maxout pointer and gated self-attention networks,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, Oct.-Nov. 3901–3910 Zhao Y, Ni X, Ding Y, Ke Q (2018) “Paragraph-level neural question generation with maxout pointer and gated self-attention networks,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, Oct.-Nov. 3901–3910
57.
Zurück zum Zitat Zhang S, Bansal M (2019) Addressing semantic drift in question generation for semi-supervised question answering, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, 2495–2509 Zhang S, Bansal M (2019) Addressing semantic drift in question generation for semi-supervised question answering, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, 2495–2509
Metadaten
Titel
Fine-tuning pretrained transformer encoders for sequence-to-sequence learning
verfasst von
Hangbo Bao
Li Dong
Wenhui Wang
Nan Yang
Songhao Piao
Furu Wei
Publikationsdatum
10.11.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 5/2024
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01992-6

Weitere Artikel der Ausgabe 5/2024

International Journal of Machine Learning and Cybernetics 5/2024 Zur Ausgabe

Neuer Inhalt