Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 10/2023

27.05.2023 | Original Article

Improving cross-lingual language understanding with consistency regularization-based fine-tuning

verfasst von: Bo Zheng, Wanxiang Che

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 10/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Fine-tuning pre-trained cross-lingual language models alleviates the need for annotated data in different languages, as it allows the models to transfer task-specific supervision between languages, especially from high- to low-resource languages. In this work, we propose to improve cross-lingual language understanding with consistency regularization-based fine-tuning. Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e., subword sampling, Gaussian noise, code-switch substitution, and machine translation. In addition, we employ model consistency to regularize the models trained with two augmented versions of the same training set. Experimental results on the XTREME benchmark show that our method (the code is available at https://​github.​com/​bozheng-hit/​xTune)  achieves significant improvements across various cross-lingual language understanding tasks, including text classification, question answering, and sequence labeling. Furthermore, we extend our method to the few-shot cross-lingual transfer setting, particularly considering a more realistic setting where machine translation systems are available. Meanwhile, machine translation as data augmentation can be well combined with our consistency regularization method. Experimental results demonstrate that our method also benefits the few-shot scenario.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
We define conventional cross-lingual fine-tuning as fine-tuning the pre-trained cross-lingual model with the labeled training set in the source language only (typically English) or with labeled training sets in all languages.
 
2
Implemented by .detach() in PyTorch.
 
5
X-STILTs [39] uses additional SQuAD v1.1 English training data for the TyDiQA-GoldP dataset, while we prefer a cleaner setting here.
 
6
FILTER directly selects the best model on the test set of XQuAD and TyDiQA-GoldP. Under this setting, we can obtain 83.1/69.7 for XQuAD, 75.5/61.1 for TyDiQA-GoldP.
 
7
For span extraction datasets, to align the labels, the answers are enclosed in quotes before translating, which makes it easy to extract answers from translated context [30]. This method can also be applied to NER tasks. However, aligning label information requires complex post-processing, and there can be alignment errors.
 
8
Paragraphs in XQuAD contains more question-answer pairs than MLQA.
 
Literatur
1.
Zurück zum Zitat Aghajanyan A, Shrivastava A, Gupta A, et al (2020) Better fine-tuning by reducing representational collapse. CoRR. arXiv:2008.03156 Aghajanyan A, Shrivastava A, Gupta A, et al (2020) Better fine-tuning by reducing representational collapse. CoRR. arXiv:​2008.​03156
2.
Zurück zum Zitat Artetxe M, Ruder S, Yogatama D (2020) On the cross-lingual transferability of monolingual representations. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 4623–4637. https://www.aclweb.org/anthology/2020.acl-main.421/ Artetxe M, Ruder S, Yogatama D (2020) On the cross-lingual transferability of monolingual representations. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 4623–4637. https://​www.​aclweb.​org/​anthology/​2020.​acl-main.​421/​
3.
Zurück zum Zitat Athiwaratkun B, Finzi M, Izmailov P, et al (2019) There are many consistent explanations of unlabeled data: why you should average. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9. OpenReview.net, https://openreview.net/forum?id=rkgKBhA5Y7 Athiwaratkun B, Finzi M, Izmailov P, et al (2019) There are many consistent explanations of unlabeled data: why you should average. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9. OpenReview.net, https://​openreview.​net/​forum?​id=​rkgKBhA5Y7
5.
Zurück zum Zitat Chi Z, Dong L, Wei F, et al (2020) InfoXLM: an information-theoretic framework for cross-lingual language model pre-training. CoRR. arXiv:2007.07834 Chi Z, Dong L, Wei F, et al (2020) InfoXLM: an information-theoretic framework for cross-lingual language model pre-training. CoRR. arXiv:​2007.​07834
6.
Zurück zum Zitat Chi Z, Dong L, Zheng B, et al (2021) Improving pretrained cross-lingual language models via self-labeled word alignment. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (vol 1: Long Papers), Virtual Event, August 1–6, 2021. Association for Computational Linguistics, pp 3418–3430. https://doi.org/10.18653/v1/2021.acl-long.265 Chi Z, Dong L, Zheng B, et al (2021) Improving pretrained cross-lingual language models via self-labeled word alignment. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (vol 1: Long Papers), Virtual Event, August 1–6, 2021. Association for Computational Linguistics, pp 3418–3430. https://​doi.​org/​10.​18653/​v1/​2021.​acl-long.​265
7.
Zurück zum Zitat Chi Z, Huang S, Dong L, et al (2022) XLM-E: cross-lingual language model pre-training via ELECTRA. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th annual meeting of the association for computational linguistics (vol 1: Long Papers), ACL 2022, Dublin, Ireland, May 22–27, 2022. Association for Computational Linguistics, pp 6170–6182. https://doi.org/10.18653/v1/2022.acl-long.427 Chi Z, Huang S, Dong L, et al (2022) XLM-E: cross-lingual language model pre-training via ELECTRA. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th annual meeting of the association for computational linguistics (vol 1: Long Papers), ACL 2022, Dublin, Ireland, May 22–27, 2022. Association for Computational Linguistics, pp 6170–6182. https://​doi.​org/​10.​18653/​v1/​2022.​acl-long.​427
8.
Zurück zum Zitat Chung HW, Garrette D, Tan KC, et al (2020) Improving multilingual models with language-clustered vocabularies. In: Webber B, Cohn T, He Y, et al (eds) Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, November 16–20, 2020. Association for Computational Linguistics, pp 4536–4546. https://doi.org/10.18653/v1/2020.emnlp-main.367 Chung HW, Garrette D, Tan KC, et al (2020) Improving multilingual models with language-clustered vocabularies. In: Webber B, Cohn T, He Y, et al (eds) Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, November 16–20, 2020. Association for Computational Linguistics, pp 4536–4546. https://​doi.​org/​10.​18653/​v1/​2020.​emnlp-main.​367
11.
Zurück zum Zitat Conneau A, Rinott R, Lample G, et al (2018) XNLI: evaluating cross-lingual sentence representations. In: Riloff E, Chiang D, Hockenmaier J, et al (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 2475–2485. https://doi.org/10.18653/v1/d18-1269 Conneau A, Rinott R, Lample G, et al (2018) XNLI: evaluating cross-lingual sentence representations. In: Riloff E, Chiang D, Hockenmaier J, et al (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 2475–2485. https://​doi.​org/​10.​18653/​v1/​d18-1269
12.
Zurück zum Zitat Conneau A, Khandelwal K, Goyal N, et al (2020a) Unsupervised cross-lingual representation learning at scale. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 8440–8451. http://www.aclweb.org/anthology/2020.acl-main.747/ Conneau A, Khandelwal K, Goyal N, et al (2020a) Unsupervised cross-lingual representation learning at scale. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 8440–8451. http://​www.​aclweb.​org/​anthology/​2020.​acl-main.​747/​
13.
Zurück zum Zitat Conneau A, Wu S, Li H, et al (2020b) Emerging cross-lingual structure in pretrained language models. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 6022–6034. https://www.aclweb.org/anthology/2020.acl-main.536/ Conneau A, Wu S, Li H, et al (2020b) Emerging cross-lingual structure in pretrained language models. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 6022–6034. https://​www.​aclweb.​org/​anthology/​2020.​acl-main.​536/​
14.
Zurück zum Zitat Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American chapter of the Association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, vol 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423 Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 conference of the North American chapter of the Association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, vol 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171–4186. https://​doi.​org/​10.​18653/​v1/​n19-1423
15.
Zurück zum Zitat Fang Y, Wang S, Gan Z, et al (2020) FILTER: an enhanced fusion method for cross-lingual language understanding. CoRR. arXiv:2009.05166 Fang Y, Wang S, Gan Z, et al (2020) FILTER: an enhanced fusion method for cross-lingual language understanding. CoRR. arXiv:​2009.​05166
16.
Zurück zum Zitat Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Bouma G, Parmentier Y (eds) Proceedings of the 14th conference of the European chapter of the association for computational linguistics, EACL 2014, April 26–30, 2014, Gothenburg, Sweden. The Association for Computer Linguistics, pp 462–471. https://doi.org/10.3115/v1/e14-1049 Faruqui M, Dyer C (2014) Improving vector space word representations using multilingual correlation. In: Bouma G, Parmentier Y (eds) Proceedings of the 14th conference of the European chapter of the association for computational linguistics, EACL 2014, April 26–30, 2014, Gothenburg, Sweden. The Association for Computer Linguistics, pp 462–471. https://​doi.​org/​10.​3115/​v1/​e14-1049
17.
Zurück zum Zitat Fei H, Zhang M, Ji D (2020) Cross-lingual semantic role labeling with high-quality translated training corpus. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 7014–7026. http://www.aclweb.org/anthology/2020.acl-main.627/ Fei H, Zhang M, Ji D (2020) Cross-lingual semantic role labeling with high-quality translated training corpus. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 7014–7026. http://​www.​aclweb.​org/​anthology/​2020.​acl-main.​627/​
18.
Zurück zum Zitat Gao T, Han X, Xie R, et al (2020) Neural snowball for few-shot relation learning. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020. AAAI Press, pp 7772–7779. http://ojs.aaai.org/index.php/AAAI/article/view/6281 Gao T, Han X, Xie R, et al (2020) Neural snowball for few-shot relation learning. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020. AAAI Press, pp 7772–7779. http://​ojs.​aaai.​org/​index.​php/​AAAI/​article/​view/​6281
19.
Zurück zum Zitat Guo J, Che W, Yarowsky D, et al (2015) Cross-lingual dependency parsing based on distributed representations. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian federation of natural language processing, ACL 2015, July 26–31, 2015, Beijing, China, vol 1: Long Papers. The Association for Computer Linguistics, pp 1234–1244. https://doi.org/10.3115/v1/p15-1119 Guo J, Che W, Yarowsky D, et al (2015) Cross-lingual dependency parsing based on distributed representations. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian federation of natural language processing, ACL 2015, July 26–31, 2015, Beijing, China, vol 1: Long Papers. The Association for Computer Linguistics, pp 1234–1244. https://​doi.​org/​10.​3115/​v1/​p15-1119
20.
Zurück zum Zitat Hou Y, Che W, Lai Y, et al (2020) Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp 1381–1393. https://doi.org/10.18653/v1/2020.acl-main.128 Hou Y, Che W, Lai Y, et al (2020) Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp 1381–1393. https://​doi.​org/​10.​18653/​v1/​2020.​acl-main.​128
21.
Zurück zum Zitat Hou Y, Mao J, Lai Y, et al (2020) Fewjoint: a few-shot learning benchmark for joint language understanding. CoRR. arXiv:2009.08138 Hou Y, Mao J, Lai Y, et al (2020) Fewjoint: a few-shot learning benchmark for joint language understanding. CoRR. arXiv:​2009.​08138
22.
Zurück zum Zitat Hu J, Ruder S, Siddhant A, et al (2020) XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, virtual event, proceedings of machine learning research, vol 119. PMLR, pp 4411–4421. http://proceedings.mlr.press/v119/hu20b.html Hu J, Ruder S, Siddhant A, et al (2020) XTREME: A massively multilingual multi-task benchmark for evaluating cross-lingual generalisation. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, virtual event, proceedings of machine learning research, vol 119. PMLR, pp 4411–4421. http://​proceedings.​mlr.​press/​v119/​hu20b.​html
23.
Zurück zum Zitat Hu J, Johnson M, Firat O, et al (2021) Explicit alignment objectives for multilingual bidirectional encoders. In: Toutanova K, Rumshisky A, Zettlemoyer L, et al (eds) Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2021, Online, June 6–11, 2021. Association for Computational Linguistics, pp 3633–3643. https://doi.org/10.18653/v1/2021.naacl-main.284 Hu J, Johnson M, Firat O, et al (2021) Explicit alignment objectives for multilingual bidirectional encoders. In: Toutanova K, Rumshisky A, Zettlemoyer L, et al (eds) Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2021, Online, June 6–11, 2021. Association for Computational Linguistics, pp 3633–3643. https://​doi.​org/​10.​18653/​v1/​2021.​naacl-main.​284
24.
Zurück zum Zitat Hu W, Miyato T, Tokui S, et al (2017) Learning discrete representations via information maximizing self-augmented training. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, proceedings of machine learning research, vol 70. PMLR, pp 1558–1567. http://proceedings.mlr.press/v70/hu17b.html Hu W, Miyato T, Tokui S, et al (2017) Learning discrete representations via information maximizing self-augmented training. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, proceedings of machine learning research, vol 70. PMLR, pp 1558–1567. http://​proceedings.​mlr.​press/​v70/​hu17b.​html
25.
Zurück zum Zitat Jiang H, He P, Chen W, et al (2020) SMART: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 2177–2190. https://www.aclweb.org/anthology/2020.acl-main.197/ Jiang H, He P, Chen W, et al (2020) SMART: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 2177–2190. https://​www.​aclweb.​org/​anthology/​2020.​acl-main.​197/​
27.
Zurück zum Zitat Kudo T, Richardson J (2018) Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Blanco E, Lu W (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, EMNLP 2018: system demonstrations, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 66–71. https://doi.org/10.18653/v1/d18-2012 Kudo T, Richardson J (2018) Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Blanco E, Lu W (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, EMNLP 2018: system demonstrations, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 66–71. https://​doi.​org/​10.​18653/​v1/​d18-2012
28.
Zurück zum Zitat Lample G, Conneau A, Denoyer L, et al (2018) Unsupervised machine translation using monolingual corpora only. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings. OpenReview.net, http://openreview.net/forum?id=rkYTTf-AZ Lample G, Conneau A, Denoyer L, et al (2018) Unsupervised machine translation using monolingual corpora only. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, conference track proceedings. OpenReview.net, http://​openreview.​net/​forum?​id=​rkYTTf-AZ
29.
Zurück zum Zitat Lauscher A, Ravishankar V, Vulic I, et al (2020) From zero to hero: On the limitations of zero-shot language transfer with multilingual transformers. In: Webber B, Cohn T, He Y, et al (eds) Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, November 16–20, 2020. Association for Computational Linguistics, pp 4483–4499. https://doi.org/10.18653/v1/2020.emnlp-main.363 Lauscher A, Ravishankar V, Vulic I, et al (2020) From zero to hero: On the limitations of zero-shot language transfer with multilingual transformers. In: Webber B, Cohn T, He Y, et al (eds) Proceedings of the 2020 conference on empirical methods in natural language processing, EMNLP 2020, Online, November 16–20, 2020. Association for Computational Linguistics, pp 4483–4499. https://​doi.​org/​10.​18653/​v1/​2020.​emnlp-main.​363
30.
Zurück zum Zitat Lewis PSH, Oguz B, Rinott R, et al (2020) MLQA: evaluating cross-lingual extractive question answering. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 7315–7330. http://www.aclweb.org/anthology/2020.acl-main.653/ Lewis PSH, Oguz B, Rinott R, et al (2020) MLQA: evaluating cross-lingual extractive question answering. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 7315–7330. http://​www.​aclweb.​org/​anthology/​2020.​acl-main.​653/​
33.
Zurück zum Zitat Luo F, Wang W, Liu J, et al (2020) VECO: Variable encoder-decoder pre-training for cross-lingual understanding and generation. arXiv:2010.16046 Luo F, Wang W, Liu J, et al (2020) VECO: Variable encoder-decoder pre-training for cross-lingual understanding and generation. arXiv:​2010.​16046
34.
Zurück zum Zitat Lv X, Gu Y, Han X, et al (2019) Adapting meta knowledge graph information for multi-hop reasoning over few-shot relations. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 3374–3379. https://doi.org/10.18653/v1/D19-1334 Lv X, Gu Y, Han X, et al (2019) Adapting meta knowledge graph information for multi-hop reasoning over few-shot relations. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 3374–3379. https://​doi.​org/​10.​18653/​v1/​D19-1334
35.
37.
Zurück zum Zitat Nivre J, Blokland R, Partanen N, et al (2018) Universal dependencies 2.2 Nivre J, Blokland R, Partanen N, et al (2018) Universal dependencies 2.2
38.
Zurück zum Zitat Pan X, Zhang B, May J, et al (2017) Cross-lingual name tagging and linking for 282 languages. In: Barzilay R, Kan M (eds) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, volume 1: long papers. Association for Computational Linguistics, pp 1946–1958. https://doi.org/10.18653/v1/P17-1178 Pan X, Zhang B, May J, et al (2017) Cross-lingual name tagging and linking for 282 languages. In: Barzilay R, Kan M (eds) Proceedings of the 55th annual meeting of the association for computational linguistics, ACL 2017, Vancouver, Canada, July 30–August 4, volume 1: long papers. Association for Computational Linguistics, pp 1946–1958. https://​doi.​org/​10.​18653/​v1/​P17-1178
39.
Zurück zum Zitat Phang J, Htut PM, Pruksachatkun Y, et al (2020) English intermediate-task training improves zero-shot cross-lingual transfer too. CoRR. arXiv:2005.13013 Phang J, Htut PM, Pruksachatkun Y, et al (2020) English intermediate-task training improves zero-shot cross-lingual transfer too. CoRR. arXiv:​2005.​13013
40.
Zurück zum Zitat Provilkov I, Emelianenko D, Voita E (2020) BPE-dropout: simple and effective subword regularization. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 1882–1892. https://www.aclweb.org/anthology/2020.acl-main.170/ Provilkov I, Emelianenko D, Voita E (2020) BPE-dropout: simple and effective subword regularization. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 1882–1892. https://​www.​aclweb.​org/​anthology/​2020.​acl-main.​170/​
41.
Zurück zum Zitat Qin L, Ni M, Zhang Y, et al (2020) CoSDA-ML: multi-lingual code-switching data augmentation for zero-shot cross-lingual NLP. In: Bessiere C (eds) Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020. ijcai.org, pp 3853–3860. https://doi.org/10.24963/ijcai.2020/533 Qin L, Ni M, Zhang Y, et al (2020) CoSDA-ML: multi-lingual code-switching data augmentation for zero-shot cross-lingual NLP. In: Bessiere C (eds) Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020. ijcai.org, pp 3853–3860. https://​doi.​org/​10.​24963/​ijcai.​2020/​533
42.
Zurück zum Zitat Shah DJ, Gupta R, Fayazi AA, et al (2019) Robust zero-shot cross-domain slot filling with example values. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, vol 1: long papers. Association for Computational Linguistics, pp 5484–5490. https://doi.org/10.18653/v1/p19-1547 Shah DJ, Gupta R, Fayazi AA, et al (2019) Robust zero-shot cross-domain slot filling with example values. In: Korhonen A, Traum DR, Màrquez L (eds) Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, vol 1: long papers. Association for Computational Linguistics, pp 5484–5490. https://​doi.​org/​10.​18653/​v1/​p19-1547
43.
Zurück zum Zitat Singh J, McCann B, Keskar NS, et al (2019) XLDA: cross-lingual data augmentation for natural language inference and question answering. CoRR. arXiv:1905.11471 Singh J, McCann B, Keskar NS, et al (2019) XLDA: cross-lingual data augmentation for natural language inference and question answering. CoRR. arXiv:​1905.​11471
44.
Zurück zum Zitat Sun S, Sun Q, Zhou K, et al (2019) Hierarchical attention prototypical networks for few-shot text classification. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 476–485. https://doi.org/10.18653/v1/D19-1045 Sun S, Sun Q, Zhou K, et al (2019) Hierarchical attention prototypical networks for few-shot text classification. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 476–485. https://​doi.​org/​10.​18653/​v1/​D19-1045
45.
Zurück zum Zitat Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Workshop Track Proceedings. OpenReview.net, http://openreview.net/forum?id=ry8u21rtl Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Workshop Track Proceedings. OpenReview.net, http://​openreview.​net/​forum?​id=​ry8u21rtl
46.
Zurück zum Zitat Wang Y, Che W, Guo J, et al (2019) Cross-lingual BERT transformation for zero-shot dependency parsing. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 5720–5726. https://doi.org/10.18653/v1/D19-1575 Wang Y, Che W, Guo J, et al (2019) Cross-lingual BERT transformation for zero-shot dependency parsing. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 5720–5726. https://​doi.​org/​10.​18653/​v1/​D19-1575
48.
Zurück zum Zitat Xu H, Murray K (2022) Por qué não utiliser alla språk? mixed training with gradient optimization in few-shot cross-lingual transfer. CoRR. arXiv:2204.13869 Xu H, Murray K (2022) Por qué não utiliser alla språk? mixed training with gradient optimization in few-shot cross-lingual transfer. CoRR. arXiv:​2204.​13869
49.
Zurück zum Zitat Xu R, Yang Y, Otani N, et al (2018) Unsupervised cross-lingual transfer of word embedding spaces. In: Riloff E, Chiang D, Hockenmaier J, et al (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 2465–2474. https://doi.org/10.18653/v1/d18-1268 Xu R, Yang Y, Otani N, et al (2018) Unsupervised cross-lingual transfer of word embedding spaces. In: Riloff E, Chiang D, Hockenmaier J, et al (eds) Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, October 31–November 4, 2018. Association for Computational Linguistics, pp 2465–2474. https://​doi.​org/​10.​18653/​v1/​d18-1268
51.
Zurück zum Zitat Yan H, Gui L, Li W, et al (2022b) Addressing token uniformity in transformers via singular value transformation. In: Cussens J, Zhang K (eds) Uncertainty in artificial intelligence, proceedings of the thirty-eighth conference on uncertainty in artificial intelligence, UAI 2022, 1–5 August 2022, Eindhoven, The Netherlands, proceedings of machine learning research, vol 180. PMLR, pp 2181–2191. http://proceedings.mlr.press/v180/yan22b.html Yan H, Gui L, Li W, et al (2022b) Addressing token uniformity in transformers via singular value transformation. In: Cussens J, Zhang K (eds) Uncertainty in artificial intelligence, proceedings of the thirty-eighth conference on uncertainty in artificial intelligence, UAI 2022, 1–5 August 2022, Eindhoven, The Netherlands, proceedings of machine learning research, vol 180. PMLR, pp 2181–2191. http://​proceedings.​mlr.​press/​v180/​yan22b.​html
53.
Zurück zum Zitat Yang H, Chen H, Zhou H et al (2022) Enhancing cross-lingual transfer by manifold mixup. In: The 10th International conference on learning representations, ICLR 2022. Virtual Event. April 25-29, 2022 Yang H, Chen H, Zhou H et al (2022) Enhancing cross-lingual transfer by manifold mixup. In: The 10th International conference on learning representations, ICLR 2022. Virtual Event. April 25-29, 2022
54.
Zurück zum Zitat Yang Y, Zhang Y, Tar C, et al (2019) PAWS-X: A cross-lingual adversarial dataset for paraphrase identification. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 3685–3690. https://doi.org/10.18653/v1/D19-1382 Yang Y, Zhang Y, Tar C, et al (2019) PAWS-X: A cross-lingual adversarial dataset for paraphrase identification. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 3685–3690. https://​doi.​org/​10.​18653/​v1/​D19-1382
56.
Zurück zum Zitat Yu M, Guo X, Yi J, et al (2018) Diverse few-shot text classification with multiple metrics. In: Walker MA, Ji H, Stent A (eds) Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, vol 1 (long papers). Association for Computational Linguistics, pp 1206–1215. https://doi.org/10.18653/v1/n18-1109 Yu M, Guo X, Yi J, et al (2018) Diverse few-shot text classification with multiple metrics. In: Walker MA, Ji H, Stent A (eds) Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1–6, 2018, vol 1 (long papers). Association for Computational Linguistics, pp 1206–1215. https://​doi.​org/​10.​18653/​v1/​n18-1109
57.
Zurück zum Zitat Zhang M, Zhang Y, Fu G (2019) Cross-lingual dependency parsing using code-mixed treebank. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 997–1006. https://doi.org/10.18653/v1/D19-1092 Zhang M, Zhang Y, Fu G (2019) Cross-lingual dependency parsing using code-mixed treebank. In: Inui K, Jiang J, Ng V, et al (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics, pp 997–1006. https://​doi.​org/​10.​18653/​v1/​D19-1092
58.
Zurück zum Zitat Zhao M, Zhu Y, Shareghi E, et al (2021) A closer look at few-shot crosslingual transfer: the choice of shots matters. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (vol 1: long papers), virtual event, August 1–6, 2021. Association for Computational Linguistics, pp 5751–5767. https://doi.org/10.18653/v1/2021.acl-long.447 Zhao M, Zhu Y, Shareghi E, et al (2021) A closer look at few-shot crosslingual transfer: the choice of shots matters. In: Zong C, Xia F, Li W, et al (eds) Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (vol 1: long papers), virtual event, August 1–6, 2021. Association for Computational Linguistics, pp 5751–5767. https://​doi.​org/​10.​18653/​v1/​2021.​acl-long.​447
59.
Zurück zum Zitat Zhao W, Eger S, Bjerva J, et al (2021) Inducing language-agnostic multilingual representations. In: Nastase V, Vulic I (eds) Proceedings of *SEM 2021: the tenth joint conference on lexical and computational semantics, *SEM 2021, Online, August 5–6, 2021. Association for Computational Linguistics, pp 229–240. https://doi.org/10.18653/v1/2021.starsem-1.22 Zhao W, Eger S, Bjerva J, et al (2021) Inducing language-agnostic multilingual representations. In: Nastase V, Vulic I (eds) Proceedings of *SEM 2021: the tenth joint conference on lexical and computational semantics, *SEM 2021, Online, August 5–6, 2021. Association for Computational Linguistics, pp 229–240. https://​doi.​org/​10.​18653/​v1/​2021.​starsem-1.​22
60.
Zurück zum Zitat Zheng B, Dong L, Huang S, et al (2021) Allocating large vocabulary capacity for cross-lingual language model pre-training. In: Moens M, Huang X, Specia L, et al (eds) Proceedings of the 2021 conference on empirical methods in natural language processing, EMNLP 2021, virtual event/Punta Cana, Dominican Republic, 7–11 November, 2021. Association for Computational Linguistics, pp 3203–3215. https://doi.org/10.18653/v1/2021.emnlp-main.257 Zheng B, Dong L, Huang S, et al (2021) Allocating large vocabulary capacity for cross-lingual language model pre-training. In: Moens M, Huang X, Specia L, et al (eds) Proceedings of the 2021 conference on empirical methods in natural language processing, EMNLP 2021, virtual event/Punta Cana, Dominican Republic, 7–11 November, 2021. Association for Computational Linguistics, pp 3203–3215. https://​doi.​org/​10.​18653/​v1/​2021.​emnlp-main.​257
61.
Zurück zum Zitat Zheng S, Song Y, Leung T, et al (2016) Improving the robustness of deep neural networks via stability training. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 4480–4488. https://doi.org/10.1109/CVPR.2016.485 Zheng S, Song Y, Leung T, et al (2016) Improving the robustness of deep neural networks via stability training. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016. IEEE Computer Society, pp 4480–4488. https://​doi.​org/​10.​1109/​CVPR.​2016.​485
62.
Zurück zum Zitat Zhu C, Cheng Y, Gan Z, et al (2020) FreeLB: enhanced adversarial training for natural language understanding. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net, https://openreview.net/forum?id=BygzbyHFvB Zhu C, Cheng Y, Gan Z, et al (2020) FreeLB: enhanced adversarial training for natural language understanding. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net, https://​openreview.​net/​forum?​id=​BygzbyHFvB
Metadaten
Titel
Improving cross-lingual language understanding with consistency regularization-based fine-tuning
verfasst von
Bo Zheng
Wanxiang Che
Publikationsdatum
27.05.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 10/2023
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-023-01854-1

Weitere Artikel der Ausgabe 10/2023

International Journal of Machine Learning and Cybernetics 10/2023 Zur Ausgabe

Neuer Inhalt