Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2024

01.12.2024

Multilingual, monolingual and mono-dialectal transfer learning for Moroccan Arabic sentiment classification

verfasst von: Naaima Boudad, Rdouan Faizi, Rachid Oulad Haj Thami

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Transfer learning has recently proven to be very powerful in diverse natural language processing (NLP) tasks such as Machine translation, Sentiment Analysis, and Question/Answering. In this work, we investigate the use of transfer learning (TL) in Dialectal Arabic sentiment classification. Our main objective is to enhance the performance of Sentiment classification and overcome the low resource issue of Arabic dialect. To this end, we use Bidirectional Encoder Representation from Transformers (BERT) to transfer contextual knowledge learned from language modeling task to sentiment classification. We particularly use the multilingual models mBert and XLM-Roberta, the specific Arabic models ARABERT, MARBERT, QARIB, CAMEL and the specific Moroccan dialect DarijaBert. After carrying out downstream fine-tuning experiments using different Moroccan SA datasets, we found that using TL significantly increases the performance of sentiment classification in Moroccan Arabic. Nevertheless, though specific Arabic models have proven to perform much better than multilingual and dialectal models, our experiments have demonstrated that multilingual models can be more effective in texts characterized by an extensive use of code-switching.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdaoui A, Berrimi M, Oussalah M, Moussaoui A (2021) Dziribert: a pre-trained language model for the algerian dialect. ArXiv Prepr. arXiv:2109.12346. Abdaoui A, Berrimi M, Oussalah M, Moussaoui A (2021) Dziribert: a pre-trained language model for the algerian dialect. ArXiv Prepr. arXiv:​2109.​12346.
Zurück zum Zitat Abdelali A, Hassan S, Mubarak H, Darwish K, Samih Y (2021) Pre-training bert on arabic tweets: practical considerations. ArXiv Prepr.arXiv:2102.10684 Abdelali A, Hassan S, Mubarak H, Darwish K, Samih Y (2021) Pre-training bert on arabic tweets: practical considerations. ArXiv Prepr.arXiv:​2102.​10684
Zurück zum Zitat Abdelfattah MF, Fakhr MW, Rizka MA (2023) ArSentBERT: fine-tuned bidirectional encoder representations from transformers model for Arabic sentiment classification. Bull Electr Eng Inform 12:1196–1202CrossRef Abdelfattah MF, Fakhr MW, Rizka MA (2023) ArSentBERT: fine-tuned bidirectional encoder representations from transformers model for Arabic sentiment classification. Bull Electr Eng Inform 12:1196–1202CrossRef
Zurück zum Zitat Abdul-Mageed M, Elmadany A, Nagoudi EMB (2020) ARBERT & MARBERT: deep bidirectional transformers for Arabic. ArXiv Prepr. arXiv:2101.01785 Abdul-Mageed M, Elmadany A, Nagoudi EMB (2020) ARBERT & MARBERT: deep bidirectional transformers for Arabic. ArXiv Prepr. arXiv:​2101.​01785
Zurück zum Zitat Alduailej A, Alothaim A (2022) AraXLNet: pre-trained language model for sentiment analysis of Arabic. J Big Data 9:1–21CrossRef Alduailej A, Alothaim A (2022) AraXLNet: pre-trained language model for sentiment analysis of Arabic. J Big Data 9:1–21CrossRef
Zurück zum Zitat Almaliki M, Almars AM, Gad I, Atlam E-S (2023) ABMM: Arabic BERT-mini model for hate-speech detection on social media. Electronics 12:1048CrossRef Almaliki M, Almars AM, Gad I, Atlam E-S (2023) ABMM: Arabic BERT-mini model for hate-speech detection on social media. Electronics 12:1048CrossRef
Zurück zum Zitat Ameri K, Hempel M, Sharif H, Lopez J Jr, Perumalla K (2021) CyBERT: cybersecurity claim classification by fine-tuning the BERT language model. J Cybersecurity Priv 1:615–637CrossRef Ameri K, Hempel M, Sharif H, Lopez J Jr, Perumalla K (2021) CyBERT: cybersecurity claim classification by fine-tuning the BERT language model. J Cybersecurity Priv 1:615–637CrossRef
Zurück zum Zitat Antit C, Mechti S, Faiz R (2022) TunRoBERTa: a tunisian robustly optimized BERT approach model for sentiment analysis. Atlantis Press, Netherlands, pp 227–231 Antit C, Mechti S, Faiz R (2022) TunRoBERTa: a tunisian robustly optimized BERT approach model for sentiment analysis. Atlantis Press, Netherlands, pp 227–231
Zurück zum Zitat Boudad N, Faizi R, Thami ROH, Chiheb R (2017) Sentiment classification of Arabic tweets: a supervised approach. J Mob Multimed 13:233–243 Boudad N, Faizi R, Thami ROH, Chiheb R (2017) Sentiment classification of Arabic tweets: a supervised approach. J Mob Multimed 13:233–243
Zurück zum Zitat Boudad N, Ezzahid S, Faizi R, Thami ROH (2019) Exploring the use of word embedding and deep learning in arabic sentiment analysis. In: Presented at the international conference on advanced intelligent systems for sustainable development, Springer pp 243–253 Boudad N, Ezzahid S, Faizi R, Thami ROH (2019) Exploring the use of word embedding and deep learning in arabic sentiment analysis. In: Presented at the international conference on advanced intelligent systems for sustainable development, Springer pp 243–253
Zurück zum Zitat Boujou E, Chataoui H, Mekki AE, Benjelloun S, Chairi I, Berrada I (2021) An open access NLP dataset for Arabic dialects: data collection, labeling, and model construction. ArXiv Prepr. arXiv:2102.11000 Boujou E, Chataoui H, Mekki AE, Benjelloun S, Chairi I, Berrada I (2021) An open access NLP dataset for Arabic dialects: data collection, labeling, and model construction. ArXiv Prepr. arXiv:​2102.​11000
Zurück zum Zitat Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901 Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Zurück zum Zitat Clark K, Luong M-T, Le QV, Manning CD (2020) Electra: pre-training text encoders as discriminators rather than generators. ArXiv Prepr. arXiv:2003.10555 Clark K, Luong M-T, Le QV, Manning CD (2020) Electra: pre-training text encoders as discriminators rather than generators. ArXiv Prepr. arXiv:​2003.​10555
Zurück zum Zitat Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2019) Unsupervised cross-lingual representation learning at scale. ArXiv Prepr. arXiv:1911.02116 Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzmán F, Grave E, Ott M, Zettlemoyer L, Stoyanov V (2019) Unsupervised cross-lingual representation learning at scale. ArXiv Prepr. arXiv:​1911.​02116
Zurück zum Zitat de Vries W, van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M (2019). Bertje: a dutch bert model. ArXiv Prepr.arXiv:1912.09582 de Vries W, van Cranenburgh A, Bisazza A, Caselli T, van Noord G, Nissim M (2019). Bertje: a dutch bert model. ArXiv Prepr.arXiv:​1912.​09582
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. ArXiv Prepr.arXiv:1810.04805 ArXiv. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. ArXiv Prepr.arXiv:​1810.​04805 ArXiv.
Zurück zum Zitat Dodge J, Ilharco G, Schwartz R, Farhadi A, Hajishirzi H, Smith N (2020) Fine-tuning pretrained language models: weight initializations, data orders, and early stopping. ArXiv Prepr.arXiv:2002.06305 Dodge J, Ilharco G, Schwartz R, Farhadi A, Hajishirzi H, Smith N (2020) Fine-tuning pretrained language models: weight initializations, data orders, and early stopping. ArXiv Prepr.arXiv:​2002.​06305
Zurück zum Zitat Elouardighi A, Maghfour M, Hammia H (2017) Collecting and processing arabic facebook comments for sentiment analysis. Springer, Berlin, pp 262–274 Elouardighi A, Maghfour M, Hammia H (2017) Collecting and processing arabic facebook comments for sentiment analysis. Springer, Berlin, pp 262–274
Zurück zum Zitat Garouani M, Kharroubi J (2021) MAC: an open and free Moroccan Arabic corpus for sentiment analysis. In: Presented at the the proceedings of the international conference on smart city applications, Springer, pp. 849–858. Garouani M, Kharroubi J (2021) MAC: an open and free Moroccan Arabic corpus for sentiment analysis. In: Presented at the the proceedings of the international conference on smart city applications, Springer, pp. 849–858.
Zurück zum Zitat Garouani, M., Chrita, H., Kharroubi, J., 2021. Sentiment analysis of Moroccan tweets using text mining. Garouani, M., Chrita, H., Kharroubi, J., 2021. Sentiment analysis of Moroccan tweets using text mining.
Zurück zum Zitat Ghaddar A, Wu Y, Rashid A, Bibi K, Rezagholizadeh M, Xing C, Wang Y, Xinyu D, Wang Z, Huai B (2021) JABER: junior Arabic BERt. ArXiv Prepr.arXiv:2112.04329 ArXiv. Ghaddar A, Wu Y, Rashid A, Bibi K, Rezagholizadeh M, Xing C, Wang Y, Xinyu D, Wang Z, Huai B (2021) JABER: junior Arabic BERt. ArXiv Prepr.arXiv:​2112.​04329 ArXiv.
Zurück zum Zitat Inoue G, Alhafni B, Baimukan N, Bouamor H, Habash N (2021) The interplay of variant, size, and task type in Arabic pre-trained language models. ArXiv Prepr. arXiv:2103.06678 Inoue G, Alhafni B, Baimukan N, Bouamor H, Habash N (2021) The interplay of variant, size, and task type in Arabic pre-trained language models. ArXiv Prepr. arXiv:​2103.​06678
Zurück zum Zitat Lan W, Chen Y, Xu W, Ritter A (2020) An empirical study of pre-trained transformers for Arabic information extraction. ArXiv Prepr. arXiv:2004.14519 Lan W, Chen Y, Xu W, Ritter A (2020) An empirical study of pre-trained transformers for Arabic information extraction. ArXiv Prepr. arXiv:​2004.​14519
Zurück zum Zitat Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ArXiv Prepr. arXiv:1910.13461 Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ArXiv Prepr. arXiv:​1910.​13461
Zurück zum Zitat Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. ArXiv Prepr.arXiv:1907.11692 Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. ArXiv Prepr.arXiv:​1907.​11692
Zurück zum Zitat Martin L, Muller B, Suárez PJO, Dupont Y, Romary L, de La Clergerie ÉV, Seddah D, Sagot B (2019) CamemBERT: a tasty French language model. ArXiv Prepr. arXiv:1911.03894 Martin L, Muller B, Suárez PJO, Dupont Y, Romary L, de La Clergerie ÉV, Seddah D, Sagot B (2019) CamemBERT: a tasty French language model. ArXiv Prepr. arXiv:​1911.​03894
Zurück zum Zitat Messaoudi A, Cheikhrouhou A, Haddad H, Ferchichi N, BenHajhmida M, Korched A, Naski M, Ghriss F, Kerkeni A (2021) TunBERT: pretrained contextualized text representation for tunisian dialect. ArXiv Prepr.arXiv:2111.13138 Messaoudi A, Cheikhrouhou A, Haddad H, Ferchichi N, BenHajhmida M, Korched A, Naski M, Ghriss F, Kerkeni A (2021) TunBERT: pretrained contextualized text representation for tunisian dialect. ArXiv Prepr.arXiv:​2111.​13138
Zurück zum Zitat Mohamed O, Kassem AM, Ashraf A, Jamal S, Mohamed EH (2022) An ensemble transformer-based model for Arabic sentiment analysis. Soc Netw Anal Min 13:11CrossRef Mohamed O, Kassem AM, Ashraf A, Jamal S, Mohamed EH (2022) An ensemble transformer-based model for Arabic sentiment analysis. Soc Netw Anal Min 13:11CrossRef
Zurück zum Zitat Oussous A, Benjelloun F-Z, Lahcen AA, Belfkih S (2020) ASA: a framework for Arabic sentiment analysis. J Inf Sci 46:544–559CrossRef Oussous A, Benjelloun F-Z, Lahcen AA, Belfkih S (2020) ASA: a framework for Arabic sentiment analysis. J Inf Sci 46:544–559CrossRef
Zurück zum Zitat Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1:9 Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI Blog 1:9
Zurück zum Zitat Safaya A, Abdullatif M, Yuret D (2020) Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media. pp. 2054–2059 Safaya A, Abdullatif M, Yuret D (2020) Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media. pp. 2054–2059
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30.
Metadaten
Titel
Multilingual, monolingual and mono-dialectal transfer learning for Moroccan Arabic sentiment classification
verfasst von
Naaima Boudad
Rdouan Faizi
Rachid Oulad Haj Thami
Publikationsdatum
01.12.2024
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2024
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-023-01159-9

Weitere Artikel der Ausgabe 1/2024

Social Network Analysis and Mining 1/2024 Zur Ausgabe

Premium Partner