Skip to main content

2024 | OriginalPaper | Buchkapitel

Offensive Language Detection in Under-Resourced Algerian Dialectal Arabic Language

verfasst von : Oussama Boucherit, Kheireddine Abainia

Erschienen in: Big Data, Machine Learning, and Applications

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper addresses the problem of detecting the offensive and abusive content in Facebook comments, where we focus on the Algerian dialectal Arabic which is one of the under-resourced languages. The latter has a variety of dialects mixed with different languages (i.e., Berber, French, and English). In addition, we deal with texts written in both Arabic and Roman scripts (i.e., Arabizi). Due to the scarcity of works on the same language, we have built a new corpus regrouping more than 8.7 k texts manually annotated as normal, abusive, and offensive. We have conducted a series of experiments using the state-of-the-art classifiers of text categorization, namely: BiLSTM, CNN, FastText, SVM, and NB. The results showed acceptable performances, but the problem requires further investigation on linguistic features to increase the identification accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alharbi AI, Lee M (2020) Combining character and word embeddings for the detection of offensive language in Arabic. In: Proceedings of the 4th workshop on open-source Arabic corpora and processing tools, with a shared task on offensive language detection, May, pp 91–96 Alharbi AI, Lee M (2020) Combining character and word embeddings for the detection of offensive language in Arabic. In: Proceedings of the 4th workshop on open-source Arabic corpora and processing tools, with a shared task on offensive language detection, May, pp 91–96
2.
Zurück zum Zitat Alshaalan R, Al-Khalifa H (2020) Hate speech detection in Saudi twittersphere: a deep learning approach. In: Proceedings of the fifth Arabic natural language processing workshop, December, pp 12–23 Alshaalan R, Al-Khalifa H (2020) Hate speech detection in Saudi twittersphere: a deep learning approach. In: Proceedings of the fifth Arabic natural language processing workshop, December, pp 12–23
3.
Zurück zum Zitat Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: Proceedings of the 26th international conference on world wide Web companion, April, pp 759–760 Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: Proceedings of the 26th international conference on world wide Web companion, April, pp 759–760
4.
Zurück zum Zitat Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef
5.
Zurück zum Zitat De Souza GA, Da Costa-Abreu M (2020, July). Automatic offensive language detection from twitter data using machine learning and feature selection of metadata. In: 2020 international joint conference on neural networks (IJCNN), pp 1–6 De Souza GA, Da Costa-Abreu M (2020, July). Automatic offensive language detection from twitter data using machine learning and feature selection of metadata. In: 2020 international joint conference on neural networks (IJCNN), pp 1–6
6.
Zurück zum Zitat Djandji M, Baly F, Antoun W, Hajj H (2020, May) Multi-task learning using AraBert for offensive language detection. In: Proceedings of the 4th workshop on open-source Arabic Corpora and processing tools, with a shared task on offensive language detection, pp 97–101 Djandji M, Baly F, Antoun W, Hajj H (2020, May) Multi-task learning using AraBert for offensive language detection. In: Proceedings of the 4th workshop on open-source Arabic Corpora and processing tools, with a shared task on offensive language detection, pp 97–101
7.
Zurück zum Zitat Farha IA, Magdy W (2020, May) Multitask learning for Arabic offensive language and hate-speech detection. In: Proceedings of the 4th workshop on open-source Arabic corpora and processing tools, with a shared task on offensive language detection, pp 86–90 Farha IA, Magdy W (2020, May) Multitask learning for Arabic offensive language and hate-speech detection. In: Proceedings of the 4th workshop on open-source Arabic corpora and processing tools, with a shared task on offensive language detection, pp 86–90
8.
Zurück zum Zitat Gitari ND, Zuping Z, Damien H, Long J (2015) A lexicon-based approach for hate speech detection. Int J Multim Ubiquit Eng 10(4):215–230CrossRef Gitari ND, Zuping Z, Damien H, Long J (2015) A lexicon-based approach for hate speech detection. Int J Multim Ubiquit Eng 10(4):215–230CrossRef
9.
Zurück zum Zitat Guellil I, Adeel A, Azouaou F, Boubred M, Houichi Y, Moumna AA (2021) Sexism detection: the first corpus in Algerian dialect with a code-switching in Arabic/French and English. arXiv preprint. arXiv:2104.01443 Guellil I, Adeel A, Azouaou F, Boubred M, Houichi Y, Moumna AA (2021) Sexism detection: the first corpus in Algerian dialect with a code-switching in Arabic/French and English. arXiv preprint. arXiv:​2104.​01443
10.
Zurück zum Zitat Haddad H, Mulki H, Oueslati A (2019) T-hsab: a tunisian hate speech and abusive dataset. In: International conference on Arabic language processing, October, pp 251–263 Haddad H, Mulki H, Oueslati A (2019) T-hsab: a tunisian hate speech and abusive dataset. In: International conference on Arabic language processing, October, pp 251–263
11.
Zurück zum Zitat Haidar B, Chamoun M, Serhrouchni A (2017) A multilingual system for cyberbullying detection: Arabic content detection using machine learning. Adv Sci, Technol Eng Syst J 2(6):275–284CrossRef Haidar B, Chamoun M, Serhrouchni A (2017) A multilingual system for cyberbullying detection: Arabic content detection using machine learning. Adv Sci, Technol Eng Syst J 2(6):275–284CrossRef
12.
Zurück zum Zitat Husain F (2020) Arabic offensive language detection using machine learning and ensemble machine learning approaches. arXiv preprint. arXiv:2005.08946 Husain F (2020) Arabic offensive language detection using machine learning and ensemble machine learning approaches. arXiv preprint. arXiv:​2005.​08946
13.
Zurück zum Zitat Mubarak H, Darwish K, Magdy W (2017) Abusive language detection on Arabic social media. In: Proceedings of the first workshop on abusive language online, August, pp 52–56 Mubarak H, Darwish K, Magdy W (2017) Abusive language detection on Arabic social media. In: Proceedings of the first workshop on abusive language online, August, pp 52–56
14.
Zurück zum Zitat Mulki H, Haddad H, Ali CB, Alshabani H (2019) L-hsab: a levantine twitter dataset for hate speech and abusive language. In: Proceedings of the third workshop on abusive language online, August, pp 111–118 Mulki H, Haddad H, Ali CB, Alshabani H (2019) L-hsab: a levantine twitter dataset for hate speech and abusive language. In: Proceedings of the third workshop on abusive language online, August, pp 111–118
15.
Zurück zum Zitat Nayel HA, Shashirekha HL (2019) DEEP at HASOC2019: a machine learning framework for hate speech and offensive language detection. In: FIRE (Working Notes), December, pp 336–343 Nayel HA, Shashirekha HL (2019) DEEP at HASOC2019: a machine learning framework for hate speech and offensive language detection. In: FIRE (Working Notes), December, pp 336–343
16.
Zurück zum Zitat Otiefy Y, Abdelmalek A, Hosary IE (2020) WOLI at SemEval-2020 Task 12: Arabic offensive language identification on different Twitter datasets. arXiv preprint. arXiv:2009.05456 Otiefy Y, Abdelmalek A, Hosary IE (2020) WOLI at SemEval-2020 Task 12: Arabic offensive language identification on different Twitter datasets. arXiv preprint. arXiv:​2009.​05456
17.
18.
Zurück zum Zitat Santucci V, Spina S, Milani A, Biondi G, Di Bari G (2018) Detecting hate speech for Italian language in social media. In: EVALITA 2018, co-located with the fifth Italian conference on computational linguistics (CLiC-it 2018), vol 2263 Santucci V, Spina S, Milani A, Biondi G, Di Bari G (2018) Detecting hate speech for Italian language in social media. In: EVALITA 2018, co-located with the fifth Italian conference on computational linguistics (CLiC-it 2018), vol 2263
19.
Zurück zum Zitat Sutejo TL, Lestari DP (2018) Indonesia hate speech detection using deep learning. In: 2018 international conference on Asian language processing (IALP), November, pp 39–43 Sutejo TL, Lestari DP (2018) Indonesia hate speech detection using deep learning. In: 2018 international conference on Asian language processing (IALP), November, pp 39–43
20.
Zurück zum Zitat Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. arXiv preprint. arXiv:1902.09666 Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. arXiv preprint. arXiv:​1902.​09666
Metadaten
Titel
Offensive Language Detection in Under-Resourced Algerian Dialectal Arabic Language
verfasst von
Oussama Boucherit
Kheireddine Abainia
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-3481-2_49

Premium Partner