nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Offensive Language Detection in Under-Resourced Algerian Dialectal Arabic Language

verfasst von : Oussama Boucherit, Kheireddine Abainia

Erschienen in: Big Data, Machine Learning, and Applications

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper addresses the problem of detecting the offensive and abusive content in Facebook comments, where we focus on the Algerian dialectal Arabic which is one of the under-resourced languages. The latter has a variety of dialects mixed with different languages (i.e., Berber, French, and English). In addition, we deal with texts written in both Arabic and Roman scripts (i.e., Arabizi). Due to the scarcity of works on the same language, we have built a new corpus regrouping more than 8.7 k texts manually annotated as normal, abusive, and offensive. We have conducted a series of experiments using the state-of-the-art classifiers of text categorization, namely: BiLSTM, CNN, FastText, SVM, and NB. The results showed acceptable performances, but the problem requires further investigation on linguistic features to increase the identification accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Post-Vaccination Risk Prediction of COVID-19: Machine Learning Approach

Nächstes Kapitel A Comparative Analysis of Modern Machine Learning Approaches for Automatic Classification of Scientific Articles

https://github.com/xprogramer/DziriOFN.

https://github.com/xprogramer/fb-cmt-crawl.

Alharbi AI, Lee M (2020) Combining character and word embeddings for the detection of offensive language in Arabic. In: Proceedings of the 4th workshop on open-source Arabic corpora and processing tools, with a shared task on offensive language detection, May, pp 91–96

Alshaalan R, Al-Khalifa H (2020) Hate speech detection in Saudi twittersphere: a deep learning approach. In: Proceedings of the fifth Arabic natural language processing workshop, December, pp 12–23

Badjatiya P, Gupta S, Gupta M, Varma V (2017) Deep learning for hate speech detection in tweets. In: Proceedings of the 26th international conference on world wide Web companion, April, pp 759–760

Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef

De Souza GA, Da Costa-Abreu M (2020, July). Automatic offensive language detection from twitter data using machine learning and feature selection of metadata. In: 2020 international joint conference on neural networks (IJCNN), pp 1–6

Djandji M, Baly F, Antoun W, Hajj H (2020, May) Multi-task learning using AraBert for offensive language detection. In: Proceedings of the 4th workshop on open-source Arabic Corpora and processing tools, with a shared task on offensive language detection, pp 97–101

Farha IA, Magdy W (2020, May) Multitask learning for Arabic offensive language and hate-speech detection. In: Proceedings of the 4th workshop on open-source Arabic corpora and processing tools, with a shared task on offensive language detection, pp 86–90

Gitari ND, Zuping Z, Damien H, Long J (2015) A lexicon-based approach for hate speech detection. Int J Multim Ubiquit Eng 10(4):215–230CrossRef

Guellil I, Adeel A, Azouaou F, Boubred M, Houichi Y, Moumna AA (2021) Sexism detection: the first corpus in Algerian dialect with a code-switching in Arabic/French and English. arXiv preprint. arXiv:2104.01443

10.

Haddad H, Mulki H, Oueslati A (2019) T-hsab: a tunisian hate speech and abusive dataset. In: International conference on Arabic language processing, October, pp 251–263

11.

Haidar B, Chamoun M, Serhrouchni A (2017) A multilingual system for cyberbullying detection: Arabic content detection using machine learning. Adv Sci, Technol Eng Syst J 2(6):275–284CrossRef

12.

Husain F (2020) Arabic offensive language detection using machine learning and ensemble machine learning approaches. arXiv preprint. arXiv:2005.08946

13.

Mubarak H, Darwish K, Magdy W (2017) Abusive language detection on Arabic social media. In: Proceedings of the first workshop on abusive language online, August, pp 52–56

14.

Mulki H, Haddad H, Ali CB, Alshabani H (2019) L-hsab: a levantine twitter dataset for hate speech and abusive language. In: Proceedings of the third workshop on abusive language online, August, pp 111–118

15.

Nayel HA, Shashirekha HL (2019) DEEP at HASOC2019: a machine learning framework for hate speech and offensive language detection. In: FIRE (Working Notes), December, pp 336–343

16.

Otiefy Y, Abdelmalek A, Hosary IE (2020) WOLI at SemEval-2020 Task 12: Arabic offensive language identification on different Twitter datasets. arXiv preprint. arXiv:2009.05456

17.

Pitenis Z, Zampieri M, Ranasinghe T (2020) Offensive language identification in Greek. arXiv preprint. arXiv:2003.07459

18.

Santucci V, Spina S, Milani A, Biondi G, Di Bari G (2018) Detecting hate speech for Italian language in social media. In: EVALITA 2018, co-located with the fifth Italian conference on computational linguistics (CLiC-it 2018), vol 2263

19.

Sutejo TL, Lestari DP (2018) Indonesia hate speech detection using deep learning. In: 2018 international conference on Asian language processing (IALP), November, pp 39–43

20.

Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Predicting the type and target of offensive posts in social media. arXiv preprint. arXiv:1902.09666

Titel: Offensive Language Detection in Under-Resourced Algerian Dialectal Arabic Language
verfasst von: Oussama Boucherit
Kheireddine Abainia
Verlag: Springer Nature Singapore
Buch: Big Data, Machine Learning, and Applications
Print ISBN: 978-981-9934-80-5

Electronic ISBN: 978-981-9934-81-2

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-99-3481-2_49

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner