Skip to main content
Top

25-04-2024 | Research Article-Computer Engineering and Computer Science

Arabic Fake News Detection in Social Media Context Using Word Embeddings and Pre-trained Transformers

Authors: Mohammad Azzeh, Abdallah Qusef, Omar Alabboushi

Published in: Arabian Journal for Science and Engineering

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The quick spread of fake news in different languages on social platforms has become a global scourge threatening societal security and the government. Fake news is usually written to deceive readers and convince them that this false information is correct; therefore, stopping the spread of this false information becomes a priority of governments and societies. Building fake news detection models for the Arabic language comes with its own set of challenges and limitations. Some of the main limitations include 1) lack of annotated data, 2) dialectal variations where each dialect can vary significantly in terms of vocabulary, grammar, and syntax, 3) morphological complexity with complex word formations and root-and-pattern morphology, 4) semantic ambiguity that make models fail to accurately discern the intent and context of a given piece of information, 5) cultural context and 6) diacrasy. The objective of this paper is twofold: first, we design a large corpus of annotated fake new data for the Arabic language from multiple sources. The corpus is collected from multiple sources to include different dialects and cultures. Second, we build fake detection by building machine learning models as model head over the fine-tuned large language models. These large language models were trained on Arabic language, such as ARBERT, AraBERT, CAMeLBERT, and the popular word embedding technique AraVec. The results showed that the text representations produced by the CAMeLBERT transformer are the most accurate because all models have outstanding evaluation results. We found that using the built deep learning classifiers with the transformer is generally better than classical machine learning classifiers. Finally, we could not find a stable conclusion concerning which model works well with each text representation method because each evaluation measure has a different favored model.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference El Ballouli, R.; El-Hajj, W.; Ghandour, A.; Elbassuoni, S.; Hajj, H.; Shaban, K: CAT: Credibility analysis of arabic content on twitter. In: Proceedings of the third Arabic natural language processing workshop. pp. 62–71. (2017). https://doi.org/10.18653/V1/W17-1308 El Ballouli, R.; El-Hajj, W.; Ghandour, A.; Elbassuoni, S.; Hajj, H.; Shaban, K: CAT: Credibility analysis of arabic content on twitter. In: Proceedings of the third Arabic natural language processing workshop. pp. 62–71. (2017). https://​doi.​org/​10.​18653/​V1/​W17-1308
4.
go back to reference Shu, K.; Sliva, A.; Wang, S.; Tang, J.; Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. newslett. 19(1), 22–36 (2017)CrossRef Shu, K.; Sliva, A.; Wang, S.; Tang, J.; Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. newslett. 19(1), 22–36 (2017)CrossRef
11.
go back to reference Hadj Ameur, M.S.; Aliane, H.: AraCOVID19-MFH arabic COVID-19 multi-label fake news & hate speech detection dataset. Proced. Comput. Sci. 189, 232–241 (2021)CrossRef Hadj Ameur, M.S.; Aliane, H.: AraCOVID19-MFH arabic COVID-19 multi-label fake news & hate speech detection dataset. Proced. Comput. Sci. 189, 232–241 (2021)CrossRef
12.
go back to reference Jardaneh, G.; Abdelhaq, H.; Buzz, M.; Johnson, D.: "Classifying Arabic tweets based on credibility using content and user features," in:2019 IEEE Jordan international joint conference on electrical engineering and information technology, JEEIT 2019 – Proceedings, pp. 596–601. 2019. https://doi.org/10.1109/JEEIT.2019.8717386. Jardaneh, G.; Abdelhaq, H.; Buzz, M.; Johnson, D.: "Classifying Arabic tweets based on credibility using content and user features," in:2019 IEEE Jordan international joint conference on electrical engineering and information technology, JEEIT 2019 – Proceedings, pp. 596–601. 2019. https://​doi.​org/​10.​1109/​JEEIT.​2019.​8717386.
23.
go back to reference Abdul-Mageed, M.; Elmadany, A. R.; Nagoudi, E. M. B.: "ARBERT & MARBERT: Deep bidirectional transformers for arabic," ACL-IJCNLP 2021 - 59th Annu. Meet. Assoc. Comput. Linguist. 11th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 7088–7105, (2020). https://doi.org/10.48550/arxiv.2101.01785. Abdul-Mageed, M.; Elmadany, A. R.; Nagoudi, E. M. B.: "ARBERT & MARBERT: Deep bidirectional transformers for arabic," ACL-IJCNLP 2021 - 59th Annu. Meet. Assoc. Comput. Linguist. 11th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 7088–7105, (2020). https://​doi.​org/​10.​48550/​arxiv.​2101.​01785.
26.
28.
33.
go back to reference Bsoul, M.A.; Qusef, A.; Abu-Soud, S.: Building an optimal dataset for arabic fake news detection. Proced. Comput. Sci. 201, 665–672 (2022)CrossRef Bsoul, M.A.; Qusef, A.; Abu-Soud, S.: Building an optimal dataset for arabic fake news detection. Proced. Comput. Sci. 201, 665–672 (2022)CrossRef
35.
go back to reference Traylor, T.; Straub, J.; Gurmeet; Snell, N: "Classifying fake news articles using natural language processing to identify in-article attribution as a supervised learning estimator," in: Proceedings - 13th ieee international conference on semantic computing, ICSC 2019, pp. 445–449, (2019). https://doi.org/10.1109/ICOSC.2019.8665593. Traylor, T.; Straub, J.; Gurmeet; Snell, N: "Classifying fake news articles using natural language processing to identify in-article attribution as a supervised learning estimator," in: Proceedings - 13th ieee international conference on semantic computing, ICSC 2019, pp. 445–449, (2019). https://​doi.​org/​10.​1109/​ICOSC.​2019.​8665593.
37.
go back to reference Abd Elminaam, D. S.; Abdelaziz, A.; Essam, G.; Mohamed, S. E: AraFake: A deep learning approach for Arabic fake news detection. In: 2023 international mobile, intelligent, and ubiquitous computing conference (MIUCC) (pp. 1–8). IEEE. (2023) Abd Elminaam, D. S.; Abdelaziz, A.; Essam, G.; Mohamed, S. E: AraFake: A deep learning approach for Arabic fake news detection. In: 2023 international mobile, intelligent, and ubiquitous computing conference (MIUCC) (pp. 1–8). IEEE. (2023)
38.
go back to reference Harrag, F.; Djahli, M.K.: Arabic fake news detection: a fact-checking based deep learning approach. Trans. Asian Low Resour. Lang. Inform. Process. 21(4), 1–34 (2022)CrossRef Harrag, F.; Djahli, M.K.: Arabic fake news detection: a fact-checking based deep learning approach. Trans. Asian Low Resour. Lang. Inform. Process. 21(4), 1–34 (2022)CrossRef
39.
go back to reference Hawashin, B.; Althunibat, A.; Kanan, T.; AlZu'bi, S.; Sharrab, Y.: Improving arabic fake news detection using optimized feature selection. In: 2023 international conference on information technology (ICIT) (pp. 690–694). IEEE. (2023) Hawashin, B.; Althunibat, A.; Kanan, T.; AlZu'bi, S.; Sharrab, Y.: Improving arabic fake news detection using optimized feature selection. In: 2023 international conference on information technology (ICIT) (pp. 690–694). IEEE. (2023)
40.
go back to reference Shishah, W.: JointBert for detecting arabic fake news. IEEE Access 10, 71951–71960 (2022)CrossRef Shishah, W.: JointBert for detecting arabic fake news. IEEE Access 10, 71951–71960 (2022)CrossRef
41.
go back to reference Wotaifi, T.A.; Dhannoon, B.N.: An effective hybrid deep neural network for arabic fake news detection. Baghdad Sci. J. 20(4), 1392–1392 (2023) Wotaifi, T.A.; Dhannoon, B.N.: An effective hybrid deep neural network for arabic fake news detection. Baghdad Sci. J. 20(4), 1392–1392 (2023)
42.
go back to reference Pennington, J.; Socher, R.; Manning, C.D.:"GloVe: global vectors for word representation," in: 2014 conference on empirical methods in natural language processing (EMNLP), (2014), pp. 1532–1543, Accessed: Aug 19, (2022). Pennington, J.; Socher, R.; Manning, C.D.:"GloVe: global vectors for word representation," in: 2014 conference on empirical methods in natural language processing (EMNLP), (2014), pp. 1532–1543, Accessed: Aug 19, (2022).
Metadata
Title
Arabic Fake News Detection in Social Media Context Using Word Embeddings and Pre-trained Transformers
Authors
Mohammad Azzeh
Abdallah Qusef
Omar Alabboushi
Publication date
25-04-2024
Publisher
Springer Berlin Heidelberg
Published in
Arabian Journal for Science and Engineering
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-024-08959-x

Premium Partners