Skip to main content
Top
Published in: Neural Computing and Applications 13/2024

23-02-2024 | Original Article

A robust classification approach to enhance clinic identification from Arabic health text

Authors: Shrouq Al-Fuqaha’a, Nailah Al-Madi, Bassam Hammo

Published in: Neural Computing and Applications | Issue 13/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Text classification has critical applications, including healthcare, where it can assist patients in locating specialized clinics based on their symptom descriptions. This can enhance healthcare services by reducing misdiagnoses and efficiently guiding patients to appropriate specialists. This research focuses on building a model for multi-class text classification in Arabic healthcare. Two evaluation schemes were employed using health data from the Altibbi dataset. Various feature extraction techniques were employed, including term frequency-inverse document frequency (TF-IDF) and word-to-vector (Word2Vec). Classical machine learning models such as Logistic Regression (LR), Random Forest (RF), Multinomial Naïve Bayes (MNB), Stochastic Gradient Descent, and Support Vector Machine (SVM), along with ensemble classifications, were implemented. Evaluation measures were used to compare model performance, including precision, recall, and F1-score. Results indicated that TF-IDF generally outperformed Word2Vec regarding the F1-score across most classes. LR and RF consistently demonstrated strong performance, while MNB exhibited lower precision. Ensemble methods like stacking and bagging showed promise, with LR and RF performing well across multiple classes. The SVM model displayed lower precision in certain classes. Preliminary experiments with the transformer-based models for Arabic language AraBERT and MarBERT as feature extraction and classification models demonstrated impressive precision, recall, and F1 scores across various classes. The findings show the effectiveness of ensemble models using TF-IDF and Arabic transformer models in accurately classifying Arabic health texts. However, challenges related to standardization, semantic variations, data availability, and resources should be considered.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
4.
go back to reference Lavanya, P. M., & Sasikala, E. (2021). Deep learning techniques on text classification using natural language processing (NLP) in social healthcare network: a comprehensive survey. In: 2021 3rd international conference on signal processing and communication (ICPSC) (pp. 603–609). https://doi.org/10.1109/ICSPC51351.2021.9451752 Lavanya, P. M., & Sasikala, E. (2021). Deep learning techniques on text classification using natural language processing (NLP) in social healthcare network: a comprehensive survey. In: 2021 3rd international conference on signal processing and communication (ICPSC) (pp. 603–609). https://​doi.​org/​10.​1109/​ICSPC51351.​2021.​9451752
7.
go back to reference Rusli A, Young J, Iswari N (2020) Identifying fake news in Indonesian via supervised binary text classification. In: 2020 IEEE international conference on industry 4.0, artificial intelligence, and communications technology (IAICT), pp 86–90 Rusli A, Young J, Iswari N (2020) Identifying fake news in Indonesian via supervised binary text classification. In: 2020 IEEE international conference on industry 4.0, artificial intelligence, and communications technology (IAICT), pp 86–90
9.
go back to reference Akhand B, Susheela Devi V (2013) Multi label classification of discrete data. In: 2013 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–5 Akhand B, Susheela Devi V (2013) Multi label classification of discrete data. In: 2013 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–5
10.
go back to reference Chen X, Bromuri S, Tan DS (2022) Confidence range: bridging failure detection and true class probability on selective hierarchical text classification. Available at SSRN 4244490 Chen X, Bromuri S, Tan DS (2022) Confidence range: bridging failure detection and true class probability on selective hierarchical text classification. Available at SSRN 4244490
11.
go back to reference Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. SIGKDD Explor 12(1):40–48CrossRef Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. SIGKDD Explor 12(1):40–48CrossRef
12.
go back to reference Dhar A, Dash N, Roy K (2017) Classification of text documents through distance measurement: an experiment with multi-domain Bangla text documents. In: Proceedings of the 6th international conference on informatics, electronics and vision (ICIEV), pp 377–382 Dhar A, Dash N, Roy K (2017) Classification of text documents through distance measurement: an experiment with multi-domain Bangla text documents. In: Proceedings of the 6th international conference on informatics, electronics and vision (ICIEV), pp 377–382
15.
go back to reference Sivakumar S, Videla L, Rajesh Kumar T, Nagaraj J, Itnal S, Haritha D (2020) Review on Word2Vec word embedding neural net. In: 2020 international conference on smart electronics and communication (ICOSEC), pp 282–290 Sivakumar S, Videla L, Rajesh Kumar T, Nagaraj J, Itnal S, Haritha D (2020) Review on Word2Vec word embedding neural net. In: 2020 international conference on smart electronics and communication (ICOSEC), pp 282–290
16.
17.
go back to reference Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:​1810.​04805
19.
go back to reference Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
21.
go back to reference Al-antari MA, Muaad AY, Davanagere H, Benifa JB, Chola C (2021) AI-based misogyny detection from Arabic levantine twitter tweets, vol 2 Al-antari MA, Muaad AY, Davanagere H, Benifa JB, Chola C (2021) AI-based misogyny detection from Arabic levantine twitter tweets, vol 2
22.
go back to reference Ahmed RMS (2021) Fake news detection in low-resourced languages ‘Kurdish language’ using machine learning algorithms. Turki J Comput Math Education (TURCOMAT) 12:4219–4225 Ahmed RMS (2021) Fake news detection in low-resourced languages ‘Kurdish language’ using machine learning algorithms. Turki J Comput Math Education (TURCOMAT) 12:4219–4225
23.
go back to reference Althabiti S, Alsalka M, Atwell E (2021). SCUoL at CheckThat! 2021: an AraBERT model for check-worthiness of Arabic tweets. In: Proceedings of the 3rd workshop on fact extraction and verification (FEVER) shared task, pp 1025–1030 Althabiti S, Alsalka M, Atwell E (2021). SCUoL at CheckThat! 2021: an AraBERT model for check-worthiness of Arabic tweets. In: Proceedings of the 3rd workshop on fact extraction and verification (FEVER) shared task, pp 1025–1030
28.
go back to reference Sharaf Al-deen HS, Zeng Z, Al-sabri R, Hekmat A (2021) An improved model for analyzing textual sentiment based on a deep neural network using multi-head attention mechanism. Appl Syst Innov 4(4):85CrossRef Sharaf Al-deen HS, Zeng Z, Al-sabri R, Hekmat A (2021) An improved model for analyzing textual sentiment based on a deep neural network using multi-head attention mechanism. Appl Syst Innov 4(4):85CrossRef
36.
go back to reference Alghanmi I, Anke LE, Schockaert S (2020) Combining BERT with static word embeddings for categorizing social media. In: Proceedings of the sixth workshop on noisy user-generated text (WNUT) Alghanmi I, Anke LE, Schockaert S (2020) Combining BERT with static word embeddings for categorizing social media. In: Proceedings of the sixth workshop on noisy user-generated text (WNUT)
38.
go back to reference Abdullah M, Alnore D, Swedat S, Khrais J, Al-Ayyoub M (2022). SarcasmDet at SemEval-2022 task 6: detecting sarcasm using pre-trained transformers in English and Arabic Languages. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022), pp 885–890. https://doi.org/10.18653/v1/2022.semeval-1.124 Abdullah M, Alnore D, Swedat S, Khrais J, Al-Ayyoub M (2022). SarcasmDet at SemEval-2022 task 6: detecting sarcasm using pre-trained transformers in English and Arabic Languages. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022), pp 885–890. https://​doi.​org/​10.​18653/​v1/​2022.​semeval-1.​124
42.
go back to reference Hussein A, Ghneim N, Joukhadar A (2021) DamascusTeam at NLP4IF2021: fighting the Arabic COVID-19 infodemic on Twitter using AraBERT. In: Proceedings of the 7th workshop on NLP for internet freedom (NLP4IF), pp 99–104 Hussein A, Ghneim N, Joukhadar A (2021) DamascusTeam at NLP4IF2021: fighting the Arabic COVID-19 infodemic on Twitter using AraBERT. In: Proceedings of the 7th workshop on NLP for internet freedom (NLP4IF), pp 99–104
Metadata
Title
A robust classification approach to enhance clinic identification from Arabic health text
Authors
Shrouq Al-Fuqaha’a
Nailah Al-Madi
Bassam Hammo
Publication date
23-02-2024
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 13/2024
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-024-09453-z

Other articles of this Issue 13/2024

Neural Computing and Applications 13/2024 Go to the issue

Premium Partner