Skip to main content
Erschienen in: Neural Computing and Applications 13/2024

23.02.2024 | Original Article

A robust classification approach to enhance clinic identification from Arabic health text

verfasst von: Shrouq Al-Fuqaha’a, Nailah Al-Madi, Bassam Hammo

Erschienen in: Neural Computing and Applications | Ausgabe 13/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text classification has critical applications, including healthcare, where it can assist patients in locating specialized clinics based on their symptom descriptions. This can enhance healthcare services by reducing misdiagnoses and efficiently guiding patients to appropriate specialists. This research focuses on building a model for multi-class text classification in Arabic healthcare. Two evaluation schemes were employed using health data from the Altibbi dataset. Various feature extraction techniques were employed, including term frequency-inverse document frequency (TF-IDF) and word-to-vector (Word2Vec). Classical machine learning models such as Logistic Regression (LR), Random Forest (RF), Multinomial Naïve Bayes (MNB), Stochastic Gradient Descent, and Support Vector Machine (SVM), along with ensemble classifications, were implemented. Evaluation measures were used to compare model performance, including precision, recall, and F1-score. Results indicated that TF-IDF generally outperformed Word2Vec regarding the F1-score across most classes. LR and RF consistently demonstrated strong performance, while MNB exhibited lower precision. Ensemble methods like stacking and bagging showed promise, with LR and RF performing well across multiple classes. The SVM model displayed lower precision in certain classes. Preliminary experiments with the transformer-based models for Arabic language AraBERT and MarBERT as feature extraction and classification models demonstrated impressive precision, recall, and F1 scores across various classes. The findings show the effectiveness of ensemble models using TF-IDF and Arabic transformer models in accurately classifying Arabic health texts. However, challenges related to standardization, semantic variations, data availability, and resources should be considered.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
4.
Zurück zum Zitat Lavanya, P. M., & Sasikala, E. (2021). Deep learning techniques on text classification using natural language processing (NLP) in social healthcare network: a comprehensive survey. In: 2021 3rd international conference on signal processing and communication (ICPSC) (pp. 603–609). https://doi.org/10.1109/ICSPC51351.2021.9451752 Lavanya, P. M., & Sasikala, E. (2021). Deep learning techniques on text classification using natural language processing (NLP) in social healthcare network: a comprehensive survey. In: 2021 3rd international conference on signal processing and communication (ICPSC) (pp. 603–609). https://​doi.​org/​10.​1109/​ICSPC51351.​2021.​9451752
7.
Zurück zum Zitat Rusli A, Young J, Iswari N (2020) Identifying fake news in Indonesian via supervised binary text classification. In: 2020 IEEE international conference on industry 4.0, artificial intelligence, and communications technology (IAICT), pp 86–90 Rusli A, Young J, Iswari N (2020) Identifying fake news in Indonesian via supervised binary text classification. In: 2020 IEEE international conference on industry 4.0, artificial intelligence, and communications technology (IAICT), pp 86–90
9.
Zurück zum Zitat Akhand B, Susheela Devi V (2013) Multi label classification of discrete data. In: 2013 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–5 Akhand B, Susheela Devi V (2013) Multi label classification of discrete data. In: 2013 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–5
10.
Zurück zum Zitat Chen X, Bromuri S, Tan DS (2022) Confidence range: bridging failure detection and true class probability on selective hierarchical text classification. Available at SSRN 4244490 Chen X, Bromuri S, Tan DS (2022) Confidence range: bridging failure detection and true class probability on selective hierarchical text classification. Available at SSRN 4244490
11.
Zurück zum Zitat Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. SIGKDD Explor 12(1):40–48CrossRef Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. SIGKDD Explor 12(1):40–48CrossRef
12.
Zurück zum Zitat Dhar A, Dash N, Roy K (2017) Classification of text documents through distance measurement: an experiment with multi-domain Bangla text documents. In: Proceedings of the 6th international conference on informatics, electronics and vision (ICIEV), pp 377–382 Dhar A, Dash N, Roy K (2017) Classification of text documents through distance measurement: an experiment with multi-domain Bangla text documents. In: Proceedings of the 6th international conference on informatics, electronics and vision (ICIEV), pp 377–382
15.
Zurück zum Zitat Sivakumar S, Videla L, Rajesh Kumar T, Nagaraj J, Itnal S, Haritha D (2020) Review on Word2Vec word embedding neural net. In: 2020 international conference on smart electronics and communication (ICOSEC), pp 282–290 Sivakumar S, Videla L, Rajesh Kumar T, Nagaraj J, Itnal S, Haritha D (2020) Review on Word2Vec word embedding neural net. In: 2020 international conference on smart electronics and communication (ICOSEC), pp 282–290
16.
17.
Zurück zum Zitat Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 Devlin J, Chang MW, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:​1810.​04805
19.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008 Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
21.
Zurück zum Zitat Al-antari MA, Muaad AY, Davanagere H, Benifa JB, Chola C (2021) AI-based misogyny detection from Arabic levantine twitter tweets, vol 2 Al-antari MA, Muaad AY, Davanagere H, Benifa JB, Chola C (2021) AI-based misogyny detection from Arabic levantine twitter tweets, vol 2
22.
Zurück zum Zitat Ahmed RMS (2021) Fake news detection in low-resourced languages ‘Kurdish language’ using machine learning algorithms. Turki J Comput Math Education (TURCOMAT) 12:4219–4225 Ahmed RMS (2021) Fake news detection in low-resourced languages ‘Kurdish language’ using machine learning algorithms. Turki J Comput Math Education (TURCOMAT) 12:4219–4225
23.
Zurück zum Zitat Althabiti S, Alsalka M, Atwell E (2021). SCUoL at CheckThat! 2021: an AraBERT model for check-worthiness of Arabic tweets. In: Proceedings of the 3rd workshop on fact extraction and verification (FEVER) shared task, pp 1025–1030 Althabiti S, Alsalka M, Atwell E (2021). SCUoL at CheckThat! 2021: an AraBERT model for check-worthiness of Arabic tweets. In: Proceedings of the 3rd workshop on fact extraction and verification (FEVER) shared task, pp 1025–1030
28.
Zurück zum Zitat Sharaf Al-deen HS, Zeng Z, Al-sabri R, Hekmat A (2021) An improved model for analyzing textual sentiment based on a deep neural network using multi-head attention mechanism. Appl Syst Innov 4(4):85CrossRef Sharaf Al-deen HS, Zeng Z, Al-sabri R, Hekmat A (2021) An improved model for analyzing textual sentiment based on a deep neural network using multi-head attention mechanism. Appl Syst Innov 4(4):85CrossRef
36.
Zurück zum Zitat Alghanmi I, Anke LE, Schockaert S (2020) Combining BERT with static word embeddings for categorizing social media. In: Proceedings of the sixth workshop on noisy user-generated text (WNUT) Alghanmi I, Anke LE, Schockaert S (2020) Combining BERT with static word embeddings for categorizing social media. In: Proceedings of the sixth workshop on noisy user-generated text (WNUT)
38.
Zurück zum Zitat Abdullah M, Alnore D, Swedat S, Khrais J, Al-Ayyoub M (2022). SarcasmDet at SemEval-2022 task 6: detecting sarcasm using pre-trained transformers in English and Arabic Languages. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022), pp 885–890. https://doi.org/10.18653/v1/2022.semeval-1.124 Abdullah M, Alnore D, Swedat S, Khrais J, Al-Ayyoub M (2022). SarcasmDet at SemEval-2022 task 6: detecting sarcasm using pre-trained transformers in English and Arabic Languages. In: Proceedings of the 16th international workshop on semantic evaluation (SemEval-2022), pp 885–890. https://​doi.​org/​10.​18653/​v1/​2022.​semeval-1.​124
42.
Zurück zum Zitat Hussein A, Ghneim N, Joukhadar A (2021) DamascusTeam at NLP4IF2021: fighting the Arabic COVID-19 infodemic on Twitter using AraBERT. In: Proceedings of the 7th workshop on NLP for internet freedom (NLP4IF), pp 99–104 Hussein A, Ghneim N, Joukhadar A (2021) DamascusTeam at NLP4IF2021: fighting the Arabic COVID-19 infodemic on Twitter using AraBERT. In: Proceedings of the 7th workshop on NLP for internet freedom (NLP4IF), pp 99–104
Metadaten
Titel
A robust classification approach to enhance clinic identification from Arabic health text
verfasst von
Shrouq Al-Fuqaha’a
Nailah Al-Madi
Bassam Hammo
Publikationsdatum
23.02.2024
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 13/2024
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-024-09453-z

Weitere Artikel der Ausgabe 13/2024

Neural Computing and Applications 13/2024 Zur Ausgabe

Premium Partner