skip to main content
research-article

Empirical Evaluation of Shallow and Deep Learning Classifiers for Arabic Sentiment Analysis

Published:29 November 2021Publication History
Skip Abstract Section

Abstract

This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks, long short-term memory, gated recurrent units, their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning models. The performance of deep and shallow learning techniques was analyzed in terms of accuracy and F1 score. The best performing shallow learning technique was Random Forest followed by Decision Tree, and AdaBoost. The deep learning models performed similarly using a default embedding layer, while the transformer model performed best when augmented with araBERT.

REFERENCES

  1. [1] Al-Ayyoub Mahmoud, Khamaiseh Abed Allah, Jararweh Yaser, and Al-Kabi Mohammed N.. 2019. A comprehensive survey of Arabic sentiment analysis. Information Processing & Management 56, 2 (2019), 320342.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Al-Ayyoub Mahmoud, Nuseir Aya, Alsmearat Kholoud, Jararweh Yaser, and Gupta Brij. 2018. Deep learning for Arabic NLP: A survey. Journal of Computational Science 26 (2018), 522531. DOI: https://doi.org/10.1016/j.jocs.2017.11.011Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Al-Azani Sadam and El-Alfy El-Sayed. 2018. Emojis-based sentiment classification of Arabic microblogs using deep recurrent neural networks. In Proceedings of the 2018 International Conference on Computing Sciences and Engineering. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Al-Azani Sadam and El-Alfy El-Sayed M.. 2017. Hybrid deep learning for sentiment polarity determination of Arabic microblogs. In Proceedings of the International Conference on Neural Information Processing. Springer, 491500.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Al-Sallab Ahmad, Baly Ramy, Hajj Hazem, Shaban Khaled Bashir, El-Hajj Wassim, and Badaro Gilbert. 2017. Aroma: A recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4 (2017), 120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Sallab Ahmad Al, Hajj Hazem, Badaro Gilbert, Baly Ramy, Hajj Wassim El, and Shaban Khaled Bashir. 2015. Deep learning models for sentiment analysis in Arabic. In Proceedings of the 2nd Workshop on Arabic Natural Language Processing. 917.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Al-Smadi Mohammad, Qawasmeh Omar, Al-Ayyoub Mahmoud, Jararweh Yaser, and Gupta Brij. 2018. Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science 27 (2018), 386393. DOI: https://doi.org/10.1016/j.jocs.2017.11.006Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Alayba Abdulaziz M., Palade Vasile, England Matthew, and Iqbal Rahat. 2018. A combined CNN and LSTM model for Arabic sentiment analysis. In Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, 179191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Almuzaini Huda Abdulrahman and Azmi Aqil M.. 2020. Impact of stemming and word embedding on deep learning-based Arabic text categorization. IEEE Access 8 (2020), 127913127928. DOI: 10.1109/ACCESS.2020.3009217Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Alnawas Anwar and Arici Nursal. 2019. Sentiment analysis of Iraqi Arabic dialect on Facebook based on distributed representations of documents. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 3 (2019), 20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Altman Naomi S.. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 3 (1992), 175185.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Altowayan A. Aziz and Elnagar Ashraf. 2017. Improving Arabic sentiment analysis with sentiment-specific embeddings. In Proceedings of the 2017 IEEE International Conference on Big Data. IEEE, 43144320.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Badaro Gilbert, Baly Ramy, Akel Rana, Fayad Linda, Khairallah Jeffrey, Hajj Hazem, Shaban Khaled, and El-Hajj Wassim. 2015. A light lexicon-based mobile application for sentiment mining of Arabic tweets. In Proceedings of the 2nd Workshop on Arabic Natural Language Processing. 1825.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Badaro Gilbert, Baly Ramy, Hajj Hazem, El-Hajj Wassim, Shaban Khaled Bashir, Habash Nizar, Al-Sallab Ahmad, and Hamdi Ali. 2019. A survey of opinion mining in Arabic: A comprehensive system perspective covering challenges and advances in tools, resources, models, applications, and visualizations. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 3 (2019), 27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Badaro Gilbert, Baly Ramy, Hajj Hazem, Habash Nizar, and El-Hajj Wassim. 2014. A large scale Arabic sentiment lexicon for Arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing. 165173.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Badaro Gilbert, Jundi Obeida El, Khaddaj Alaa, Maarouf Alaa, Kain Raslan, Hajj Hazem, and El-Hajj Wassim. 2018. EMA at SemEval-2018 task 1: Emotion mining for Arabic. In Proceedings of the 12th International Workshop on Semantic Evaluation. 236244.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Badaro Gilbert, Hajj Hazem, and Habash Nizar. 2020. A link prediction approach for accurately mapping a large-scale Arabic lexical resource to English WordNet. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 6 (2020), 138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Baly Fady, Hajj Hazem, and Wissam Antoun. 2020. AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. 915.Google ScholarGoogle Scholar
  19. [19] Baly Ramy, Badaro Gilbert, El-Khoury Georges, Moukalled Rawan, Aoun Rita, Hajj Hazem, El-Hajj Wassim, Habash Nizar, and Shaban Khaled. 2017. A characterization study of Arabic Twitter data with a benchmarking for state-of-the-art opinion mining models. In Proceedings of the 3rd Arabic Natural Language Processing Workshop. 110118.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Baly Ramy, Badaro Gilbert, Hamdi Ali, Moukalled Rawan, Aoun Rita, El-Khoury Georges, Sallab Ahmad Al, Hajj Hazem, Habash Nizar, Shaban Khaled, and W. El-Hajj. 2017. OMAM at SemEval-2017 task 4: Evaluation of english state-of-the-art sentiment analysis models for Arabic and a new topic-based model. In Proceedings of the 11th International Workshop on Semantic Evaluation. 603610.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Baly Ramy, Hajj Hazem, Habash Nizar, Shaban Khaled Bashir, and El-Hajj Wassim. 2017. A sentiment treebank and morphologically enriched recursive deep models for effective sentiment analysis in Arabic. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4 (2017), 23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Beseiso Majdi and Elmousalami Haytham. 2020. Subword attentive model for Arabic sentiment analysis: A deep learning approach. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 2 (2020), 117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Boudad Naaima, Faizi Rdouan, Thami Rachid Oulad Haj, and Chiheb Raddouane. 2017. Sentiment analysis in Arabic: A review of the literature. Ain Shams Engineering Journal 9 (2017), 2479–2490. DOI: https://doi.org/10.1016/j.asej.2017.04.007Google ScholarGoogle Scholar
  24. [24] Breiman Leo. 2001. Random forests. Machine Learning 45, 1 (2001), 532.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Cho Kyunghyun, Van Merriënboer Bart, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8’14).Google ScholarGoogle Scholar
  26. [26] Davis Jesse and Goadrich Mark. 2006. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 233240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. 4171–4186.Google ScholarGoogle Scholar
  28. [28] Dohaiha Hai Ha, Prasad P. W. C., Maag Angelika, and Alsadoon Abeer. 2019. Deep learning for aspect-based sentiment analysis: A comparative review. Expert Systems with Applications 118 (2019), 272–299.Google ScholarGoogle Scholar
  29. [29] Einea Omar, Elnagar Ashraf, and Debsi Ridhwan Al. 2019. SANAD: Single-label Arabic news articles dataset for automatic text categorization. Data in Brief 25 (2019), 104076. DOI: https://doi.org/10.1016/j.dib.2019.104076Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] ElJundi Obeida, Antoun Wissam, Droubi Nour El, Hajj Hazem, El-Hajj Wassim, and Shaban Khaled. 2019. hULMonA: The universal language model in Arabic. In Proceedings of the 4th Arabic Natural Language Processing Workshop. 6877.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Elnagar Ashraf. 2016. Investigation on sentiment analysis for Arabic reviews. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications. IEEE, 17.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Elnagar Ashraf, Al-Debsi Ridhwan, and Einea Omar. 2020. Arabic text classification using deep learning models. Information Processing & Management 57, 1 (2020), 102121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Elnagar Ashraf and Einea Omar. 2016. BRAD 1.0: Book reviews in Arabic dataset. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications. 18.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Elnagar Ashraf, Khalifa Yasmin S., and Einea Anas. 2018. Hotel Arabic-reviews dataset construction for sentiment analysis applications. In Intelligent Natural Language Processing: Trends and Applications. K. Shaalan, A. Hassanien, F. Tolba (Eds.), Studies in Computational Intelligence, Vol. 740, Springer, 3552.Google ScholarGoogle Scholar
  35. [35] Elnagar Ashraf, Lulu Leena, and Einea Omar. 2018. An annotated huge dataset for standard and colloquial Arabic reviews for subjective sentiment analysis. Procedia Computer Science 142 (2018), 182189. DOI: https://doi.org/10.1016/j.procs.2018.10.474Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Elnagar Ashraf, Yagi Sane M., Nassif Ali Bou, Shahin Ismail, and Salloum Said A.. 2021. Systematic literature review of dialectal Arabic: Identification and detection. IEEE Access 9 (2021), 3101031042. DOI: 10.1109/ACCESS.2021.3059504Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Freund Yoav and Schapire Robert E.. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Freund Yoav and Schapire Robert E.. 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on International Conference on Machine Learning. Vol. 96, Citeseer, 148156.Google ScholarGoogle Scholar
  39. [39] Gehan Edmund A.. 1965. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52, 1–2 (1965), 203224.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Grave Edouard, Bojanowski Piotr, Gupta Prakhar, Joulin Armand, and Mikolov Tomas. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation.Google ScholarGoogle Scholar
  41. [41] Greff Klaus, Srivastava Rupesh K., Koutník Jan, Steunebrink Bas R., and Schmidhuber Jürgen. 2016. LSTM: A search space Odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2016), 22222232.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Hamdi Ali, Shaban Khaled, and Zainal Anazida. 2018. CLASENTI: A class-specific sentiment analysis framework. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 4 (2018), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Hochreiter Sepp and Schmidhuber Jurgen. 1996. Bridging long time lags by weight guessing and “Long Short-Term Memory”. Spatiotemporal Models in Biological and Artificial Systems 37, 65–72 (1996), 11.Google ScholarGoogle Scholar
  44. [44] Jozefowicz Rafal, Zaremba Wojciech, and Sutskever Ilya. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 23422350.Google ScholarGoogle Scholar
  45. [45] Khan Jawad, Alam Aftab, Hussain Jamil, and Lee Young-Koo. 2019. EnSWF: Effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification. Applied Intelligence 49, 8 (2019), 123.Google ScholarGoogle Scholar
  46. [46] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1746–1751.Google ScholarGoogle Scholar
  47. [47] LeCun Yann, Touresky D., Hinton G., and Sejnowski T.. 1988. A theoretical framework for back-propagation. In Proceedings of the 1988 Connectionist Models Summer School. Vol. 1, Morgan Kaufmann, 2128.Google ScholarGoogle Scholar
  48. [48] Li Lin, Goh Tiong-Thye, and Jin Dawei. 2020. How textual quality of online reviews affect classification performance: A case of deep learning sentiment analysis. Neural Computing and Applications 32, 9 (2020), 4387–4415.Google ScholarGoogle Scholar
  49. [49] Liaw Andy and Wiener Matthew. 2002. Classification and regression by randomForest. R news 2, 3 (2002), 1822.Google ScholarGoogle Scholar
  50. [50] Liu Weibo, Wang Zidong, Liu Xiaohui, Zeng Nianyin, Liu Yurong, and Alsaadi Fuad E.. 2017. A survey of deep neural network architectures and their applications. Neurocomputing 100, 234 (2017), 1126.Google ScholarGoogle Scholar
  51. [51] Lulu Leena and Elnagar Ashraf. 2018. Automatic Arabic dialect classification using deep learning models. Procedia Computer Science 142 (2018), 262269. DOI: https://doi.org/10.1016/j.procs.2018.10.489Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Madjarov Gjorgji, Kocev Dragi, Gjorgjevikj Dejan, and Džeroski Sašo. 2012. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition 45, 9 (2012), 30843104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).Google ScholarGoogle Scholar
  54. [54] Mohammad Saif, Bravo-Marquez Felipe, Salameh Mohammad, and Kiritchenko Svetlana. 2018. SemEval-2018 task 1: Affect in tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation. 117.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Nassif Ali Bou, Elnagar Ashraf, Shahin Ismail, and Henno Safaa. 2020. Deep learning for Arabic subjective sentiment analysis: Challenges and research opportunities. Applied Soft Computing 98 (2020), 106836. DOI: https://doi.org/10.1016/j.asoc.2020.106836Google ScholarGoogle Scholar
  56. [56] Ng Andrew. 2015. What Data Scientists Should Know about Deep Learning. Retrieved on 22 March, 2021 from https://www.slideshare.net/ExtractConf.Google ScholarGoogle Scholar
  57. [57] Pennington Jeffrey, Socher Richard, and Manning Christopher. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 15321543.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Rosenblatt F.. 1960. Perceptrons and the Theory of Brain Mechanisms: Cornell Aeronautical Laboratory. Technical Report. Report No. VG-1196-G-8.Google ScholarGoogle Scholar
  59. [59] Safavian S. Rasoul and Landgrebe David. 1991. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21, 3 (1991), 660674.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Schmidhuber Jürgen. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85117. DOI: https://doi.org/10.1016/j.neunet.2014.09.003Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Shoukry Amira and Rafea Ahmed. 2012. Sentence-level Arabic sentiment analysis. In Proceedings of the 2012 International Conference on Collaboration Technologies and Systems. IEEE, 546550.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Tubishat Mohammad, Abushariah Mohammad A. M., Idris Norisma, and Aljarah Ibrahim. 2019. Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Applied Intelligence 49, 5 (2019), 16881707.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar

Index Terms

  1. Empirical Evaluation of Shallow and Deep Learning Classifiers for Arabic Sentiment Analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 1
          January 2022
          442 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3494068
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 November 2021
          • Accepted: 1 May 2021
          • Revised: 1 March 2021
          • Received: 1 January 2020
          Published in tallip Volume 21, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format