research-article

Empirical Evaluation of Shallow and Deep Learning Classifiers for Arabic Sentiment Analysis

Authors:
Ali Bou Nassif

Computer Engineering Department, University of Sharjah, Sharjah, UAE

Computer Engineering Department, University of Sharjah, Sharjah, UAE

0000-0003-1570-0897
View Profile

,
Abdollah Masoud Darya

Electrical Engineering Department, University of Sharjah, Sharjah, UAE

Electrical Engineering Department, University of Sharjah, Sharjah, UAE

0000-0003-4263-2041
View Profile

,
Ashraf Elnagar

Computer Science Department, University of Sharjah, Sharjah, UAE

Computer Science Department, University of Sharjah, Sharjah, UAE
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21 Issue 1Article No.: 14pp 1–25https://doi.org/10.1145/3466171

Published:29 November 2021Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks, long short-term memory, gated recurrent units, their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning models. The performance of deep and shallow learning techniques was analyzed in terms of accuracy and F1 score. The best performing shallow learning technique was Random Forest followed by Decision Tree, and AdaBoost. The deep learning models performed similarly using a default embedding layer, while the transformer model performed best when augmented with araBERT.

REFERENCES

[1] Al-Ayyoub Mahmoud, Khamaiseh Abed Allah, Jararweh Yaser, and Al-Kabi Mohammed N.. 2019. A comprehensive survey of Arabic sentiment analysis. Information Processing & Management 56, 2 (2019), 320–342.Google ScholarCross Ref
[2] Al-Ayyoub Mahmoud, Nuseir Aya, Alsmearat Kholoud, Jararweh Yaser, and Gupta Brij. 2018. Deep learning for Arabic NLP: A survey. Journal of Computational Science 26 (2018), 522–531. DOI: https://doi.org/10.1016/j.jocs.2017.11.011Google ScholarCross Ref
[3] Al-Azani Sadam and El-Alfy El-Sayed. 2018. Emojis-based sentiment classification of Arabic microblogs using deep recurrent neural networks. In Proceedings of the 2018 International Conference on Computing Sciences and Engineering. IEEE, 1–6.Google ScholarCross Ref
[4] Al-Azani Sadam and El-Alfy El-Sayed M.. 2017. Hybrid deep learning for sentiment polarity determination of Arabic microblogs. In Proceedings of the International Conference on Neural Information Processing. Springer, 491–500.Google ScholarCross Ref
[5] Al-Sallab Ahmad, Baly Ramy, Hajj Hazem, Shaban Khaled Bashir, El-Hajj Wassim, and Badaro Gilbert. 2017. Aroma: A recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4 (2017), 1–20.Google ScholarDigital Library
[6] Sallab Ahmad Al, Hajj Hazem, Badaro Gilbert, Baly Ramy, Hajj Wassim El, and Shaban Khaled Bashir. 2015. Deep learning models for sentiment analysis in Arabic. In Proceedings of the 2nd Workshop on Arabic Natural Language Processing. 9–17.Google ScholarCross Ref
[7] Al-Smadi Mohammad, Qawasmeh Omar, Al-Ayyoub Mahmoud, Jararweh Yaser, and Gupta Brij. 2018. Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science 27 (2018), 386–393. DOI: https://doi.org/10.1016/j.jocs.2017.11.006Google ScholarCross Ref
[8] Alayba Abdulaziz M., Palade Vasile, England Matthew, and Iqbal Rahat. 2018. A combined CNN and LSTM model for Arabic sentiment analysis. In Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, 179–191.Google ScholarDigital Library
[9] Almuzaini Huda Abdulrahman and Azmi Aqil M.. 2020. Impact of stemming and word embedding on deep learning-based Arabic text categorization. IEEE Access 8 (2020), 127913–127928. DOI: 10.1109/ACCESS.2020.3009217Google ScholarCross Ref
[10] Alnawas Anwar and Arici Nursal. 2019. Sentiment analysis of Iraqi Arabic dialect on Facebook based on distributed representations of documents. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 3 (2019), 20.Google ScholarDigital Library
[11] Altman Naomi S.. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 3 (1992), 175–185.Google ScholarCross Ref
[12] Altowayan A. Aziz and Elnagar Ashraf. 2017. Improving Arabic sentiment analysis with sentiment-specific embeddings. In Proceedings of the 2017 IEEE International Conference on Big Data. IEEE, 4314–4320.Google ScholarCross Ref
[13] Badaro Gilbert, Baly Ramy, Akel Rana, Fayad Linda, Khairallah Jeffrey, Hajj Hazem, Shaban Khaled, and El-Hajj Wassim. 2015. A light lexicon-based mobile application for sentiment mining of Arabic tweets. In Proceedings of the 2nd Workshop on Arabic Natural Language Processing. 18–25.Google ScholarCross Ref
[14] Badaro Gilbert, Baly Ramy, Hajj Hazem, El-Hajj Wassim, Shaban Khaled Bashir, Habash Nizar, Al-Sallab Ahmad, and Hamdi Ali. 2019. A survey of opinion mining in Arabic: A comprehensive system perspective covering challenges and advances in tools, resources, models, applications, and visualizations. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 3 (2019), 27.Google ScholarDigital Library
[15] Badaro Gilbert, Baly Ramy, Hajj Hazem, Habash Nizar, and El-Hajj Wassim. 2014. A large scale Arabic sentiment lexicon for Arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing. 165–173.Google ScholarCross Ref
[16] Badaro Gilbert, Jundi Obeida El, Khaddaj Alaa, Maarouf Alaa, Kain Raslan, Hajj Hazem, and El-Hajj Wassim. 2018. EMA at SemEval-2018 task 1: Emotion mining for Arabic. In Proceedings of the 12th International Workshop on Semantic Evaluation. 236–244.Google ScholarCross Ref
[17] Badaro Gilbert, Hajj Hazem, and Habash Nizar. 2020. A link prediction approach for accurately mapping a large-scale Arabic lexical resource to English WordNet. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 6 (2020), 1–38.Google ScholarDigital Library
[18] Baly Fady, Hajj Hazem, and Wissam Antoun. 2020. AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. 9–15.Google Scholar
[19] Baly Ramy, Badaro Gilbert, El-Khoury Georges, Moukalled Rawan, Aoun Rita, Hajj Hazem, El-Hajj Wassim, Habash Nizar, and Shaban Khaled. 2017. A characterization study of Arabic Twitter data with a benchmarking for state-of-the-art opinion mining models. In Proceedings of the 3rd Arabic Natural Language Processing Workshop. 110–118.Google ScholarCross Ref
[20] Baly Ramy, Badaro Gilbert, Hamdi Ali, Moukalled Rawan, Aoun Rita, El-Khoury Georges, Sallab Ahmad Al, Hajj Hazem, Habash Nizar, Shaban Khaled, and W. El-Hajj. 2017. OMAM at SemEval-2017 task 4: Evaluation of english state-of-the-art sentiment analysis models for Arabic and a new topic-based model. In Proceedings of the 11th International Workshop on Semantic Evaluation. 603–610.Google ScholarCross Ref
[21] Baly Ramy, Hajj Hazem, Habash Nizar, Shaban Khaled Bashir, and El-Hajj Wassim. 2017. A sentiment treebank and morphologically enriched recursive deep models for effective sentiment analysis in Arabic. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4 (2017), 23.Google ScholarDigital Library
[22] Beseiso Majdi and Elmousalami Haytham. 2020. Subword attentive model for Arabic sentiment analysis: A deep learning approach. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 2 (2020), 1–17.Google ScholarDigital Library
[23] Boudad Naaima, Faizi Rdouan, Thami Rachid Oulad Haj, and Chiheb Raddouane. 2017. Sentiment analysis in Arabic: A review of the literature. Ain Shams Engineering Journal 9 (2017), 2479–2490. DOI: https://doi.org/10.1016/j.asej.2017.04.007Google Scholar
[24] Breiman Leo. 2001. Random forests. Machine Learning 45, 1 (2001), 5–32.Google ScholarDigital Library
[25] Cho Kyunghyun, Van Merriënboer Bart, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8’14).Google Scholar
[26] Davis Jesse and Goadrich Mark. 2006. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 233–240.Google ScholarDigital Library
[27] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. 4171–4186.Google Scholar
[28] Dohaiha Hai Ha, Prasad P. W. C., Maag Angelika, and Alsadoon Abeer. 2019. Deep learning for aspect-based sentiment analysis: A comparative review. Expert Systems with Applications 118 (2019), 272–299.Google Scholar
[29] Einea Omar, Elnagar Ashraf, and Debsi Ridhwan Al. 2019. SANAD: Single-label Arabic news articles dataset for automatic text categorization. Data in Brief 25 (2019), 104076. DOI: https://doi.org/10.1016/j.dib.2019.104076Google ScholarCross Ref
[30] ElJundi Obeida, Antoun Wissam, Droubi Nour El, Hajj Hazem, El-Hajj Wassim, and Shaban Khaled. 2019. hULMonA: The universal language model in Arabic. In Proceedings of the 4th Arabic Natural Language Processing Workshop. 68–77.Google ScholarCross Ref
[31] Elnagar Ashraf. 2016. Investigation on sentiment analysis for Arabic reviews. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications. IEEE, 1–7.Google ScholarCross Ref
[32] Elnagar Ashraf, Al-Debsi Ridhwan, and Einea Omar. 2020. Arabic text classification using deep learning models. Information Processing & Management 57, 1 (2020), 102–121.Google ScholarDigital Library
[33] Elnagar Ashraf and Einea Omar. 2016. BRAD 1.0: Book reviews in Arabic dataset. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications. 1–8.Google ScholarCross Ref
[34] Elnagar Ashraf, Khalifa Yasmin S., and Einea Anas. 2018. Hotel Arabic-reviews dataset construction for sentiment analysis applications. In Intelligent Natural Language Processing: Trends and Applications. K. Shaalan, A. Hassanien, F. Tolba (Eds.), Studies in Computational Intelligence, Vol. 740, Springer, 35–52.Google Scholar
[35] Elnagar Ashraf, Lulu Leena, and Einea Omar. 2018. An annotated huge dataset for standard and colloquial Arabic reviews for subjective sentiment analysis. Procedia Computer Science 142 (2018), 182–189. DOI: https://doi.org/10.1016/j.procs.2018.10.474Google ScholarDigital Library
[36] Elnagar Ashraf, Yagi Sane M., Nassif Ali Bou, Shahin Ismail, and Salloum Said A.. 2021. Systematic literature review of dialectal Arabic: Identification and detection. IEEE Access 9 (2021), 31010–31042. DOI: 10.1109/ACCESS.2021.3059504Google ScholarCross Ref
[37] Freund Yoav and Schapire Robert E.. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119–139.Google ScholarDigital Library
[38] Freund Yoav and Schapire Robert E.. 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on International Conference on Machine Learning. Vol. 96, Citeseer, 148–156.Google Scholar
[39] Gehan Edmund A.. 1965. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52, 1–2 (1965), 203–224.Google ScholarCross Ref
[40] Grave Edouard, Bojanowski Piotr, Gupta Prakhar, Joulin Armand, and Mikolov Tomas. 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation.Google Scholar
[41] Greff Klaus, Srivastava Rupesh K., Koutník Jan, Steunebrink Bas R., and Schmidhuber Jürgen. 2016. LSTM: A search space Odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2016), 2222–2232.Google ScholarCross Ref
[42] Hamdi Ali, Shaban Khaled, and Zainal Anazida. 2018. CLASENTI: A class-specific sentiment analysis framework. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 4 (2018), 1–28.Google ScholarDigital Library
[43] Hochreiter Sepp and Schmidhuber Jurgen. 1996. Bridging long time lags by weight guessing and “Long Short-Term Memory”. Spatiotemporal Models in Biological and Artificial Systems 37, 65–72 (1996), 11.Google Scholar
[44] Jozefowicz Rafal, Zaremba Wojciech, and Sutskever Ilya. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 2342–2350.Google Scholar
[45] Khan Jawad, Alam Aftab, Hussain Jamil, and Lee Young-Koo. 2019. EnSWF: Effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification. Applied Intelligence 49, 8 (2019), 1–23.Google Scholar
[46] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1746–1751.Google Scholar
[47] LeCun Yann, Touresky D., Hinton G., and Sejnowski T.. 1988. A theoretical framework for back-propagation. In Proceedings of the 1988 Connectionist Models Summer School. Vol. 1, Morgan Kaufmann, 21–28.Google Scholar
[48] Li Lin, Goh Tiong-Thye, and Jin Dawei. 2020. How textual quality of online reviews affect classification performance: A case of deep learning sentiment analysis. Neural Computing and Applications 32, 9 (2020), 4387–4415.Google Scholar
[49] Liaw Andy and Wiener Matthew. 2002. Classification and regression by randomForest. R news 2, 3 (2002), 18–22.Google Scholar
[50] Liu Weibo, Wang Zidong, Liu Xiaohui, Zeng Nianyin, Liu Yurong, and Alsaadi Fuad E.. 2017. A survey of deep neural network architectures and their applications. Neurocomputing 100, 234 (2017), 11–26.Google Scholar
[51] Lulu Leena and Elnagar Ashraf. 2018. Automatic Arabic dialect classification using deep learning models. Procedia Computer Science 142 (2018), 262–269. DOI: https://doi.org/10.1016/j.procs.2018.10.489Google ScholarDigital Library
[52] Madjarov Gjorgji, Kocev Dragi, Gjorgjevikj Dejan, and Džeroski Sašo. 2012. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition 45, 9 (2012), 3084–3104.Google ScholarDigital Library
[53] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).Google Scholar
[54] Mohammad Saif, Bravo-Marquez Felipe, Salameh Mohammad, and Kiritchenko Svetlana. 2018. SemEval-2018 task 1: Affect in tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation. 1–17.Google ScholarCross Ref
[55] Nassif Ali Bou, Elnagar Ashraf, Shahin Ismail, and Henno Safaa. 2020. Deep learning for Arabic subjective sentiment analysis: Challenges and research opportunities. Applied Soft Computing 98 (2020), 106836. DOI: https://doi.org/10.1016/j.asoc.2020.106836Google Scholar
[56] Ng Andrew. 2015. What Data Scientists Should Know about Deep Learning. Retrieved on 22 March, 2021 from https://www.slideshare.net/ExtractConf.Google Scholar
[57] Pennington Jeffrey, Socher Richard, and Manning Christopher. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532–1543.Google ScholarCross Ref
[58] Rosenblatt F.. 1960. Perceptrons and the Theory of Brain Mechanisms: Cornell Aeronautical Laboratory. Technical Report. Report No. VG-1196-G-8.Google Scholar
[59] Safavian S. Rasoul and Landgrebe David. 1991. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21, 3 (1991), 660–674.Google ScholarCross Ref
[60] Schmidhuber Jürgen. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85–117. DOI: https://doi.org/10.1016/j.neunet.2014.09.003Google ScholarDigital Library
[61] Shoukry Amira and Rafea Ahmed. 2012. Sentence-level Arabic sentiment analysis. In Proceedings of the 2012 International Conference on Collaboration Technologies and Systems. IEEE, 546–550.Google ScholarCross Ref
[62] Tubishat Mohammad, Abushariah Mohammad A. M., Idris Norisma, and Aljarah Ibrahim. 2019. Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Applied Intelligence 49, 5 (2019), 1688–1707.Google ScholarDigital Library
[63] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5998–6008.Google Scholar

Index Terms

Empirical Evaluation of Shallow and Deep Learning Classifiers for Arabic Sentiment Analysis
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Information systems
  1. Data management systems
    1. Information integration
      1. Data cleaning
  2. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis

Recommendations

Sentiment Analysis of Arabic Tweets using Deep Learning
Abstract
Sentiment analysis is the computational study of people’s opinions, attitudes and emotions toward entities, individuals, issues, events or topics. A lot of research has been done to improve the accuracy of sentiment analysis, varying from simple ...
Read More
Enhancing deep learning sentiment analysis with ensemble techniques in social applications

A taxonomy that classifies ensemble models in the literature is presented.Surface and deep features integration is explored to improve classification.Several ensembles of classifiers and features are proposed and evaluated.Performance of the proposed ...
Read More
Benchmarking of Shallow Learning and Deep Learning Techniques with Transfer Learning for Neurodegenerative Disease Assessment Through Handwriting
Document Analysis and Recognition – ICDAR 2021 Workshops
Abstract
Neurodegenerative diseases are incurable diseases where a timely diagnosis plays a key role. For this reason, various techniques of computer aided diagnosis (CAD) have been proposed. In particular handwriting is a well-established diagnosis ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 1
January 2022
442 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3494068
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 November 2021
- Accepted: 1 May 2021
- Revised: 1 March 2021
- Received: 1 January 2020
Published in tallip Volume 21, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
shallow learning
learning curve
embedding
misclassification
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 385
  Total Downloads
- Downloads (Last 12 months)90
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Empirical Evaluation of Shallow and Deep Learning Classifiers for Arabic Sentiment Analysis

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Sentiment Analysis of Arabic Tweets using Deep Learning

Enhancing deep learning sentiment analysis with ensemble techniques in social applications

Benchmarking of Shallow Learning and Deep Learning Techniques with Transfer Learning for Neurodegenerative Disease Assessment Through Handwriting

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Caption

Empirical Evaluation of Shallow and Deep Learning Classifiers for Arabic Sentiment Analysis

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Sentiment Analysis of Arabic Tweets using Deep Learning

Enhancing deep learning sentiment analysis with ensemble techniques in social applications

Benchmarking of Shallow Learning and Deep Learning Techniques with Transfer Learning for Neurodegenerative Disease Assessment Through Handwriting

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

HTML Format

Share this Publication link

Share on Social Media