Abstract
This work presents a detailed comparison of the performance of deep learning models such as convolutional neural networks, long short-term memory, gated recurrent units, their hybrids, and a selection of shallow learning classifiers for sentiment analysis of Arabic reviews. Additionally, the comparison includes state-of-the-art models such as the transformer architecture and the araBERT pre-trained model. The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews. Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature. This discrepancy in outcome was caused by dataset size as we found it to be proportional to the performance of deep learning models. The performance of deep and shallow learning techniques was analyzed in terms of accuracy and F1 score. The best performing shallow learning technique was Random Forest followed by Decision Tree, and AdaBoost. The deep learning models performed similarly using a default embedding layer, while the transformer model performed best when augmented with araBERT.
- [1] . 2019. A comprehensive survey of Arabic sentiment analysis. Information Processing & Management 56, 2 (2019), 320–342.Google ScholarCross Ref
- [2] . 2018. Deep learning for Arabic NLP: A survey. Journal of Computational Science 26 (2018), 522–531.
DOI: https://doi.org/10.1016/j.jocs.2017.11.011Google ScholarCross Ref - [3] . 2018. Emojis-based sentiment classification of Arabic microblogs using deep recurrent neural networks. In Proceedings of the 2018 International Conference on Computing Sciences and Engineering. IEEE, 1–6.Google ScholarCross Ref
- [4] . 2017. Hybrid deep learning for sentiment polarity determination of Arabic microblogs. In Proceedings of the International Conference on Neural Information Processing. Springer, 491–500.Google ScholarCross Ref
- [5] . 2017. Aroma: A recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4 (2017), 1–20.Google ScholarDigital Library
- [6] . 2015. Deep learning models for sentiment analysis in Arabic. In Proceedings of the 2nd Workshop on Arabic Natural Language Processing. 9–17.Google ScholarCross Ref
- [7] . 2018. Deep Recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. Journal of Computational Science 27 (2018), 386–393.
DOI: https://doi.org/10.1016/j.jocs.2017.11.006Google ScholarCross Ref - [8] . 2018. A combined CNN and LSTM model for Arabic sentiment analysis. In Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, 179–191.Google ScholarDigital Library
- [9] . 2020. Impact of stemming and word embedding on deep learning-based Arabic text categorization. IEEE Access 8 (2020), 127913–127928.
DOI: 10.1109/ACCESS.2020.3009217Google ScholarCross Ref - [10] . 2019. Sentiment analysis of Iraqi Arabic dialect on Facebook based on distributed representations of documents. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 3 (2019), 20.Google ScholarDigital Library
- [11] . 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 3 (1992), 175–185.Google ScholarCross Ref
- [12] . 2017. Improving Arabic sentiment analysis with sentiment-specific embeddings. In Proceedings of the 2017 IEEE International Conference on Big Data. IEEE, 4314–4320.Google ScholarCross Ref
- [13] . 2015. A light lexicon-based mobile application for sentiment mining of Arabic tweets. In Proceedings of the 2nd Workshop on Arabic Natural Language Processing. 18–25.Google ScholarCross Ref
- [14] . 2019. A survey of opinion mining in Arabic: A comprehensive system perspective covering challenges and advances in tools, resources, models, applications, and visualizations. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 3 (2019), 27.Google ScholarDigital Library
- [15] . 2014. A large scale Arabic sentiment lexicon for Arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing. 165–173.Google ScholarCross Ref
- [16] . 2018. EMA at SemEval-2018 task 1: Emotion mining for Arabic. In Proceedings of the 12th International Workshop on Semantic Evaluation. 236–244.Google ScholarCross Ref
- [17] . 2020. A link prediction approach for accurately mapping a large-scale Arabic lexical resource to English WordNet. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 6 (2020), 1–38.Google ScholarDigital Library
- [18] . 2020. AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection. 9–15.Google Scholar
- [19] . 2017. A characterization study of Arabic Twitter data with a benchmarking for state-of-the-art opinion mining models. In Proceedings of the 3rd Arabic Natural Language Processing Workshop. 110–118.Google ScholarCross Ref
- [20] . 2017. OMAM at SemEval-2017 task 4: Evaluation of english state-of-the-art sentiment analysis models for Arabic and a new topic-based model. In Proceedings of the 11th International Workshop on Semantic Evaluation. 603–610.Google ScholarCross Ref
- [21] . 2017. A sentiment treebank and morphologically enriched recursive deep models for effective sentiment analysis in Arabic. ACM Transactions on Asian and Low-Resource Language Information Processing 16, 4 (2017), 23.Google ScholarDigital Library
- [22] . 2020. Subword attentive model for Arabic sentiment analysis: A deep learning approach. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 2 (2020), 1–17.Google ScholarDigital Library
- [23] . 2017. Sentiment analysis in Arabic: A review of the literature. Ain Shams Engineering Journal 9 (2017), 2479–2490.
DOI: https://doi.org/10.1016/j.asej.2017.04.007Google Scholar - [24] . 2001. Random forests. Machine Learning 45, 1 (2001), 5–32.Google ScholarDigital Library
- [25] . 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8’14).Google Scholar
- [26] . 2006. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 233–240.Google ScholarDigital Library
- [27] . 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. 4171–4186.Google Scholar
- [28] . 2019. Deep learning for aspect-based sentiment analysis: A comparative review. Expert Systems with Applications 118 (2019), 272–299.Google Scholar
- [29] . 2019. SANAD: Single-label Arabic news articles dataset for automatic text categorization. Data in Brief 25 (2019), 104076.
DOI: https://doi.org/10.1016/j.dib.2019.104076Google ScholarCross Ref - [30] . 2019. hULMonA: The universal language model in Arabic. In Proceedings of the 4th Arabic Natural Language Processing Workshop. 68–77.Google ScholarCross Ref
- [31] . 2016. Investigation on sentiment analysis for Arabic reviews. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications. IEEE, 1–7.Google ScholarCross Ref
- [32] . 2020. Arabic text classification using deep learning models. Information Processing & Management 57, 1 (2020), 102–121.Google ScholarDigital Library
- [33] . 2016. BRAD 1.0: Book reviews in Arabic dataset. In Proceedings of the 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications. 1–8.Google ScholarCross Ref
- [34] . 2018. Hotel Arabic-reviews dataset construction for sentiment analysis applications. In Intelligent Natural Language Processing: Trends and Applications. K. Shaalan, A. Hassanien, F. Tolba (Eds.), Studies in Computational Intelligence, Vol. 740, Springer, 35–52.Google Scholar
- [35] . 2018. An annotated huge dataset for standard and colloquial Arabic reviews for subjective sentiment analysis. Procedia Computer Science 142 (2018), 182–189.
DOI: https://doi.org/10.1016/j.procs.2018.10.474Google ScholarDigital Library - [36] . 2021. Systematic literature review of dialectal Arabic: Identification and detection. IEEE Access 9 (2021), 31010–31042.
DOI: 10.1109/ACCESS.2021.3059504Google ScholarCross Ref - [37] . 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119–139.Google ScholarDigital Library
- [38] . 1996. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on International Conference on Machine Learning. Vol. 96, Citeseer, 148–156.Google Scholar
- [39] . 1965. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52, 1–2 (1965), 203–224.Google ScholarCross Ref
- [40] . 2018. Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation.Google Scholar
- [41] . 2016. LSTM: A search space Odyssey. IEEE Transactions on Neural Networks and Learning Systems 28, 10 (2016), 2222–2232.Google ScholarCross Ref
- [42] . 2018. CLASENTI: A class-specific sentiment analysis framework. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 4 (2018), 1–28.Google ScholarDigital Library
- [43] . 1996. Bridging long time lags by weight guessing and “Long Short-Term Memory”. Spatiotemporal Models in Biological and Artificial Systems 37, 65–72 (1996), 11.Google Scholar
- [44] . 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 2342–2350.Google Scholar
- [45] . 2019. EnSWF: Effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification. Applied Intelligence 49, 8 (2019), 1–23.Google Scholar
- [46] . 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). Association for Computational Linguistics, 1746–1751.Google Scholar
- [47] . 1988. A theoretical framework for back-propagation. In Proceedings of the 1988 Connectionist Models Summer School. Vol. 1, Morgan Kaufmann, 21–28.Google Scholar
- [48] . 2020. How textual quality of online reviews affect classification performance: A case of deep learning sentiment analysis. Neural Computing and Applications 32, 9 (2020), 4387–4415.Google Scholar
- [49] . 2002. Classification and regression by randomForest. R news 2, 3 (2002), 18–22.Google Scholar
- [50] . 2017. A survey of deep neural network architectures and their applications. Neurocomputing 100, 234 (2017), 11–26.Google Scholar
- [51] . 2018. Automatic Arabic dialect classification using deep learning models. Procedia Computer Science 142 (2018), 262–269.
DOI: https://doi.org/10.1016/j.procs.2018.10.489Google ScholarDigital Library - [52] . 2012. An extensive experimental comparison of methods for multi-label learning. Pattern Recognition 45, 9 (2012), 3084–3104.Google ScholarDigital Library
- [53] . 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR’13).Google Scholar
- [54] . 2018. SemEval-2018 task 1: Affect in tweets. In Proceedings of the 12th International Workshop on Semantic Evaluation. 1–17.Google ScholarCross Ref
- [55] . 2020. Deep learning for Arabic subjective sentiment analysis: Challenges and research opportunities. Applied Soft Computing 98 (2020), 106836.
DOI: https://doi.org/10.1016/j.asoc.2020.106836Google Scholar - [56] . 2015. What Data Scientists Should Know about Deep Learning. Retrieved on 22 March, 2021 from https://www.slideshare.net/ExtractConf.Google Scholar
- [57] . 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 1532–1543.Google ScholarCross Ref
- [58] . 1960. Perceptrons and the Theory of Brain Mechanisms: Cornell Aeronautical Laboratory.
Technical Report . Report No. VG-1196-G-8.Google Scholar - [59] . 1991. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21, 3 (1991), 660–674.Google ScholarCross Ref
- [60] . 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85–117.
DOI: https://doi.org/10.1016/j.neunet.2014.09.003Google ScholarDigital Library - [61] . 2012. Sentence-level Arabic sentiment analysis. In Proceedings of the 2012 International Conference on Collaboration Technologies and Systems. IEEE, 546–550.Google ScholarCross Ref
- [62] . 2019. Improved whale optimization algorithm for feature selection in Arabic sentiment analysis. Applied Intelligence 49, 5 (2019), 1688–1707.Google ScholarDigital Library
- [63] . 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 5998–6008.Google Scholar
Index Terms
- Empirical Evaluation of Shallow and Deep Learning Classifiers for Arabic Sentiment Analysis
Recommendations
Sentiment Analysis of Arabic Tweets using Deep Learning
AbstractSentiment analysis is the computational study of people’s opinions, attitudes and emotions toward entities, individuals, issues, events or topics. A lot of research has been done to improve the accuracy of sentiment analysis, varying from simple ...
Enhancing deep learning sentiment analysis with ensemble techniques in social applications
A taxonomy that classifies ensemble models in the literature is presented.Surface and deep features integration is explored to improve classification.Several ensembles of classifiers and features are proposed and evaluated.Performance of the proposed ...
Benchmarking of Shallow Learning and Deep Learning Techniques with Transfer Learning for Neurodegenerative Disease Assessment Through Handwriting
Document Analysis and Recognition – ICDAR 2021 WorkshopsAbstractNeurodegenerative diseases are incurable diseases where a timely diagnosis plays a key role. For this reason, various techniques of computer aided diagnosis (CAD) have been proposed. In particular handwriting is a well-established diagnosis ...
Comments