1 Introduction
-
We propose a meta-ensemble deep learning approach to improve the sentiment classification performance that combines three levels of meta-learners.
-
We extended the Arabic-Egyptian corpus (Mohammed and Kora 2019) by increasing it to 50k annotated tweets.
-
We train several baseline deep models using six public benchmark sentiment analysis datasets in different languages and dialects.
-
We conduct a wide range of experiments to study the effect of the meta-ensemble deep learning approach against single deep learning models.
-
We compare the effect of the generated predictions of meta-learners involved in the proposed approach to improve the performance.
2 Related work
Ensemble methods | Advantage | Disadvantage |
---|---|---|
Bagging | - Ease of implementation and adapts. | -High Bias |
- Reducing Variance (Avoids Overfitting). | -Computationally Expensive | |
- High performs on high-dimensional data. | -Loss of interpretability of the model | |
-Allowing weak learners to outperform strong learner | ||
-Robust against to noise or outliers data | ||
Boosting | -Reduces Variance. | -Slower to train |
-Reduces Bias. | - Computationally Expensive | |
-Handling of the missing data. | -More Overfitting | |
- Ease of interpretation of the model | -The difficulty of scaling sequential training | |
-Each classifier must correct the errors made by its predecessors | ||
Stacking | -A deeper understanding of the data. | -More Overfitting |
-More Accurate | - Time Complexity | |
-Less Variance | -The difficulty of interpreting the final model | |
-Less Bias | ||
-Used to ensemble a variety of strong learners |
Approach | Papers | Baseline classifiers | Ensemble method | Languages | Dataset |
---|---|---|---|---|---|
TEL | Wilson et al. (2006) | DT | Boosting | English | MPQA Corpus (Wiebe et al. 2005) |
Tsutsumi et al. (2007) | SVM, ME | Stacking | English | Movie Review (Chaovalit and Zhou 2005) | |
Li et al. (2010) | SVM, LR | Voting | English | Amazon.com. (Rushdi-Saleh et al. 2011) | |
Lu and Tsou (2010) | NB, ME, SVM | Stacking | Chinese | Reviews (Seki et al. 2008) | |
Xia et al. (2011) | NB, ME, SVM | Stacking | English | Movie Review (Chen et al. 2012) | |
Li et al. (2012) | SVM, KNN | Stacking | Chinese | Reviews (Seki et al. 2008) | |
Su et al. (2012) | ME, SVM | Voting, Stacking | Chinese | Reviews (Seki et al. 2008) | |
Rodriguez-Penagos et al. (2013) | SVM | Voting | English | SemEval (Dzikovska et al. 2013) | |
Clark and Wicentwoski (2013) | NB | Voting | English | SemEval (Nakov et al. 2016) | |
Fersini et al. (2014) | ME, SVM, NB | Voting,Bagging | English | Product Reviews Pang and Lee (2005) | |
Da Silva et al. (2014) | SVM, RF, LR | Voting | English | Tweets Saif et al. (2013) | |
Wang et al. (2014) | SVM, KNN, DT, ME, NB | Bagging,Boosting | English | Movie Reviews (Chaovalit and Zhou 2005) | |
Kanakaraj and Guddeti (2015) | NB, SVM | Bagging,Boosting | English | Movie Review (Chen et al. 2012) | |
Prusa et al. (2015) | KNN, SVM, LR | Bagging,Boosting | English | sentiment140 Corpus (Go et al. 2009) | |
Xia et al. (2016) | SVM, LR | Voting | English | Amazon.com. (Rushdi-Saleh et al. 2011) | |
Onan et al. (2016) | BLR, NB, LDA,LR, SVM | Stacking,AdaBoost, Bagging | English | Tweets (Whitehead and Yaeger 2009) | |
Fersini et al. (2016) | NB, DT, SVM | Voting | English | Movie Reviews (Chen et al. 2012) | |
Perikos and Hatzilygeroudis (2016) | NB, ME | Bagging | English | Posts (Cambria et al. 2013) | |
Araque et al. (2017) | NB, ME, SVM | Voting | English | Movie Reviews (Chen et al. 2012) | |
Oussous et al. (2018) | MNB, SVM, ME | Voting, Stacking | Moroccan | Tweets (Tratz et al. 2013 ) | |
Saleena (2018) | SVM, RF, NB, LR | Voting | English | ||
Sharma et al. (2018) | SVM | Bagging | English | Movie Reviews (Chen et al. 2012) | |
Pasupulety et al. (2019) | SVM, RF | Stacking | Indian | NSE (Kumar and Misra 2018) | |
Saeed et al. (2022) | SVM, NB, LR, DT, KNN | Voting, Stacking | Arabic | Corpus (Li et al. 2011) | |
EDL | Deriu et al. (2016) | CNN | Stacking | English | SemEval (Bethard et al. 2016) |
Xu et al. (2016) | CNN, LSTM | Voting | English | SemEval (Dzikovska et al. 2013) | |
Akhtyamova et al. (2017) | CNNs | Voting | English | Reviews (Karimi et al. 2015) | |
(Araque et al. 2017) | CNN, LSTM, GRU | Voting, Stacking | English | Movie reviews (Chen et al. 2012) | |
(Heikal et al. 2018) | CNN, LSTM | Voting | Arabic | ASTD (Nabil et al. 2015) | |
Haralabopoulos et al. (2020) | LSTM, GRU, CNN, RCNN, DNN | Voting, Stacking | English | ||
(Mohammadi and Shaverizade 2021) | CNN, LSTM, GRU, Bi_LSTM | Stacking | English | SemEval (Bethard et al. 2016) |
3 Proposed meta-ensemble deep learning approach
3.1 Description of the proposed Algorithm
4 Experiment results
4.1 Description of benchmark datasets
Dataset | Data types | Sentiment classes | Positive count | Negative count | Total count |
---|---|---|---|---|---|
Egyptian dialects, MSA | 2 | 25k | 25k | 50k | |
2-Saudi Arabia Tweets (Aljabri et al. 2021) | Dialects Tweets | 2 | 1002 | 673 | 1675 |
3-ASTD (Nabil et al. 2015) | Dialects Tweets | 4 | 797 | 1682 | 10,006 |
4-ArSenTD-LEV (Al-Laith and Shahbaz 2021) | Dialects Tweets | 5 | 835 | 1253 | 4,000 |
5-Movie Reviews (Koh et al. 2010) | English Reviews | 2 | 5331 | 5331 | 10,662 |
6-Twitter US Airline Sentiment (Rane and Kumar 2018) | English Tweets | 3 | 2310 | 8797 | 14,601 |
4.2 Baseline deep learning models
Models | Configuration value |
---|---|
GRU | GRU layer= 1 or 2 |
GRU size= 256 | |
LSTM | LSTM layer= 1 or 2 |
LSTM size= 256 | |
CNN | No. of filters= 32 |
Filters size= 16 | |
Vocab size= 10,000 |
Split dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Dataset | Baseline models | 1 (%) | 2 (%) | 3 (%) | 4 (%) | 5 (%) | 6 (%) | 7 (%) | 8 (%) | AVG models (%) |
GRU | 89.9 | 89.8 | 89.2 | 89.5 | 89.3 | 88.8 | 88.2 | 89 | 89.21 | |
LSTM | 89.7 | 89.8 | 89.8 | 89.1 | 89.2 | 89 | 89.1 | 89.4 | 89.38 | |
CNN | 87.64 | 85.04 | 84.78 | 85.65 | 87.40 | 86 | 85 | 86.2 | 85.96 | |
2-Aljabri et al. (2021) | GRU1 | 63.1 | 67.3 | 66.2 | 64.2 | 62.3 | 64.2 | 65.4 | 67.7 | 65.05 |
LSTM1 | 61.9 | 61.9 | 60 | 65 | 64.6 | 68.5 | 65 | 70.8 | 64.71 | |
GRU2 | 60.8 | 69.6 | 60.4 | 61.5 | 63.1 | 66.9 | 64.6 | 61.5 | 63.55 | |
LSTM2 | 60.4 | 65.4 | 65 | 66.2 | 66.5 | 70 | 62.3 | 67.3 | 65.38 | |
CNN | – | – | – | – | – | – | – | – | – | |
3-Nabil et al. (2015) | GRU1 | 73.1 | 66.2 | 68.5 | 72.8 | 74.1 | 72.3 | 72.8 | 67.9 | 70.86 |
LSTM1 | 72.1 | 75.9 | 69.5 | 74.1 | 71.5 | 69.2 | 69 | 71.5 | 71.6 | |
GRU2 | – | – | – | – | – | – | – | – | – | |
LSTM2 | – | – | – | – | – | – | – | – | – | |
CNN | 68.2 | 70.4 | 67 | 68 | 68.9 | 71 | 70.6 | 68.2 | 69.03 | |
4-Al-Laith and Shahbaz (2021) | GRU1 | 74.5 | 76.4 | 73.6 | 73.9 | 76.4 | 77.3 | 78.5 | 76.4 | 75.87 |
LSTM1 | 73.3 | 75.8 | 75.2 | 78.2 | 75.2 | 78.2 | 75.8 | 77.9 | 76.2 | |
GRU2 | – | – | – | – | – | – | – | – | – | |
LSTM2 | – | – | – | – | – | – | – | – | – | |
CNN | 70.5 | 76 | 65.5 | 75 | 71.3 | 77.3 | 75.5 | 66 | 72.13 | |
5-Koh et al. (2010) | GRU1 | 68.9 | 77.5 | 76.6 | 75.4 | 71.2 | 75.3 | 74.8 | 76.6 | 74.53 |
LSTM1 | 82.6 | 74.8 | 79.4 | 81.9 | 81.7 | 76.4 | 82.7 | 64.8 | 78.03 | |
GRU2 | 62.4 | 57.4 | 55.4 | 64.1 | 66.1 | 58.2 | 69 | 69.2 | 62.72 | |
LSTM2 | 71.9 | 67.9 | 62.4 | 68.4 | 54.8 | 66.8 | 65.9 | 74.9 | 66.62 | |
CNN | – | – | – | – | – | – | – | – | – | |
6-Rane and Kumar (2018) | GRU1 | 71.6 | 78.6 | 79.2 | 78.9 | 68.5 | 70.4 | 65.9 | 73.3 | 73.18 |
LSTM1 | 80.6 | 78.4 | 81.1 | 79.7 | 80.3 | 81.8 | 78.1 | 81.2 | 80.05 | |
GRU2 | 70.6 | 66.4 | 70.8 | 68.2 | 63.3 | 67.7 | 64.4 | 63.9 | 66.82 | |
LSTM2 | 73.2 | 66.3 | 72.4 | 69.7 | 71.3 | 70.8 | 70.9 | 73.1 | 70.96 | |
CNN | – | – | – | – | – | – | – | – | – |
4.3 Meta-ensemble classifiers
Dataset | Predictions | GB (%) | SVM (%) | NB (%) | LG (%) | RF (%) |
---|---|---|---|---|---|---|
Hard | 92 | 92.6 | 91.6 | 91.9 | 91.9 | |
Soft | 91.8 | 93.2 | 92.2 | 92.3 | 90 | |
2-Aljabri et al. (2021) | Hard | 69.3 | 69.9 | 67.4 | 69.2 | 68.4 |
Soft | 71.2 | 72.3 | 69.8 | 72.3 | 71.8 | |
3-Nabil et al. (2015) | Hard | 74.1 | 75.9 | 72.3 | 75.9 | 74.1 |
Soft | 76.2 | 77.1 | 73.6 | 77.6 | 75.8 | |
4-Al-Laith and Shahbaz (2021) | Hard | 79.5 | 80.4 | 76.2 | 80.3 | 79.6 |
Soft | 81.4 | 82.3 | 79.1 | 83.2 | 81.4 | |
5-Koh et al. (2010) | Hard | 80.5 | 80.9 | 79.3 | 80.5 | 80.5 |
Soft | 82.4 | 83.9 | 80.5 | 83.8 | 82.1 | |
6-Rane and Kumar (2018) | Hard | 82.1 | 82.9 | 80.3 | 81.8 | 82.2 |
Soft | 85.3 | 85.1 | 81.9 | 85.1 | 84.9 |
Benchmarks | AVG Baseline models | High AVG Baseline models | Meta-Ensemble |
---|---|---|---|
GRU= 89.52% | LSTM= 89.54% | SVM=93.2% (Soft) | |
LSTM= 89.54% | |||
CNN= 86.10% | |||
2-Aljabri et al. (2021) | GRU1= 65.05% | LSTM2= 65.38% | SVM=72.3% (Soft) |
LSTM1= 64.71% | |||
GRU2= 63.55% | |||
LSTM2= 65.38% | |||
3-Nabil et al. (2015) | GRU= 70.86% | LSTM= 71.6% | LG=77.6% (Soft) |
LSTM= 71.6% | |||
CNN= 69.03% | |||
4-Al-Laith and Shahbaz (2021) | GRU= 75.87% | LSTM= 76.2% | LG=83.2% (Soft) |
LSTM= 76.2% | |||
CNN= 72.13% | |||
5-Koh et al. (2010) | GRU1= 74.53% | LSTM1= 78.03% | SVM=83.9% (Soft) |
LSTM1= 78.03% | |||
GRU2= 62.72% | |||
LSTM2= 66.62% | |||
6-Rane and Kumar (2018) | GRU1= 73.18% | LSTM1=80.05% | GB=85.3% (Soft) |
LSTM1=80.05% | |||
GRU2= 66.82% | |||
LSTM2=70.96% |