Introduction
Related work
Methods
Machine learning algorithms
Feature selection
Information gain
Chi square (CHI2)
Document frequency difference
Optimal orthogonal centroid (OCFS)
Query expansion ranking
Experiments and results
Datasets
Performance evaluation
Experimental settings
Turkish review datasets | English review datasets | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Features | NBM | SVM | J48 | LR | Features | NBM | SVM | J48 | LR | |
Movie | 18,578 | 0.8248 | 0.8161 | 0.6954 | – | 38,869 | 0.8129 | 0.8480 | 0.6769 | – |
DVDs | 11,343 | 0.7957 | 0.7320 | 0.6886 | – | 17,674 | 0.7836 | 0.7649 | 0.6789 | – |
Electronics | 10,911 | 0.8155 | 0.7707 | 0.7371 | – | 9010 | 0.7629 | 0.7856 | 0.6750 | – |
Book | 10,511 | 0.8317 | 0.7955 | 0.7019 | – | 18,306 | 0.7619 | 0.7485 | 0.6407 | – |
Kitchen | 9447 | 0.7762 | 0.7407 | 0.6647 | – | 8076 | 0.8099 | 0.8136 | 0.7093 | – |
Performance of feature selection methods for Turkish reviews
QER | DFD | OCFS | CHI2 | IG | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Size | F measure | Size | F measure | Size | F measure | Size | F measure | Size | F measure | |
Movie | 3000 | NBM:0.9112 | 3000 | NBM:0.8864 | 3000 | NBM:0.8447 | 1500 | NBM:0.8883 | 1500 | NBM:0.8883 |
DVDs | 1500 | NBM:0.9136 | 3000 | NBM:0.8650 | 3000 | NBM:0.8129 | 500 | NBM:0.8671 | 500 | NBM:0.8671 |
Electronics | 1500 | NBM:0.8996 | 1500 | NBM:0.8567 | 2000 | NBM:0.8337 | 1000 | NBM:0.8564 | 1500 | NBM:0.8551 |
Book | 1500 | NBM:0.9150 | 1500 | NBM:0.8771 | 3000 | NBM:0.8506 | 1000 | NBM:0.8864 | 1000 | NBM:0.8864 |
Kitchen | 1000 | NBM:0.8790 | 3000 | NBM:0.8314 | 3000 | NBM:0.8017 | 500 | SVM:0.8378 | 500 | SVM:0.8378 |
NBM | SVM | LR | J48 | |||||
---|---|---|---|---|---|---|---|---|
Size | F measure | Size | F measure | Size | F measure | Size | F measure | |
QER | 1500 | 0.8996 | 2000 | 0.8715 | 1000 | 0.7927 | 2000 | 0.6734 |
CHI2 | 1000 | 0.8564 | 1000 | 0.8505 | 500 | 0.7969 | 1000 | 0.7435 |
IG | 1500 | 0.8551 | 1000 | 0.8505 | 500 | 0.8156 | 1500 | 0.7428 |
DFD | 1500 | 0.8567 | 1500 | 0.8128 | 2500 | 0.7829 | 500 | 0.7399 |
OCFS | 2000 | 0.8337 | 1000 | 0.7729 | 3000 | 0.7643 | 1500 | 0.7371 |
Performance of feature selection methods for English reviews
QER | DFD | OCFS | CHI2 | IG | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Size | F measure | Size | F measure | Size | F measure | Size | F measure | Size | F measure | |
Movie | 3000 | LR:0.9550 | 2500 | SVM:0.8640 | 3000 | SVM: 0.8285 | 2500 | NBM:0.9150 | 2500 | NBM:0.9150 |
DVDs | 2500 | NBM:0.9169 | 3000 | NBM:0.8502 | 1000 | NBM:0.7996 | 1000 | NBM:0.8964 | 1000 | NBM:0.8964 |
Electronics | 2000 | NBM:0.8878 | 1500 | NBM:0.8221 | 2000 | SVM: 0.7821 | 1000 | NBM:0.8621 | 1000 | NBM:0.8621 |
Book | 3000 | NBM:0.9162 | 3000 | NBM:0.8628 | 3000 | NBM:0.7899 | 1000 | NBM:0.8879 | 1000 | NBM:0.8879 |
Kitchen | 2000 | NBM:0.9106 | 3000 | LR:0.8893 | 1500 | SVM: 0.8157 | 500 | NBM:0.8964 | 500 | NBM:0.8964 |
NBM | SVM | LR | J48 | |||||
---|---|---|---|---|---|---|---|---|
Size | F measure | Size | F measure | Size | F measure | Size | F measure | |
QER | 2500 | 0.9169 | 3000 | 0.8724 | 2000 | 0.8977 | 2000 | 0.5481 |
CHI2 | 1000 | 0.8964 | 500 | 0.8650 | 3000 | 0.6976 | 3000 | 0.6799 |
IG | 1000 | 0.8964 | 1000 | 0.8614 | 2000 | 0.6970 | 500 | 0.6769 |
DFD | 3000 | 0.8502 | 1000 | 0.8293 | 3000 | 0.7600 | 500 | 0.6771 |
OCFS | 1000 | 0.7996 | 1000 | 0.7714 | 500 | 0.6800 | 2000 | 0.6829 |
Comparison of our proposal with the previous studies
Paper | Dataset | Baseline accuracy (%) | Best accuracies observed (%) | Classifier |
---|---|---|---|---|
[4] | Movie | 78.7 | NB, SVM | |
[7] | Movie | 87.1 minimum cut | SVM | |
[8] | Movie Product | 79.9 74.3 | 85.7 CHI2; 86.9 DFD; 80.9 OCFS 73.7 CHI2; 75 DFD; 73.8 OCFS | MEM |
[9] | Movie Product | 84.2 80.9 Book; 78.9 DVD; 80.8 El | 91.8 92.5 Book; 91.5 DVD; 91.8 El mRMR with composite features | BNBM, SVM |
[23] | Product | 70.1 | 84.2% Kitc. semantic orientation | SVM |
[24] | Movie Product | 84.8 74.7 Book; 77.2 DVD; 80.8 El.; 83.3 Kitc | 87.7 81.8 Book; 83.8 DVD; 85.9 El.; 88.7 Kitc word relation based method | NB, SVM, MEM |
[25] | Movie | 84.1 | 92.7% Tabu search-enhanced Markov blanket model | NB, SVM, MEM |
Our study | Movie Product | 84.8 76.2 Book; 78.4 DVD; 78.6 Elect; 81.4 Kitc | 91.5 CHI2-IG; 87.1 DFD; 82.9 OCFS; 95.5 91.6 Book; 91.7 DVD; 88.8 Elect; 91.1 Kitc proposed QER | NBM, SVM, MEM, DT |