1 Introduction
2 Sentiment classification approaches
2.1 Machine learning approach (MLA)
2.2 Semantic orientation approach (SOA)
2.3 Pros and cons of existing sentiment analysis method and performance of sentiment analysis
Approach | Pros | Cons | Note |
---|---|---|---|
SOA | - No need for training sets | - Uncertainty in long document classification | |
- Applicable without domain constraints | - Difficulties in processing words with multiple meaning | ||
- Easy to implement | |||
MLA | - Generally superior performance than SOA with training data | - Need for training data from same domain of test data | - Decision tree based classification is fast but may have problem for long document |
- SVM based classification has its strength in long document but may take longer time |
Library Name | Supporting Language | Supporting Algorithms | Note |
---|---|---|---|
NLTK (Natural Language Toolkit)a(Bird 2006) | Python | Naïve Bayes, Maximum Entropy | scikit-learn libraryb can be used to apply more MLA. |
CLiPSc (Smedt and Daelemans 2012) | Python | SOA | Supporting a part-of-speech tagging and including SentiWordNet |
Stanford NLP libraryd (Manning et al. 2014) | Java | Most of MLA and SOA with SentiWordNet | Supporting a part-of-speech tagging |
Wekae (Hall et al. 2009) | Java | MLA | Weka only supports the ML classifier. For the sentiment classification, another text processing library is needed. |
tm libraryf (Feinerer 2015) | R | SOA | Other ML libraries are needed for MLA |
sentimentalizerg
| Ruby | SOA |
Research | Classification Algorithms | Data Properties | Datasets used |
---|---|---|---|
Barbosa and Feng (2010) | SVM | subjectivity | Twitter Dataset |
Pak and Paroubek (2010) | Naïve Bayes and SVM | subjectivity | Twitter Dataset |
Aue and Gamon (2005) | Naïve Bayes | n/a | Car Review Dataset |
Ranade et al. (2013) | SOA | Document length | Online Debate Dataset |
Pang et al. (2002) | Naïve Bayes, SVM, and Maximum Entropy | n/a | Movie Review |
Moraes et al. (2013) | SVM and Neural Network | Document length | Movie Review Data |
This study | SAO and MLA (Multinomial Naïve Bayes, SVM, and Decision Tree) | Training size, document length, and subjectivity | Movie Review, Twitter, Hotel Review, and Amazon product Review Dataset |
Data properties | Description | Algorithm | References |
---|---|---|---|
Document length/ Words count | The quantity of information depends on the length, or words count and information about author’s sentiment can affect the quality of training as well as classification accuracy of test datasets. | MLA, SOA | |
Document subjectivity | Subjective words can be the critical cues for sentiment polarity determination. | MLA, SOA | |
Training size | In the case of ML-based sentiment classification, the training size has a significant influence on the classification performance. | MLA |
3 Method
3.1 Selection of data properties for comparison
“This movie has a fantastic scale and a perfect location for a fantasy movie. But there’s no theme so I can’t understand what the director want to say. The plot is also awful. I do not want to recommend this movie to my friends.”
3.2 Data
Dataset | Size (positive/negative) | Domain/Context | Language |
---|---|---|---|
IMDB Dataset | 10,000/10,000 | Movie Review | English |
Twitter Dataset | 4000/4000 | Social Data | English |
Hotel Review Datasets | 8000/8000 | E-commerce/ Service | English |
Amazon Review Datasets | 6000/6000 | Product Review (small electronics) | English |
4 Experiment results
4.1 The sensitivity of SOA on data properties
Word-count in Document | Number of Document | Accuracy | Accuracy Difference |
---|---|---|---|
IMDB Dataset | |||
0 ~ 100 | 2557 | 0.7560 | |
101 ~ 200 | 9474 | 0.7052 | -0.0508 |
201 ~ 300 | 3669 | 0.6721 | -0.0331 |
301 ~ 400 | 1812 | 0.6440 | -0.0281 |
401 ~ 500 | 996 | 0.6285 | -0.0155 |
500~ | 1492 | 0.6635 | 0.0350 |
Amazon Review Dataset | |||
0 ~ 100 | 7637 | 0.6816 | |
101 ~ 200 | 4577 | 0.6552 | -0.0263 |
201 ~ 300 | 2164 | 0.5924 | -0.0628 |
301 ~ 400 | 825 | 0.5200 | -0.0724 |
401 ~ 500 | 394 | 0.4797 | -0.0403 |
500~ | 163 | 0.4724 | -0.0073 |
Hotel Review Dataset | |||
0~100 | 5112 | 0.62.03 | |
101~200 | 3774 | 0.58.29 | -0.374 |
201~300 | 1475 | 0.61.83 | 0.0354 |
301~400 | 733 | 0.61.94 | 0.0011 |
401~500 | 353 | 0.72.24 | 0.1030 |
501~ | 134 | 0.73.13 | 0.0090 |
Twitter Dataset | |||
0~100 | 8000 | 0.7943 |
Subjectivity | Number of Document | Accuracy | Accuracy Difference |
---|---|---|---|
IMDB Dataset (Average subjectivity of dataset =0.531) | |||
0 ~ 0.5 | 7243 | 0.6520 | |
0.5 ~ 0.7 | 12,026 | 0.7112 | 0.0592 |
0.7 ~ 1.0 | 731 | 0.8114 | 0.1002 |
Overall | 20,000 | 0.6931 | |
Hotel Review Dataset (Average subjectivity of dataset =0.544) | |||
0 ~ 0.5 | 4591 | 0.4907 | |
0.5 ~ 0.7 | 10,087 | 0.6851 | 0.1943 |
0.7 ~ 1.0 | 1322 | 0.8434 | 0.1582 |
Overall | 16,000 | 0.6424 | |
Twitter Dataset (Average subjectivity of dataset =0.598) | |||
0~0.5 | 2693 | 0.6082 | |
0.5~0.7 | 1628 | 0.8428 | 0.2345 |
0.7~1.0 | 3679 | 0.9092 | 0.0664 |
Overall | 8000 | 0.7943 | |
Amazon Review Dataset (Average subjectivity of dataset =0.519) | |||
0~0.5 | 4980 | 0.5149 | |
0.5~0.7 | 6116 | 0.6848 | 0.1699 |
0.7~1.0 | 904 | 0.6925 | 0.0077 |
Overall | 12,000 | 0.6148 |
4.2 The sensitivity of MLA on data properties
4.2.1 Training size and document length
IMDB Dataset (Training size 500, Test size 10,000)a
| Hotel Review Dataset (Training size 500, Test size 10,000) | Amazon Review Dataset (Training size 500, Test size 10,000) | |||||||
---|---|---|---|---|---|---|---|---|---|
Word-count | DT | M- NB | SVM | DT | M- NB | SVM | DT | M- NB | SVM |
0–50 | 0.6484 | 0.7919 | 0.6363 | 0.6288 | 0.8113 | 0.7343 | |||
50–100 | 0.6611 | 0.8189 | 0.7151 | 0.6874 | 0.8459 | 0.767 | 0.6408 | 0.8206 | 0.8006 |
100–150 | 0.6576 | 0.8168 | 0.7332 | 0.6797 | 0.8471 | 0.7999 | 0.6381 | 0.8143 | 0.7964 |
150–200 | 0.6429 | 0.7959 | 0.7331 | 0.6696 | 0.8389 | 0.7941 | 0.6406 | 0.8172 | 0.777 |
200–250 | 0.6354 | 0.7557 | 0.7334 | 0.6668 | 0.8308 | 0.796 | 0.6202 | 0.8141 | 0.7845 |
250–300 | 0.6392 | 0.7347 | 0.7141 | 0.6617 | 0.839 | 0.7914 |
4.2.2 Training size and subjectivity
4.2.3 Document length and subjectivity
4.3 General performance comparison - MLA and SOA
word-count of documents in test dataset | 0–0.5 | 0.5–1.0 | |
---|---|---|---|
IMDB Dataset | |||
0–100 | 4.21% | 8.38% | 12.59% |
100–200 | 16.68% | 30.58% | 47.26% |
200–300 | 6.90% | 11.63% | 18.53% |
300–400 | 3.56% | 5.57% | 9.13% |
400–500 | 1.92% | 3.09% | 5.01% |
500- | 3.29% | 4.22% | 7.51% |
36.55% | 63.45% | 100.00% | |
Amazon Review Dataset | |||
0–100 | 16.66% | 25.60% | 42.26% |
100–200 | 13.14% | 18.51% | 31.65% |
200–300 | 5.26% | 7.07% | 12.33% |
300–400 | 2.57% | 3.58% | 6.15% |
400–500 | 1.14% | 1.83% | 2.98% |
500- | 1.93% | 2.71% | 4.64% |
40.70% | 59.30% | 100.00% | |
Hotel Review Dataset | |||
0–100 | 7.89% | 39.54% | 47.43% |
100–200 | 9.54% | 19.19% | 28.73% |
200–300 | 5.57% | 8.02% | 13.59% |
300–400 | 2.64% | 2.58% | 5.23% |
400–500 | 1.26% | 1.24% | 2.49% |
500- | 1.36% | 1.18% | 2.53% |
28.26% | 71.74% | 100.00% | |
Twitter Dataset | |||
0–100 | 31.65% | 68.35% | 100.00% |
SOA | Multinomial NB | |||||
---|---|---|---|---|---|---|
word-count of documents in test dataset | Subjectivity | Overall | Subjectivity | Overall | ||
0–0.5 | 0.5–1.0 | 0–0.5 | 0.5–1.0 | |||
a. IMDB Dataset | ||||||
0–100 | 0.7102 | 0.7875 | 0.7616 | 0.7755 | 0.8442 | 0.8212 |
100–200 | 0.6676 | 0.7295 | 0.7076 | 0.7851 | 0.8250 | 0.8109 |
200–300 | 0.6345 | 0.6896 | 0.6691 | 0.7846 | 0.8220 | 0.8081 |
300–400 | 0.5997 | 0.6801 | 0.6488 | 0.7851 | 0.8005 | 0.7945 |
400–500 | 0.5651 | 0.6629 | 0.6254 | 0.7786 | 0.8201 | 0.8042 |
500- | 0.6484 | 0.6730 | 0.6622 | 0.7793 | 0.8140 | 0.7988 |
Overall | 0.6525 | 0.7185 | 0.6944 | 0.7830 | 0.8239 | 0.8090 |
SVM | Decision Tree | |||||
0–0.5 | 0.5–1.0 | Overall | 0–0.5 | 0.5–1.0 | Overall | |
0–100 | 0.7411 | 0.7916 | 0.7747 | 0.6283 | 0.6985 | 0.6750 |
100–200 | 0.7452 | 0.7781 | 0.7665 | 0.6250 | 0.6567 | 0.6455 |
200–300 | 0.7114 | 0.7623 | 0.7433 | 0.6273 | 0.6612 | 0.6486 |
300–400 | 0.6742 | 0.7161 | 0.6997 | 0.6559 | 0.6595 | 0.6581 |
400–500 | 0.6328 | 0.7099 | 0.6803 | 0.6354 | 0.6434 | 0.6404 |
500- | 0.6941 | 0.7275 | 0.7129 | 0.6575 | 0.6825 | 0.6716 |
Overall | 0.7209 | 0.7649 | 0.7488 | 0.6323 | 0.6644 | 0.6527 |
b. Hotel Review Dataset | ||||||
0–100 | 0.7102 | 0.7875 | 0.7616 | 0.7755 | 0.8442 | 0.8212 |
100–200 | 0.6676 | 0.7295 | 0.7076 | 0.7851 | 0.8250 | 0.8109 |
200–300 | 0.6345 | 0.6896 | 0.6691 | 0.7846 | 0.8220 | 0.8081 |
300–400 | 0.5997 | 0.6801 | 0.6488 | 0.7851 | 0.8005 | 0.7945 |
400–500 | 0.5651 | 0.6629 | 0.6254 | 0.7786 | 0.8201 | 0.8042 |
500- | 0.6484 | 0.6730 | 0.6622 | 0.7793 | 0.8140 | 0.7988 |
Overall | 0.6525 | 0.7185 | 0.6944 | 0.7830 | 0.8239 | 0.8090 |
SVM | Decision Tree | |||||
0–0.5 | 0.5–1.0 | Overall | 0–0.5 | 0.5–1.0 | Overall | |
0–100 | 0.7411 | 0.7916 | 0.7747 | 0.6283 | 0.6985 | 0.6750 |
100–200 | 0.7452 | 0.7781 | 0.7665 | 0.6250 | 0.6567 | 0.6455 |
200–300 | 0.7114 | 0.7623 | 0.7433 | 0.6273 | 0.6612 | 0.6486 |
300–400 | 0.6742 | 0.7161 | 0.6997 | 0.6559 | 0.6595 | 0.6581 |
400–500 | 0.6328 | 0.7099 | 0.6803 | 0.6354 | 0.6434 | 0.6404 |
500- | 0.6941 | 0.7275 | 0.7129 | 0.6575 | 0.6825 | 0.6716 |
Overall | 0.7209 | 0.7649 | 0.7488 | 0.6323 | 0.6644 | 0.6527 |
c. Amazon Review Dataset | ||||||
0–100 | 0.5633 | 0.7171 | 0.6565 | 0.8049 | 0.8298 | 0.8200 |
100–200 | 0.4959 | 0.6479 | 0.5848 | 0.7907 | 0.7870 | 0.7886 |
200–300 | 0.5547 | 0.6639 | 0.6173 | 0.7781 | 0.7842 | 0.7816 |
300–400 | 0.5422 | 0.6721 | 0.6179 | 0.7532 | 0.7814 | 0.7696 |
400–500 | 0.6788 | 0.7455 | 0.7199 | 0.7664 | 0.8318 | 0.8067 |
500- | 0.6509 | 0.7292 | 0.6966 | 0.8017 | 0.7723 | 0.7846 |
Overall | 0.5465 | 0.6879 | 0.6303 | 0.7924 | 0.8055 | 0.8002 |
SVM | Decision Tree | |||||
0–0.5 | 0.5–1.0 | Overall | 0–0.5 | 0.5–1.0 | Overall | |
0–100 | 0.7844 | 0.7969 | 0.7920 | 0.6763 | 0.6621 | 0.6677 |
100–200 | 0.7806 | 0.7861 | 0.7838 | 0.6088 | 0.6538 | 0.6351 |
200–300 | 0.7353 | 0.7559 | 0.7471 | 0.6482 | 0.6380 | 0.6423 |
300–400 | 0.6883 | 0.7488 | 0.7236 | 0.5649 | 0.6605 | 0.6206 |
400–500 | 0.7372 | 0.7864 | 0.7675 | 0.7007 | 0.6455 | 0.6667 |
500- | 0.7414 | 0.7415 | 0.7415 | 0.7026 | 0.7662 | 0.7397 |
Overall | 0.7674 | 0.7829 | 0.7766 | 0.6458 | 0.6608 | 0.6547 |
SOA | Multinomial NB | |||||
d. Twitter Dataset | ||||||
0–0.5 | 0.5–1.0 | Overall | 0–0.5 | 0.5–1.0 | Overall | |
0.8693 | 0.8950 | 0.8869 | 0.8938 | 0.9305 | 0.9189 | |
SVM | Decision Tree | |||||
0–0.5 | 0.5–1.0 | Overall | 0–0.5 | 0.5–1.0 | Overall | |
0.8013 | 0.8270 | 0.8156 | 0.9273 | 0.9656 | 0.9535 |
Decision Tree | Multinomial Naïve Bayes | SVM | SOA | ||||
---|---|---|---|---|---|---|---|
Training | Test | Training | Test | Training | Test | ||
IMDB Dataset | 1.7175 | 11.5816 | 0.5444 | 10.4597 | 2.8626 | 53.0191 | 96.1212 |
Hotel Review Dataset | 1.1233 | 8.7047 | 0.3222 | 7.8711 | 1.5191 | 26.5369 | 54.3441 |
Twitter Dataset | 0.1986 | 0.2937 | 0.0235 | 0.2992 | 0.046 | 0.81 | 5.3138 |
Amazon Review Dataset | 1.1278 | 6.7353 | 0.6374 | 4.9226 | 1.3784 | 17.4996 | 52.4596 |
4.4 Summary of experiment results
Approach | Data properties | Findings |
---|---|---|
SOA | Words count | • works better for shorter documents (less than 200 words in document) than longer documents (more than 200 words in document) |
Subjectivity | • works better for documents with high average subjectivity (higher than 0.7) than documents with low average subjectivity (lower than 0.7) | |
MLA | Training size, document length | • More than 2% of test dataset size is required. |
• Ideal document length for training dataset is 50 ~ 150 words for all datasets and all MLA algorithms | ||
Training size, subjectivity | • Larger size of training dataset is required if the average subjectivity of document in training datasets is lower than 0.5 | |
• Documents with higher subjectivity (0.5 ~ 1.0) are more suitable for training data than document with lower subjectivity (0 ~ 0.5) | ||
Document length, subjectivity | • Documents with higher average subjectivity (0.5 ~ 1.0) and 100 ~ 250 words counts can be best for training dataset. | |
General Performance | • In general, Multinomial Naïve Bayes and SVM outperform SOA | |
• SOA works as good as MLA for very short document (0 ~ 100) with higher average subjectivity (0.5 ~ 1.0). | ||
• For document with higher average subjectivity (0.5 ~ 1.0) SOA outperforms Decision Tree. | ||
• Decision Tree fails to show better performance than other MLA such as Multinomial Naïve Bayes and SVM. |