Skip to main content

2015 | OriginalPaper | Buchkapitel

A Comparative Study of Feature Selection and Machine Learning Methods for Sentiment Classification on Movie Data Set

verfasst von : C. Selvi, Chakshu Ahuja, E. Sivasankar

Erschienen in: Intelligent Computing and Applications

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sentiment analysis has become a leading research domain with the advent of Web 2.0 where Web users express their opinions in user forums, blogs, discussion boards, and review sites. The online information is considered to be a valuable source for decision making, improving the quality of service, and helping the service providers to enhance their competitiveness. Since the processing of high-dimensional text data is not scalable, different feature selection mechanisms are being used to confine the study to only most informative features. These features are then used to train the classifier to improve the accuracy of sentiment-based classification. This paper explores six feature selection mechanisms (IG, GR, CHI, OneR, Relief-F, and SAE) with five different machine learning classifiers (SVM, NB, DT, K-NN, and ME) thereby providing Accuracy, on the movie review data set for each. Comparative results show that Naive Bayes (NB) outperforms other classifiers and works better for Gain Ratio (GR) and Significance Attribute Evaluation (SAE) feature selection method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang, G., Sun, J., Ma, J., Xu, K., Gu, J.: Sentiment classification: the contribution of ensemble learning. Decis. Support Syst. 57, 77–93 (2014)CrossRef Wang, G., Sun, J., Ma, J., Xu, K., Gu, J.: Sentiment classification: the contribution of ensemble learning. Decis. Support Syst. 57, 77–93 (2014)CrossRef
2.
Zurück zum Zitat Boiy, E., Hens, P., Deschacht, K., Moens, M.F.: Automatic sentiment analysis in on-line text. In: ELPUB (2007) Boiy, E., Hens, P., Deschacht, K., Moens, M.F.: Automatic sentiment analysis in on-line text. In: ELPUB (2007)
3.
Zurück zum Zitat Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: ACL-02 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, vol. 10 (2002) Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: ACL-02 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, vol. 10 (2002)
4.
Zurück zum Zitat Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 22(2), 110–125 (2006)CrossRefMathSciNet Kennedy, A., Inkpen, D.: Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 22(2), 110–125 (2006)CrossRefMathSciNet
5.
Zurück zum Zitat Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Syst. Appl. 38(7), 8696–8702 (2011)CrossRef Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Syst. Appl. 38(7), 8696–8702 (2011)CrossRef
6.
Zurück zum Zitat Tan, S., Zhang, J.: An empirical study of sentiment analysis for Chinese documents. Expert Syst. Appl. 34(4), 2622–2629 (2008)CrossRef Tan, S., Zhang, J.: An empirical study of sentiment analysis for Chinese documents. Expert Syst. Appl. 34(4), 2622–2629 (2008)CrossRef
7.
Zurück zum Zitat Annett, M., Kondrak, G.: A comparison of sentiment analysis techniques: polarizing movie blogs. In: Advances in Artificial Intelligence, pp. 25–35. Springer, Heidelberg (2008) Annett, M., Kondrak, G.: A comparison of sentiment analysis techniques: polarizing movie blogs. In: Advances in Artificial Intelligence, pp. 25–35. Springer, Heidelberg (2008)
8.
Zurück zum Zitat Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)CrossRef Xia, R., Zong, C., Li, S.: Ensemble of feature sets and classification algorithms for sentiment classification. Inf. Sci. 181(6), 1138–1152 (2011)CrossRef
9.
Zurück zum Zitat Moraes, R., Valiati, J.F., GaviãO Neto, W.P.: Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst. Appl. 40(2), 621–633 (2013)CrossRef Moraes, R., Valiati, J.F., GaviãO Neto, W.P.: Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst. Appl. 40(2), 621–633 (2013)CrossRef
10.
Zurück zum Zitat Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2004) Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2004)
11.
Zurück zum Zitat Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2005) Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (2005)
12.
Zurück zum Zitat Dasgupta, S., Ng, V.: Topic-wise, sentiment-wise, or otherwise?: identifying the hidden dimension for unsupervised text classification. Paper Presented at the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2. Association for Computational Linguistics (2009) Dasgupta, S., Ng, V.: Topic-wise, sentiment-wise, or otherwise?: identifying the hidden dimension for unsupervised text classification. Paper Presented at the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 2. Association for Computational Linguistics (2009)
13.
Zurück zum Zitat Paltoglou, G., Mike, T.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2010) Paltoglou, G., Mike, T.: A study of information retrieval weighting schemes for sentiment analysis. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2010)
14.
Zurück zum Zitat Sharma, A., Dey, S.: A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the Symposium on Research in Applied Computation, ACM (2012) Sharma, A., Dey, S.: A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the Symposium on Research in Applied Computation, ACM (2012)
15.
Zurück zum Zitat Samsudin, N., Puteh, M., Hamdan, A.R., Nazri, M.Z.A.: Immune based feature selection for opinion mining. In: World Congress on Engineering, vol. 3 (2013) Samsudin, N., Puteh, M., Hamdan, A.R., Nazri, M.Z.A.: Immune based feature selection for opinion mining. In: World Congress on Engineering, vol. 3 (2013)
16.
Zurück zum Zitat Ahmad, A., Dey, L.: A feature selection technique for classificatory analysis. Pattern Recogn. Lett. 26(1), 43–56 (2005)CrossRef Ahmad, A., Dey, L.: A feature selection technique for classificatory analysis. Pattern Recogn. Lett. 26(1), 43–56 (2005)CrossRef
17.
Zurück zum Zitat Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Southeast Asia Edition. Morgan Kaufmann, Los Altos (2006) Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Southeast Asia Edition. Morgan Kaufmann, Los Altos (2006)
18.
Zurück zum Zitat Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan Kaufmann, Los Altos (1993) Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan Kaufmann, Los Altos (1993)
19.
Zurück zum Zitat Morariu, D.I., Creţulescu, R.G., Breazu, M.: Feature selection in document classification Morariu, D.I., Creţulescu, R.G., Breazu, M.: Feature selection in document classification
20.
Zurück zum Zitat Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)CrossRefMATH Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1–2), 23–69 (2003)CrossRefMATH
Metadaten
Titel
A Comparative Study of Feature Selection and Machine Learning Methods for Sentiment Classification on Movie Data Set
verfasst von
C. Selvi
Chakshu Ahuja
E. Sivasankar
Copyright-Jahr
2015
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2268-2_39