Skip to main content
Top

2016 | OriginalPaper | Chapter

Performance Evaluation of Sentiment Classification Using Query Strategies in a Pool Based Active Learning Scenario

Authors : K. Lakshmi Devi, P. Subathra, P. N. Kumar

Published in: Computational Intelligence, Cyber Security and Computational Models

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In order to perform Sentiment Classification in scenarios where there is availability of huge amounts of unlabelled data (as in Tweets and other big data applications), human annotators are required to label the data, which is very expensive and time consuming. This aspect is resolved by adopting the Active Learning approach to create labelled data from the available unlabelled data by actively choosing the most appropriate or most informative instances in a greedy manner, and then submitting to human annotator for annotation. Active learning (AL) thus reduces the time, cost and effort to label huge amount of unlabelled data. The AL provides improved performance over passive learning by reducing the amount of data to be used for learning; producing higher quality labelled data; reducing the running time of the classification process; and improving the predictive accuracy. Different Query Strategies have been proposed for choosing the most informative instances out of the unlabelled data. In this work, we have performed a comparative performance evaluation of Sentiment Classification in a Pool based Active Learning scenario adopting the query strategies—Entropy Sampling Query Strategy in Uncertainty Sampling, Kullback-Leibler divergence and Vote Entropy in Query By Committee using the evaluation metrics Accuracy, Weighted Precision, Weighted Recall, Weighted F-measure, Root Mean Square Error, Weighted True Positive Rate and Weighted False Positive Rate. We have also calculated different time measures in an Active Learning process viz. Accumulative Iteration time, Iteration time, Training time, Instances selection time and Test time. The empirical results reveal that Uncertainty Sampling query strategy showed better overall performance than Query By Committee in the Sentiment Classification of movie reviews dataset.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language processing-Volume 10. Association for Computational Linguistics, (2002) Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language processing-Volume 10. Association for Computational Linguistics, (2002)
2.
go back to reference Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002 Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002
3.
go back to reference Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2):1–135 (2008) Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2):1–135 (2008)
4.
go back to reference Maks, Isa, Vossen, Piek: A lexicon model for deep sentiment analysis and opinion mining applications. Decis. Support Syst. 53(4), 680–688 (2012)CrossRef Maks, Isa, Vossen, Piek: A lexicon model for deep sentiment analysis and opinion mining applications. Decis. Support Syst. 53(4), 680–688 (2012)CrossRef
5.
go back to reference Xu, Tao, Peng, Qinke, Cheng, Yinzhao: Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowl.-Based Syst. 35, 279–289 (2012)CrossRef Xu, Tao, Peng, Qinke, Cheng, Yinzhao: Identifying the semantic orientation of terms using S-HAL for sentiment analysis. Knowl.-Based Syst. 35, 279–289 (2012)CrossRef
6.
go back to reference Olsson, F.: A literature survey of active machine learning in the context of natural language processing. (2009) Olsson, F.: A literature survey of active machine learning in the context of natural language processing. (2009)
7.
go back to reference Rothfels, J., Tibshirani, J.: Unsupervised sentiment classification of English movie reviews using automatic selection of positive and negative sentiment items. CS224N-Final Project (2010) Rothfels, J., Tibshirani, J.: Unsupervised sentiment classification of English movie reviews using automatic selection of positive and negative sentiment items. CS224N-Final Project (2010)
8.
go back to reference Fersini, E., Messina, E., Pozzi, F.A.: Sentiment analysis: Bayesian Ensemble Learning. Decis. Support Syst. 68, 26–38 (2014)CrossRef Fersini, E., Messina, E., Pozzi, F.A.: Sentiment analysis: Bayesian Ensemble Learning. Decis. Support Syst. 68, 26–38 (2014)CrossRef
9.
go back to reference Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)CrossRef Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)CrossRef
10.
go back to reference Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988) Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)
11.
go back to reference Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66):11 (2010) Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66):11 (2010)
12.
go back to reference Dagan, Engelson, S.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 150–157. Morgan Kaufmann (1995) (Cited on page(s) 8, 28) Dagan, Engelson, S.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 150–157. Morgan Kaufmann (1995) (Cited on page(s) 8, 28)
13.
go back to reference Atlas, L.C., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994). doi:10.1007/BF00993277. Cited on page(s) 7, 8, 24, 58, 62 Atlas, L.C., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994). doi:10.​1007/​BF00993277. Cited on page(s) 7, 8, 24, 58, 62
15.
go back to reference Lee, M.S., Rhee, J.K., Kim, B.H., Zhang, B.T.: AESNB: Active example selection with naive Bayes classifier for learning from imbalanced biomedical data. In: Proceedings of the 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering, 38: 15–21 (2009) Lee, M.S., Rhee, J.K., Kim, B.H., Zhang, B.T.: AESNB: Active example selection with naive Bayes classifier for learning from imbalanced biomedical data. In: Proceedings of the 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering, 38: 15–21 (2009)
16.
go back to reference Kang, J., Ryu, K., Kwon, H.: Using cluster-based sampling to select initial training set for active learning in text classification. Adv. Knowl. Discov. Data Min. 3056: 384–388. 39, 96, 135, 139, 200 (2004) Kang, J., Ryu, K., Kwon, H.: Using cluster-based sampling to select initial training set for active learning in text classification. Adv. Knowl. Discov. Data Min. 3056: 384–388. 39, 96, 135, 139, 200 (2004)
17.
go back to reference Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proceedings of the 21st International Conference on Machine Learning, 623–630, 39, 52, 118 (2004) Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proceedings of the 21st International Conference on Machine Learning, 623–630, 39, 52, 118 (2004)
18.
go back to reference Dasgupta, S., Ng, V.: Mine the easy, classify the hard: A semisupervised approach to automatic sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language, 2(701–709), Suntec, Singapore. 39:45 (2009) Dasgupta, S., Ng, V.: Mine the easy, classify the hard: A semisupervised approach to automatic sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language, 2(701–709), Suntec, Singapore. 39:45 (2009)
19.
go back to reference Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In Proceedings of the 17th annual International ACM SIGIR conference on Research and Development in Information Retrieval, Springer, New York. 36(3–12): 40 (1994) Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In Proceedings of the 17th annual International ACM SIGIR conference on Research and Development in Information Retrieval, Springer, New York. 36(3–12): 40 (1994)
20.
go back to reference Zhu, J., et al.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2008) Zhu, J., et al.: Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics (2008)
21.
go back to reference Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of 1995 International Conference on Machine Learning, 42:150–157 (1995) Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of 1995 International Conference on Machine Learning, 42:150–157 (1995)
22.
go back to reference Kullback, Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22:79–86 (1951) Kullback, Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22:79–86 (1951)
Metadata
Title
Performance Evaluation of Sentiment Classification Using Query Strategies in a Pool Based Active Learning Scenario
Authors
K. Lakshmi Devi
P. Subathra
P. N. Kumar
Copyright Year
2016
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-0251-9_8

Premium Partner