ABSTRACT
We demonstrate that it is possible to perform automatic sentiment classification in the very noisy domain of customer feedback data. We show that by using large feature vectors in combination with feature reduction, we can train linear support vector machines that achieve high classification accuracy on data that present classification challenges even for a human annotator. We also show that, surprisingly, the addition of deep linguistic analysis features to a set of surface level word n-gram features contributes consistently to classification accuracy in this domain.
- Harald Baayen, Hans van Halteren, and Fiona Tweedie. 1996. Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing 11 (3): 121--131.Google ScholarCross Ref
- Thomas G. Dietterich (1997): "Machine-learning research: Four current directions". In: AI Magazine, 18 (4), pp. 97--136.Google Scholar
- Susan Dumais, John Platt, David Heckerman, Mehran Sahami (1998): "Inductive Learning Algorithms and Representations for Text Categorization". Proceedings of CIKM-98, pp. 148--155. Google ScholarDigital Library
- Ted Dunning. 1993. Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19: 61--74. Google ScholarDigital Library
- Aidan Finn and Nicholas Kushmerick (2003): "Learning to classify documents according to genre". IJCAI-03 Workshop on Computational Approaches to Text Style and Synthesis.Google Scholar
- Michael Gamon (2004): "Linguistic correlates of style: authorship classification with deep linguistic analysis features". Paper to be presented at COLING 2004. Google ScholarDigital Library
- George Heidorn. (2000): "Intelligent Writing Assistance." In R. Dale, H. Moisl and H. Somers, eds., Handbook of Natural Language Processing. Marcel Dekker.Google Scholar
- Thorsten Joachims (1998): "Text Categorization with Support Vector Machines: Learning with Many Relevant Features". Proceedings of ECML 1998, pp. 137--142. Google ScholarDigital Library
- Kushal Dave, Steve Lawrence and David M. Pennock (2003): "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews". In: Proceedings of the Twelfth International World Wide Web Conference, pp. 519--528. Google ScholarDigital Library
- Hugo Liu, Henry Lieberman and Ted Selker (2003): "A Model of Textual Affect Sensing using Real-World Knowledge". In: Proceedings of the Seventh Conference on Intelligent User Interfaces, pp. 125--132. Google ScholarDigital Library
- Tetsuya Nasukawa and Jeonghee Yi (2003): "Sentiment Analysis: Capturing Favorability Using Natural Language Processing". In: proceedings of the International Conference on Knowledge Capture, pp. 70--77. Google ScholarDigital Library
- Bo Pang, Lillian Lee and Shivakumar Vaithyanathan (2002): "Thumbs up? Sentiment Classification using Machine Learning Techniques". Proceedings of EMNLP 2002, pp. 79--86. Google ScholarDigital Library
- John Platt (1999): "Fast training of SVMs using sequential minimal optimization". In: B. Schoelkopf, C. Burges and A. Smola (eds.) "Advances in Kernel Methods: Support Vector Learning", MIT Press, Cambridge, MA, pp. 185--208. Google ScholarDigital Library
- Pero Subasic and Alison Huettner (2001): "Affect Analysis of Text Using Fuzzy Semantic Typing". In: Proceedings of the Tenth IEEE International Conference on Fuzzy Systems, pp. 483--496.Google ScholarDigital Library
- Ljupčo Todorovski and Sašo Džeroski (2003): "Combining Classifiers with Meta Decision Trees". In: Machine Learning, 50, pp.223--249. Google ScholarDigital Library
- Peter D. Turney (2002): "Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews". In: Proceedings of ACL 2002, pp. 417--424. Google ScholarDigital Library
- Peter D. Turney and M. L. Littman (2002): "Unsupervised ILearning of Semantic Orientation from a Hundred-Billion-Word Corpus." Technical report ERC-1094 (NRC 44929), National research Council of Canada.Google Scholar
- Janyce Wiebe, Theresa Wilson and Matthew Bell (2001): "Identifying Collocations for Recognizing Opinions". In: Proceedings of the ACL/EACL Workshop on Collocation.Google Scholar
- Hong Yu and Vasileios Hatzivassiloglou (2003): "Towards Answering pinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences". In: Proceedings of EMNLP 2003. Google ScholarDigital Library
- Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis
Recommendations
Multi-class sentiment classification
A framework for multi-class sentiment classification is proposed.A total of 3600 comparative experiments are conducted.Performances of different feature selection/machine learning algorithms are compared.The results are valuable for further studies on ...
Sentence based sentiment classification from online customer reviews
FIT '10: Proceedings of the 8th International Conference on Frontiers of Information TechnologySentiment analysis is the process of analyzing and classifying the rewires contents about a product, event, and place etc into positive, negative or neutral opinion. In this paper; we propose a sentence level machine learning approach for sentiment ...
Sentiment classification: The contribution of ensemble learning
With the rapid development of information technologies, user-generated contents can be conveniently posted online. While individuals, businesses, and governments are interested in evaluating the sentiments behind this content, there are no consistent ...
Comments