Article

Free Access

Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis

Author:
Michael Gamon

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

COLING '04: Proceedings of the 20th international conference on Computational LinguisticsAugust 2004Pages 841–eshttps://doi.org/10.3115/1220355.1220476

Published:23 August 2004Publication History

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

Pages 841–es

ABSTRACT

We demonstrate that it is possible to perform automatic sentiment classification in the very noisy domain of customer feedback data. We show that by using large feature vectors in combination with feature reduction, we can train linear support vector machines that achieve high classification accuracy on data that present classification challenges even for a human annotator. We also show that, surprisingly, the addition of deep linguistic analysis features to a set of surface level word n-gram features contributes consistently to classification accuracy in this domain.

References

Harald Baayen, Hans van Halteren, and Fiona Tweedie. 1996. Outside the Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing 11 (3): 121--131.Google ScholarCross Ref
Thomas G. Dietterich (1997): "Machine-learning research: Four current directions". In: AI Magazine, 18 (4), pp. 97--136.Google Scholar
Susan Dumais, John Platt, David Heckerman, Mehran Sahami (1998): "Inductive Learning Algorithms and Representations for Text Categorization". Proceedings of CIKM-98, pp. 148--155. Google ScholarDigital Library
Ted Dunning. 1993. Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19: 61--74. Google ScholarDigital Library
Aidan Finn and Nicholas Kushmerick (2003): "Learning to classify documents according to genre". IJCAI-03 Workshop on Computational Approaches to Text Style and Synthesis.Google Scholar
Michael Gamon (2004): "Linguistic correlates of style: authorship classification with deep linguistic analysis features". Paper to be presented at COLING 2004. Google ScholarDigital Library
George Heidorn. (2000): "Intelligent Writing Assistance." In R. Dale, H. Moisl and H. Somers, eds., Handbook of Natural Language Processing. Marcel Dekker.Google Scholar
Thorsten Joachims (1998): "Text Categorization with Support Vector Machines: Learning with Many Relevant Features". Proceedings of ECML 1998, pp. 137--142. Google ScholarDigital Library
Kushal Dave, Steve Lawrence and David M. Pennock (2003): "Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews". In: Proceedings of the Twelfth International World Wide Web Conference, pp. 519--528. Google ScholarDigital Library
Hugo Liu, Henry Lieberman and Ted Selker (2003): "A Model of Textual Affect Sensing using Real-World Knowledge". In: Proceedings of the Seventh Conference on Intelligent User Interfaces, pp. 125--132. Google ScholarDigital Library
Tetsuya Nasukawa and Jeonghee Yi (2003): "Sentiment Analysis: Capturing Favorability Using Natural Language Processing". In: proceedings of the International Conference on Knowledge Capture, pp. 70--77. Google ScholarDigital Library
Bo Pang, Lillian Lee and Shivakumar Vaithyanathan (2002): "Thumbs up? Sentiment Classification using Machine Learning Techniques". Proceedings of EMNLP 2002, pp. 79--86. Google ScholarDigital Library
John Platt (1999): "Fast training of SVMs using sequential minimal optimization". In: B. Schoelkopf, C. Burges and A. Smola (eds.) "Advances in Kernel Methods: Support Vector Learning", MIT Press, Cambridge, MA, pp. 185--208. Google ScholarDigital Library
Pero Subasic and Alison Huettner (2001): "Affect Analysis of Text Using Fuzzy Semantic Typing". In: Proceedings of the Tenth IEEE International Conference on Fuzzy Systems, pp. 483--496.Google ScholarDigital Library
Ljupčo Todorovski and Sašo Džeroski (2003): "Combining Classifiers with Meta Decision Trees". In: Machine Learning, 50, pp.223--249. Google ScholarDigital Library
Peter D. Turney (2002): "Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews". In: Proceedings of ACL 2002, pp. 417--424. Google ScholarDigital Library
Peter D. Turney and M. L. Littman (2002): "Unsupervised ILearning of Semantic Orientation from a Hundred-Billion-Word Corpus." Technical report ERC-1094 (NRC 44929), National research Council of Canada.Google Scholar
Janyce Wiebe, Theresa Wilson and Matthew Bell (2001): "Identifying Collocations for Recognizing Opinions". In: Proceedings of the ACL/EACL Workshop on Collocation.Google Scholar
Hong Yu and Vasileios Hatzivassiloglou (2003): "Towards Answering pinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences". In: Proceedings of EMNLP 2003. Google ScholarDigital Library

Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Multi-class sentiment classification

A framework for multi-class sentiment classification is proposed.A total of 3600 comparative experiments are conducted.Performances of different feature selection/machine learning algorithms are compared.The results are valuable for further studies on ...
Read More
Sentence based sentiment classification from online customer reviews
FIT '10: Proceedings of the 8th International Conference on Frontiers of Information Technology

Sentiment analysis is the process of analyzing and classifying the rewires contents about a product, event, and place etc into positive, negative or neutral opinion. In this paper; we propose a sentence level machine learning approach for sentiment ...
Read More
Sentiment classification: The contribution of ensemble learning

With the rapid development of information technologies, user-generated contents can be conveniently posted online. While individuals, businesses, and governments are interested in evaluating the sentiments behind this content, there are no consistent ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

COLING '04: Proceedings of the 20th international conference on Computational Linguistics
August 2004
1411 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 23 August 2004
Qualifiers
- Article
Conference

Acceptance Rates
COLING '04 Paper Acceptance Rate1,411of1,411submissions,100%Overall Acceptance Rate1,537of1,537submissions,100%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 62
  Total Citations
  View Citations
- 4,478
  Total Downloads
- Downloads (Last 12 months)53
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Multi-class sentiment classification

Sentence based sentiment classification from online customer reviews

Sentiment classification: The contribution of ensemble learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis

COLING '04: Proceedings of the 20th international conference on Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

Multi-class sentiment classification

Sentence based sentiment classification from online customer reviews

Sentiment classification: The contribution of ensemble learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media