ABSTRACT
The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies, that are increasingly concerned about monitoring the discussion around their products. Tracking such discussion on weblogs, provides useful insight on how to improve products or market them more effectively. An important component of such analysis is to characterize the sentiment expressed in blogs about specific brands and products. Sentiment Analysis focuses on this task of automatically identifying whether a piece of text expresses a positive or negative opinion about the subject matter. Most previous work in this area uses prior lexical knowledge in terms of the sentiment-polarity of words. In contrast, some recent approaches treat the task as a text classification problem, where they learn to classify sentiment based only on labeled training data. In this paper, we present a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. Empirical results on diverse domains show that our approach performs better than using background knowledge or training data in isolation, as well as alternative approaches to using lexical knowledge with text classification.
- R. Agrawal, R. J. B. Jr., and R. Srikant. Athena: Mining-based interactive management of text databases. In Extending Database Technology, 2000. Google ScholarDigital Library
- Blogpulse: A service of nielsen buzzmetrics. http://www.blogpulse.com/.Google Scholar
- R. T. Clemen and R. L. Winkler. Combining probability distributions from experts in risk analysis. Risk Analysis, 19:187--203, 1999.Google ScholarCross Ref
- W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Transferring naive Bayes classifiers for text classification. In AAAI, 2007.Google ScholarDigital Library
- S. Das and M. Chen. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Asia Pacific Finance Association, 2001.Google Scholar
- A. Dayanik, D. D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing informative prior distributions from domain knowledge in text classification. In SIGIR, 2006. Google ScholarDigital Library
- G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In SIGIR, 2008. Google ScholarDigital Library
- K. T. Durant and M. D. Smith. Advances in Web Mining and Web Usage Analysis, chapter Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection. Springer, 2007. Google ScholarDigital Library
- Extracting the main content from a webpage. http://w-shadow.com/blog/2008/01/25/extracting-the-main-content-from-a-webpage/.Google Scholar
- S. French. Group consensus probability distributions: A critical survey. In Bayesian Statistics 2, pages 183--197. North-Holland, 1985.Google Scholar
- C. Genest and J. V. Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1:114--135, 1986.Google ScholarCross Ref
- M. Hu and B. Liu. Mining and summarizing customer reviews. In KDD, 2004. Google ScholarDigital Library
- S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In COLING, 2004. Google ScholarDigital Library
- B. Liu. Web Data Mining. Springer, 2007.Google Scholar
- B. Liu, X. Li, W. S. Lee, and P. Yu. Text classification by labeling words. In AAAI, 2004. Google ScholarDigital Library
- A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI Workshop on Text Categorization, 1998.Google Scholar
- V. Ng, S. Dasgupta, and S. M. N. Arifin. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In ACL, 2006. Google ScholarDigital Library
- K. Nigam. Using Unlabeled Data to Improve Text Classification. PhD thesis, Carnegie Mellon University, 2001. Google ScholarDigital Library
- B. Pang and L. Lee. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In ACL, 2004. Google ScholarDigital Library
- B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In EMNLP, 2002. Google ScholarDigital Library
- M. F. Porter. An algorithm for suffix stripping, pages 313--316. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
- G. Ramakrishnan, A. Jadhav, A. Joshi, S. Chakrabarti, and P. Bhattacharyya. Question answering via Bayesian inference on lexical relations. In ACL Workshop on Multilingual Summarization and Question Answering, 2003. Google ScholarDigital Library
- R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197--227, 1990. Google ScholarDigital Library
- R. E. Schapire, M. Rochery, M. G. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In ICML, 2002. Google ScholarDigital Library
- J. Shavlik. A framework for combining symbolic and neural learning. In Machine Learning, 1992. Google ScholarDigital Library
- V. Sindhwani and P. Melville. Document-word co-regularization for semi-supervised sentiment analysis. In ICDM, 2008. Google ScholarDigital Library
- S. Spangler, Y. Chen, L. Proctor, A. Lelescu, A. Behal, B. He, T. Griffin, A. Liu, B. Wade, and T. Davis. COBRA-Mining Web for Corporate Brand and Reputation Analysis. IEEE International Conference on Web Intelligence, 2007. Google ScholarDigital Library
- P. Turney. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. ACL, 2002. Google ScholarDigital Library
- T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In EMNLP, 2005. Google ScholarDigital Library
- R. L. Winkler. The consensus of subjective probability distributions. Management Science, 15:361--375, 1968.Google ScholarDigital Library
- X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD, 2004. Google ScholarDigital Library
- H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In EMNLP, 2003. Google ScholarDigital Library
- L. Zhuang, F. Jing, and X.-Y. Zhu. Movie review mining and summarization. In CIKM, 2006. Google ScholarDigital Library
Index Terms
- Sentiment analysis of blogs by combining lexical knowledge with text classification
Recommendations
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Holistic approaches to identifying the sentiment of blogs using opinion words
WISE'11: Proceedings of the 12th international conference on Web information system engineeringSentiment analysis aims to identify the orientation (positive or negative) of opinions or emotions expressed in documents. Opinion lexicons comprise opinion words expressing prior positive or negative sentiments. In most previous work documents are ...
Social sentiment sensor: a visualization system for topic detection and topic sentiment analysis on microblog
As a new form of social media, microblogging provides platform sharing, wherein users can share their feelings and ideas on certain topics. Bursty topics from microblogs are the results of the emerging issues that instantly attract more followers and ...
Comments