skip to main content
10.1145/1557019.1557156acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Sentiment analysis of blogs by combining lexical knowledge with text classification

Published:28 June 2009Publication History

ABSTRACT

The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies, that are increasingly concerned about monitoring the discussion around their products. Tracking such discussion on weblogs, provides useful insight on how to improve products or market them more effectively. An important component of such analysis is to characterize the sentiment expressed in blogs about specific brands and products. Sentiment Analysis focuses on this task of automatically identifying whether a piece of text expresses a positive or negative opinion about the subject matter. Most previous work in this area uses prior lexical knowledge in terms of the sentiment-polarity of words. In contrast, some recent approaches treat the task as a text classification problem, where they learn to classify sentiment based only on labeled training data. In this paper, we present a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. Empirical results on diverse domains show that our approach performs better than using background knowledge or training data in isolation, as well as alternative approaches to using lexical knowledge with text classification.

References

  1. R. Agrawal, R. J. B. Jr., and R. Srikant. Athena: Mining-based interactive management of text databases. In Extending Database Technology, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Blogpulse: A service of nielsen buzzmetrics. http://www.blogpulse.com/.Google ScholarGoogle Scholar
  3. R. T. Clemen and R. L. Winkler. Combining probability distributions from experts in risk analysis. Risk Analysis, 19:187--203, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  4. W. Dai, G.-R. Xue, Q. Yang, and Y. Yu. Transferring naive Bayes classifiers for text classification. In AAAI, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Das and M. Chen. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Asia Pacific Finance Association, 2001.Google ScholarGoogle Scholar
  6. A. Dayanik, D. D. Lewis, D. Madigan, V. Menkov, and A. Genkin. Constructing informative prior distributions from domain knowledge in text classification. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Druck, G. Mann, and A. McCallum. Learning from labeled features using generalized expectation criteria. In SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. T. Durant and M. D. Smith. Advances in Web Mining and Web Usage Analysis, chapter Predicting the Political Sentiment of Web Log Posts Using Supervised Machine Learning Techniques Coupled with Feature Selection. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Extracting the main content from a webpage. http://w-shadow.com/blog/2008/01/25/extracting-the-main-content-from-a-webpage/.Google ScholarGoogle Scholar
  10. S. French. Group consensus probability distributions: A critical survey. In Bayesian Statistics 2, pages 183--197. North-Holland, 1985.Google ScholarGoogle Scholar
  11. C. Genest and J. V. Zidek. Combining probability distributions: A critique and an annotated bibliography. Statistical Science, 1:114--135, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. Hu and B. Liu. Mining and summarizing customer reviews. In KDD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S.-M. Kim and E. Hovy. Determining the sentiment of opinions. In COLING, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Liu. Web Data Mining. Springer, 2007.Google ScholarGoogle Scholar
  15. B. Liu, X. Li, W. S. Lee, and P. Yu. Text classification by labeling words. In AAAI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI Workshop on Text Categorization, 1998.Google ScholarGoogle Scholar
  17. V. Ng, S. Dasgupta, and S. M. N. Arifin. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In ACL, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Nigam. Using Unlabeled Data to Improve Text Classification. PhD thesis, Carnegie Mellon University, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Pang and L. Lee. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In ACL, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In EMNLP, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. F. Porter. An algorithm for suffix stripping, pages 313--316. Morgan Kaufmann Publishers Inc., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Ramakrishnan, A. Jadhav, A. Joshi, S. Chakrabarti, and P. Bhattacharyya. Question answering via Bayesian inference on lexical relations. In ACL Workshop on Multilingual Summarization and Question Answering, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197--227, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. E. Schapire, M. Rochery, M. G. Rahim, and N. Gupta. Incorporating prior knowledge into boosting. In ICML, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Shavlik. A framework for combining symbolic and neural learning. In Machine Learning, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. Sindhwani and P. Melville. Document-word co-regularization for semi-supervised sentiment analysis. In ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Spangler, Y. Chen, L. Proctor, A. Lelescu, A. Behal, B. He, T. Griffin, A. Liu, B. Wade, and T. Davis. COBRA-Mining Web for Corporate Brand and Reputation Analysis. IEEE International Conference on Web Intelligence, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Turney. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. ACL, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In EMNLP, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. L. Winkler. The consensus of subjective probability distributions. Management Science, 15:361--375, 1968.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In EMNLP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L. Zhuang, F. Jing, and X.-Y. Zhu. Movie review mining and summarization. In CIKM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sentiment analysis of blogs by combining lexical knowledge with text classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader