Skip to main content
Erschienen in: Soft Computing 9/2016

04.08.2015 | Focus

Hierarchical classification in text mining for sentiment analysis of online news

verfasst von: Jinyan Li, Simon Fong, Yan Zhuang, Richard Khoury

Erschienen in: Soft Computing | Ausgabe 9/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sentiment analysis in text mining is a challenging task. Sentiment is subtly reflected by the tone and affective content of a writer’s words. Conventional text mining techniques, which are based on keyword frequencies, usually run short of accurately detecting such subjective information implied in the text. In this paper, we evaluate several popular classification algorithms, along with three filtering schemes. The filtering schemes progressively shrink the original dataset with respect to the contextual polarity and frequent terms of a document. We call this approach “hierarchical classification”. The effects of the approach in different combination of classification algorithms and filtering schemes are discussed over three sets of controversial online news articles where binary and multi-class classifications are applied. Meanwhile we use two methods to test this hierarchical classification model, and also have a comparison of the two methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal R, Rajagopalan S, Srikant R, Xu Y (2003) Mining newsgroups using networks arising from social behavior. In: Proceedings of the 12th international conference on World Wide Web. ACM, pp 529–535 Agrawal R, Rajagopalan S, Srikant R, Xu Y (2003) Mining newsgroups using networks arising from social behavior. In: Proceedings of the 12th international conference on World Wide Web. ACM, pp 529–535
Zurück zum Zitat Argamon S, Bloom K, Esuli A, Sebastiani F (2009) Automatically determining attitude type and force for sentiment analysis. Human Language Technology. Challenges of the Information Society. Springer, Berlin, Heidelberg, pp 218–231 Argamon S, Bloom K, Esuli A, Sebastiani F (2009) Automatically determining attitude type and force for sentiment analysis. Human Language Technology. Challenges of the Information Society. Springer, Berlin, Heidelberg, pp 218–231
Zurück zum Zitat Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini G (2007) Language resources and linguistic theory: typology, second language acquisition, English linguistics (Forthcoming), chapter Micro-WNOp: A gold standard for the evaluation of automatically compiled lexical resources for opinion mining. Franco Angeli Editore, Milan Cerini S, Compagnoni V, Demontis A, Formentelli M, Gandini G (2007) Language resources and linguistic theory: typology, second language acquisition, English linguistics (Forthcoming), chapter Micro-WNOp: A gold standard for the evaluation of automatically compiled lexical resources for opinion mining. Franco Angeli Editore, Milan
Zurück zum Zitat Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on. IEEE Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on. IEEE
Zurück zum Zitat Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on World Wide Web. ACM, pp 519–528 Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on World Wide Web. ACM, pp 519–528
Zurück zum Zitat Devitt A, Ahmad K (2007) Sentiment polarity identification in financial news: a cohesion-based approach Devitt A, Ahmad K (2007) Sentiment polarity identification in financial news: a cohesion-based approach
Zurück zum Zitat Esuli A, Sebastiani F (2005) Determining the semantic orientation of terms through gloss classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM, pp 617–624 Esuli A, Sebastiani F (2005) Determining the semantic orientation of terms through gloss classification. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management. ACM, pp 617–624
Zurück zum Zitat Fong S, Zhuang Y, Li J, Khoury R (2013) (2013) Sentiment analysis of online news using MALLET. In: Computational and Business Intelligence (ISCBI), 2013 International Symposium on. IEEE, pp 301–304 Fong S, Zhuang Y, Li J, Khoury R (2013) (2013) Sentiment analysis of online news using MALLET. In: Computational and Business Intelligence (ISCBI), 2013 International Symposium on. IEEE, pp 301–304
Zurück zum Zitat Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305MATH Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305MATH
Zurück zum Zitat Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, pp 174–181 Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the 35th annual meeting of the association for computational linguistics and eighth conference of the european chapter of the association for computational linguistics. Association for Computational Linguistics, pp 174–181
Zurück zum Zitat Hernández L, López-Lopez A, Medina JE (2009) Recognizing polarity and attitude of words in text. In: New trends in artificial intelligence, Procs. 14th Portuguese Conference on Artificial Intelligence. EPIA, pp 12–15 Hernández L, López-Lopez A, Medina JE (2009) Recognizing polarity and attitude of words in text. In: New trends in artificial intelligence, Procs. 14th Portuguese Conference on Artificial Intelligence. EPIA, pp 12–15
Zurück zum Zitat Kamps J, Marx M, Mokken RJ, De Rijke M (2004) Using WordNet to measure semantic orientations of adjectives. LREC 4:1115–1118 Kamps J, Marx M, Mokken RJ, De Rijke M (2004) Using WordNet to measure semantic orientations of adjectives. LREC 4:1115–1118
Zurück zum Zitat Kim SM, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics Kim SM, Hovy E (2004) Determining the sentiment of opinions. In: Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics
Zurück zum Zitat Kim SM, Hovy EH (2007) Crystal: analyzing predictive opinions on the Web. In: EMNLP-CoNLL. pp 1056–1064 Kim SM, Hovy EH (2007) Crystal: analyzing predictive opinions on the Web. In: EMNLP-CoNLL. pp 1056–1064
Zurück zum Zitat Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 115–124 Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 115–124
Zurück zum Zitat Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). pp 79–86 Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). pp 79–86
Zurück zum Zitat Rajaraman A, Ullman JD (2012) Mining of massive datasets, vol 77. Cambridge University Press, Cambridge Rajaraman A, Ullman JD (2012) Mining of massive datasets, vol 77. Cambridge University Press, Cambridge
Zurück zum Zitat Snyder B, Barzilay R (2007) Multiple aspect ranking using the good grief algorithm. In: Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL). pp 300–307 Snyder B, Barzilay R (2007) Multiple aspect ranking using the good grief algorithm. In: Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL). pp 300–307
Zurück zum Zitat Takamura H, Inui T, Okumura M (2005) Extracting semantic orientations of words using spin model. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 133–140 Takamura H, Inui T, Okumura M (2005) Extracting semantic orientations of words using spin model. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 133–140
Zurück zum Zitat Turney P (2002) Thumbs up or thumbs down’s semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the Association for Computational Linguistics. pp. 417–424. arXiv:cs.LG/0212032 Turney P (2002) Thumbs up or thumbs down’s semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the Association for Computational Linguistics. pp. 417–424. arXiv:​cs.​LG/​0212032
Zurück zum Zitat Turney PD, Littman M (2003) Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346CrossRef Turney PD, Littman M (2003) Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346CrossRef
Zurück zum Zitat Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, pp 625–631 Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, pp 625–631
Zurück zum Zitat Wiebe J (1994) Tracking point of view in narrative. Computational Linguistics, 20. R. Nicole, Title of paper with only first word capitalized. J Name Stand Abbrev (in press) Wiebe J (1994) Tracking point of view in narrative. Computational Linguistics, 20. R. Nicole, Title of paper with only first word capitalized. J Name Stand Abbrev (in press)
Zurück zum Zitat Wilson TA (2008) Fine-grained subjectivity and sentiment analysis: recognizing the intensity, polarity, and attitudes of private states. ProQuest Wilson TA (2008) Fine-grained subjectivity and sentiment analysis: recognizing the intensity, polarity, and attitudes of private states. ProQuest
Zurück zum Zitat Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. ICML 97:412–420 Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. ICML 97:412–420
Metadaten
Titel
Hierarchical classification in text mining for sentiment analysis of online news
verfasst von
Jinyan Li
Simon Fong
Yan Zhuang
Richard Khoury
Publikationsdatum
04.08.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 9/2016
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-015-1812-4

Weitere Artikel der Ausgabe 9/2016

Soft Computing 9/2016 Zur Ausgabe

Premium Partner