Skip to main content
Top
Published in: Soft Computing 18/2018

05-01-2018 | Focus

Semantic Twitter sentiment analysis based on a fuzzy thesaurus

Authors: Heba M. Ismail, Boumediene Belkhouche, Nazar Zaki

Published in: Soft Computing | Issue 18/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We define a new, fully automated and domain-independent method for building feature vectors from Twitter text corpus for machine learning sentiment analysis based on a fuzzy thesaurus and sentiment replacement. The proposed method measures the semantic similarity of Tweets with features in the feature space instead of using terms’ presence or frequency feature vectors. Thus, we account for the sentiment of the context instead of just counting sentiment words. We use sentiment replacement to reduce the dimensionality of the feature space and a fuzzy thesaurus to incorporate semantics. Experimental results show that sentiment replacement yields up to 35% reduction in the dimensionality of the feature space. Moreover, feature vectors developed based on a fuzzy thesaurus show improvement of sentiment classification performance with multinomial naïve Bayes and support vector machine classifiers with accuracies of 83 and 85%, respectively, on the Stanford testing dataset. Incorporating the fuzzy thesaurus resulted in the best accuracy compared to the baselines with an increase greater than 3%. Comparable results were obtained with a larger dataset, the STS-Gold, indicating the robustness of the proposed method. Furthermore, comparison of results with previous work shows that the proposed method outperforms other methods reported in the literature using the same benchmark data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
STS-Gold dataset can be requested from the authors at: http://​kmi.​open.​ac.​uk/​people/​member/​hassan-saif
 
3
Stanford testing and training datasets can be downloaded from: https://​docs.​google.​com/​file/​d/​0B04GJPshIjmPRnZ​ManQwWEdTZjg/​edit
 
Literature
go back to reference Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: features selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):1–34CrossRef Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: features selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):1–34CrossRef
go back to reference Agarwal A, Xie B, Vovsha I, Rambow O (2011) Sentiment analysis of Twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, pp 30–38 Agarwal A, Xie B, Vovsha I, Rambow O (2011) Sentiment analysis of Twitter data. In: Proceedings of the workshop on languages in social media. Association for Computational Linguistics, pp 30–38
go back to reference Barbosa L, Feng J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: 23rd International conference on computational linguistics. Association for Computational Linguistics, pp 36–44 Barbosa L, Feng J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: 23rd International conference on computational linguistics. Association for Computational Linguistics, pp 36–44
go back to reference Batra S, Rao D (2010) Entity based sentiment analysis on Twitter. Science 9(4):1–12 Batra S, Rao D (2010) Entity based sentiment analysis on Twitter. Science 9(4):1–12
go back to reference Bhuta S, Doshi A, Doshi U, Narvekar M (2014) A review of techniques for sentiment analysis of Twitter data. In: International conference on issues and challenges in intelligent computing techniques (ICICT). IEEE, pp 583–591 Bhuta S, Doshi A, Doshi U, Narvekar M (2014) A review of techniques for sentiment analysis of Twitter data. In: International conference on issues and challenges in intelligent computing techniques (ICICT). IEEE, pp 583–591
go back to reference Boulianne S (2015) Social media use and participation: a meta-analysis of current research. Inf Commun Soc 18(5):524–538CrossRef Boulianne S (2015) Social media use and participation: a meta-analysis of current research. Inf Commun Soc 18(5):524–538CrossRef
go back to reference Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28:15–21CrossRef Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28:15–21CrossRef
go back to reference Cambria E, Speer R, Havasi C, Hussain A (2010) SenticNet: a publicly available semantic resource for opinion mining. AAAI fall symposium: commonsense knowledge 10 Cambria E, Speer R, Havasi C, Hussain A (2010) SenticNet: a publicly available semantic resource for opinion mining. AAAI fall symposium: commonsense knowledge 10
go back to reference Elfeky M, Elhawary M (2010) Mining Arabic business reviews. In: International conference in data mining. IEEE, Sydney. pp 1108–1113 Elfeky M, Elhawary M (2010) Mining Arabic business reviews. In: International conference in data mining. IEEE, Sydney. pp 1108–1113
go back to reference Esuli A (2006) SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation, pp 417–422 (2006) Esuli A (2006) SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation, pp 417–422 (2006)
go back to reference Garcia I, Ng YK (2006) Eliminating redundant and less-informative RSS news articles based on word similarity and a fuzzy equivalence relation. In: Tools with artificial intelligence, ICTAI’06. IEEE, pp 465–473 Garcia I, Ng YK (2006) Eliminating redundant and less-informative RSS news articles based on word similarity and a fuzzy equivalence relation. In: Tools with artificial intelligence, ICTAI’06. IEEE, pp 465–473
go back to reference Go A, Bhayani R, Huang L (2009). Twitter sentiment classification using distant supervision. Stanford digital library technologies projects Go A, Bhayani R, Huang L (2009). Twitter sentiment classification using distant supervision. Stanford digital library technologies projects
go back to reference Hotho A, Nürnberger A, Paaß G (2005) A brief survey of text mining. Ldv Forum 20(1):19–62 Hotho A, Nürnberger A, Paaß G (2005) A brief survey of text mining. Ldv Forum 20(1):19–62
go back to reference Ismail HM (2014) Using concept maps and fuzzy set information retrieval model to dynamically personalize RSS feeds. Int J Comput Sci Netw Secur 14(2):10 Ismail HM (2014) Using concept maps and fuzzy set information retrieval model to dynamically personalize RSS feeds. Int J Comput Sci Netw Secur 14(2):10
go back to reference Ismail HM, Harous S, Belkhouche B (2016) A comparative analysis of machine learning classifiers for Twitter sentiment analysis. Res Comput Sci 110:71–83 Ismail HM, Harous S, Belkhouche B (2016) A comparative analysis of machine learning classifiers for Twitter sentiment analysis. Res Comput Sci 110:71–83
go back to reference Ismail HM, Zaki N, Belkhouche B (2016) Using custom fuzzy thesaurus to incorporate semantics and reduce data sparsity for Twitter sentiment analysis. In: 3rd International conference on soft computing and machine intelligence (ISCMI). IEEE, pp 47–52 Ismail HM, Zaki N, Belkhouche B (2016) Using custom fuzzy thesaurus to incorporate semantics and reduce data sparsity for Twitter sentiment analysis. In: 3rd International conference on soft computing and machine intelligence (ISCMI). IEEE, pp 47–52
go back to reference Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent Twitter sentiment classification. In: Annual meeting of the association for computational linguistics. Association for Computational Linguistics, Portland, pp 151–160 Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent Twitter sentiment classification. In: Annual meeting of the association for computational linguistics. Association for Computational Linguistics, Portland, pp 151–160
go back to reference Kao A, Poteet SR (eds) (2007) Natural language processing and text mining. Springer, BerlinMATH Kao A, Poteet SR (eds) (2007) Natural language processing and text mining. Springer, BerlinMATH
go back to reference Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of Twitter posts. Expert Syst Appl 40(10):4065–4074CrossRef Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N (2013) Ontology-based sentiment analysis of Twitter posts. Expert Syst Appl 40(10):4065–4074CrossRef
go back to reference Kraft DH, Bordogna G, Pasi G (1999) Fuzzy set techniques in information retrieval. Fuzzy Sets Approx Reason Inf Syst 5(6):469–510MathSciNetCrossRefMATH Kraft DH, Bordogna G, Pasi G (1999) Fuzzy set techniques in information retrieval. Fuzzy Sets Approx Reason Inf Syst 5(6):469–510MathSciNetCrossRefMATH
go back to reference Lee B, Pang L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135 Lee B, Pang L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
go back to reference Lima ACE, de Castro LN, Corchado JM (2015) A polarity analysis framework for Twitter messages. Applied Mathematics and Computation 270(1):756–767CrossRef Lima ACE, de Castro LN, Corchado JM (2015) A polarity analysis framework for Twitter messages. Applied Mathematics and Computation 270(1):756–767CrossRef
go back to reference Liu Y, Kliman-Silver C, Mislove A (2014) The Tweets They Are a-Changin: Evolution of Twitter Users and Behavior. ICWSM 30:5–314 Liu Y, Kliman-Silver C, Mislove A (2014) The Tweets They Are a-Changin: Evolution of Twitter Users and Behavior. ICWSM 30:5–314
go back to reference Manning CD, Raghavan P, Schütze H (2009) Text classification and naive bayes. In: Introduction to information retrieval. Cambridge University Press, pp 253–287 Manning CD, Raghavan P, Schütze H (2009) Text classification and naive bayes. In: Introduction to information retrieval. Cambridge University Press, pp 253–287
go back to reference Ogawa Y, Morita T, Kobayashi K (1991) A fuzzy document retrieval system using the keyword connection matrix and a learning method. Fuzzy Sets Syst 39(2):163–179MathSciNetCrossRef Ogawa Y, Morita T, Kobayashi K (1991) A fuzzy document retrieval system using the keyword connection matrix and a learning method. Fuzzy Sets Syst 39(2):163–179MathSciNetCrossRef
go back to reference Pang B, Lee L, Vaithyanathan S (2002) Thumbs up: sentiment classification using machine learning techniques. Association for Computational Linguistics, StroudsburgCrossRef Pang B, Lee L, Vaithyanathan S (2002) Thumbs up: sentiment classification using machine learning techniques. Association for Computational Linguistics, StroudsburgCrossRef
go back to reference Perez-Tellez F, Pinto D, Cardiff J, Rosso P (2010) On the difficulty of clustering company Tweets. In: 2nd International workshop on search and mining user-generated contents. ACM, New York, pp 95–102 Perez-Tellez F, Pinto D, Cardiff J, Rosso P (2010) On the difficulty of clustering company Tweets. In: 2nd International workshop on search and mining user-generated contents. ACM, New York, pp 95–102
go back to reference Porter MF (1980) An Algorithm for Suffix Stripping. Program 14(3):130–137CrossRef Porter MF (1980) An Algorithm for Suffix Stripping. Program 14(3):130–137CrossRef
go back to reference Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for Twitter sentiment analysis a survey and a new dataset, the STS-gold. In: Interantional workshop on emotion and sentiment in social and expressive media: approaches and perspectives from AI (ESSEM 2013). Italy Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for Twitter sentiment analysis a survey and a new dataset, the STS-gold. In: Interantional workshop on emotion and sentiment in social and expressive media: approaches and perspectives from AI (ESSEM 2013). Italy
go back to reference Saif H, He Y, Alani H (2012) Alleviating data sparsity for twitter sentiment analysis. Making sense of microposts. CEUR-WS. org, Lyon, France Saif H, He Y, Alani H (2012) Alleviating data sparsity for twitter sentiment analysis. Making sense of microposts. CEUR-WS. org, Lyon, France
go back to reference Saif H, He Y, Fernandez M, Alani H (2016) Contextual semantics for sentiment analysis of twitter. Inf Process Manag 52(1):5–19CrossRef Saif H, He Y, Fernandez M, Alani H (2016) Contextual semantics for sentiment analysis of twitter. Inf Process Manag 52(1):5–19CrossRef
go back to reference Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRef Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRef
go back to reference Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Conference on empirical methods in natural language processing. UK, pp 53–63 Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Conference on empirical methods in natural language processing. UK, pp 53–63
go back to reference Strapparava C, Valitutti A (2004) WordNet affect: an affective extension of WordNet. LREC 4:1083–1086 Strapparava C, Valitutti A (2004) WordNet affect: an affective extension of WordNet. LREC 4:1083–1086
go back to reference Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37:267–307CrossRef Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37:267–307CrossRef
go back to reference Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346CrossRef Turney PD, Littman ML (2003) Measuring praise and criticism: inference of semantic orientation from association. ACM Trans Inf Syst 21(4):315–346CrossRef
go back to reference Vapnik VN, Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH Vapnik VN, Vapnik V (1998) Statistical learning theory. Wiley, New YorkMATH
go back to reference Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: International conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, Vancouver, pp 347–354 Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis. In: International conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, Vancouver, pp 347–354
go back to reference Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington
go back to reference Yerra R, Ng YK (2005) Detecting similar HTML documents using a fuzzy set information retrieval approach. In: Granular computing IEEE International Conference, IEEE. 2:693–699 Yerra R, Ng YK (2005) Detecting similar HTML documents using a fuzzy set information retrieval approach. In: Granular computing IEEE International Conference, IEEE. 2:693–699
go back to reference Zaki N, Lazarova-Molnar S, El-Hajj W, Campbell P (2009) Protein-protein interaction based on pairwise similarity. BMC Bioinf 10(1):150CrossRef Zaki N, Lazarova-Molnar S, El-Hajj W, Campbell P (2009) Protein-protein interaction based on pairwise similarity. BMC Bioinf 10(1):150CrossRef
go back to reference Zhou P, Chaovalit L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: International conference on system sciences. IEEE, Hawaii, pp 112c–112c Zhou P, Chaovalit L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: International conference on system sciences. IEEE, Hawaii, pp 112c–112c
Metadata
Title
Semantic Twitter sentiment analysis based on a fuzzy thesaurus
Authors
Heba M. Ismail
Boumediene Belkhouche
Nazar Zaki
Publication date
05-01-2018
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 18/2018
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-017-2994-8

Other articles of this Issue 18/2018

Soft Computing 18/2018 Go to the issue

Premium Partner