Skip to main content
Top
Published in: Knowledge and Information Systems 2/2019

18-05-2018 | Regular Paper

Sentiment analysis using semantic similarity and Hadoop MapReduce

Published in: Knowledge and Information Systems | Issue 2/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Sentiment analysis or opinion mining is a domain that analyses people’s opinions, sentiments, evaluations, attitudes, and emotions from a written language; it had become a very active area of scientific research in recent years, especially with the development of social networks like Facebook and Twitter. In this paper we propose two new approaches to classify the tweets (look for the feeling expressed in the tweet), the first according to three classes : negative, positive or neutral, and the second according to two classes : negative or positive. Our first method consists in calculating the semantic similarity between the tweet to classify and three documents where each document represents a class (contains the words that represent a class); after the calculation of the similarity, the tweet takes the class of the document that has the greatest value of the semantic similarity with it. And the second method consists in calculating the semantic similarity between each word of the tweet to classify and the words “positive” and “negative” by proposing a new formula. We decide to do the analysis in a parallel and distributed way, using the Hadoop framework with the Hadoop distributed file system (HDFS) and the programming model MapReduce to solve the problem of the calculation time of the analysis if the dataset of the tweets is very large. The aim of our work is to combine between several domains, the information retrieval, semantic similarity, opinion mining or sentiment analysis and big data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
2
Twitter4J (https://​twitter4j.​org/​) is an unofficial Java library for the Twitter API.With Twitter4J; you can easily integrate your Java application with the Twitter service. Twitter4J is an unofficial library.
 
Literature
1.
go back to reference Madani Y, Engourram J, Erritali M, Hssina B, BIRJALI M (2017) Adaptive e-learning using genetic algorithm and sentiments analysis in a big data system. Int J Adv Comput Sci Appl 8(8):394–403 Madani Y, Engourram J, Erritali M, Hssina B, BIRJALI M (2017) Adaptive e-learning using genetic algorithm and sentiments analysis in a big data system. Int J Adv Comput Sci Appl 8(8):394–403
3.
go back to reference Youness M, Mohammed E, Jamaa B (2017) A parallel semantic sentiment analysis. In: 2017 3rd international conference of cloud computing technologies and applications (CloudTech), pp 1–6 Youness M, Mohammed E, Jamaa B (2017) A parallel semantic sentiment analysis. In: 2017 3rd international conference of cloud computing technologies and applications (CloudTech), pp 1–6
6.
go back to reference Huq MR, Ali A, Rahman A (2017) Sentiment analysis on Twitter data using KNN and SVM. Int J Adv Comput Sci Appl 8(6):19–25 Huq MR, Ali A, Rahman A (2017) Sentiment analysis on Twitter data using KNN and SVM. Int J Adv Comput Sci Appl 8(6):19–25
7.
go back to reference Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the seventh conference on international language resources and evaluation, pp 1320–1326 Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the seventh conference on international language resources and evaluation, pp 1320–1326
8.
go back to reference Barbosa L, Feng J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: COLING 2010: poster volume, pp 36–44 Barbosa L, Feng J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: COLING 2010: poster volume, pp 36–44
9.
go back to reference Xie AB, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of Twitter data. In: Proceedings of the ACL 2011 workshop on languages in social media, pp 30–38 Xie AB, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of Twitter data. In: Proceedings of the ACL 2011 workshop on languages in social media, pp 30–38
11.
go back to reference Haddi E, Liu X, Shi Y (2014) The role of text pre-processing in sentiment analysis. Procedia Comput Sci 17:26–32CrossRef Haddi E, Liu X, Shi Y (2014) The role of text pre-processing in sentiment analysis. Procedia Comput Sci 17:26–32CrossRef
12.
go back to reference Saif H, Fernandez M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: Proceedings of the 9th language resources and evaluation conference (LREC), Reykjavik, Iceland, pp 80–81 Saif H, Fernandez M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: Proceedings of the 9th language resources and evaluation conference (LREC), Reykjavik, Iceland, pp 80–81
13.
go back to reference Bao Y, Quan C, Wang L, Ren F (2014) The role of pre-processing in Twitter sentiment analysis. In: Huang DS, Jo KH, Wang L (eds) Intelligent computing methodologies. ICIC 2014. Lecture notes in computer science, vol 8589. Springer Bao Y, Quan C, Wang L, Ren F (2014) The role of pre-processing in Twitter sentiment analysis. In: Huang DS, Jo KH, Wang L (eds) Intelligent computing methodologies. ICIC 2014. Lecture notes in computer science, vol 8589. Springer
18.
go back to reference Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison- Wesley Publishing Co., Inc., New York Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison- Wesley Publishing Co., Inc., New York
19.
go back to reference Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of the 32nd annual meeting of the associations for computational Llinguistics, pp 133–138 Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of the 32nd annual meeting of the associations for computational Llinguistics, pp 133–138
20.
go back to reference Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp 448–453 Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp 448–453
21.
go back to reference Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning (ICML’98). Morgan-Kaufmann, Madison Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning (ICML’98). Morgan-Kaufmann, Madison
22.
go back to reference Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of international conference on research in computational linguistics, Taiwan Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of international conference on research in computational linguistics, Taiwan
23.
go back to reference Hirst G, St-Onge D (1998) Lexical chains as representation of context for the detection and correction malapropisms. In: Christiane F (ed), WordNet: an electronic lexical database, chapter 13, pp 305–332. TheMIT Press Hirst G, St-Onge D (1998) Lexical chains as representation of context for the detection and correction malapropisms. In: Christiane F (ed), WordNet: an electronic lexical database, chapter 13, pp 305–332. TheMIT Press
24.
go back to reference Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellbaum C (ed) WordNet: an electronic lexical database. MIT Press, Cambridge Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellbaum C (ed) WordNet: an electronic lexical database. MIT Press, Cambridge
26.
go back to reference Porter MF (1980) An algorithm for suffix stripping. Orig Publ Program 14(3):130–137 Porter MF (1980) An algorithm for suffix stripping. Orig Publ Program 14(3):130–137
Metadata
Title
Sentiment analysis using semantic similarity and Hadoop MapReduce
Publication date
18-05-2018
Published in
Knowledge and Information Systems / Issue 2/2019
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-018-1212-z

Other articles of this Issue 2/2019

Knowledge and Information Systems 2/2019 Go to the issue

Premium Partner