Top

Knowledge and Information Systems

Published in:

18-05-2018 | Regular Paper

Sentiment analysis using semantic similarity and Hadoop MapReduce

Published in: Knowledge and Information Systems | Issue 2/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Sentiment analysis or opinion mining is a domain that analyses people’s opinions, sentiments, evaluations, attitudes, and emotions from a written language; it had become a very active area of scientific research in recent years, especially with the development of social networks like Facebook and Twitter. In this paper we propose two new approaches to classify the tweets (look for the feeling expressed in the tweet), the first according to three classes : negative, positive or neutral, and the second according to two classes : negative or positive. Our first method consists in calculating the semantic similarity between the tweet to classify and three documents where each document represents a class (contains the words that represent a class); after the calculation of the similarity, the tweet takes the class of the document that has the greatest value of the semantic similarity with it. And the second method consists in calculating the semantic similarity between each word of the tweet to classify and the words “positive” and “negative” by proposing a new formula. We decide to do the analysis in a parallel and distributed way, using the Hadoop framework with the Hadoop distributed file system (HDFS) and the programming model MapReduce to solve the problem of the calculation time of the analysis if the dataset of the tweets is very large. The aim of our work is to combine between several domains, the information retrieval, semantic similarity, opinion mining or sentiment analysis and big data.

previous article A probabilistic model for semantic advertising

next article An innovative linear unsupervised space adjustment by keeping low-level spatial data structure

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

https://wordnet.princeton.edu/.

Twitter4J (https://twitter4j.org/) is an unofficial Java library for the Twitter API.With Twitter4J; you can easily integrate your Java application with the Twitter service. Twitter4J is an unofficial library.

https://apps.twitter.com/.

Madani Y, Engourram J, Erritali M, Hssina B, BIRJALI M (2017) Adaptive e-learning using genetic algorithm and sentiments analysis in a big data system. Int J Adv Comput Sci Appl 8(8):394–403

Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J 5(4):1093–1113. https://doi.org/10.1016/j.asej.2014.04.011 CrossRef

Youness M, Mohammed E, Jamaa B (2017) A parallel semantic sentiment analysis. In: 2017 3rd international conference of cloud computing technologies and applications (CloudTech), pp 1–6

Appel O, Chiclana F, Carter J, Fujita H (2016) A hybrid approach to the sentiment analysis problem at the sentence level. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2016.05.040

Shiha MO, Serkan A (2017) The effects of emoji in sentiment analysis. Int J Comput Electr Eng 9(1). https://doi.org/10.17706/ijcee.2017.9.1.360-369

Huq MR, Ali A, Rahman A (2017) Sentiment analysis on Twitter data using KNN and SVM. Int J Adv Comput Sci Appl 8(6):19–25

Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the seventh conference on international language resources and evaluation, pp 1320–1326

Barbosa L, Feng J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: COLING 2010: poster volume, pp 36–44

Xie AB, Vovsha I, Rambow O, Passonneau R (2011) Sentiment analysis of Twitter data. In: Proceedings of the ACL 2011 workshop on languages in social media, pp 30–38

10.

Jianqiang Z, Xiaolin G (2017) Comparison research on text pre-processing methods on Twitter sentiment analysis. IEEE Access. https://doi.org/10.1109/ACCESS.2017.2672677

11.

Haddi E, Liu X, Shi Y (2014) The role of text pre-processing in sentiment analysis. Procedia Comput Sci 17:26–32CrossRef

12.

Saif H, Fernandez M, He Y, Alani H (2014) On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: Proceedings of the 9th language resources and evaluation conference (LREC), Reykjavik, Iceland, pp 80–81

13.

Bao Y, Quan C, Wang L, Ren F (2014) The role of pre-processing in Twitter sentiment analysis. In: Huang DS, Jo KH, Wang L (eds) Intelligent computing methodologies. ICIC 2014. Lecture notes in computer science, vol 8589. Springer

14.

Sharif W, Samsudin NA, Deris MM, Naseem R (2016) Effect of negation in sentiment analysis. In: The sixth international conference on innovative computing technology (INTECH 2016). https://doi.org/10.1109/INTECH.2016.7845119

15.

Saif H, He Y, Alani H (2012) Semantic sentiment analysis of twitter. In: ISWC’12 (2012) proceedings of the 11th international conference on the semantic web—volume part I, pp 508–524. https://doi.org/10.1007/978-3-642-35176-1_32

16.

Saif H, He Y, Fernandez M, Alani H (2014) Semantic patterns for sentiment analysis of Twitter. In: Mika P et al. (eds) The semantic web—ISWC 2014. ISWC 2014. Lecture notes in computer science, vol 8797. Springer, Cham. https://doi.org/10.1007/978-3-319-11915-1_21

17.

Tartir Samir, Abdul-Nabi Ibrahim (2017) Semantic sentiment analysis in Arabic social media. J King Saud Univ Comput Inform Sci 29(2):229–233. https://doi.org/10.1016/j.jksuci.2016.11.011

18.

Salton G (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison- Wesley Publishing Co., Inc., New York

19.

Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: Proceedings of the 32nd annual meeting of the associations for computational Llinguistics, pp 133–138

20.

Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp 448–453

21.

Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning (ICML’98). Morgan-Kaufmann, Madison

22.

Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of international conference on research in computational linguistics, Taiwan

23.

Hirst G, St-Onge D (1998) Lexical chains as representation of context for the detection and correction malapropisms. In: Christiane F (ed), WordNet: an electronic lexical database, chapter 13, pp 305–332. TheMIT Press

24.

Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellbaum C (ed) WordNet: an electronic lexical database. MIT Press, Cambridge

25.

Madani Y, Bengourram J, Erritali M (2017) Social login and data storage in the big data file system HDFS. In: proceedings of the international conference on compute and data analysis, New York, NY, USA, pp 91–97. ACM. https://doi.org/10.1145/3093241.3093265

26.

Porter MF (1980) An algorithm for suffix stripping. Orig Publ Program 14(3):130–137

Title: Sentiment analysis using semantic similarity and Hadoop MapReduce
Publication date: 18-05-2018
Published in: Knowledge and Information Systems / Issue 2/2019
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-018-1212-z

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 2/2019

A novel density peaks clustering with sensitivity of local density and density-adaptive metric

Self-learning and embedding based entity alignment

A probabilistic model for semantic advertising

: named event extraction engine

An innovative linear unsupervised space adjustment by keeping low-level spatial data structure

A new truth discovery method for resolving object conflicts over Linked Data with scale-free property

Premium Partner