Skip to main content
Top
Published in: Cluster Computing 3/2019

04-08-2017

Opinion mining on large scale data using sentiment analysis and k-means clustering

Authors: Sumbal Riaz, Mehvish Fatima, M. Kamran, M. Wasif Nisar

Published in: Cluster Computing | Special Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the rapid growth of web technology and easy access of internet, online shopping has been increased. Now people express their opinions and share their experiences that greatly influence new buyers for purchasing products, thereby generating large data sets. This large data is very helpful for analyzing customer preference, needs and its behavior toward a product. Companies face the challenge of analyzing this sheer amount of data to extract customer opinion. To address this challenge, in this paper, we performed sentiment analysis on the customer review real-world data at phrase level to find out customer preference by analyzing subjective expressions. Then we calculated the strength of sentiment word to find out the intensity of each expression and applied clustering for placing the words in various clusters based on their intensity. We also compared the results of our technique with star-ranking given on the same dataset and found the drastic change in our results. We also provide a visual representation of our results to provide a clear insight of customer preference and behavior to help decision makers for better decision making.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Smith, A., Anderson, M.: Online Shopping and E-Commerce. Pew Research Center, Washington, DC (2016) Smith, A., Anderson, M.: Online Shopping and E-Commerce. Pew Research Center, Washington, DC (2016)
2.
go back to reference Liu, B.: Sentiment analysis and subjectivity. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Boca Raton (2010) Liu, B.: Sentiment analysis and subjectivity. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Boca Raton (2010)
3.
go back to reference Asghar, M.Z., Ahmad, S., Qasim, M., Zahra, S.R., Kundi, F.M.: SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5, 1139 (2016)CrossRef Asghar, M.Z., Ahmad, S., Qasim, M., Zahra, S.R., Kundi, F.M.: SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5, 1139 (2016)CrossRef
4.
go back to reference Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014) Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth International AAAI Conference on Weblogs and Social Media (2014)
5.
go back to reference Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A system for real-time twitter sentiment analysis of 2012 US presidential election cycle. In: Proceedings of the ACL 2012 System Demonstrations, pp. 115–120 (2012) Wang, H., Can, D., Kazemzadeh, A., Bar, F., Narayanan, S.: A system for real-time twitter sentiment analysis of 2012 US presidential election cycle. In: Proceedings of the ACL 2012 System Demonstrations, pp. 115–120 (2012)
7.
go back to reference Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014)CrossRef Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5, 1093–1113 (2014)CrossRef
8.
go back to reference Bai, X.: Predicting consumer sentiments from online text. Decis. Support Syst. 50, 732–742 (2011)CrossRef Bai, X.: Predicting consumer sentiments from online text. Decis. Support Syst. 50, 732–742 (2011)CrossRef
9.
go back to reference Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28, 15–21 (2013)CrossRef Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28, 15–21 (2013)CrossRef
10.
go back to reference Archak, N., Ghose, A., Ipeirotis, P.G.: Deriving the pricing power of product features by mining consumer reviews. Manag. Sci. 57, 1485–1509 (2011)CrossRef Archak, N., Ghose, A., Ipeirotis, P.G.: Deriving the pricing power of product features by mining consumer reviews. Manag. Sci. 57, 1485–1509 (2011)CrossRef
11.
go back to reference Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37, 267–307 (2011)CrossRef Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37, 267–307 (2011)CrossRef
12.
go back to reference Kang, H., Yoo, S.J., Han, D.: Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39, 6000–6010 (2012)CrossRef Kang, H., Yoo, S.J., Han, D.: Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39, 6000–6010 (2012)CrossRef
13.
go back to reference Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved Fisher’s discriminant ratio for text sentiment classification. Expert Syst. Appl. 38, 8696–8702 (2011)CrossRef Wang, S., Li, D., Song, X., Wei, Y., Li, H.: A feature selection method based on improved Fisher’s discriminant ratio for text sentiment classification. Expert Syst. Appl. 38, 8696–8702 (2011)CrossRef
14.
go back to reference Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Syst. 89, 14–46 (2015)CrossRef Ravi, K., Ravi, V.: A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl. Syst. 89, 14–46 (2015)CrossRef
15.
go back to reference Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst. Appl. 36, 6527–6535 (2009)CrossRef Ye, Q., Zhang, Z., Law, R.: Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Syst. Appl. 36, 6527–6535 (2009)CrossRef
16.
go back to reference Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the OMG!. ICWSM 11, 164 (2011) Kouloumpis, E., Wilson, T., Moore, J.D.: Twitter sentiment analysis: the good the bad and the OMG!. ICWSM 11, 164 (2011)
17.
go back to reference Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of twitter. Semant. Web-ISWC 2012, 508–524 (2012) Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of twitter. Semant. Web-ISWC 2012, 508–524 (2012)
18.
go back to reference Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36-44 (2010) Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36-44 (2010)
19.
go back to reference Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354 (2005) Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354 (2005)
20.
go back to reference Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54, 547–577 (2003)CrossRef Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural language use: our words, our selves. Ann. Rev. Psychol. 54, 547–577 (2003)CrossRef
21.
go back to reference Lu, Y., Kong, X., Quan, X., Liu, W., Xu, Y.: Exploring the sentiment strength of user reviews. In: International Conference on Web-Age Information Management, pp. 471–482 (2010)CrossRef Lu, Y., Kong, X., Quan, X., Liu, W., Xu, Y.: Exploring the sentiment strength of user reviews. In: International Conference on Web-Age Information Management, pp. 471–482 (2010)CrossRef
22.
go back to reference Eirinaki, M., Pisal, S., Singh, J.: Feature-based opinion mining and ranking. J. Comput. Syst. Sci. 78, 1175–1184 (2012)MathSciNetCrossRef Eirinaki, M., Pisal, S., Singh, J.: Feature-based opinion mining and ranking. J. Comput. Syst. Sci. 78, 1175–1184 (2012)MathSciNetCrossRef
23.
go back to reference Deng, Z.-H., Luo, K.-H., Yu, H.-L.: A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 41, 3506–3513 (2014)CrossRef Deng, Z.-H., Luo, K.-H., Yu, H.-L.: A study of supervised term weighting scheme for sentiment analysis. Expert Syst. Appl. 41, 3506–3513 (2014)CrossRef
24.
go back to reference Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38 (2011) Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38 (2011)
25.
go back to reference Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57, 245–257 (2014)CrossRef Khan, F.H., Bashir, S., Qamar, U.: TOM: Twitter opinion mining framework using hybrid classification scheme. Decis. Support Syst. 57, 245–257 (2014)CrossRef
26.
go back to reference Asghar, M.Z., Khan, A., Ahmad, S., Qasim, M., Khan, I.A.: Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12, e0171649 (2017)CrossRef Asghar, M.Z., Khan, A., Ahmad, S., Qasim, M., Khan, I.A.: Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12, e0171649 (2017)CrossRef
27.
go back to reference Mostafa, M.M.: More than words: social networks’ text mining for consumer brand sentiments. Expert Syst. Appl. 40, 4241–4251 (2013)CrossRef Mostafa, M.M.: More than words: social networks’ text mining for consumer brand sentiments. Expert Syst. Appl. 40, 4241–4251 (2013)CrossRef
28.
go back to reference Asghar, M.Z., Khan, A., Ahmad, S., Khan, I.A., Kundi, F.M.: A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS oNE 10, e0140204 (2015)CrossRef Asghar, M.Z., Khan, A., Ahmad, S., Khan, I.A., Kundi, F.M.: A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS oNE 10, e0140204 (2015)CrossRef
29.
go back to reference Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of twitter posts. Expert Syst. Appl. 40, 4065–4074 (2013)CrossRef Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based sentiment analysis of twitter posts. Expert Syst. Appl. 40, 4065–4074 (2013)CrossRef
30.
go back to reference Bell, D., Koulouri, T., Lauria, S., Macredie, R.D., Sutton, J.: Microblogging as a mechanism for human-robot interaction. Knowl. Syst. 69, 64–77 (2014)CrossRef Bell, D., Koulouri, T., Lauria, S., Macredie, R.D., Sutton, J.: Microblogging as a mechanism for human-robot interaction. Knowl. Syst. 69, 64–77 (2014)CrossRef
31.
go back to reference Popescu, O., Strapparava, C.: Time corpora: epochs, opinions and changes. Knowl. Syst. 69, 3–13 (2014)CrossRef Popescu, O., Strapparava, C.: Time corpora: epochs, opinions and changes. Knowl. Syst. 69, 3–13 (2014)CrossRef
32.
go back to reference Neviarouskaya, A., Prendinger, H., Ishizuka, M.: SentiFul: a lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2, 22–36 (2011)CrossRef Neviarouskaya, A., Prendinger, H., Ishizuka, M.: SentiFul: a lexicon for sentiment analysis. IEEE Trans. Affect. Comput. 2, 22–36 (2011)CrossRef
33.
go back to reference Asghar, M.Z., Khan, A., Ahmad, A., Kundi, F.M.: Preprocessing in natural language processing. Emerg. Issues Nat. Appl. Sci. 10, 152–161 (2013) Asghar, M.Z., Khan, A., Ahmad, A., Kundi, F.M.: Preprocessing in natural language processing. Emerg. Issues Nat. Appl. Sci. 10, 152–161 (2013)
34.
go back to reference Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings. IEEE International Forum on Research and Technology Advances in Digital Libraries: ADL 98, pp. 12–18 (1998) Ohsawa, Y., Benson, N.E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings. IEEE International Forum on Research and Technology Advances in Digital Libraries: ADL 98, pp. 12–18 (1998)
35.
go back to reference Lee, D., Jeong, O.-R., Lee, S.: Opinion mining of customer feedback data on the web. In: Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, pp. 230–235 (2008) Lee, D., Jeong, O.-R., Lee, S.: Opinion mining of customer feedback data on the web. In: Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, pp. 230–235 (2008)
Metadata
Title
Opinion mining on large scale data using sentiment analysis and k-means clustering
Authors
Sumbal Riaz
Mehvish Fatima
M. Kamran
M. Wasif Nisar
Publication date
04-08-2017
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 3/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1077-z

Other articles of this Special Issue 3/2019

Cluster Computing 3/2019 Go to the issue

Premium Partner