Top

International Journal of Machine Learning and Cybernetics

Published in:

16-12-2017 | Original Article

Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop

Authors: Komal Dhingra, Sumit Kr Yadav

Published in: International Journal of Machine Learning and Cybernetics | Issue 8/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Online reviews are the most easily available free information sources used by both organizations and customers to make decisions. Establishments are utilizing significance of opinions to earn undue profit by hiring professionals known as spammers, giving positive comments on their products and negative opinions on their competitor’s product. This activity is known as opinion spamming and should be identified to give genuine results containing sentiments towards a product. So far, opinion spam detection has been considered as a discrete classification problem, generally as spam and non-spam. However, it involves uncertainty as suspicious behavior of a user might be due to coincidence. As, fuzzy logic handles real world uncertainty very well, we propose a novel fuzzy modeling based solution to the problem. We have proposed four fuzzy input linguistic variable and considered suspicious level of a spammer group to be one of—Ultra, Mega, Immense, Highly, Moderate, Slightly and Feebly. We have defined novel FSL Deduction Algorithm generating 81 fuzzy rules and Fuzzy Ranking Evaluation Algorithm (FREA) to determine the extent to which a group is suspicious. As reviews dataset satisfy the three V’s of big data (Volume, Velocity and Variety), we have considered this problem as a big data problem and used Hadoop for storage and analyzation. We have further demonstrated our proposed algorithm using a sample reviews dataset and Amazon reviews dataset achieving an accuracy of 80.77% which unlike other approaches remains steady for large number of groups and deals well with uncertainty involved in opinion spam detection.

previous article A hybrid model for opinion mining based on domain sentiment dictionary

next article Using long short-term memory deep neural networks for aspect-based sentiment analysis of Arabic reviews

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

http://www.liwc.net/tryonline.php.

http://sentiwordnet.isti.cnr.it/.

https://sites.google.com/site/alenaneviarouskaya/research-.1/sentiful.

Abouelenien M, Perez-Rosas V, Zhao B, Mihalcea R, Burzo M (2017) Gender-based multimodal deception detection. In: Symposium On Applied Computing (SAC) 2017. ACM, Morocco. https://doi.org/10.1145/3019612.3019644 CrossRef

Adike MR, Reddy V (2016) Detection of fake review and brand spam using data mining. Int J Recent Trends Eng Res 2(7):251–256

Agarwal A, Sharma V, Sikka G, Dhir R (2016) Opinion mining of news headlines using SentiWordNet. Symposium on Colossal Data Analysis and Networking (CDAN). IEEE, pp 1–5. https://doi.org/10.1109/CDAN.2016.7570949

Ahuja Y, Yadav SK (2012) Multiclass classification and support vector machine. Global J Comput Sci Technol Interdiscip 12(11):14–20

Akoglu L, Chandy R, Faloutsos C (2013) Opinion fraud detection in online reviews by network effects. In: Seventh international AAAI conference on weblogs and social media vol 13. AAAI Publications, pp 2–11

Al-Anzi FS, Yadav SK, Soni J (2014) Cloud computing: security model comprising governance, risk management and compliance. In: International conference on data mining and intelligent computing (ICDMIC). IEEE, pp. 1–6

Andrea E, Sebastiani F (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation (LREC 2006), vol. 6, pp. 417–422

Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497. https://doi.org/10.1016/j.ins.2016.04.019 CrossRef

Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC vol 10. European Language Resources Association, pp 2200–2204

10.

Balazs JA, Velasquez JD (2016) Opinion mining and information fusion: a survey. Inf Fusion 27:95–110. https://doi.org/10.1016/j.inffus.2015.06.002 CrossRef

11.

Benevenuto F, Araujo M, Ribeiro F (2015) Sentiment analysis methods for social media. In: Proceedings of the 21st Brazilian symposium on multimedia and the web. ACM, pp. 11–11. https://doi.org/10.1145/2820426.2820642

12.

Bhushan M, Banerjea S, Yadav SK (2014) Bloom filter based optimization on HBase with MapReduce. In: 2014 International conference on data mining and intelligent computing (ICDMIC). IEEE, pp. 1–5

13.

Bhuta S, Doshi U (2014) A review of techniques for sentiment analysis of twitter data. In: 2014 International conference on issues and challenges in intelligent computing techniques (ICICT). IEEE, pp 583–591. https://doi.org/10.1109/ICICICT.2014.6781346

14.

Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21. https://doi.org/10.1109/MIS.2013.30 CrossRef

15.

Chavan A, Darekar O, Kulkarni O, Jain Y (2017) Spam reviews detection using Hadoop. Int J Eng Comput Sci 6(2):20320–20323. https://doi.org/10.18535/ijecs/v6i2.30 CrossRef

16.

Choo E, Yu T, Chi M (2015) Detecting opinion spammer groups through community discovery and sentiment analysis. In: Samarati P (ed) Data and applications security and privacy XXIX. DBSec 2015. Lecture Notes Computer Science vol 9149. Springer, Cham, pp 170–187. https://doi.org/10.1007/978-3-319-20810-7_11

17.

Cormack GV (2008) Email spam filtering: a systematic review. Found Trends® Inf Retr 1(4):335–455. https://doi.org/10.1561/1500000006 CrossRef

18.

Crawford M et al (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):23. https://doi.org/10.1186/s40537-015-0029-9 CrossRef

19.

DeRoos D, Zikopoulos P, Brown B, Coss R, Melnyk RB (2014) Hadoop for dummies. Wiley, Hoboken

20.

Dixit S, Agrawal AJ (2013) Survey on review spam detection. Int J Comput Commun Technol 4(2):68–72

21.

Emmanuel I, Stanier C (2016) Defining big data. In: Proceedings of the international conference on big data and advanced wireless technologies. ACM, p. 5. https://doi.org/10.1145/3010089.3010090

22.

Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56 (4): pp 82–89. https://doi.org/10.1145/2436256.2436274 CrossRef

23.

Fusilier DH, Montes-y-Gomez M, Rosso P, Cabrera RG (2015) Detection of opinion spam with character n-grams. In: International conference on intelligent text processing and computational linguistics. Springer, pp. 285–294. https://doi.org/10.1007/978-3-319-18117-2_21

24.

Gimenes G, Cordeiro RL, Rodrigues-Jr JF (2017) ORFEL: efficient detection of defamation or illegitimate promotion in online recommendation. Inf Sci 379:274–287. https://doi.org/10.1016/j.ins.2016.09.006 CrossRef

25.

Gu B, Sheng VS (2016) A robust regularization path algorithm for ν -support vector classification. IEEE Transac Neural Netw Learn Syst 28(5):1241–1248. https://doi.org/10.1109/TNNLS.2016.2527796 CrossRef

26.

Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Transac Neural Netw Learn Syst 28(7):1646–1656. https://doi.org/10.1109/TNNLS.2016.2544779 MathSciNetCrossRef

27.

Heydari A, Tavakoli M, Salim N (2016) Detection of fake opinions using time series. Expert Syst Appl 58(C):83–92. https://doi.org/10.1016/j.eswa.2016.03.020 CrossRef

28.

Hu X, Tang J, Zhang Y, Liu H (2013) Social spammer detection in microblogging. In: Proceedings of the twenty-third international joint conference on artificial intelligence (IJCAI), vol. 13, pp. 2633–2639

29.

Hyun Y, Kim N (2016) Detecting blog spam hashtags using topic modeling. In: Proceedings of the 18th annual international conference on electronic commerce: e-commerce in smart connected world. ACM, p. 43. https://doi.org/10.1145/2971603.2971646

30.

Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp. 219–230. https://doi.org/10.1145/1341531.1341560

31.

Kaur A, Gupta V (2013) A survey on sentiment analysis and opinion mining techniques. J Emerging Technol Web Intell 5(4):367–371. https://doi.org/10.4304/jetwi.5.4.367-371 CrossRef

32.

Kim S, Chang H, Lee S, Yu M, Kang J (2015) Deep semantic frame-based deceptive opinion spam analysis. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp. 1131–1140. https://doi.org/10.1145/2806416.2806551

33.

Kumar S, Gao X, Welch I (2016) Novel features for web spam detection. In: 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp. 593–597. https://doi.org/10.1109/ICTAI.2016.0096

34.

Li H, Chen Z, Mukherjee A, Liu B, Shao J (2015) Analyzing and detecting opinion spam on a large-scale dataset via temporal and spatial patterns. In: International AAAI conference on web and social media. AAAI Press, California pp 634–637

35.

Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 3, pp. 2488–2493. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-414

36.

Li J, Ott M, Cardie C, Hovy EH (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd annual meeting of the association for computational linguistics. ACL, Baltimore, pp. 1566–1576

37.

Li L, Ren W, Qin B, Liu T (2015) Learning document representation for deceptive opinion spam detection. In: Sun M, Liu Z, Zhang M, Liu Y (eds) Chinese computational linguistics and natural language processing based on naturally annotated big data. Lecture NotesComputer Science vol 9427. Springer, Cham, pp 393–403. https://doi.org/10.1007/978-3-319-25816-4_32 CrossRef

38.

Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23. https://doi.org/10.1016/j.knosys.2014.04.022 CrossRef

39.

Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp. 939–948. https://doi.org/10.1145/1871437.1871557

40.

Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187. https://doi.org/10.1016/j.knosys.2015.12.019 CrossRef

41.

Liu B (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5(1):1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016

42.

McAuley J, Pandey R, Leskovec J (2015) Inferring networks of substitutable and complementary products. In: Proceedings of the 21th ACM SIGKDD International conference on knowledge discovery and data mining. ACM, pp. 785–794

43.

McAuley J, Targett C, Shi Q, Hengel AVD (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp. 43–52

44.

Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp. 191–200. https://doi.org/10.1145/2187836.2187863

45.

Nadaf SB, Gujar AD (2016) A survey paper on spam mail detection using RFD. Int J Adv Res Comput Sci Manag Stud 4(1):46–48

46.

Nandimath JN, Katkar BS, Ghadge VU, Garad AN (2017) Efficiently detecting and analyzing spam reviews using live data feed. Int Res J Eng Technol (IRJET) 4(2):1421–1424

47.

Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on knowledge capture, pp 70–77. https://doi.org/10.1145/945645.945658

48.

Neviarouskaya A, Prendinger H, Ishizuka M (2011) SentiFul: A lexicon for sentiment analysis. IEEE Transac Affect Comput 2(1):22–36. https://doi.org/10.1109/T-AFFC.2011.1 CrossRef

49.

Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the 15th international conference on World Wide Web. ACM, pp. 83–92. https://doi.org/10.1145/1135777.1135794

50.

Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: 9th it and t conference, Dublin Institute of Technology, Dublin, p 13. https://doi.org/10.21427/D77S56

51.

Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) 2013. Association for Computational Linguistics, pp. 497–501

52.

Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol.1. Association for Computational Linguistics, pp. 309–319

53.

Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends® Inf Retr 2(1–2):1–135. https://doi.org/10.1561/150000001 CrossRef

54.

Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol. 10. Association for Computational Linguistics, pp. 79–86. https://doi.org/10.3115/1118693.1118704

55.

Peng J, Choo KK, Ashman H (2016) Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. J Netw Comput Appl 70:171–182. https://doi.org/10.1016/j.jnca.2016.04.001 CrossRef

56.

Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108(C):42–49. https://doi.org/10.1016/j.knosys.2016.06.009 CrossRef

57.

Qian T, Liu B (2013) Identifying multiple userids of the same author. In: Proceedings of conference on empirical methods in natural language processing (EMNLP-2013), pp. 1124–1135

58.

Qiu J, Wu Q, Ding G, Xu Y, Feng S (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016(1):1–16. https://doi.org/10.1186/s13634-016-0355-x CrossRef

59.

Rao Y, Xie H, Li J, Jin F, Wang FL, Li Q (2016) Social emotion classification of short text via topic-level maximum entropy model. Inf Manag 53(8):978–986CrossRef

60.

Ren Y, Ji D (2017) Neural networks for deceptive opinion spam detection: an empirical study. Inf Sci 385(C):213–224. https://doi.org/10.1016/j.ins.2017.01.015 CrossRef

61.

Roul RK, Asthana SR, Kumar G (2016) Spam web page detection using combined content and link features. Int J Data Mining Model Manag 8(3):209–222

62.

Rout J, Dalmia A, Choo KK, Bakshi S, Jena S (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5:1319–1327. https://doi.org/10.1109/ACCESS.2017.2655032 CrossRef

63.

Rubin VL (2017) Deception detection and rumor debunking for social media. In: Sloan L, Quan-Haase(eds) A handbook of social media research methods. Sage, London, pp 1–25

64.

Schuckert M, Liu X, Law R (2016) Insights into suspicious online ratings: direct evidence from TripAdvisor. Asia Pacific J Tourism Res 21(3):259–272. https://doi.org/10.1080/10941605.2015.1029954 CrossRef

65.

Sheela LJ (2016) A review of sentiment analysis in twitter data using Hadoop. Int J Database Theory Appl 9(1):77–86. https://doi.org/10.14257/ijdta.2016.9.1.07 CrossRef

66.

Taddy M (2013) Measuring political sentiment on twitter: factor optimal design for multinomial inverse regression. Technometrics 55(4):415–425. https://doi.org/10.1080/00401706.2013.778791 MathSciNetCrossRef

67.

Tavakoli M, Heydari A, Ismail Z, Salim N (2015) A framework for review spam detection research. World Academy of Science, Engineering and Technology. Int J Comput Electrical Autom Control Inf Eng 10(1):67–71

68.

Tayal DK, Yadav SK (2015) Word level sentiment analysis using fuzzy sets. Int J Adv Sci Technol. 54: 73–78

69.

Tayal DK, Yadav SK (2016) Fast retrieval approach of sentimental analysis with implementation of bloom filter on Hadoop. In: 2016 International conference on computational techniques in information and communication technologies (ICCTICT). IEEE, pp. 14–18. https://doi.org/10.1109/ICCTICT.2016.7514544

70.

Tayal DK, Yadav SK (2016) Sentiment analysis on social campaign “Swachh Bharat Abhiyan” using unigram method. AI & SOCIETY, pp 1–13. https://doi.org/10.1007/s00146-016-0672-5

71.

Tayal DK, Yadav S, Gupta K, Rajput B, Kumari K (2014) Polarity detection of sarcastic political tweets. In: 2014 International conference on computing for sustainable global development (INDIACom). IEEE, pp. 625–628. https://doi.org/10.1109/IndiaCom.2014.6828037

72.

Tsang ECC, Chen D, Yeung DS, Wang XZ, Lee JWT (2008) Attributes reduction using fuzzy rough sets. IEEE Transac Fuzzy Syst 16(5):1130–1141. https://doi.org/10.1109/TFUZZ.2006.889960 CrossRef

73.

Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: What 140 characters reveal about political sentiment. In: Proceedings of the fourth international aaai conference on weblogs and social media(ICWSM), vol. 10, no. 1, pp. 178–185

74.

Tuteja SK (2016) A Survey on classification algorithms for email spam filtering. Int J Eng Sci 6(5):5937–5940. https://doi.org/10.4010/2016.1440 CrossRef

75.

Vashisht P, Gupta V (2015) Big data analytics techniques: a survey. In: Green Computing and Internet of Things (ICGCIoT), 2015 International Conference. IEEE, pp. 264–269. https://doi.org/10.1109/ICGCIoT.2015.7380470

76.

Viviani M, Pasi G (2017) Quantifier guided aggregation for the veracity assessment of online reviews. Int J Intell Syst 32(5):481–501. https://doi.org/10.1002/int.21844 CrossRef

77.

Wang XZ (2015) Learning from big data with uncertainty–editorial. J Intell Fuzzy Syst 28(5):2329–2330MathSciNetCrossRef

78.

Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654. https://doi.org/10.1109/TFUZZ.2014.2371479 CrossRef

79.

Yadav SK, Bhushan M, Gupta S (2015) Multimodal sentiment analysis: Sentiment analysis using audiovisual format. In: 2015 2nd international conference on computing for Sustainable Global Development (INDIACom). IEEE, pp. 1415–1419

80.

Yadav S, Dhingra K, Kaushik D (2016) Opinion mining using SentiFul. In: 3rd International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, pp. 2406–2411

81.

Ye J, Kumar S, Akoglu L (2016) Temporal opinion spam detection by multivariate indicative signals. In: Proceedings of the tenth international AAAI conference on web and social media. Association for the Advancement of Artificial Intelligence, pp. 743–746

82.

Yen J, Langari R (1998) Fuzzy logic: intelligence, control, and information. Prentice-Hall, Inc., Upper Saddle River

83.

Zheng X, Lin Z, Wang X, Lin KJ, Song M (2014) Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowl-Based Syst 61(1):29–47. https://doi.org/10.1016/j.knosys.2014.02.003 CrossRef

Title: Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop
Authors: Komal Dhingra
Sumit Kr Yadav
Publication date: 16-12-2017
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 8/2019
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-017-0768-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 8/2019

Ensemble learning on visual and textual data for social image emotion classification

Topic specific emotion detection for retweet prediction

Leveraging semantics for sentiment polarity detection in social media

Editorial: Affective and sentimental computing

Evolutionary dynamic multi-objective optimization algorithm based on Borda count method

An improved biogeography/complex algorithm based on decomposition for many-objective optimization