nach oben

The Journal of Supercomputing

Erschienen in:

10.10.2018

Twitter spam account detection based on clustering and classification methods

verfasst von: Kayode Sakariyah Adewole, Tao Han, Wanqing Wu, Houbing Song, Arun Kumar Sangaiah

Erschienen in: The Journal of Supercomputing | Ausgabe 7/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Twitter social network has gained more popularity due to the increase in social activities of registered users. Twitter performs dual functions of online social network (OSN), acting as a microblogging OSN, and at the same time as a news update platform. Recently, the growth in Twitter social interactions has attracted the attention of cybercriminals. Spammers have used Twitter to spread malicious messages, post phishing links, flood the network with fake accounts, and engage in other malicious activities. The process of detecting the network of spammers who engage in these activities is an important step toward identifying individual spam account. Researchers have proposed a number of approaches to identify a group of spammers. However, each of these approaches addressed a specific category of spammer. This paper proposes a different approach to detect spammers on Twitter based on the similarities that exist among spam accounts. A number of features were introduced to improve the performance of the three classification algorithms selected in this study. The proposed approach applied principal component analysis and tuned K-means algorithm to cluster over 200,000 accounts, randomly selected from more than 2 million tweets to detect the clusters of spammers. Experimental results show that Random Forest achieved the highest accuracy of 96.30%. This result is followed by multilayer perceptron with 96.00% and support vector machine, which achieved 95.60%. The performance of the selected classifiers based on class imbalance also revealed that Random Forest achieved the highest accuracy, precision, recall, and F-measure.

Vorheriger Artikel 2PBDC: privacy-preserving bigdata collection in cloud environment

Nächster Artikel A novel predicted replication strategy in cloud storage

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Adewole KS, Anuar NB, Kamsin A, Varathan KD, Razak SA (2016) Malicious accounts: dark of the social networks. J Netw Comput Appl. https://doi.org/10.1016/j.jnca.2016.11.030 CrossRef

Adikari S, Dutta K (2014) Identifying fake profiles in LinkedIn. In: PACIS

Aggarwal A, Rajadesingan A, Kumaraguru P (2012) PhishAri: Automatic realtime phishing detection on twitter. In: eCrime Researchers Summit (eCrime)

Ahmed F, Abulaish M (2012) An MCL-based approach for spam profile detection in online social networks. In: 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

Ahmed F, Abulaish M (2013) A generic statistical approach for spam detection in Online Social Networks. Comput Commun 36(10–11):1120–1129. https://doi.org/10.1016/j.comcom.2013.04.004 CrossRef

Aiyar S, Shetty NP (2018) N-gram assisted Youtube spam comment detection. Procedia Comput Sci 132:174–182CrossRef

Al-Qurishi M, Al-Rakhami M, Alamri A, Alrubaian M, Rahman SMM, Hossain MS (2017) Sybil defense techniques in online social networks: a survey. IEEE Access 5:1200–1219CrossRef

Almaatouq A, Shmueli E, Nouh M, Alabdulkareem A, Singh VK, Alsaleh M, Alfaris A (2016) If it looks like a spammer and behaves like a spammer, it must be a spammer: analysis and detection of microblogging spam accounts. Int J Inf Secur 15:475–491CrossRef

Alsaleh M, Alarifi A, Al-Salman AM, Alfayez M, Almuhaysin A (2014) TSD: detecting sybil accounts in Twitter. In: 2014 13th IEEE International Conference on Machine Learning and Applications (ICMLA)

10.

Atluri AC, Tran V (2017) Botnets threat analysis and detection. In: Traoré I, Awad A, Woungang I (eds) Information security practices. Springer, Cham

11.

Avci E, Turkoglu I (2009) An intelligent diagnosis system based on principle component analysis and ANFIS for the heart valve diseases. Expert Syst Appl 36(2):2873–2878CrossRef

12.

Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on Twitter. In: 7th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, CEAS 2010

13.

Bhat SY, Abulaish M (2013) Community-based features for identifying spammers in online social networks. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

14.

Bhat SY, Abulaish M, Mirza AA (2014) Spammer classification using ensemble methods over structural social network features. In: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol 02

15.

Chan PPK, Yang C, Yeung DS, Ng WWY (2014) Spam filtering for short messages in adversarial environment. Neurocomputing 155:167–176. https://doi.org/10.1016/j.neucom.2014.12.034 CrossRef

16.

Chen C-M, Guan D, Su Q-K (2014) Feature set identification for detecting suspicious URLs using Bayesian classification in social networks. Inf Sci 289:133–147CrossRef

17.

Chu Z, Gianvecchio S, Wang H, Jajodia S (2012) Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Trans Dependable Secure Comput 9(6):811–824. https://doi.org/10.1109/TDSC.2012.75 CrossRef

18.

Chu Z, Wang H, Widjaja I (2012) Detecting social spam campaigns on Twitter. In: Bao F, Samarati P, Zhou J (eds) Applied cryptography and network security. Lecture notes in computer science, vol 7341. Springer, Berlin

19.

Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. arXiv preprint arXiv:1701.03017

20.

DMR (2014) Statistics of social networking sites. http://expandedramblings.com/index.php/resource-how-many-people-use-the-top-social-media

21.

Do-Jong K, Yong-Woon P, Dong-Jo P (2001) A novel validity index for determination of the optimal number of clusters. IEICE Trans Inf Syst 84(2):281–285

22.

Echeverría J, Zhou S (2017) TheStar Wars’ botnet with > 350 k Twitter bots. arXiv preprint arXiv:1701.02405

23.

Egele M, Stringhini G, Kruegel C, Vigna G (2015) Towards detecting compromised accounts on social networks. IEEE Tran Dependable Secure Comput. https://doi.org/10.1109/TDSC.2015.2479616 CrossRef

24.

Gani K, Hacid H, Skraba R (2012) Towards multiple identity detection in social networks. In: Proceedings of the 21st International Conference Companion on World Wide Web. ACM

25.

Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY (2010) Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement

26.

Gao S, Ma X, Wang L, Yu Y (2016) Spammer detection based on comprehensive features in Sina Microblog. In: 2016 13th International Conference on Service Systems and Service Management (ICSSSM)

27.

Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st International Conference World Wide Web, p 61

28.

Google (2015) Google safe browsing API. Retrieved from 25 Nov 2015, http://code.google.com/apis/safebrowsing/

29.

Grier C, Thomas K, Paxson V, Zhang M (2010) @spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp 27–37

30.

Iqbal F, Binsalleeh H, Fung BC, Debbabi M (2010) Mining writeprints from anonymous e-mails for forensic investigation. Digit Investig 7(1):56–64CrossRef

31.

Kiliroor CC, Valliyammai C (2019) Social context based Naive Bayes filtering of spam messages from online social networks. In: Nayak J, Abraham A, Krishna B, Chandra SG, Das A (eds) Soft computing in data analytics. Springer, Singapore, pp 699–706CrossRef

32.

Kim K-J, Ahn H (2008) A recommender system using GA K-means clustering in an online shopping market. Expert Syst Appl 34(2):1200–1209CrossRef

33.

Lee S, Kim J (2014) Early filtering of ephemeral malicious accounts on Twitter. Comput Commun 54:48–57CrossRef

34.

Lin P-C, Huang P-M (2013) A study of effective features for detecting long-surviving Twitter spam accounts. In: 2013 15th International Conference on Advanced Communications Technology (ICACT), p 841

35.

Luckner M, Gad M, Sobkowiak P (2014) Stable web spam detection using features based on lexical items. Comput Secur 46:79–93. https://doi.org/10.1016/j.cose.2014.07.006 CrossRef

36.

Martinez-Romo J, Araujo L (2013) Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Syst Appl 40:2992–3000. https://doi.org/10.1016/j.eswa.2012.12.015 CrossRef

37.

Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: Calero JMA, Yang LT, Mármol FG, García Villalba LJ, Li AX, Wang Y (eds) Autonomic and trusted computing. Springer, Berlin, pp 175–186CrossRef

38.

Meligy AM, Ibrahim HM, Torky MF (2017) Identity verification mechanism for detecting fake profiles in online social networks. Int J Comput Netw Inf Secur 9(1):31

39.

Muhammad K, Ahmad J, Rho S, Baik SW (2017) Image steganography for authenticity of visual contents in social networks. Multimed Tools Appl 76(18):18985–19004CrossRef

40.

Muhammad K, Sajjad M, Mehmood I, Rho S, Baik SW (2016) Image steganography using uncorrelated color space and its application for security of visual contents in online social networks. Future Gener Comput Syst 86:951–960CrossRef

41.

Narudin FA, Feizollah A, Anuar NB, Gani A (2016) Evaluation of machine learning classifiers for mobile malware detection. Soft Computing 20(1):343–357CrossRef

42.

Noriega L (2005) Multilayer perceptron tutorial. School of Computing, Staffordshire University, Staffordshire

43.

Nowakowska E, Koronacki J, Lipovetsky S (2016) Dimensionality reduction for data of unknown cluster structure. Inf Sci 330:74–87CrossRef

44.

PhishTank (2015) Phishtank API. Retrieved from 25 Nov 2015, http://www.phishtank.com/

45.

Principal Components Analysis (2009) Principal components: Mathematics, example, interpretation. http://www.stat.cmu.edu/~cshalizi/350/lectures/10/lecture-10.pdf

46.

Quadri SA (2012) Feature extraction and selection methods & introduction to principal component analysis: a tutorial. http://www.slideshare.net/reachquadri/feature-extraction-and-principal-component-analysis

47.

Rokach L, Maimon O (2005) Clustering methods. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook. Springer, Boston, pp 321–352CrossRef

48.

Sadan Z, Schwartz DG (2011) Social network analysis of web links to eliminate false positives in collaborative anti-spam systems. J Netw Comput Appl 34(5):1717–1723CrossRef

49.

Shlens J (2014) A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100

50.

Singh M, Bansal D, Sofat S (2014) Detecting malicious users in Twitter using classifiers. In: ACM International Conference Proceeding Series, p 247

51.

Smith LI (2002) A tutorial on principal components analysis. Cornell University, USA, 51, 52

52.

Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222MathSciNetCrossRef

53.

Statista (2016) Leading social networks worldwide as of April 2016, ranked by number of active users (in millions). http://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/

54.

Twitter (2016) The twitter rules. Retrieved from 28 Jan 2016, https://support.twitter.com/articles/18311

55.

URIBL (2015) URIBL API. Retrieved from 25 Nov 2015, http://uribl.com/

56.

Viswanath B, Bashir MA, Crovella M, Guha S, Gummadi KP, Krishnamurthy B, Mislove A (2014) Towards detecting anomalous user behavior in online social networks. In: Proceedings of the 23rd USENIX Security Symposium (USENIX Security)

57.

Vorakitphan V, Leu F-Y, Fan Y-C (2018) Clickbait detection based on word embedding models. In: International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing

58.

WEKA (2016) The University of Waikato. Retrieved from 2 Feb 2016, http://www.cs.waikato.ac.nz/ml/weka/

59.

Wikipedia (2016) Determining the number of clusters in a data set. Retrieved from 24 Jan 2016, https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

60.

Yang Z, Xue J, Yang X, Wang X, Dai Y (2015) VoteTrust: leveraging friend invitation graph to defend against social network Sybils. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2015.2410792 CrossRef

61.

Yi X, Zhang Y (2013) Equally contributory privacy-preserving k-means clustering over vertically partitioned data. Inf Syst 38(1):97–107CrossRef

62.

Yoon JW, Kim H, Huh JH (2010) Hybrid spam filtering for mobile communication. Comput Secur 29(4):446–459. https://doi.org/10.1016/j.cose.2009.11.003 CrossRef

63.

Zhang X, Zhu S, Liang W (2012) Detecting spam and promoting campaigns in the Twitter social network. In: 2012 IEEE 12th International Conference on Data Mining

64.

Zheng X, Zeng Z, Chen Z, Yu Y, Rong C (2015) Detecting spammers on social networks. Neurocomputing 159:27–34. https://doi.org/10.1016/j.neucom.2015.02.047 CrossRef

Titel: Twitter spam account detection based on clustering and classification methods
verfasst von: Kayode Sakariyah Adewole
Tao Han
Wanqing Wu
Houbing Song
Arun Kumar Sangaiah
Publikationsdatum: 10.10.2018
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 7/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-018-2641-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 7/2020

Programming bsp and multi-bsp algorithms in ml

Power information network intrusion detection based on data mining algorithm

dOCAL: high-level distributed programming with OpenCL and CUDA

A new cost-saving and efficient method for patch management using blockchain

Mesh convergence test system in integrated platform environment for finite element analysis

Comment on “Privacy-preserving public auditing for non-manager group shared data”