Top

Published in:

2020 | OriginalPaper | Chapter

Clustering of Tweets: A Novel Approach to Label the Unlabelled Tweets

Author : Tabassum Gull Jan

Published in: Proceedings of ICRIC 2019

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Twitter is one of the fastest growing microblogging and online social networking site that enables users to send and receive messages in the form of tweets. Twitter is the trend of today for news analysis and discussions. That is why Twitter has become the main target of attackers and cybercriminals. These attackers not only hamper the security of Twitter but also destroy the whole trust people have on it. Hence, making Twitter platform impure by misusing it. Misuse can be in the form of hurtful gossips, cyberbullying, cyber harassment, spams, pornographic content, identity theft, common Web attacks like phishing and malware downloading, etc. Twitter world is growing fast and hence prone to spams. So, there is a need for spam detection on Twitter. Spam detection using supervised algorithms is wholly and solely based on the labelled dataset of Twitter. To label the datasets manually is costly, time-consuming and a challenging task. Also, these old labelled datasets are nowadays not available because of Twitter data publishing policies. So, there is a need to design an approach to label the tweets as spam and non-spam in order to overcome the effect of spam drift. In this paper, we downloaded the recent dataset of Twitter and prepared an unlabelled dataset of tweets from it. Later on, we applied the cluster-then-label approach to label the tweets as spam and non-spam. This labelled dataset can then be used for spam detection in Twitter and categorization of different types of spams.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Suicidal Ideation from the Perspective of Social and Opinion Mining

next chapter Performance Analysis of Queries with Hive Optimized Data Models

Ala’M, A.Z., Faris, H., et al.: Spam profile detection in social networks based on public features. In: 2017 8th International Conference on information and Communication Systems (ICICS). pp. 130–135. IEEE (2017)

Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS). vol. 6, p. 12 (2010)

Eshraqi, N., Jalali, M., Moattar, M.H.: Detecting spam tweets in twitter using a data stream clustering algorithm. In: 2015 International Congress on Technology, Communication and Knowledge (ICTCK). pp. 347–351. IEEE (2015)

Fazil, M., Abulaish, M.: A hybrid approach for detecting automated spammers in twitter. IEEE Trans. Inf. Forensics Secur. 13(11), 2707–2719 (2018)CrossRef

Gautam, G., Yadav, D.: Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In: 2014 Seventh International Conference on Contemporary Computing (IC3). pp. 437–442. IEEE (2014)

Liu, C., Wang, G.: Analysis and detection of spam accounts in social networks. In: 2016 2nd IEEE International Conference on Computer and Communications (ICCC). pp. 2526–2530. IEEE (2016)

Meda, C., Bisio, F., Gastaldo, P., Zunino, R.: A machine learning approach for twitter spammers detection. In: 2014 International Carnahan Conference on Security Technology (ICCST). pp. 1–6. IEEE (2014)

Peikari, M., Salama, S., Nofech-Mozes, S., Martel, A.L.: A cluster-then-label semi-supervised learning approach for pathology image classification. Sci. Rep. 8(1), 7193 (2018)CrossRef

Perveen, N., Missen, M.M.S., Rasool, Q., Akhtar, N.: Sentiment based twitter spam detection. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 7(7), 568–573 (2016)

10.

Sedhai, S., Sun, A.: Semi-supervised spam detection in twitter stream. IEEE Trans. Computational Soc. Syst. 5(1), 169–175 (2018)CrossRef

11.

Song, J., Lee, S., Kim, J.: Spam filtering in twitter using sender-receiver relationship. In: International workshop on recent advances in intrusion detection. pp. 301–317. Springer (2011)

12.

Stringhini, G., Kruegel, C., Vigna, G.: Detecting spammers on social networks. In: Proceedings of the 26th annual computer security applications conference. pp. 1–9. ACM (2010)

13.

Wu, T., Liu, S., Zhang, J., Xiang, Y.: Twitter spam detection based on deep learning. In: Proceedings of the Australasian Computer Science Week Multiconference. p. 3. ACM (2017)

14.

Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Info. Forensics Sec. 8(8), 1280–1293 (2013)CrossRef

Title: Clustering of Tweets: A Novel Approach to Label the Unlabelled Tweets
Author: Tabassum Gull Jan
Publisher: Springer International Publishing
Book: Proceedings of ICRIC 2019
Print ISBN: 978-3-030-29406-9

Electronic ISBN: 978-3-030-29407-6

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-29407-6_48

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner