Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2016

01.12.2016 | Original Article

Discover millions of fake followers in Weibo

verfasst von: Yi Zhang, Jianguo Lu

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Weibo is the Chinese counterpart of Twitter, which has attracted hundreds of millions of users. Just like other Online Social Networks (hereafter OSNs), Weibo has a large number of fake accounts. They are created to sell their following links to customers, who want to boost their follower counts. These bogus accounts are difficult to identify individually, especially when they are created by sophisticated programs or controlled by human beings directly. This paper proposes a novel fake account detection method that is based on the very purpose of the existence of these accounts: they are created to follow their targets en masse, resulting in high-overlapping between the follower lists of their customers. This paper investigates the top Weibo accounts whose follower lists duplicate or nearly duplicate each other (hereafter called near-duplicates). Discovering near-duplicates is a challenging task. The network is large; the data in its entirety are not available; the pair-wise comparison is very expensive. We developed a sampling-based approach to discover all the near-duplicates of the top accounts, who have at least 50,000 followers. In the experiment, we found 395 near-duplicates, which leads us to 11.90 million fake accounts (4.56 % of total users) who send 741.10 million links (9.50 % of the entire edges). Furthermore, we characterize four typical structures of the spammers, cluster these spammers into 34 groups, and analyze the properties of each group.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, page 12 Benevenuto F, Magno G, Rodrigues T, Almeida V (2010) Detecting spammers on twitter. In: Collaboration, electronic messaging, anti-abuse and spam conference (CEAS), vol 6, page 12
Zurück zum Zitat Chen C, Wu K, Srinivasan V, Zhang V (2013) Battling the internet water army: detection of hidden paid posters. In: The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining Chen C, Wu K, Srinivasan V, Zhang V (2013) Battling the internet water army: detection of hidden paid posters. In: The 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Zurück zum Zitat Chu Z et al (2012) Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans Depend Secure Comput 9(6):811–824CrossRef Chu Z et al (2012) Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans Depend Secure Comput 9(6):811–824CrossRef
Zurück zum Zitat Dasgupta A, Kumar R, Sarlos T (2014) On estimating the average degree. In: Proceedings of the 23rd international conference on World wide web. International World Wide Web Conferences Steering Committee Dasgupta A, Kumar R, Sarlos T (2014) On estimating the average degree. In: Proceedings of the 23rd international conference on World wide web. International World Wide Web Conferences Steering Committee
Zurück zum Zitat Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web, pp 61–70. ACM Ghosh S, Viswanath B, Kooti F, Sharma NK, Korlam G, Benevenuto F, Ganguly N, Gummadi KP (2012) Understanding and combating link farming in the twitter social network. In: Proceedings of the 21st international conference on World Wide Web, pp 61–70. ACM
Zurück zum Zitat Giles J (2011) Social-bots infiltrate twitter and trick human users. New Sci 209(2804):28CrossRef Giles J (2011) Social-bots infiltrate twitter and trick human users. New Sci 209(2804):28CrossRef
Zurück zum Zitat Gjoka M, Kurant M, Butts C, Markopoulou A (2009) A walk in facebook: uniform sampling of users in online social networks. arXiv:0906.0060 Gjoka M, Kurant M, Butts C, Markopoulou A (2009) A walk in facebook: uniform sampling of users in online social networks. arXiv:​0906.​0060
Zurück zum Zitat Henzinger M (2006) Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR. ACM Henzinger M (2006) Finding near-duplicate web pages: a large-scale evaluation of algorithms. In SIGIR. ACM
Zurück zum Zitat Hu X, Tang J, Zhang Y, Liu H (2013) Social spammer detection in microblogging. In: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pp 2633–2639. AAAI Press Hu X, Tang J, Zhang Y, Liu H (2013) Social spammer detection in microblogging. In: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pp 2633–2639. AAAI Press
Zurück zum Zitat Jacomy M, Venturini T, Heymann S, Bastian M (2014) Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS One, 9(6):1–12 Jacomy M, Venturini T, Heymann S, Bastian M (2014) Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS One, 9(6):1–12
Zurück zum Zitat Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In WWW, pp 597–606. ACM Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In WWW, pp 597–606. ACM
Zurück zum Zitat Lee S-M, Chao A (1994) Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50(1):88–97CrossRefMATH Lee S-M, Chao A (1994) Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50(1):88–97CrossRefMATH
Zurück zum Zitat Lin C, He J, Zhou J, Yang X, Chen K, Song L (2013) Analysis and identification of spamming behaviors in sina weibo microblog. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, ACM Lin C, He J, Zhou J, Yang X, Chen K, Song L (2013) Analysis and identification of spamming behaviors in sina weibo microblog. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, ACM
Zurück zum Zitat Lu J, Li D (2013) Bias correction in small sample from big data. TKDE, IEEE Trans Knowledge Data Eng 25(11):2658–2663CrossRef Lu J, Li D (2013) Bias correction in small sample from big data. TKDE, IEEE Trans Knowledge Data Eng 25(11):2658–2663CrossRef
Zurück zum Zitat Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp 141–150, New York. ACM Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, pp 141–150, New York. ACM
Zurück zum Zitat Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge England Manning CD, Raghavan P, Schütze H et al (2008) Introduction to information retrieval, vol 1. Cambridge University Press, Cambridge England
Zurück zum Zitat Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Information Sci 260:64–73CrossRef Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Information Sci 260:64–73CrossRef
Zurück zum Zitat Myers SA, Sharma A, Gupta P, Lin J (2014) Information network or social network?: The structure of the twitter follow graph. In 23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, Companion Volume, pp 493–498. International World Wide Web Conferences Steering Committee Myers SA, Sharma A, Gupta P, Lin J (2014) Information network or social network?: The structure of the twitter follow graph. In 23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, Companion Volume, pp 493–498. International World Wide Web Conferences Steering Committee
Zurück zum Zitat Newman M (2010) Networks: an introduction. Oxford University Press Inc, Oxford England Newman M (2010) Networks: an introduction. Oxford University Press Inc, Oxford England
Zurück zum Zitat Perlroth N (2013) Fake twitter followers become multimillion-dollar business. NewYork Times Perlroth N (2013) Fake twitter followers become multimillion-dollar business. NewYork Times
Zurück zum Zitat Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference on - ACSAC ’10, p 1, New York. ACM Press Stringhini G, Kruegel C, Vigna G (2010) Detecting spammers on social networks. In: Proceedings of the 26th Annual Computer Security Applications Conference on - ACSAC ’10, p 1, New York. ACM Press
Zurück zum Zitat Tao K, Abel F, Hauff C, Houben GJ, Gadiraju U (2013) Groundhog day: near-duplicate detection on twitter. In: Proceedings of the 22nd international conference on World Wide Web, pp 1273–1284. International World Wide Web Conferences Steering Committee Tao K, Abel F, Hauff C, Houben GJ, Gadiraju U (2013) Groundhog day: near-duplicate detection on twitter. In: Proceedings of the 22nd international conference on World Wide Web, pp 1273–1284. International World Wide Web Conferences Steering Committee
Zurück zum Zitat Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM Thomas K, Grier C, Song D, Paxson V (2011) Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM
Zurück zum Zitat Wang A (2009) Don’t follow me: spam detection in twitter. In: International Conference on Security and Cryptography (SECRYPT) Wang A (2009) Don’t follow me: spam detection in twitter. In: International Conference on Security and Cryptography (SECRYPT)
Zurück zum Zitat Wang H, Lu J (2013) Detect inflated follower numbers in osn using star sampling. The IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp 127–133 Wang H, Lu J (2013) Detect inflated follower numbers in osn using star sampling. The IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp 127–133
Zurück zum Zitat Wu B, Davison BD (2005) Identifying link farm spam pages. In: Proceedings of the 14th International World Wide Web Conference, pp 820–829. ACM Press Wu B, Davison BD (2005) Identifying link farm spam pages. In: Proceedings of the 14th International World Wide Web Conference, pp 820–829. ACM Press
Zurück zum Zitat Zhang Q, Ma H, Qian W, Zhou A (2013) Duplicate detection for identifying social spam in microblogs. In: Big Data (BigData Congress), 2013 IEEE International Congress on, pp 141–148. IEEE Zhang Q, Ma H, Qian W, Zhou A (2013) Duplicate detection for identifying social spam in microblogs. In: Big Data (BigData Congress), 2013 IEEE International Congress on, pp 141–148. IEEE
Metadaten
Titel
Discover millions of fake followers in Weibo
verfasst von
Yi Zhang
Jianguo Lu
Publikationsdatum
01.12.2016
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2016
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-016-0324-2

Weitere Artikel der Ausgabe 1/2016

Social Network Analysis and Mining 1/2016 Zur Ausgabe