Abstract
The rise of bots and their influence on social networks is a hot topic that has aroused the interest of many researchers. Despite the efforts to detect social bots, it is still difficult to distinguish them from legitimate users. Here, we propose a simple yet effective semi-supervised method that allows distinguishing between bots and legitimate users with high accuracy. The method learns a joint representation of social connections and interactions between users by leveraging graph-based representation learning. Then, on the proximity graph derived from user embeddings, a sample of bots is used as seeds for a label propagation algorithm. We demonstrate that when the label propagation is done according to pairwise account proximity, our method achieves F1 = 0.93, whereas other state-of-the-art techniques achieve F1 ≤ 0.87. By applying our method to a large dataset of retweets, we uncover the presence of different clusters of bots in the network of Twitter interactions. Interestingly, such clusters feature different degrees of integration with legitimate users. By analyzing the interactions produced by the different clusters of bots, our results suggest that a significant group of users was systematically exposed to content produced by bots and to interactions with bots, indicating the presence of a selective exposure phenomenon.
- Sinan Aral and Dean Eckles. 2019. Protecting elections from social media manipulation. Science 365, 6456 (2019), 858--861.Google Scholar
- Chris Baraniuk. 2018. How Twitter bots help fuel political feuds. Sci. Amer. (2018), 20--30. https://www.scientificamerican.com/article/how-twitter-bots-help-fuel-political-feuds/.Google Scholar
- David A. Broniatowski, Amelia M. Jamison, SiHua Qi, Lulwah AlKulaib, Tao Chen, Adrian Benton, Sandra C. Quinn, and Mark Dredze. 2018. Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Amer. J. Pub. Health 108, 10 (2018), 1378--1384.Google ScholarCross Ref
- Carlos Castillo, Marcelo Mendoza, and Bárbara Poblete. 2011. Information credibility on Twitter. In Proceedings of the 20th International Conference Companion on World Wide Web (WWW’11). ACM, 675--684.Google ScholarDigital Library
- Carlos Castillo, Marcelo Mendoza, and Bárbara Poblete. 2013. Predicting information credibility in time-sensitive social media. Internet Res. 23, 5 (2013), 560--588.Google ScholarCross Ref
- Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. 2012. Detecting automation of Twitter accounts: Are you a human, bot, or cyborg? IEEE Trans. Depend. Sec. Comput. 9, 6 (2012), 811--824.Google ScholarDigital Library
- Matteo Cinelli, Stefano Cresci, Alessandro Galeazzi, Walter Quattrociocchi, and Maurizio Tesconi. 2020. The limited reach of fake news on Twitter during 2019 European elections. PLoS ONE 15, 6 (2020), e0234689.Google ScholarCross Ref
- Eric M. Clark, Chris A. Jones, Jake Ryland Williams, Allison N. Kurti, Mitchell Craig Norotsky, Christopher M. Danforth, and Peter Sheridan Dodds. 2016. Vaporous marketing: Uncovering pervasive electronic cigarette advertisements on Twitter. PLoS One 11, 7 (2016).Google Scholar
- Aaron Clauset, M. E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Phys. Rev. E 70, 6 (2004).Google ScholarCross Ref
- Stefano Cresci. 2020. A decade of social bot detection. Commun. ACM 63, 10 (2020), 72--83.Google ScholarDigital Library
- Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2015. Fame for sale: Efficient detection of fake Twitter followers. Dec. Supp. Syst. 80 (2015), 56--71.Google Scholar
- Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2017. Social fingerprinting: Detection of spambot groups through DNA-inspired behavioral modeling. IEEE Trans. Depend. Sec. Comput. 15, 4 (2017), 561--576.Google Scholar
- Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2017. The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. In Proceedings of the 26th International Conference Companion on World Wide Web (WWW’17). 963--972.Google ScholarDigital Library
- Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2020. Emergent properties, models, and laws of behavioral similarities within groups of Twitter users. Comput. Commun. 150 (2020), 47--61.Google ScholarDigital Library
- Stefano Cresci, Fabrizio Lillo, Daniele Regoli, Serena Tardelli, and Maurizio Tesconi. 2018. $FAKE: Evidence of spam and bot activity in stock microblogs on Twitter. In Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM’18). AAAI, 580--583.Google Scholar
- Stefano Cresci, Fabrizio Lillo, Daniele Regoli, Serena Tardelli, and Maurizio Tesconi. 2019. Cashtag piggybacking: Uncovering spam and bot activity in stock microblogs on Twitter. ACM Trans. Web 13, 2 (2019), 11.Google ScholarDigital Library
- Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi. 2019. Better safe than sorry: An adversarial approach to improve social bot detection. In Proceedings of the 11th International ACM Web Science Conference (WebSci’19). ACM, 47--56.Google ScholarDigital Library
- Stefano Cresci, Marinella Petrocchi, Angelo Spognardi, and Stefano Tognazzi. 2019. On the capability of evolved spambots to evade detection via genetic engineering. Online Social Netw. Media 9 (2019), 1--16.Google ScholarCross Ref
- Giovanni Da San Martino, Stefano Cresci, Alberto Barrón-Cedeño, Seunghak Yu, Roberto Di Pietro, and Preslav Nakov. 2020. A survey on computational propaganda detection. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI’20).Google ScholarCross Ref
- Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2016. BotOrNot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web (WWW’16). 273--274.Google ScholarDigital Library
- Pedro M. Domingos. 2012. A few useful things to know about machine learning.Commun. ACM 55, 10 (2012), 78--87.Google ScholarDigital Library
- Mohd Fazil and Muhammad Abulaish. 2020. A socialbots analysis-driven graph-based approach for identifying coordinated campaigns in Twitter. J. Intell. Fuzzy Syst. Preprint (2020), 1--17.Google Scholar
- Emilio Ferrara. 2020. #COVID-19 on Twitter: Bots, conspiracies, and social media activism. arXiv preprint arXiv:2004.09531 (2020).Google Scholar
- Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. 2016. The rise of social bots. Commun. ACM 59, 7 (2016), 96--104.Google ScholarDigital Library
- Emilio Ferrara, Onur Varol, Filippo Menczer, and Alessandro Flammini. 2016. Detection of promoted social media campaigns. In Proceedings of the 10th International AAAI Conference on Web and Social Media (ICWSM’16). AAAI, 563--566.Google Scholar
- Syeda Nadia Firdaus, Chen Ding, and Alireza Sadeghian. 2018. Retweet: A popular information diffusion mechanism--A survey paper. Online Social Netw. Media 6 (2018), 26--40.Google ScholarCross Ref
- Riccardo Gallotti, Francesco Valle, Nicola Castaldo, Pierluigi Sacco, and Manlio De Domenico. 2020. Assessing the risks of “infodemics” in response to COVID-19 epidemics. arXiv preprint arXiv:2004.03997 (2020).Google Scholar
- Zafar Gilani, Reza Farahbakhsh, Gareth Tyson, and Jon Crowcroft. 2019. A large-scale behavioural analysis of bots and humans on Twitter. ACM Trans. Web 13, 1 (2019), 7.Google ScholarDigital Library
- Sharad Goel, Duncan J. Watts, and Daniel G. Goldstein. 2012. The structure of online diffusion networks. In Proceedings of the 13th ACM Conference on Electronic Commerce (EC’12). ACM, 623--638.Google Scholar
- Huy Hang, Xuetao Wei, Michalis Faloutsos, and Tina Eliassi-Rad. 2013. Entelecheia: Detecting P2P botnets in their waiting stage. In Proceedings of the 12th IFIP Networking Conference. IEEE, 1--9.Google Scholar
- Philip N. Howard. 2018. How political campaigns weaponize social media bots. IEEE Spectrum 55, 11 (2018).Google Scholar
- Ville Hyvönen, Teemu Pitkänen, Sotiris K. Tasoulis, Elias Jaasaari, Risto Tuomainen, Liang Wang, Jukka Corander, and Teemu Roos. 2016. Fast nearest neighbor search through sparse random projections and voting. In Proceedings of the 3rd IEEE International Conference on Big Data (BigData’16). IEEE, 881--888.Google ScholarCross Ref
- Elias Jääsaari, Ville Hyvönen, and Teemu Roos. 2019. Efficient autotuning of hyperparameters in approximate nearest neighbor search. In Proceedings of the 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’19). 590--602.Google ScholarDigital Library
- Bence Kollanyi, Philip N. Howard, and Samuel C. Woolley. 2016. Bots and automation over Twitter during the first U.S. election. Data Memo 2016.4. Oxford, UK: Project on Computational Propaganda (2016).Google Scholar
- Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots + machine learning. In Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, 435--442.Google ScholarDigital Library
- Kyumin Lee, Brian David Eoff, and James Caverlee. 2011. Seven months with the devils: A long-term study of content polluters on Twitter. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM’11). AAAI.Google Scholar
- Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alexander Peysakhovich. 2019. PyTorch-BigGraph: A large-scale graph embedding system. In Proceedings of the 2nd Conference on Systems and Machine Learning (SysML’19).Google Scholar
- Shing-Han Li, Yu-Cheng Kao, Zong-Cyuan Zhang, Ying-Ping Chuang, and David C. Yen. 2015. A network behavior-based botnet detection mechanism using PSO and k-means. ACM Trans. Manag. Inf. Syst. 6, 1 (2015), 3.Google ScholarDigital Library
- Shenghua Liu, Bryan Hooi, and Christos Faloutsos. 2017. Holoscope: Topology-and-spike aware fraud detection. In Proceedings of the 26th ACM Conference on Information and Knowledge Management (CIKM’17). ACM, 1539--1548.Google ScholarDigital Library
- Luca Luceri, Ashok Deb, Silvia Giordano, and Emilio Ferrara. 2019. Evolution of bot and human behavior during elections. First Mond. 24, 9 (2019).Google Scholar
- Michele Mazza, Stefano Cresci, Marco Avvenuti, Walter Quattrociocchi, and Maurizio Tesconi. 2019. RTbust: Exploiting temporal patterns for botnet detection on Twitter. In Proceedings of the 11th International ACM Web Science Conference (WebSci’19). ACM, 183--192.Google ScholarDigital Library
- Stuart E. Middleton and Vadims Krivcovs. 2016. Geoparsing and geosemantics for social media: Spatiotemporal grounding of content propagating rumors to support trust and veracity analysis during breaking news. ACM Trans. Inf. Syst. 34, 3 (2016), 16.Google ScholarDigital Library
- Claude Nadeau and Yoshua Bengio. 2003. Inference for the generalization error. Mach. Learn. 52, 3 (2003), 239--281.Google ScholarDigital Library
- Leonardo Nizzoli, Serena Tardelli, Marco Avvenuti, Stefano Cresci, and Maurizio Tesconi. 2020. Coordinated behavior on social media in 2019 UK general election. arXiv preprint arXiv:2008.08370 (2020).Google Scholar
- Leonardo Nizzoli, Serena Tardelli, Marco Avvenuti, Stefano Cresci, Maurizio Tesconi, and Emilio Ferrara. 2020. Charting the landscape of online cryptocurrency manipulation. IEEE Access 8 (2020), 113230--113245.Google ScholarCross Ref
- Symeon Papadopoulos, Kalina Bontcheva, Eva Jaho, Mihai Lupu, and Carlos Castillo. 2016. Overview of the special issue on trust and veracity of information in social media. ACM Trans. Inf. Syst. 34, 3 (2016), 14.Google ScholarDigital Library
- Eli Pariser. 2011. The Filter Bubble: What the Internet is Hiding from You. Penguin UK.Google ScholarDigital Library
- David Martin Powers. 2011. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness 8 correlation. J. Mach. Learn. Technol. 2, 1 (2011), 37--63.Google ScholarCross Ref
- Michael Reinhard. 2020. Automating fandom: Social bots, music celebrity, and identity online. Transform. Works Cult. 32 (2020).Google ScholarCross Ref
- Marian-Andrei Rizoiu, Timothy Graham, Rui Zhang, Yifei Zhang, Robert Ackland, and Lexing Xie. 2018. #DebateNight: The role and influence of socialbots on Twitter during the 1st 2016 US presidential debate. In Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM’18). AAAI.Google Scholar
- Yu Rong, Qiankun Zhu, and Hong Cheng. 2016. A model-free approach to infer the diffusion network from event cascade. In Proceedings of the 25th International Conference on Information and Knowledge Management (CIKM’16). ACM, 1653--1662.Google ScholarDigital Library
- Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Trans. Neural Netw. 20, 1 (2009), 61--80.Google ScholarDigital Library
- Ross Schuchard, Andrew T. Crooks, Anthony Stefanidis, and Arie Croitoru. 2019. Bot stamina: Examining the influence and staying power of bots in online social networks. Appl. Netw. Sci. 4, 1 (2019), 55.Google ScholarCross Ref
- Chengcheng Shao, Giovanni Luca Ciampaglia, Onur Varol, Kai-Cheng Yang, Alessandro Flammini, and Filippo Menczer. 2018. The spread of low-credibility content by social bots. Nat. Commun. 9, 4787 (2018).Google Scholar
- Chengcheng Shao, Pik Mai Hui, Lei Wang, Xinwen Jiang, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2018. Anatomy of an online misinformation network. PLoS ONE 13, 4 (2018), e0196087.Google ScholarCross Ref
- Kate Starbird. 2019. Disinformation’s spread: Bots, trolls and all of us. Nature 571, 7766 (2019), 449.Google Scholar
- Kate Starbird, Ahmer Arif, and Tom Wilson. 2019. Disinformation as collaborative work: Surfacing the participatory nature of strategic information operations. In Proceedings of the 22nd ACM Conference on Computer Supported Cooperative Work 8 Social Computing (CSCW’19). ACM.Google ScholarDigital Library
- Onur Varol, Emilio Ferrara, Clayton A. Davis, Filippo Menczer, and Alessandro Flammini. 2017. Online human-bot interactions: Detection, estimation, and characterization. In Proceedings of the 11th International AAAI Conference on Web and Social Media (ICWSM’17). AAAI.Google Scholar
- Onur Varol, Emilio Ferrara, Filippo Menczer, and Alessandro Flammini. 2017. Early detection of promoted campaigns on social media. EPJ Data Sci. 6, 1 (2017), 13.Google ScholarCross Ref
- Soroush Vosoughi, Mostafa‘‘Neo’’ Mohsenvand, and Deb Roy. 2017. Rumor gauge: Predicting the veracity of rumors on Twitter. ACM Trans. Knowl. Discov. Data 11, 4 (2017), 50.Google ScholarDigital Library
- Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018), 1146--1151.Google Scholar
- Lilian Weng, Filippo Menczer, and Yong Yeol Ahn. 2013. Virality prediction and community structure in social networks. Sci. Rep. 3, 2522 (2013).Google Scholar
- Han Xiao, Cigdem Aslay, and Aristides Gionis. 2018. Robust cascade reconstruction by Steiner tree sampling. In Proceedings of the 18th International Conference on Data Mining (ICDM’18). IEEE, 637--646.Google ScholarCross Ref
- Kai-Cheng Yang, Onur Varol, Clayton A. Davis, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2019. Arming the public with artificial intelligence to counter social bots. Hum. Behav. Emerg. Technol. 1, 1 (2019), 48--61.Google ScholarCross Ref
- Kai-Cheng Yang, Onur Varol, Pik-Mai Hui, and Filippo Menczer. 2020. Scalable and generalizable social bot detection through data selection. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). AAAI.Google ScholarCross Ref
- Chengxi Zang, Peng Cui, Chaoming Song, Christos Faloutsos, and Wenwu Zhu. 2017. Quantifying structural patterns of information cascades. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW’17 Companion). 867--868.Google ScholarDigital Library
- Jinxue Zhang, Rui Zhang, Yanchao Zhang, and Guanhua Yan. 2018. The rise of social botnets: Attacks and countermeasures. IEEE Trans. Depend. Sec. Comput. 15, 6 (2018), 1068--1082.Google ScholarCross Ref
Index Terms
- Bots in Social and Interaction Networks: Detection and Impact Estimation
Recommendations
Detection of Novel Social Bots by Ensembles of Specialized Classifiers
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementMalicious actors create inauthentic social media accounts controlled in part by algorithms, known as social bots, to disseminate misinformation and agitate online discussion. While researchers have developed sophisticated methods to detect abuse, novel ...
Discovering social bots on Twitter: a thematic review
The onset of online social networks (OSN) like Twitter became a predominant platform for social expression and public relations. Twitter had 330 million monthly active users by the year 2019. With the gain in popularity, the ratio of virulent and ...
Characterizing Social Bots Spreading Financial Disinformation
Social Computing and Social Media. Design, Ethics, User Behavior, and Social Network AnalysisAbstractDespite the existence of several studies on the characteristics and role of social bots in spreading disinformation related to politics, health, science and education, financial social bots remain a largely unexplored topic. We aim to shed light ...
Comments