Skip to main content
Top

2021 | Book

Identification of Pathogenic Social Media Accounts

From Data to Intelligence to Prediction

insite
SEARCH

About this book

This book sheds light on the challenges facing social media in combating malicious accounts, and aims to introduce current practices to address the challenges. It further provides an in-depth investigation regarding characteristics of “Pathogenic Social Media (PSM),”by focusing on how they differ from other social bots (e.g., trolls, sybils and cyborgs) and normal users as well as how PSMs communicate to achieve their malicious goals. This book leverages sophisticated data mining and machine learning techniques for early identification of PSMs, using the relevant information produced by these bad actors. It also presents proactive intelligence with a multidisciplinary approach that combines machine learning, data mining, causality analysis and social network analysis, providing defenders with the ability to detect these actors that are more likely to form malicious campaigns and spread harmful disinformation.
Over the past years, social media has played a major role in massive dissemination of misinformation online. Political events and public opinion on the Web have been allegedly manipulated by several forms of accounts including “Pathogenic Social Media (PSM)” accounts (e.g., ISIS supporters and fake news writers). PSMs are key users in spreading misinformation on social media - in viral proportions. Early identification of PSMs is thus of utmost importance for social media authorities in an effort toward stopping their propaganda. The burden falls to automatic approaches that can identify these accounts shortly after they began their harmful activities.
Researchers and advanced-level students studying and working in cybersecurity, data mining, machine learning, social network analysis and sociology will find this book useful. Practitioners of proactive cyber threat intelligence and social media authorities will also find this book interesting and insightful, as it presents an important and emerging type of threat intelligence facing social media and the general public.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
Recent years have witnessed an exponential growth of online platforms such as online social networks (OSNs) and microblogging websites. These platforms play a major role in online communication and information sharing as they have become large-scale and real-time communication tools. This leads to massive user-generated data produced on a daily basis and via different forms that are rich sources of information and can be used in different tasks from marketing to research. On the negative side, online platforms have become widespread tools exploited by various malicious actors who orchestrate societal-significant threats leading to numerous security and privacy issues.
Hamidreza Alvari, Elham Shaabani, Paulo Shakarian
Chapter 2. Characterizing Pathogenic Social Media Accounts
Abstract
Over the past years, political events and public opinion on the Web have been allegedly manipulated by “Pathogenic Social Media (PSM)” accounts dedicated to spreading disinformation and performing malicious activities. These accounts are often controlled by terrorist supporters, water armies, or fake news writers and hence can pose threats to social media and general public. Understanding and analyzing PSMs could help social media devise sophisticated techniques to stop them from reaching their audience and consequently reduce their threat. In this chapter, probabilistic causal inference and well-known statistical technique Hawkes processes are utilized to distinguish between PSM and non-PSM accounts. Results on real-world ISIS-related datasets from Twitter demonstrate that PSMs behave significantly differently from regular users while disseminating information.
Hamidreza Alvari, Elham Shaabani, Paulo Shakarian
Chapter 3. Unsupervised Pathogenic Social Media Accounts Detection Without Content or Network Structure
Abstract
This chapter introduces an unsupervised causality-based framework built upon the causal inference presented in Chap. 2 using label propagation. The merit of this approach is that it identifies PSM users without using network structure, cascade path information, content and user’s information which are usually hard to obtain. Results on the ISIS-A dataset discussed in the previous chapter, show that the proposed approach obtains higher precision (0.75) in identifying PSM accounts compared with the random (precision of 0.11) and existing bot detection (precision of 0.16) methods.
Hamidreza Alvari, Elham Shaabani, Paulo Shakarian
Chapter 4. Early Detection of Pathogenic Social Media Accounts
Abstract
This chapter introduces a time-decay causality metric and incorporates it into a causal community detection-based algorithm to identify PSMs within a short time frame around their activity. The proposed algorithm is applied to groups of accounts sharing similar causality features and is followed by a classification algorithm to classify accounts as PSM or not. Unlike existing techniques that take significant time to collect information such as network, cascade path, or content, our scheme relies solely on action log of users. Results on the ISIS-B dataset described previously, demonstrate effectiveness and efficiency of our approach. We achieved precision of 0.84 for detecting PSMs only based on their first 10 days of activity; the misclassified accounts were then detected 10 days later.
Hamidreza Alvari, Elham Shaabani, Paulo Shakarian
Chapter 5. Semi-Supervised Causal Inference for Identifying Pathogenic Social Media Accounts
Abstract
The lack of sufficient labeled examples for devising and training sophisticated approaches to combat PSM accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to the massive user-generated data produced on a daily basis. This chapter proposes a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data for identifying PSM users. The proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information from users’ activities. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on the ISIS-B dataset from previous chapters suggests promising results of utilizing unlabeled instances for detecting PSMs.
Hamidreza Alvari, Elham Shaabani, Paulo Shakarian
Chapter 6. Graph-Based Semi-Supervised and Supervised Approaches for Detecting Pathogenic Social Media Accounts
Abstract
In this chapter, we adopt the causal inference framework described previously along with graph-based metrics to distinguish PSMs from normal users within a short time of their activities. We propose both supervised and semi-supervised approaches without taking the network information and content into account. Results on the ISIS-A dataset demonstrate the advantage of our proposed frameworks. We show our approach achieves 0.28 improvement in F1 score over existing approaches with the precision of 0.90 and F1 score of 0.63.
Hamidreza Alvari, Elham Shaabani, Paulo Shakarian
Chapter 7. Feature-Driven Method for Identifying Pathogenic Social Media Accounts
Abstract
In this chapter, we present a feature-driven approach to detect PSM accounts in social media. Inspired by the literature, we set out to assess PSMs from three broad perspectives: (1) user-related information (e.g., user activity, profile characteristics), (2) source-related information (i.e., information linked via URLs shared by users) and (3) content-related information (e.g., tweets characteristics). For the user-related information, we investigate malicious signals using causality analysis (i.e., if user is frequently a cause of viral cascades) and profile characteristics (e.g., number of followers, etc.). For the source-related information, we explore various malicious properties linked to URLs (e.g., URL address, content of the associated website, etc.). Finally, for the content-related information, we examine attributes (e.g., number of hashtags, suspicious hashtags, etc.) from tweets posted by users. Experiments on real-world Twitter data from different countries demonstrate the effectiveness of the proposed approach in identifying PSM users.
Hamidreza Alvari, Elham Shaabani, Paulo Shakarian
Chapter 8. Conclusion
Abstract
In this book, we presented results of the efforts to detect “Pathogenic Social Media (PSM)” accounts who are responsible for manipulating public opinion and political events. There are many challenges in the area of PSM accounts detection. In Chaps. 3 and 4, standard and time-decay probabilistic causal metrics were proposed to distinguish PSM from normal users within a short time around their activity. In Chap. 4, we investigated whether or not causality scores of PSM users within same communities are higher than those across different communities. Furthermore, as available data for training automatic approaches for detecting PSM users are usually either highly imbalanced or comprise insufficient labeled data, in Chaps. 5 and 6, we proposed semi-supervised approaches for detecting PSMs that utilize unlabeled data to compensate for the lack of sufficient labeled data. In Chap. 7, we observed that PSMs would deploy techniques to generate diverse information to make their posts look more natural. We utilize several metrics to approximate the complexity and readability of content shared online by PSMs and normal users. Finally, in Chap. 7, we took a closer look at the differences between malicious and normal behavior in terms of the posted URLs by different types of users. We leveraged several characteristics of URLs as source-level information along with other attributes in a supervised setting for detecting PSMs.
Hamidreza Alvari, Elham Shaabani, Paulo Shakarian
Metadata
Title
Identification of Pathogenic Social Media Accounts
Authors
Hamidreza Alvari
Elham Shaabani
Prof. Paulo Shakarian
Copyright Year
2021
Electronic ISBN
978-3-030-61431-7
Print ISBN
978-3-030-61430-0
DOI
https://doi.org/10.1007/978-3-030-61431-7

Premium Partner