Most of the studies we reviewed have implemented supervised ML and DL to detect social bots, spambots, and sybil bots which shall be discussed below.
3.1.3 Facebook—detecting sybil bots
We found a reasonable number of studies that thrived in recognizing sybils (Fake profiles) on Facebook. This study proposed by (Albayati and Altamimi
2019) was about a smart system known as FBChecker that checks if a profile is fake. A set of behavioral and informational attributes were analyzed and classified by the system using the data mining approach. Four data mining algorithms which include KNN, DT, SVM, and NB were used. The RapidMiner data science platform was used to implement the selected classifiers. The dataset of 200 profiles was prepared by the authors. A Receiver Operating Characteristic Curve (ROC) graph comparison was created to check the accuracy and all classifiers showed a high accuracy rate, but SVM outperformed with an accuracy rate of 98%.
Subsequently, (Hakimi et al.
2019) proposed supervised ML techniques based on only five characteristics that play a key role in distinguishing fake and true users on Facebook. The important characteristics finalized were Average Post Likes Received, Average Post Comments, Average Post Comments Received, Average Post Liked, and Average Friends. A sample data of 800 users were generated by Mockaroo. The data were categorized into four clusters: Inactive User, Assume Fake account User, Fake account user, and Real User. Classifiers namely KNN, SVM, and NN were implemented. Results showed that KNN outperformed with an accuracy of 0.829. It was concluded that the features “likes”, and “remarks” add a significant value to the job of detection.
Moreover, (Singh and Banerjee
2019) created a dataset on Facebook using their graph API to be utilized for sybil accounts detection. Also, a comparative analysis of various algorithms over the dataset was performed. The dataset contained 995 both real and fake accounts. Twenty-nine features were extracted including textual, categorical, and numerical features. AdaBoost, Bagging, XGBoost, Gradient Boost (GB), RF, LR, Support Vector Classifier (LinearSVC), and ExtraTree algorithms were applied for evaluation. AdaBoost was the best-performing algorithm with a 99% F1-score.
However, (Saranya Shree et al.
2021) suggested Natural Language Processing (NLP) pre-processing techniques and ML algorithms such as SVM and NB to classify fake and genuine profiles on Facebook. A dataset of 516 profiles was used and trained until 30 epochs. It predicted 91.5% fake accounts and 90.2% genuine accounts correctly.
Another strategy for identifying sybils on Facebook was presented by (Babu et al.
2021). By using the Facebook graph API, they gathered a dataset of 500 users from a survey of 500 Facebook users in order to better understand the nature and distinguishing characteristics of sybil. The tested dataset was used to identify fake profiles using the NB classifier. Seven profile-based features were used in the model. Their suggested solution had a 98% efficiency rate. Moving on, (Gupta and Kaushal
2017) has described an approach to detect fake accounts. The key contributions of the authors’ work include a collection of a private dataset using the Facebook API through Python wrappers. After data collection, a set of 17 features was shortlisted which included likes, comments, shares, tag, apps usage, etc. A total of 12 supervised ML classification algorithms were used (from Weka), namely, k-Nearest Neighbor, Naive Bayes, Decision Tree classifiers (J48, C5.0, Reduced Error Pruning Trees Classification (REPT), Random Tree, Random Forest), etc. Two types of cross-validation were performed, namely, the holdout method, and tenfold cross-validation. A classification accuracy of 79% was achieved. The user activities contributed the maximum to the detection of fake accounts.
3.1.6 Instagram—detecting sybil bots
Many studies were able to detect sybil bots starting with (Meshram et al.
2021) proposed an automated methodology for fake profiles detection. The authors collected 1203 accounts including real and fake accounts using Instagram API. In addition, a list of eight content- and behavior-based features were extracted. Authors needed to oversample the dataset using SMOTE-NC before applying any algorithm due to the unevenness of the real-fake accounts ratio. Afterward, NN, SVM, and RF algorithms were applied. RF depicted the best-performing results with an accuracy of 97%.
Whereas, using the same records and features, (Sheikhi
2020) presented a bagging classifier and performed a comparative analysis with five well-known ML algorithms, which were RT, J48, SVM, Radial Basis Function (RBF), MLP, Hoeffding Tree, and NB with 10-cross-validation. The bagging classifier showed better performance by successfully classifying 98% of the accounts. Moreover, the author presented the best feature types for different sizes of datasets.
Additionally, (Dey et al.
2019) also assessed fake and real different Instagram accounts. A publicly labeled dataset of sixteen accounts was obtained from Kaggle. Twelve profile-based features were extracted from the sample dataset. Missing Value Treatment, OuSybiltlier Detection, and Bivariate Analysis were carried out as a part of the Exploratory Data Analysis. Median imputation was done to deal with the outliers. For the extent of this paper, LR, and RF—two supervised classification algorithms were used. Lastly, out of the two mentioned classifiers, RF showed the best performance with 92.5% accuracy.
Subsequently, the research of (Purba et al.
2020) aimed to identify fake users’ behavior. Furthermore, different approaches of classification have been proposed. 2-class (authentic, fake) and 4-classes (authentic, spammer, active fake user, inactive fake user) classifications. The total number of fake and authentic users in the dataset was 32,460 users. They used seventeen features based on metadata, media info, media tags, media similarity, and engagement. Using these features with RF, MLP, LR, NB, and J48 algorithms showed promising results. RF showed an accuracy of up to 91.76% for 4-classes classification. Moreover, analysis outcomes showed that metadata and statistics results are the foremost predictors for classification.
Nevertheless, (Kesharwani et al.
2021) utilized a six-layered DL model NN to classify fake and genuine Instagram accounts. The designed model used 12 profile-based features. An open dataset of 696 Instagram users available on Kaggle was used for this experiment and was collected using a crawler. The dataset had 10 profile-based features. The model’s training was done using 20 epochs and therefore giving an accuracy of 93.63%.
Quite interestingly, (Bazm and Asadpour
2020) proposed a behavioral-based model. A labeled dataset was collected by the authors including 2000 accounts of both fake and genuine users. Seven behavioral features were extracted from the dataset. KNN, DT, SVM, RF, and AdaBoost algorithms were tested and analyzed. AdaBoost showed the best-performing results with an accuracy of 95%. Additionally, the Max feature was identified as the most effective for classification followed by standard deviation, following count, and entropy. Three of the above-mentioned most effective features were behavioral.
Lastly, the work of (Thejas et al.
2019) also focused on detecting valid and fake likes of Instagram posts by applying automated single and ensembled learning models. A labeled dataset of 10,346 observations and 37 features has been composed. The authors used numeric features and text-based features to perform extensive analysis of fake likes related patterns. Various single classifiers have been used such as LR SVM, KNN, NB, and NN with different versions. Adjacent to ensembled-based classifiers as RF with multiple versions as well. Moreover, bot detection using an autoencoder has been experimented. RF showed the highest performance among all with 97% accuracy.
Numerous studies were able to detect social bots on Twitter starting (Echeverra et al.
2018) tested 20 unseen bot classes of varying sizes and characteristics using bot classifiers. Two datasets were collected using Twitter’s API consisting of 2.5 million accounts. Twenty-nine Profile- and Content-based features were employed for classification. The classifiers used to test were GB Trees (XGBoost and LightGBoost Model (LGBM)), RF, DT, and AdaBoost. LGBM showed the highest accuracy rate of 97.84% on both the subsampling used—C30K and C500.
Moreover, (Fonseca Abreu et al.
2020) examined whether feature set reduction for Twitter bot detection yields comparable outcomes to large sets. Five Profile-based features were used for classification. The dataset used consisted of 4565 records of both social bots and genuine users. The ML algorithms tested namely were RF, SVM, NB, and one-class SVM. AUC’s greater than 0.9 were obtained by all multiclass classifiers. However, RF exhibited the best results with an AUC of 0.9999.
Varol et al. (
2017) used more than a thousand features which were based on metadata primarily based on friends, tweet content, sentiment, network patterns, and activity time series. A publicly accessible dataset of size 31 K that contains manually verified Twitter accounts as bots or real was used to train the model. The model’s accuracy was evaluated using RF, AdaBoost, LR, and DT classifiers. The best performance was depicted by RF of 0.95 AUC. Furthermore, it was concluded that the most significant sources of data are user metadata and content features.
Twenty-eight features were extracted based on profile, tweets, and behavior (Knauth
2019). For easy future portability, language-agnostic features were mainly focused on. LR, SVM, RF, AdaBoost, and MLP classifiers were used for experiments. AdaBoost outperformed all competitors with an accuracy of 0.988. Smaller quantities of training data were analyzed, and it was shown that using a few, expressive characteristics provides good practical benefits for bot identification.
In this study, after a long process of feature extraction and data pre-processing, (Kantepe and Gañiz
2017) employed ML techniques. Thousand eight hundred accounts were used to get the data from Twitter API and Apache Spark, which was In this study, after a long process of feature extraction and data pre-processing, (Kantepe and Gañiz
2017) employed ML techniques. One thousand eight hundred accounts data was obtained with Twitter API and Apache Spark, which was then used to extract 62 different features. The features extracted were mainly profile-based features, Twitter features and periodic features. Four classifiers were used which include LR, Multinomial Naïve Bayes (MNB), SVM and GB. The highest accuracy result 86% was shown by the GB trees.
This research conducted by (Barhate et al.
2020) used two approaches for the detection of bots and analyzed their influence in trending a hashtag on Twitter. First, the bot probability of a user was calculated using a supervised ML technique and a new feature bot score. A total of 13 features were extracted for data pre-processing and Estimation of Distribution Algorithms (EDA). The data were trained using RF classifier, which produced an AUC result of 0.96. This study also came to the conclusion that bots had a high friend-to-follower ratio and a low follower growth rate.
The dataset that was acquired by (Pratama and Rakhmawati
2019) is from the supporters of the Indonesian presidential candidate on Twitter. The top five hashtags for each candidate were used to collect tweets, which were then manually labeled with the accounts' bot characteristics, resulting in a limit of about 4.000 tweets. SVM and RF, two ML models, are utilized for bot detection. These two models were trained with cross-validation ten-folds to improve the overall score. From these two models, RF has a higher overall score than SVM of 74% in F1-Score, Accuracy, and AUC. Comparing the 10 retrieved features from the dataset, they discovered that the account year creation had the biggest separation between humans and bots.
Davis et al. (
2016) made use of RF classifier to evaluate and detect social bots by creating a system called BotOrNot. A public dataset of 31 K accounts was used to train the model. From six main groups of characteristics—network, user, friend, temporal, content, and sentiment features—the framework collected more than 1000 features. These various classifiers—one for each category of features and one for the overall score—were trained using extracted features. The system performance was assessed using ten-fold cross-validation, and an AUC value of 95% was obtained.
Likewise, a Twitter bot identification technique was also presented by (Shevtsov et al.
2022). 15.6 million tweets ‘total, including 3.2 million accounts sent during the US Elections, were included in their dataset from Twitter. The XGBoost algorithm was used to pick 229 features from approximately 337 user-extracted features. Their suggested ML pipeline involves training and validating many three ML models which are SVM, RF, and XGBoost. Performance was best for XGBoost where their findings indicate that it performs well on the collected dataset compared to the training data section because of its great generalization capabilities. Only 2% of the F1 score is going from 0.916 to 0.896, and 0.03% of the ROC-AUC indicates a decline in performance from 0.98 to 0. 977.
Additionally, SPY-BOT, a post-filtering method based on ML for social network behavior analysis, was introduced by (Rahman et al.
2021). Six hundred training samples were used to extract eleven characteristics. They contrast the two ML algorithms LR and SVM throughout the training phase. After comparing outcomes, tuned SVM was the best performing. On the validation dataset, their method achieves up to 92.7% accuracy while up to 90.1% accuracy was obtained on the testing dataset. As result, they suggest that the proposed approach able to classify the users’ behavior in Social Network-Integrated Industrial Internet of Things (SN-IIoT).
Also, a real-time streaming framework called Shot Boundary Determination (SBD) was also suggested by (Alothali, Alashwal, et al.
2021a) as a way to detect social bots before they launch an attack to protect users. To gather tweets and extract user profile features, the system uses the Twitter API. They used a publicly available Twitter dataset from Kaggle, which has a total of 37,438 records, as their offline dataset. Friends count, Followers count, Favorites count, Status count, Account age days, and Average tweets per day were the six features that were extracted and further used as input to their ML model. They use RF algorithm to differentiate between the bots and human accounts. The outcomes of their methodology demonstrated the effectiveness of retrieving, publishing the data, and monitoring the estimates.
Shukla et al. (
2022) proposed a novel AI-driven multi-layer condition-based social media bot detection framework called TweezBot. Moreover, the authors have performed a comparative analysis with several existing models and an extensive study of features, and exploratory data. The proposed method analyzed each Twitter-specific user profile features and activity-centric characteristics, such as profile name, location, description, verification status, and listed count. 2789 distinct user profiles were used to extract these features from a public labeled dataset from Kaggle. ML models used for comparative evaluation and analysis were RF, DT, Bernoulli Naïve Bayes (BNB), CNB, SVC, and MLP. TweezBot attained a maximum accuracy of 99.00049%.
Since bots are used to manipulate activities in politics as well (Fernquist et al.
2018) presented a study on political Twitter bots and their impact on the September 2018 Swedish general elections. To identify automatic behavior, an ML model that is independent of language was developed. The training data consist of both bots and genuine accounts. Three different datasets (Cresci et al.
2015; Gilani et al.
2017; Varol et al.
2017) were used to train the classification model. Furthermore, a list of 140 user metadata, Tweet and Time features were extracted. Various algorithms such as AdaBoost, LR, SVM, and NB were tested. RF outperformed with an accuracy of 0.957.
Similarly, (Beğenilmiş and Uskudarli
2018) made use of collective behavior features in hashtag-based tweet sets, which were compiled by searching for relevant hashtags. A dataset of 850 records was utilized to train the model using algorithms including RF, SVM, and LR. From tweets collected during the 2016 US presidential election, 299 features were retrieved. To capture the coordinated behavior, the features represent user and temporal synchronization characteristics. These models were developed to distinguish between organic and inorganic, political and non-political, and pro-Trump or pro-Hillary or neither tweet set behavior. The RF displayed the best outcomes, with an F-measure of 0.95. In conclusion, this study found that media utilization and tweets marked as favorites are the most dominant features and user-based features were the most valuable ones.
On the other hand, in this approach, (Rodríguez-Ruiz et al.
2020) one-class classification was suggested. One benefit of one-class classifiers is that they do not need examples of abnormal behavior, such as bot accounts. The public dataset (Cresci et al.
2017) was used. Bagging-TPMiner (BTPM), Bagging-RandomMine (BRM), One-Class K-means with Randomly projected features Algorithm (OCKRA), one-class SVM, and NB were the classifiers that were taken into consideration. For categorization, only 13 numerical features were extracted. With an average AUC value of 0.921, Bagging-TPMiner outperformed all other classifiers over a number of experiments.
Moreover, (Attia et al.
2022) proposed a new multi-input DNN technique-based content-based bot detection model. They used the 6760 records from the public PAN 2019 Bots and Gender Profiling Task (Rangel and Rosso
2019) dataset. The proposed multi-input model includes three phases. Their proposed Multi-input model includes 3 phases. The first phase represents the first input as an N-gram model of a 3D matrix of 100*8*300 as model input to two-dimensional CNN. On the other hand, the second phase input is one-dimensional CNN model that has a vector with M length (100 tweets) as model input. The final phase has the previous models with fully connected neural networks to combine them. Each model was trained using suitable hyper-parameters values. Their model achieved a detection accuracy of 93.25% and outperforms other newly proposed models in bot detection.
In the work of (Sayyadiharikandeh et al.
2020) for each class of bots, they recommended training specialized classifiers and combining their conclusions using the maximum rule. In the most recent version of Botometer, they also produced Ensemble Specialized Classifier (ESC). Additionally, the authors used 18 different public labeled datasets from Bot Repository, and over 1200 features were extracted. Features were divided into 6 categories: metadata, retweet/mention networks, temporal features, content information, and sentiment features. Accordingly, a cross-domain performance comparison and analysis was performed using all the 18 different datasets. The authors recommend considering the three types of bot class as in (Cresci et al.
2017) dataset. Moreover, the authors provided a list of the most informative features per bot classes in the used public dataset.
A comprehensive comparative analysis was conducted by (Shukla et al.
2021) to determine the optimal feature encoding, feature selection, and ensembling method. From the Kaggle repository, a total of 37,438 records comprising the training and testing dataset were acquired. Scaling of numerical attributes and encoding of categorical attributes were two steps in the pre-processing of the dataset. A total of 19 attributes were extracted. The model used the classifiers: RF, Adaboost, NN, SVM, and KNN. It was determined that employing RF for blending produced the best results and the highest AUC score of 93%. Since the proposed approach uses Twitter profile metadata, it can detect bots more quickly than a system that analyzes an account's behavior. However, the system's reliance on static analysis reduces its efficiency.
Ramalingaiah et al. (
2021) represented an effective text-based bag of words (BoW) model. BoW produces a numerical vector that can be utilized as inputs in different ML algorithms. Using resulted features from feature selection process, different ML algorithms were implemented like DT, KNN, LR, and NB to calculate their accuracies and compare it with their classifier which uses the BoW model to detect Twitter bots from a given training data. The utilized dataset from Kaggle with 2792 training entries and 576 testing entries for evaluation of their models. As a result, the performance of the decision tree gives the highest accuracy which further uses a bag of bots’ algorithm to increase accuracy in detecting bots. Their classifier performs the best as it uses a bag of words model with test data yields an accuracy of over 99%.
A ML method based on benchmarking was proposed by (Pramitha et al.
2021) to choose the best model for bot account detection. Dataset obtained from Kaggle with 24,631 records then scraping was performed using the Twitter API to obtain profile features. Furthermore, over-sampling using SMOTE is applied to overcome imbalanced data and improve the models’ accuracy. Both RF and XGBoost algorithms were evaluated. XGBoost algorithm outperforms RF, with an accuracy of 0.8908. Additionally, after ranking fifteen different features, they discovered that three significant features—verified, network, and geo-enable—can identify between human and bot accounts.
Many studies implemented effective DL algorithms instead of ML, such as a Behavior-enhanced Deep Model (BeDM) proposed by (Cai, Li, and Zengi
2017b) for bot detection using a real-world public labeled dataset of size 5658 accounts and 5,122,000 tweets from Twitter, which have been collected with honeypots. The model fused tweets content as temporal text data and the user posting behavior information using DL by applying a DNN to detect bots. The DL frameworks used in the BeDM are CNN and LSTM. Compared to Boosting (Gilani et al.
2016; Lee et al.
2006; Morstatter et al.
2016) baselines, the BeDM attained the highest F1 score of 87.32%, which proved the efficacy of the model.
Later in the same year, (Cai, Li, and Zeng
2017a) proposed analogous work. Yet, the novel Deep Bot Detection Model (DBDM) avoids the laborious feature engineering and automatically learns both behavioral and content representations based on the user representation. Additionally, DBDM took into consideration endogenous and exogenous factors that have an impact on user behavior. DBDM achieved a better results with an F1-score of 88.30%.
Additionally, (Hayawi et al.
2022) also proposed a DL framework, DeeProBot used eleven user profile metadata-based features. Five training and five testing datasets were used from Bot Repository. Additionally, the text feature was embedded using GLoVe which aided in enhanced learning from the features. To detect bots, DeeProBot employed a hybrid Deep NN model. On the hold-out test set, DeeProBot gave an AUC of 0.97 for bot detection.
However, in a novel framework called GANBOT (Najari et al.
2022) modified the (Generative Adversarial Network) GAN concept. The generator and classifier were connected via an LSTM layer as a shared channel between them, reducing the convergence limitation on Twitter. By raising the likelihood of bot identification, the suggested framework outperformed the existing contextual LSTM technique. A total of 8386 from the Cresci2017 dataset were used. Results were assessed for four distinct vector dimensions: 25D, 50D, 100D, and 200D; the highest result was 949/0.951 for 200D.
A total of seventeen state-of-the-art methods for bot detection were described by (Kenyeres and Kovács
2022) together based on DL models. They classified Twitter feeds as bots or humans, based solely on the account’s textual form of the tweets. PAN 2019 Bots and Gender Profiling task (Rangel and Rosso
2019) dataset was used which consisted of 11,560 labeled users. The core of seven models was based on LSTM networks, four based on Encoder Representations from Transformers (BERT) models, and one a combination of the two. For tweet classification, the best accuracy was obtained using fine-tuned BERT model of 0.828. While for account classification, the Adaboost model archived the best accuracy of 0.9. Their findings demonstrate that, even with a small dataset, DL models may compete with Classical Machine Learning (CML) methods.
Moreover, (Martin-Gutierrez et al.
2021) provide a multilingual method for detecting suspect Twitter accounts through DL. Dataset used in their work was collected using Twitter API of 37,438 Twitter accounts. Several experiments were conducted using different combinations of Word Embeddings to obtain a single vector regarding the text-based features of the user account. These features are later on concatenated with the rest of the metadata to build a potential input vector on top of a Dense Network denoted as Bot-DenseNet. The comparison of these experiments showed that the Bot-DenseNet when using the so-called RoBERTa Transformer as part of the input feature vector with an F1-score of 0.77, produces the best acceptable trade-off between performance and feasibility.
In this research, (Ping and Qin
2019) proposed a social bot detection model DeBD based on the DL algorithm CNN-LSTM for Twitter. CNN was used by DeBD to extract the joint features of the tweet content and their relationship. To carry out the experiments, a dataset of 5132 accounts was created. Secondly, the potential temporal features of the tweet metadata were extracted using LSTM. Finally, in order to achieve the purpose of detecting social bots, the temporal features were finally fused with the joint content features. The dataset used in this experiment was from (Cresci et al.
2017). All the experiments achieved a detection accuracy of more than 99%.
Daouadi et al. (
2019) proved that a Deep Forest algorithm combined with thirteen metadata-based features is sufficient to accurately identify bot accounts on Twitter. Two datasets were used which were published by (Lee et al.
2006; Subrahmanian et al.
2016). The Twitter API was used to gather the dataset. The implementation was performed for more than 30 conventional algorithms, including Bagging, MLP, AdaBoost, RF, SL, etc. With an accuracy of 97.55%, the Deep Forest method surpassed the other conventional supervised learning techniques.
In this paper, (Cable and Hugh
2019) implemented the algorithms: NB, LR, Kernel SVM, RF, and LSTM-NN to identify political trolls across Twitter and compared their accuracies. A dataset of tweet ids related to the 2016 elections was used by scraping the Twitter API and obtaining a total of 142,560 unique tweets. The features were extracted using several methods: Word count, TF-IDF, and Word embeddings. The LSTM-NN obtained a test accuracy of 0.957.
Since it is important to determine the best features for enhancing the detection of social bots. To locate these ideal features, (Alothali, Hayawi, et al.
2021b) offer a hybrid feature selection (FS) technique. This method evaluates profile metadata features using random forest, naive Bayes, support vector machines, and neural networks. Using a public dataset made accessible by Kaggle that had a total of 18 profile metadata features, they investigated four feature selection approaches. In order to find the best feature subset, they employed filter and wrapper approaches. They discovered that, when compared to other FS methods, the cross-validation attribute evaluation performed the best. According to their findings, the random forest classifier has the best score using six optimal features: favorites count, verified, statuses count, average tweets per day, lang, and ID.
Lastly, (Sengar et al.
2020) proposed both ML and DL to distinguish bots from genuine users on Twitter. This was done by gathering user activity and profile-based features, then applying supervised ML and NLP to accomplish the goal. A labeled Twitter dataset which contains more than 5000 users and 200,000 tweets was used to train the classifiers. After analysis and feature engineering, eight features were extracted. Different learning models were compared and analyzed to determine the best-performing bot detection system namely KNN, DT, RF, AdaBoost, GB, Gaussian Naive Bayes (GNB), MNB, and MLP. Results showed that NN-based MLP algorithm gave the most accurate prediction with an accuracy of 95.08%. A CNN architecture was proposed for tweet level analysis by combining user and tweet metadata. The MIB Dataset (Cresci et al.
2017) was used. The novel approach gave a staggering improvement. RF and GB gave the highest accuracy of 99.54%.
Some studies demonstrate the detection of spammers, starting with a hybrid method for identifying automated spammers based on their interactions with their followers was presented (Fazil and Abulaish
2018). Nineteen distinct features were retrieved, integrating community-based features with those from other categories like metadata-, content-, and interaction-based features. A real public dataset of 11,000 labeled users was used. The performance was analyzed using three supervised ML techniques namely RF, DT, and BN which were implemented in Weka. All three metrics—DR-0.976, FPR-0.017, and F-score 0.979, were found to be the best for RF. Lastly, it was determined that interaction- and community-based features are the most successful for spam identification in comparison after executing a feature ablation test and examining the discrimination capability of various features.
Oentaryo et al. (
2016) categorized bots based on their behavior as broadcast, consumption, and spambots. A systematic profiling framework was developed which included a set of features and a classifier bank. Numeric, categorical, and series features were taken into consideration. The private manually labeled dataset used consisted of bots and non-bot 159 K accounts. Four supervised ML algorithms were employed which include: NB, RF, SVM, and LR. It was seen that LR outperforms the other classifiers by depicting an F1 score of 0.8228.
The research conducted by (Heidari et al.
2020) firstly, they created a new public data set containing profile-based features for more than 6900 Twitter accounts from the (Cresci et al.
2017) dataset where the input feature set consisted of age, gender, personality, and education from users’ online posts. To build their system, they compare the following classifiers: RF, LR, AdaBoost, Feed-forward NN (FFNN), SGD. The results showed that the FFNN model with 97% accuracy provides the best results as compared with the other classifiers. Lastly, a new bot detection model was introduced which uses a contextualized representation of each tweet by using Embeddings from Language Model (ELMO) and Global Vectors for Word Representation (GloVe) in the word embedding phase to have a complete representation of each tweet’s text. The model created multiple FFNN’s models on top of multilayer bidirectional LSTM models to extract different aspects of a tweet’s text. The model detected bots from human accounts, regardless of having the same user profile and achieved 94% prediction accuracy in two different testing datasets.
A spam detection AI approach for Twitter social networks was proposed by (Prabhu Kavin et al.
2022). The dataset (7973 accounts) was collected using Twitter Rest API and combined with the public dataset “The Fake Project” (Cresci et al.
2015). For pre-processing, dataset tokenization, stop word removal, and stemming were applied. User-based and content-based features were extracted from the dataset. To develop the model, a variety of ML methods, including SVM, ANN, and RF, were applied. With user-based features, the findings showed that SVM had the highest precision (97.45%), recall (98.19%), and F measure (97.32%).
In this research, (Eshraqi et al.
2016) determined a clustering algorithm that identified spam tweets (anomaly problem) on the basis of the data stream. The dataset consisted of 50,000 Twitter user accounts and 14 million tweets. The pre-processing was done by RapidMiner and then, transferred into Massive Online Analysis (MOA) for implementation. The features extracted were based on Graphs, Content, Time, and Keywords. When using the DenStream algorithm (Cao et al.
2006), regulating needed to be done properly. The model successfully identified 89% of available spam tweets. Furthermore, the results achieved by the model showed an accuracy of 99%.
Mateen et al. (
2017) used 13 user-, content—as well as graph-based features to classify between human and spam profiles. The real public dataset used for this study was provided by (Gu
2022) which consisted of 11 K user accounts and 400 K tweets approximately. Three classifiers namely J48, DE, and NB were used for evaluation. J48 and DE outperformed the other classifiers using the hybrid technique of combined features by showing a 97.6% precision. Results showed that for the dataset employed, the hybrid technique significantly improved precision and recall. Additionally, compared to content- and graph-based features, which demonstrated 92% accuracy, user- and graph-based features correctly classified only 90% of cases.
Moreover, (Chen et al.
2017a,
b) found that over time, the statistical characteristics of spam tweets in their labeled dataset changed, which impacted the effectiveness of the existing ML classifiers and is known as Twitter spam drift. Using Twitter's Streaming API, a public dataset of 2 million tweets was gathered. The Web Reputation Technology from Trend Micro was used to identify the tweets that were considered spam. The Lfun system, which was learned from unlabeled tweets, was proposed. Day 1 training and Day 2 to Day 9 testing results showed that RF only obtained DR ranging from 45 to 80%, whereas RF-Lfun increased to 90%. The Detection Rate of RF was roughly 85% from Day 2 training to Day 10 testing, but that of RF-Lfun was over 95%.
Kumar and Rishiwal (
2020) explored and provided a framework for identifying spammers, content polluters, and bots using a ML approach based on NN usage. A data set consisting of 5572 tweets containing the text messages and their categorization labeling was used. Various algorithms were trained mainly MNB, Bernoulli, NB, SVM, and Complementary NB. The most effective and best classification of spam account detection was shown by MNB with an accuracy of 99%.
In this study, (Güngör et al.
2020) used a dataset of 714 tweets that had been manually labeled and retrieved through the Twitter API. Eight profile-based features and five tweet-based features were extracted and analyzed. Additionally, a set of guidelines had been discovered via adding followers and friend FF rate, and spam accounts had been detected. For this experiment, the algorithms NB, J48, and LR were used. J48 performed the best, achieving an accuracy of 97.2%. In conclusion, the accuracy rate increased as a result of the usage of both tweet- and profile-based features.
By utilizing a dataset of 82 accounts of tweeters who use both Arabic and English, (Al-Zoubi et al.
2017) improved spam identification. J48, MLP, KNN, and NB were the algorithms used and compared in tenfold cross-validation with stratified sampling as a training/testing methodology. With an accuracy of 94.9, J48 demonstrated the best spam detection ability using the top seven features discovered by ReliefF.
For bot detection, (Heidari et al.
2021) analyzed the sentiment features of tweets' content for each account to measure their impact on the accuracy of ML algorithms. The authors have used (Cresci et al.
2017) dataset of the size of 12,736 accounts and 6,637,615 tweets. The bot detection methodology proposed by the authors is centered on the number of tweets that show a concentration on extreme opinions for an individual account. Whether the opinions are overly negative, positive, or neutral, it indicates the user is a bot. ML models such as RF, NN, SVM, and LR were examined using the proposed sentiment features. The highest result was achieved using Support Vector Regression (SVR) with an F1-score of 0.930.
The research work (Rodrigues et al.
2022) focused on identifying live tweets as spam or ham and performed sentiment analysis on both live and stored tweets to classify them as either positive, negative, or neutral. The proposed methodology used two different datasets from Kaggle. Vectorizers like TF-IDF and BoW models were used to extract sentiment features, which were then fed into a variety of ML and DL classifiers. The classifiers achieved the highest accuracy rate using LSTM in both spam detection with 98.74% and sentiment analysis with 73.81% accuracy.
The work (Andriotis and Takasu
2019) proposed a content-based approach to identify spambots. Technically, four public datasets were used in this study, which was (Cresci et al.
2017; Varol et al.
2017; Yang et al.
2012,
2013). Collectively, the datasets contain tweets of nearly up to 20 K accounts of both bots and genuine users. The methodology proposed employed metadata, content, and sentiment features. Furthermore, the performance of the KNN, DT, NB, SVM, RF, and AdaBoost algorithms was tested. AdaBoost showed the best result with a 0.95 F1-score. Additionally, the study depicted that sentiment features add value when combined with known features to bot detection algorithms.
Also, (Sadineni
2020) detect spam using a dataset from Kaggle that included 950 users and ten content-based attributes, demonstrating that SVM and RF outperform NB in terms of performance.
On the other hand, (Kudugunta and Ferrara
20182018) presented a contextual LSTM architecture based on a DNN that uses account metadata and tweet text to identify bots at the tweet level. The tweet text served as the primary input for the model. It was tokenized and converted into a series of GloVe vectors before being fed into the LSTM, which then fed the data into a 2-layer NN with ReLU activations. High classification accuracy can be attained using the suggested model. Additionally, the compared techniques for account-level bot identification that used synthetic minority oversampling reached over 99% AUC.
In this study, Arabic spam accounts were detected using text-based data with CNN models and metadata with NN models by (Alhassun and Rassam
2022) utilizing Twitter's premium API, and a dataset of 1.25 million tweets was collected. By flagging terminated accounts, data labeling was carried out. 13 features based on tweets, accounts, and graphs were retrieved. The findings demonstrated that the suggested combination framework used premium features to reach an accuracy of 94.27%. The performance of spam detection improved when premium features were compared to standard features when used with Twitter.
An efficient technique for spam identification was introduced by (Inuwa-Dutse et al.
2018). They suggested an SPD Optimized set of features that are apart from historical tweets. They focused on user-related attributes, user accounts, and paired user engagement. MaxEnt, Random Forest, ExtraTrees, SVM, GB, MLP, MLP+, and SVM were among the classification models that were utilized and evaluated based on three datasets, Honeypot (Lee et al.
2006), SPDautomated, and SPDmanual. The performance reached a peak of 99.93% when using GB on the SPD Optimized set. This technique can be used in real-time as the first step in a social media data gathering pipeline to increase the validity of research data.
Instead of employing the LCS method, (Sheeba et al.
2019) discovered spams using the RF classifier technique. The study used a dataset of 100,000 tweets. Latent Semantic Analysis was used to further identify the account after the RF classifier had identified it as a spambot using Latent Semantic Analysis (LSA). The proposed approach delivered benefits in terms of time consumption, high accuracy, and cost effectiveness.
An approach to spam identification based on DL methods was developed by (Alom et al.
2020). CNN architecture was utilized for the text-based classifier, while CNN and NN were merged for the combined classifier to classify tweet text and metadata, respectively. On two distinct real-world public datasets, Honeypot (Lee et al.
2006) and 1KS-10 K (Yang et al.
2013), the suggested approach's performance was compared to those of five ML-based and two DL-based state-of-the-art approaches. For the datasets Honeypot and 1KS-10KN, the accuracy of 99.68% and 93.12%, respectively, was attained.
In this research, (Reddy et al.
2021) implemented some supervised classification algorithms to detect spammers on Twitter. Information was obtained from tweepyAPI which comprised 2798 accounts in the training set and 578 accounts in the test set. Eighteen profile-base features were extracted. In terms of accuracy, Extreme Machine Learning (EML) obtained a better accuracy of 87.5.
Firstly, (Narayan
2021) used ML algorithms for the detection and successful identification of bogus Twitter accounts/bots. The algorithms used were DT, RF, and MNB. The dataset used included 447 Twitter accounts. Twitter API was used for the excavation of the data. DT has been found to be more accurate as compared to RF and MNB.
In their work, (Bindu et al.
2022) proposed three efficient methods to successfully detect fake accounts. The classification algorithms used were as follows: Linear and radial SVM, RF, and KNN. The data set used contained a total of 3964 records. RF gave more accurate prediction results accordingly overcoming the overfitting problem. The K-Fold Cross-Validation Scores for RF include a mean of 0.979812 and a standard deviation of 0.019682. On the other hand, in comparison Radial SVM did not perform well, and gave more False Negatives. However, using the Ensemble approach, higher accuracy was achieved.
Likewise, (Alarifi et al.
2016) studied the features used for detecting sybil accounts. Twitter4j was used to gather a manually labeled sample dataset of 2000 Twitter accounts (humans, bots, and hybrid-both human and bot tweets). Eight content-based features were selected. Four supervised ML algorithms which include J48 (C4.5), Logistic Model Tree, RF, Logitboost, BN, SMO-P, SMO-R, and multilayer NN were used. RF performed the best with a DR of 91.39 for two-class and 88.00 for three-class classification. Lastly, in order to maximize the use of the classifier, the authors developed an efficient browser plug-in.
David et al. (
2017) leveraged a public labeled dataset from the project BoteDeTwitter to build half of their data set related to Spain politics. Using the Twitter API, a sample of 853 bot profiles and the most recent 1000 tweets from each user's timeline was collected. To create an initial feature set, 71 features based on profiles, metadata, and content were extracted. The following supervised ML methods were compared: RF, SVM, NB, DT, and NNET. Even though the increases were not significant after the first six features, RF managed to get the highest average accuracy of 94% by using 19 features.
In (van der Walt and Eloff
2018) paper, Twitter data were mined using the twitter4J API and a non-relational database yielding a total of 169,517 accounts. Engineered traits that had previously been used to successfully identify fraudulent accounts made by bots were added to a sample of human accounts. Without relying on behavioral data, these features were applied to several supervised ML models, enabling training on very little data. The results show that engineered traits, which were previously employed to identify fake accounts created by bots, could only reasonably predict fake accounts created by humans with an F1 score of 49.75%.
Kondeti et al. (
2021) implemented ML to detect fake accounts on the Twitter platform. Different ML algorithms were used such as SVM, LR, RF, and KNN along with six account metadata features likes, Lang-code, sex-code, status-count, friends-count, followers-count, and favorites-count. Further to improve these algorithms’ accuracy, they used two different normalization techniques such as Z-Score and Min–Max. Their approach achieved high accuracy of 98% for both RF and KNN models.
Khaled et al. (
2019) suggested a new algorithm—SVM-NN to efficiently detect sybil bots. Four public labeled datasets were used by the authors. A total of 4456 accounts of both fake and human classes, result from combining them. Sixteen user-based numerical features were extracted from the datasets after applying features reduction, and they were then fed into the SVM, NN, and SVM-NN algorithms. The authors of the researchers assert that their novel SVM-NN uses fewer features than existing models. SVM-NN was the best-performing algorithm as it showed an accuracy of around 98%.
In the study, (Ersahin et al.
2017) collected their own dataset of fake and real accounts using Twitter API. The dataset consisted of 1000 accounts’ data later pre-processed using Entropy Minimization Discretization (EMD) on sixteen user-based numerical features. NB with EMD showed the best result with 90.41% accuracy.
However, in order to predict sybil bots on Twitter using deep-regression learning, (Al-Qurishi et al.
2018) introduced a new model. The authors used two publicly available labeled datasets that had been generated during the 2016 US election and collected using Twitter API. The first dataset consisted of 39,467 profiles and 42,856,800 tweets. Whereas the second dataset consisted of 3140 profiles and 4,152,799 tweets. The authors extracted 80 online and offline features based on Profile-, Content- (Temporal, Topic, Quality, and Emotion-based), and Graph. Accordingly, the features were fed into the Deep Learning Component (DLC) FFNN. When fed with noisy and unclear data, the results depicted an accuracy of 86%. Categorical features showed clear segregation that all sybil bots disable their geographical location and have an unverified account. While numerical features showed that sybil bots have a noticeably young account age (recently created). Additionally, the number of re-post and mentions are significantly higher in the sybil's accounts.
Gao et al. (
2020) proposed a content-based method to detect sybils. The proposed method included three main phases: CNN, bi-SN-LSTM, and the dense layer and softmax classifier stacked to output the classification results. The proposed bi-SN-LSTM network, in contrast to the bi-LSTM, employs SELU as the activation function of its recurrent step, enabling limitless modifications to the state value. The proposed model achieved a high F1-score of 99.31% on the “My Information Bubble” (Cresci et al.
2015) dataset.