Hate speech classification
We predicted hate speech scores for each tweet using a machine learning algorithm. We used a random forest classifier with handcrafted psycholinguistic features to enforce interpretability and scalability (Pennebaker et al.
2003; Tausczik and Pennebaker
2010). Using the Netmapper software, we extracted from each tweet several lexical measures of pronoun use, abusive words, exclusive words, absolutist words, and identity terms, among others (Carley et al.
2018b). Pairwise products of each linguistic measure were used as additional features to capture the ways they co-occurred within a tweet. Netmapper’s lexicon includes these measures for English as well as a variety of other languages including those used in the Philippines (e.g., Tagalog, Cebuano), thus facilitating our comparative analysis.
Our model was trained on a widely used benchmark dataset for hate speech (Davidson et al.
2017). This dataset distinguishes between hate speech, offensive speech, and regular speech as class labels. As previously mentioned, the difference between hate speech and offensive speech is crucial as it recognizes that some tweets may use expletives and similarly profane language but not hatefully target any group in particular. Our model achieved over 83% in terms of both accuracy and F1 score (Uyheng and Carley
2020a). Other experiments using alternative algorithms (e.g., logistic regression, support vector machines) consistently yielded results bested by random forest models. This gave us confidence in using our model for our purposes of analyzing the network dynamics of hate speech around COVID-19.
Infodemic trends
Temporal analyses of these predictions further facilitated characterization of trends in hate speech spread over time. This analysis proceeds as follows: over the 75-day period under observation, we generated cumulative distributions of when each hateful tweet appeared in the dataset. Because of the probabilistic predictions generated by our machine learning model, we relied on three distinct cutoffs for classifying a tweet as hate speech. In order of increasing stringency, we relied on: (a) the median hate speech value in the dataset (i.e., the top 50% most hateful tweets), (b) the 75th percentile (i.e., the top 25% most hateful tweets), and (c) the tweets which achieved a hate speech probability of at least 50%.
Through this procedure, we produced three curves which visualize the infodemic spreading dynamics of hate speech in each country. Using the method described by Fisman and colleagues (Fisman et al.
2013), which was later adopted by Cinelli and colleagues for social media infodemics (Cinelli et al.
2020), we estimated a reproduction number
\(R_0\) which quantified the spread of hate speech within the online conversation. Also known as the Incidence Decay and Exponential Adjustment (IDEA) model, the Fisman model utilizes a fairly simple function to model the growth of an epidemic over time (Fisman et al.
2013). In particular, incidence
I is modeled at time
t (in days) with the reproduction number
\(R_0\) and some discounting factor
d as follows:
$$\begin{aligned} I = \left( \dfrac{R_0}{(1+d)^t}\right) ^t. \end{aligned}$$
As in Cinelli et al. (
2020), we let
I represent the cumulative number of hateful tweets at time
t. We estimate both
\(R_0\) and
d through ordinary least squares regression. We note that Cinelli et al. (
2020) estimate that
\(R_0\) on Twitter for COVID-19 infodemics lies in the confidence interval between 1.65 and 2.06. They additionally observe that using the more traditional SIR model resulted in unrealistic values of
\(R_0\) due to steep jumps in their dataset. We therefore adopt this insight in focusing on the IDEA estimate for
\(R_0\) in our analysis.
To examine the dynamics of groups in the online conversation, we represented our data as a time-varying social network. For our temporal analysis, we segmented the dataset into a series of daily snapshots. For a given day
\(t \in \left\{ 1,2,\ldots ,75\right\}\), let
\(G_t = (V_t, E_t)\) be the graph representation of the online conversation. Here,
\(V_t\) corresponds to the set of users in the data, represented as the set of vertices in the graph. Meanwhile,
\(E_t\) represents a set of weighted, directed edges between vertices in
\(V_t\). The weight of each directed edge is given by the number of interactions originating from the source node toward the target node. To obtain edge weights, we take the sum of all forms of Twitter communication, including retweets, replies, mentions, and quotes. The ORA software was used to perform all network analysis (Carley et al.
2018b).
Community detection was performed to operationalize a localized understanding of online groups. We used a Leiden algorithm to automatically recover local clusters of users. The Leiden algorithm is an unsupervised method for community detection which iteratively refines cluster assignments with the intuitive goal of optimizing the difference between actual and expected number of edges within an assigned cluster (Traag et al.
2019). It has been shown to be superior to the widely used Louvain algorithm by guaranteeing well-connected communities as well as faster runtime (Blondel et al.
2011). Thus, for each network snapshot, we obtained cluster assignments for all agents. Agents assigned to the same cluster were conceptualized as constituting a distinct group engaged in meaningful interaction about the pandemic. Note that for all succeeding analysis, we remove trivial clusters containing only one or two agents (i.e., isolates and pendants).
Using the obtained groupings, we designed several novel measures for characterizing hate speech in its dynamic social context. Drawing on constructural theory, these metrics intuitively capture the aspects of the evolution of hate speech with online communities.
Community-level hate.
To obtain a continuous measure of hate-like content at a given time
t, we leverage the probabilistic outputs of our random forest model. To propagate hate speech probabilities from tweets to users, we take each user’s average hate speech score at time
t. As before, let
\(G_t = (V_t, E_t)\) represent the graph of the online conversation. Let
\(C_{t,1}, C_{t,2}, \ldots , C_{t,l_t}\) represent the
\(l_t\) distinct clusters derived by a Leiden algorithm. For each cluster
\(C_{t,i}\), where
\(i \in \left\{ 1,2,\ldots , l_t\right\}\), consider its constituent agents
\(V_{t,i}\). Now let
\(h_{t,i,j}\) represent the hate score associated with user
\(j \in V_{t,i}\). Then the cluster-level hate
\(H_{t,i}\) is given by the average of user-level hate speech scores, as given by the following equation:
$$\begin{aligned} H_{t,i} = \dfrac{1}{|V_{t,i}|}\sum \limits _{j \in V_{t,i}} h_{t,i,j}. \end{aligned}$$
Hate community homogeneity. Beyond the raw amount of hate in the network, we are also interested in the extent to which users employing more hate speech are in turn more likely to interact with more hateful others. This would also correspond to hate speech being more organized and less scattered throughout the online conversation at a given time. We refer to this measure as
hate community homogeneity.
As above, let \(\sigma _{t,i}^2\) represent the variance of user-level hate speech scores \(h_{t,i,j}\) within the same cluster i. Then we also obtain a measure \(O_{t,i}\) of how homogeneous or orderly the cluster-level hate speech is. Higher levels of this measure indicate that hateful users are conversing with other hateful users. Lower levels suggest that hateful users are more scattered throughout the social network, interacting with both hateful and non-hateful users.
At a given time
t, we compute hate community homogeneity for each cluster
i. Taking the measurement in log scale deals with differences in scale (especially extremely small values), with arbitrarily small
\(\nu > 0\) ensuring all inputs are non-zero. The equation is given as follows:
$$\begin{aligned} O_{t,i} = \log \biggr [ \dfrac{H_{t,i}}{\sigma _{t,i}^2} + \nu \biggr ]. \end{aligned}$$
Finally, at time
t, we obtain an overall network-level measure of hate community homogeneity
\(O_t\) using an average of cluster-level hate community homogeneity scores as follows:
$$\begin{aligned} O_{t} = \dfrac{1}{l_t}\sum \limits _{i = 1}^{l_t} O_{t,i}. \end{aligned}$$
Hate speech assortativity. Leveraging more classical network science measures, we also use the assortativity coefficient to analyze the relational dynamics of hate speech over time. Newman defines assortativity as the tendency for nodes in a network to be connected to similar nodes (Newman
2003). For each time
t, we measure the assortativity of network
\(G_t\) based on node-level hate speech scores
\(h_{t,i,j}\) as defined above. In this manner, we obtain another measure of the organization of hate speech over time. For continuous variables, the assortativity coefficient is given simply by the Pearson correlation between the user-level hate speech scores of source nodes with the user-level hate speech scores of target nodes. We recall that
\(G_t\) is a directed network, so these values are not symmetric.
While our proposed hate community homogeneity measure depends on clusters derived through a Leiden algorithm, the assortativity coefficient does not depend on network clustering. Hence, while both network measures capture some notion of hate speech organization in the network, they are not equivalent. Hate community homogeneity accounts for a wider context of social influence than assortativity; the latter, meanwhile, focuses primarily on dyadic interaction. However, both values considered together may nonetheless be informative for analysis, as we demonstrate later.
Structural features of hate communities.
Next, we consider the structural features of clusters, following the hypothesis that these relate to localized levels of hate speech (Kim
2020). We are specifically interested in the following features. First, we examine cluster size, denoted by the number of unique agents assigned to the same cluster. Second, we look at the E/I index. The E/I index is a classical measure in network science which intuitively quantifies exclusive group communication (Krackhardt and Stern
1988). Normalized between +1 and − 1, higher values of the E/I index indicates high levels of communication with out-groups; lower levels suggest that the cluster communicates solely with in-group members. Third, we measure the Cheeger constant. This quantifies bottleneck behavior, such that higher values indicate more hierarchy in the cluster while lower values indicate more dispersed connection patterns between agents (Mohar
1989).
Identity target analysis.
To analyze the content and targets of hate speech, we employ a lexicon of identity terms derived from prior research (Joseph et al.
2014,
2016). Identity terms here refer to words which describe personal or group-based categories (Priante et al.
2016). Available on the Netmapper software (Carley et al.
2018b), our analysis recognizes that identities may be intersectional; hence, identity terms are further organized into subcategories of gender, race/ethnicity, politics, and religion.
Each of these subcategories is counted over each tweet using the Netmapper software. User-level invocation of identity terms is computed as the average number of times an account mentions each identity subcategory in their tweets at a given time. Cluster-level invocation of identity terms is computed as the average user-level identity score for all accounts within the cluster.
To capture the sense that some identities are targeted over others in hate speech, we compute the linear slope relating the level of cluster-level hate speech to cluster-level identity scores. We use ordinary least-squares regression to compute these slopes for a given time interval, thereby reflecting the extent to which certain identity categories are more or less associated with hate speech within a particular time frame.
Integrated estimation of network dynamics
Finally, we present an integrated analysis of structural (i.e., cluster properties) and functional (i.e., identity targets) network dynamics of hate speech. Invoking insights from constructural theory, we posit that levels of cluster-level hate are dynamically associated with: (a) cluster-level measures of information flow, and (b) cluster-level lexical measures linked to identity targets of hate speech.
We model these intuitions in a Bayesian multiple regression setup. In this model, we consider daily clusters as the unit of analysis. For each cluster at a given point in time, we consider the fixed effects of structural features (i.e., cluster size, density, E/I index, and Cheeger score) and functional features (i.e., identity scores) of hate speech. To deal with the temporal component of our analysis, we model an AR1 random effect. The dependent variable is given by the cluster-level hate. Uninformative, standard Gaussian priors are used for all estimated effects.
To perform model inference, we use the Integrated Nested Laplace Approximation (INLA) technique (Rue et al.
2009). INLA is a fast alternative to traditional techniques like Markov Chain Monte Carlo estimation methods. INLA retrieves high-quality estimations of model parameters at scalable runtime through the use of appropriate Gaussian approximations of general distributions. This allows us to efficiently analyze our relatively large-scale dataset and estimate the effects of both structural and functional features of hate speech networks over time. The additional advantage of our Bayesian approach is that we also obtain interval estimates of uncertainty instead of single point-estimates.