Introduction
- RQ1 How well do different algorithms and feature representations perform for hate detection in multiple social media platforms?
- RQ2 What are the most impactful features when predicting hate in multiple social media platforms?
- RQ3 How well does a machine learning model learn linguistic characteristics in the hateful and non-hateful language in a cross-platform environment?
Literature review
Theoretical underpinnings of online hate
Definitions of online hate
Definition of online hate | Source | Focus |
---|---|---|
“Language that is used to expresses hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group” | Davidson et al. [16] (p. 215) | Language, target |
“Hateful comments toward a specific group or target” | Target, group | |
“[Hate speech is] either ‘directed’ towards a specific person or entity, or ‘generalized’ towards a group of people sharing a common protected characteristic” | ElSherief et al. [67] (p. 1) | Target, group |
“Comments that are rude, disrespectful or otherwise likely to make someone leave a discussion” | Almerekhi et al. [73], adapted from Jigsaw’s toxic comment classification challenge in Kagglea | Individual, comments, consequences |
“An offensive post, motivated, in whole or in a part, by the writer’s bias against an aspect of a group of people” | Mondal et al. [32] (p. 87) | Language, group, target |
Offensive name calling, purposefully embarrassing others, stalking, harassing sexually, physically threatening, and harassing in a sustained manner | Wulzyn et al. [7], adapted from Pew Research Center | Language |
Online hate is composed of the use of language that contains either hate speech targeted toward individuals or groups, profanity, offensive language, or toxicity – in other words, comments that are rude, disrespectful, and can result in negative online and offline consequences for the individual, community, and society at large.
Evolution of online hate detection
Keyword-based classifiers
Challenge | Description |
---|---|
False positive problem | False positives occur when a model detects a non-threatening expression as hateful content due to the presence of some words/phrases as a feature. For example, a tweet such as “Bill aims to fix sex-offender list’s inequity toward gay men” can be labeled as hateful whereas, in reality, it is not an offensive expression but a simple statement |
False negative problem | False negatives include cases when the model detects a threatening expression as non-threatening. For example, a keyword detector could correctly detect “I fucking hate Donald Trump”, but ignore “Donald Trump is a rat”. In reality, both of these expressions can be considered hateful |
Subjectivity | The datasets can involve subjectivity arising from several sources. Crowd raters may not understand context or follow instructions. There can be high disagreement of what constitutes hate and various biases, such as racial bias [66, 110], can occur when constructing ground truth datasets. Sarcasm and humor further exacerbate the problem, as individuals’ ability to interpret these types of language greatly varies |
Polysemy | Polysemy, i.e., the same word or phrase having a different meaning in different contexts (e.g., social media community or platform) can greatly complicate the detection of online hate, as it introduces contextuality that the model should be aware of |
Distributional semantics
Deep learning classifiers
Research gaps
Source | Primary source | Secondary source | Tertiary source |
---|---|---|---|
Hosseinmardi et al. [3] | Instagram | – | – |
Almerekhi et al. [73] | Reddit | – | – |
Kumar et al. [2] | Reddit | – | – |
Davidson et al. [16] | Twitter | – | – |
Twitter | Whisper | – | |
Badjatiya et al. [29] | Twitter | – | – |
Twitter | – | – | |
ElSherief et al. [37] | Twitter | – | – |
Unsvag and Gambäck [60] | Twitter | – | – |
Agarwal and Sureka [113] | YouTube | – | – |
Salminen et al. [5] | YouTube | – | – |
Djuric et al. [30] | Yahoo | – | – |
Nobata et al. [23] | Yahoo | – | – |
Wulczyn et al. [7] | Wikipedia | – | – |
Datasets
Overview
YouTube dataset (ICWSM-18-SALMINEN)
Reddit dataset (ALMEREKHI-19)
Wikipedia dataset (KAGGLE-18)
Twitter dataset (DAVIDSON-17-ICWSM)
Structuring the datasets into binary classes
Source | Platform and domain | Number of comments | Cum. count | % from total (%) | |
---|---|---|---|---|---|
ICWSM-18-SALMINEN [5] | YouTube news media | T = 3221 | 3221 | 1.6 | |
H = 2364 (73.4%) | NH = 857 (26.6%) | ||||
ALMEREKHI-19 [73] | Reddit 10 popular sub-communities | 9991 | 13,212 | 5.1 | |
1619 (16.2%) | 8372 (83.8%) | ||||
DAVIDSON-17-ICWSM [16] | Twitter generic tweets | 24,783 | 37,995 | 12.5 | |
20,620 (83.2%) | 4163 (16.8%) | ||||
KAGGLE-18 [7] | Wikipedia editor discussions | 159,571 | 197,566 | 80.8 | |
15,294 (9.6%) | 144,277 (90.4%) |
Hateful (H) | Not hateful (NH) | Total |
---|---|---|
39,897 | 157,669 | 197,566 |
20.2% | 79.8% | 100%a |
Platform | (% of full dataset)a | H-NH split (in %) |
---|---|---|
YouTube | 798 (24.77%) | 72.81–27.19 |
Reddit | 2431 (24.33%) | 16.25–83.75 |
Twitter | 6290 (25.38%) | 83.37–16.63 |
Wikipedia | 39,873 (24.99%) | 9.64–90.36 |
Total | 49,392 (25.00%) | 20.38–79.62 |
Comment text | Platform | |
---|---|---|
1 | this is disgusting and heartbreaking. They should do the same shit to the cops of what they did to that poor guy. 10 s of pepper spray directly into the face, run over foot etc. equal it up a little bit, except for the detail of having a fucking stroke. So it still wouldn’t be exactly what the guy went through. Fucking discusting. They get a hard on power tripping others. They are just fucking cowards | YouTube |
2 | Fuck that cop. Nothing but a fucking thug | YouTube |
3 | Fucking piece of shit cops, protect and serve my ass | YouTube |
4 | Land of the free… My fucking ass. America is one hell of a shit hole where its full of dumb sheeple believing in fictional fox news and cops who kill and terrorise more per day than terrorists in this whole world combined | YouTube |
5 | That stupid bitch Sally to just die. I know he’s cheating on me with her, I’ve seen the texts on his phone. It’s just a matter of time until she fucks up though! | Reddit |
6 | Yeah, you’re not projecting at all… ??? This still doesn’t make any sense… nobody even suggested your choice had anything whatsoever to do with lack of opportunity, lol. But whatever, it was obvious from the original comment I replied to you’re not exactly a very rational human being, so, cheers bro | Reddit |
7 | Nope, I swallowed your moms pussy while she was snickering. The things I do for money but your dad was happy to pay it. To bad she was crying | Reddit |
8 | Old bitch:”Is this salmon gluten free?” Me: “ma’am gluten is a protein found is wheat and isn’t present in any meat” Old bitch: “I ASKED YOU IF THIS SALMON HAS GLUTEN IN IT!” Me:”………..no” Old bitch: “what good are you?” | Reddit |
9 | Nuke town that pussy | Twitter |
10 | Lovin hoes but thats same bitch we put the pipe in. She suckin dick, thats the same bitch you give goodnight kiss | Twitter |
11 | Old hoes get mad when you don’t show them the attention they want | Twitter |
12 | RT @slimthugga: U a nurse 😂RT @Blaccstone: @slimthugga you a clown boy You music is trash…. I dare you say something back I bury your Bit… | Twitter |
13 | All of my edits are good. Cunts like you who revert good edits because you’re too stupid to understand how to write well, and then revert other edits just because you’ve decided to bear a playground grudge, are the problem. Maybe 1 day you’ll realise the damage you did to a noble project | Wikipedia |
14 | You should do something nice for yourself, maybe go grab a couple of Horny Goat Weeds from your local convenience store and jack off for a little longer than 3 min tonight | Wikipedia |
15 | I’m sorry I screwed around with someones talk page. It was very bad to do. I know how having the templates on their talk page helps you assert your dominance over them. I know I should bow down to the almighty administrators. But then again, I’m going to go play outside….with your mom | Wikipedia |
16 | Would you both shut up, you don’t run wikipedia, especially a stupid kid | Wikipedia |
Related datasets
- “@FarOutAkhtar How can I promote gender equality without sounding preachy or being a ‘feminazi’? #AskFarhan”.
- “i got called a feminazi today, it’s been a good day”.
- “In light of the monster derailment that is #BlameOneNotAll here are some mood capturing pics for my feminist pals pic.twitter.com/3pTV0M9qOQ”.
Classification algorithms
Logistic regression (LR)
Naïve Bayes (NB)
Support-vector machines (SVM)
XGBoost
Feed-forward neural network (FFNN)
Feature representation
Simple features
Feature | Definition |
---|---|
Words | The number of words in the comment |
Uppercase | The number of uppercase characters in the comment |
Uppercase_per_word | The average number of uppercase characters in the words of the comment (i.e., number of uppercases divided by the number of words) |
Punctuation | The number of punctuations such as a full stop, comma, or question mark used in the comment |
Punctuation_per_word | The number of punctuation marks divided by the number of words in the comment |
Numbers | The number of numbers in the comment |
Numbers_per_word | The number of numbers divided by the number of words in the comment |
Bag of words
TF-IDF
Word embeddings
BERT
Experimental design and evaluation
Experimental design
Evaluation metrics
Experimental results
Simple features | BOW | TF-IDF | Word2Vec | BERT | All featuresa | |
---|---|---|---|---|---|---|
LR | 0.062 | 0.764 | 0.768 | 0.828 | 0.891 | 0.892 |
NB | 0.130 | 0.505 | 0.606 | 0.601 | 0.885 | 0.868 |
SVM | 0.066 | 0.487 | 0.648 | 0.765 | 0.892 | 0.883 |
XGBoost | 0.400 | 0.765 | 0.774 | 0.880 | 0.916 | 0.924** |
FFNN | 0.064 | 0.770 | 0.769 | 0.847 | 0.893 | 0.894 |
KBC | n/a | n/a | n/a | n/a | n/a | 0.388 |
BOC | n/a | n/a | n/a | n/a | n/a | 0.084 |
Simple features | BOW | TF-IDF | Word2Vec | BERT | All features | |
---|---|---|---|---|---|---|
LR | 0.514 | 0.819 | 0.820 | 0.873 | 0.925 | 0.925 |
NB | 0.524 | 0.738 | 0.809 | 0.761 | 0.938 | 0.934 |
SVM | 0.515 | 0.661 | 0.74 | 0.818 | 0.924 | 0.911 |
XGBoost | 0.782 | 0.932 | 0.937 | 0.986 | 0.994 | 0.995 |
FFNN | 0.743 | 0.934 | 0.937 | 0.974 | 0.988 | 0.988 |
Platform-specific analysis
YouTube | Reddit | Twitter | Wikipedia | |
---|---|---|---|---|
F1xgboost_all | 0.911 | 0.776 | 0.980 | 0.861 |
F1xgboost_BERT | 0.907 | 0.778 | 0.975 | 0.846 |
F1 in original papera | 0.960 [5] | 0.749 [73] | 0.900 [16] | – |
ROC-AUCxgboost_all | 0.968 | 0.967 | 0.994 | 0.993 |
ROC-AUCxgboost_BERT | 0.964 | 0.967 | 0.991 | 0.991 |
ROC-AUC in original papera | – | 0.957 [73] | – | 0.972 [7] |
Linguistic variable analysis
- hatefulground (containing comments whose ground truth value is hateful).
- hatefulpredicted (predicted value == hateful).
- non-hatefulground (ground truth value == non-hateful).
- non-hatefulpredicted (predicted value == non-hateful).
LIWC category | Rel. diff. (lower scores) (%) | LIWC category | Rel diff. (higher scores) (%) |
---|---|---|---|
Parentha | − 13.4 | Frienda | + 6.9 |
Quotea | − 8.6 | Bodya | + 5.3 |
Dasha | − 7.8 | Sweara | + 5.2 |
QMark | − 5.6 | Sexuala | + 5.1 |
WC | − 4.9 | Bio | + 4.7 |
Risk | − 4.7 | Informal | + 4.6 |
anx | − 4.5 | Anger | + 4.4 |
Work | − 4.4 | Semic | + 4.0 |
Tone | − 4.0 | Netspeak | + 3.6 |
Feature importance analysis
Twitter | Reddit | YouTube | Wikipedia | All platforms | |||||
---|---|---|---|---|---|---|---|---|---|
Feature | Coeff. | Feature | Coeff. | Feature | Coeff. | Feature | Coeff. | Feature | Coeff. |
tfidf_bitch | 10.713 | tfidf_fuck | 8.937 | tfidf_fuck | 3.609 | tfidf_fuck | 14.174 | tfidf_fuck | 13.875 |
tfidf_bitches | 8.800 | tfidf_shit | 7.932 | tfidf_hate | 3.341 | tfidf_fucking | 11.861 | tfidf_bitch | 12.983 |
tfidf_pussy | 7.175 | tfidf_fucking | 7.633 | tfidf_stupid | 3.266 | tfidf_shit | 9.639 | tfidf_fucking | 11.188 |
tfidf_hoes | 6.719 | tfidf_dick | 6.243 | tfidf_fucking | 3.257 | tfidf_ass | 8.467 | tfidf_bitches | 10.699 |
tfidf_hoe | 5.185 | tfidf_ass | 4.733 | tfidf_kill | 2.684 | tfidf_stupid | 8.356 | tfidf_shit | 9.180 |