1 Introduction
-
The percentage of existing hate samples are minute compared to the non-hateful samples, hence there is a data scarcity.
-
The data annotation process is very expensive, time consuming and leads to very little output for the already minute hateful class.
-
The sensitive nature of hate speech samples have the possibility of adversely affecting the mental health of the annotators exposed to it.
-
The subjective nature of hate speech makes the label output questionable.
-
The trained models not sufficiently exposed to hateful class.
-
Overfitting.
-
Low generalization ability of the trained models.
-
The trained models not sufficiently exposed to the varying aspects/kinds of hate speech and hence biased towards only certain aspects.
-
First, we intend to design experiments to replace the Word2Vec embedding with an alternative with the potential of providing a better substitute word.
-
Secondly, we propose new methods for choosing which words to substitute in a sentence.
-
The third specific objective will address the issue of homogeneity or heterogeneity of pre-trained embedding.
-
And finally, try to confirm if the labels were preserved by the augmentation method of choice.
2 Related studies
3 Proposed methods
3.1 Objective 1: source of synonyms
3.2 Objective 2: substituted word selection
3.3 Objective 3: Homogeneous and heterogeneous embedding
Homogeneous | Heterogeneous | ||
---|---|---|---|
Replacement | Classification | Replacement | Classification |
W2V | W2V | W2V | DE-W2V |
W2V | CF-W2V | ||
CF-W2V | CF-W2V | CF-W2V | W2V |
DE-W2V | DE-W2V | DE-W2V | W2V |
3.4 Objective 4: label preservation
4 Experiment design
4.1 Datasets
4.2 Baselines and experiment settings
5 Results
Baseline Settings | Davidson | Founta |
---|---|---|
Original Dataset | \(0.646 ^{0.007}\) | \(0.570^{0.002}\) |
Oversampling | \(\mathbf{0}.733 ^{0.003}\) | \(0.638^{0.003}\) |
WordNet SR | \(\mathbf{0}.733 ^{0.002}\) | \(\mathbf{0}.646 ^{0.005}\) |
W2V AAN SR | \(0.703^{0.007}\) | \(0.635^{0.005}\) |
W2V AESW SR | \(0.707 ^{0.004}\) | \(0.635^{0.004}\) |
MWP_BERT | \(0.724 ^{0.001}\) | \(0.602^{0.002}\) |
5.1 Baseline results
Source of Synonyms Settings | Davidson | Founta |
---|---|---|
Original Dataset | \(0.646 ^{0.007}\) | \(0.570^{0.002}\) |
W2V AAN SR | \(0.703^{0.007}\) | \(\mathbf{0}.635 ^{0.005}\) |
W2V AESW SR | \(0.707 ^{0.004}\) | \(\mathbf{0}.635 ^{0.004}\) |
CF-W2V AAN SR | \(\mathbf{0}.728 ^{0.002}\) | \(0.633^{0.002}\) |
CF-W2V AESW SR | \(0.712^{0.002}\) | \(0.623^{0.004}\) |
DE-W2V AAN SR | \(0.609 ^{0.004}\) | \(0.633^{0.005}\) |
DE-W2V AESW SR | \(0.711^{0.003}\) | \(0.626 ^{0.003}\) |
5.2 Results on source of synonyms
Words | W2V | CF-W2V | DE-W2V |
---|---|---|---|
horror | Horror | gruesome | suspense |
FEARnet_branded_VOD | scary | sexploitation | |
horror_flick | terrifying | fiction | |
horror_flicks | horrific | science-fiction | |
redneck | hillbilly | hick | beatnik |
rednecks | kickin | yuppie | |
hick | fuckin | skinhead | |
hayseed | smelly | hipster | |
idiotic | stupid | foolish | preposterous |
moronic | silly | mean-spirited | |
asinine | nonsensical | deceitful | |
inane | unwise | trollish | |
fuck | fucking | fucking | bugger |
f_*_ck | fucked | fuck | |
f_**_k | fuckin | shit | |
shit | bitches | shit |
Substituted Word Selection Settings | Davidson | Founta |
---|---|---|
AAN | \(0.703^{0.007}\) | \(0.635^{0.005}\) |
AESW | \(0.707 ^{0.004}\) | \(0.635^{0.004}\) |
PSO | \(\mathbf{0}.723 ^{0.001}\) | \(0.639 ^{0.005}\) |
IG Replace Unimportant | \(0.702^{0.004}\) | \(0.638^{0.004}\) |
IG Replace Important | \(0.705^{0.004}\) | \(\mathbf{0}.641 ^{0.006}\) |
IG Drop Unimportant | \(0.696^{0.006}\) | \(0.637^{0.004 }\) |
5.3 Results on substituted word selection
5.4 Results on Homogeneity vs. Heterogeneity
-
W2V synonym replacement with W2V embedding.
-
CF-W2V synonym replacement with CF-W2V embedding.
-
DE-W2V synonym replacement with DE-W2V embedding.
-
W2V synonym replacement with DE-W2V embedding.
-
W2V synonym replacement with CF-W2V embedding.
-
CF-W2V synonym replacement with W2V embedding.
-
DE-W2V synonym replacement with W2V embedding.
Homogeneous | Heterogeneous | ||||
---|---|---|---|---|---|
Repl | Clas | F1 | Repl | Clas | F1 |
W2V | W2V | \(\mathbf{0}.707 ^{0.004}\) | W2V | DE-W2V | \(0.701^{0.002}\) |
W2V | CF-W2V | \(0.700^{0.003}\) | |||
CF-W2V | CF-W2V | \(\mathbf{0}.715 ^{0.002}\) | CF-W2V | W2V | \(0.712^{0.002}\) |
DE-W2V | DE-W2V | \(\mathbf{0}.716 ^{0.002}\) | DE-W2V | W2V | \(0.711^{0.003}\) |
5.5 Results on label preservation
6 Further analysis
Settings | Sentence |
---|---|
Original | that band is white trash and only white trash would buy that album |
WordNet | that dance band is white trash and only white trash would buy that album |
W2V_AAN | that bands is black garbage and one black garbage would buy that album |
W2V_AESW | that bands is black garbage and only black garbage could sell that album |
MWP_BERT | that band is white trash and only white trash can buy that album |
CF-W2V_AAN | that band is brown rubbish and solely brown rubbish would buy that album |
CF-W2V_AESW | that banding is brown rubbish and only brown rubbish could hold that album |
DE-W2V_AAN | that boyband is black garbage and measly black garbgae would buy that mini-album |
DE-W2V_AESW | that boyband is black garbage and only black garbage whould sell that album. |
PSO Method | that band is white trash and only white trash would buy that album |
IG_Unimp | that band is black trash and only white trash would buy that album |
IG_Imp | it bands is white trash and only white trash would buy that album |
IG_DropUnimp | that band trash album |
Original | we agree do fuck yes i do! send those illegal wetback home |
WordNet | we agree do fuck yes i do send those illegal wetback national |
W2V_AAN | we agree do fuckiveng yes ive do! send those ivellegal wetback house |
W2V_AESW | we agrees do f_**_k mso_style_noshow i do! sends those illicit spics homes |
MWP_BERT | we agree do fuck yes i do! send those illegal aliens home |
CF-W2V_AAN | we agree do fucked yes i do! send those illegal wetback homing |
CF-W2V_AESW | we unhappy do fucking yeah i do! sent those illicit pinche habitation |
DE-W2V_AAN | we agree do fuck yes i do! send those illegal wetback residence |
DE-W2V_AESW | we concur do bugger yeah i do! sent those unlawful wetback homes |
PSO | we agree do fuck yes i do send those illegal wetback home |
IG_Unimp | we agree do fuck yes i do send those illicit wetback homes |
IG_Imp | we disagree want fucking mso_style_qformat i do send those illegal wetback home |
IG_DropUnimp | we agree do fuck yes i do send wetback |
Original | they shot another monkey |
WordNet | they shot some other monkey |
W2V_AAN | they shot another monkeys |
W2V_AESW | they shots another monkeys |
MWP_BERT | they killed another monkey |
CF-W2V_AAN | they shot another curious |
CF-W2V_AESW | they pulls deliriously curious |
DE-W2V_AAN | they shot another hamster |
DE-W2V_AESW | they stabbed a hamster |
PSO | they shot another monkey |
IG_Unimp | they shot another monkey |
IG_Imp | we shots another monkeys |
IG_DropUnimp | they shot monkey |