Introduction
-
We fine-tune the BERT with Linear layers and devise two accurate systems for the span identification and technique classification of propaganda in news articles.
-
We change the binary sequence tagging task SI into a three-way classification task by adding ’invalid’ token type and compare the binary tagging method with the three-token type method.
-
We propose SLFC approach in SI system. To our best knowledge, it is the first work to integrate sentence-level classification features into each word.
-
For our systems, we have obtained the optimal network parameters through experiments and comparative analysis.
Related work
Propaganda detection
BERT-based model
Method
Data process
Approach of span identification (SI)
Approach of technique classification (TC)
Technique | Instances | Proportion (%) |
---|---|---|
Appeal_to_Authority | 144 | 2.35 |
Appeal_to_fear-prejudice | 294 | 4.80 |
Bandwagon, Reductio_ad_hitlerum | 72 | 1.17 |
Black-and-White_Fallacy | 107 | 1.75 |
Causal_Oversimplification | 209 | 3.41 |
Doubt | 493 | 8.04 |
Exaggeration, minimisation | 466 | 7.60 |
Flag-Waving | 229 | 3.74% |
Loaded_Language | 2123 | 34.64 |
Name_Calling, Labeling | 1058 | 17.26 |
Repetition | 621 | 10.13 |
Slogans | 129 | 2.10 |
Thought-terminating_Cliches | 76 | 1.24 |
Whataboutism, Straw_Men, Red_Herring | 108 | 1.76 |
Total | 6129 |
Experiment and results
Experiment details
Results: span identification (SI)
Model | F1 | Precision | Recall |
---|---|---|---|
Baseline | 0.007862 | 0.099663 | 0.004092 |
BERT + binary classifier | 0.370578 | 0.385497 | 0.356771 |
BERT + three classifier | 0.408815 | 0.401099 | 0.416834 |
Our SI system | 0.441732 | 0.432075 | 0.451831 |
Model | F1 |
---|---|
Baseline | 0.262326 |
BERT without EDA | 0.539535 |
Our TC system | 0.575729 |
Results: technique classification (TC)
Technique | F1 with EDA | F1 without EDA |
---|---|---|
Appeal_to_Authority | 0 | 0 |
Appeal_to_fear-prejudice | 0.3870967741935484 | 0.31460674157303375 |
Bandwagon, Reductio_ad_hitlerum | 0 | 0 |
Black-and-White_Fallacy | 0 | 0 |
Causal_Oversimplification | 0.26666666666666666 | 0.12000000000000002 |
Doubt | 0.4647887323943662 | 0.4678362573099415 |
Exaggeration, Minimisation | 0.39655172413793105 | 0.30894308943089427 |
Flag-Waving | 0.632258064516129 | 0.6369426751592357 |
Loaded_Language | 0.7414772727272728 | 0.7005208333333333 |
Name_Calling, Labeling | 0.6504065040650406 | 0.6129032258064515 |
Repetition | 0.48366013071895425 | 0.4129554655870445 |
Slogans | 0.588235294117647 | 0.5245901639344264 |
Thought-terminating_Cliches | 0.3636363636363636 | 0.21052631578947367 |
Whataboutism, Straw_Men, Red_Herring | 0.09523809523809525 | 0.1111111111111111 |
Parameter analysis
Model | OS | EDA | SLFC | epoch | lr | sent-len | F1 |
---|---|---|---|---|---|---|---|
Baseline (SI) | 0.007862 | ||||||
BERT + binary classifier | \(\surd \) | \(\surd \) | \(\surd \) | 10 | \(3\times 10^{-5}\) | 256 | 0.370578 |
BERT + three classifier | 10 | \(3\times 10^{-5}\) | 256 | 0.408815 | |||
BERT + three classifier | \(\surd \) | \(\surd \) | \(\surd \) | 10 | \(3\times 10^{-5}\) | 256 | 0.427860 |
BERT + three classifier | \(\surd \) | \(\surd \) | \(\surd \) | 8 | \(3\times 10^{-5}\) | 256 | 0.429325 |
Our SI system | \(\surd \) | \(\surd \) | \(\surd \) | 8 | \(3\times 10^{-5}\) | 200 | 0.441732 |
Baseline (TC) | 0.262326 | ||||||
BERT | 20 | \(3\times 10^{-3}\) | 256 | 0.425729 | |||
BERT | \(\surd \) | 20 | \(3\times 10^{-5}\) | 256 | 0.540473 | ||
BERT | \(\surd \) | 20 | \(3\times 10^{-5}\) | 210 | 0.560931 | ||
Our TC system | \(\surd \) | 15 | \(3\times 10^{-5}\) | 210 | 0.575729 |