A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

Mozafari, Marzieh; Farahbakhsh, Reza; Crespi, Noël

doi:10.1007/978-3-030-36687-2_77

Marzieh Mozafari⁷,
Reza Farahbakhsh⁷ &
Noël Crespi⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 881))

Included in the following conference series:

International Conference on Complex Networks and Their Applications

5085 Accesses
123 Citations
6 Altmetric

Abstract

Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an efficient automatic hate speech detection model based on advanced machine learning and natural language processing, but also a sufficiently large amount of annotated data to train a model. The lack of a sufficient amount of labelled hate speech data, along with the existing biases, has been the main issue in this domain of research. To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers). More specifically, we investigate the ability of BERT at capturing hateful context within social media content by using new fine-tuning methods based on transfer learning. To evaluate our proposed approach, we use two publicly available datasets that have been annotated for racism, sexism, hate, or offensive content on Twitter. The results show that our solution obtains considerable performance on these datasets in terms of precision and recall in comparison to existing approaches. Consequently, our model can capture some biases in data annotation and collection process and can potentially lead us to a more accurate model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Anti-muslim hate crime surges after Manchester and London Bridge attacks (2017): https://www.theguardian.com.
2.
A.: Hate on the rise after Trump’s election: http://www.newyorker.com.
3.
https://sites.google.com/view/alw3/home.
4.
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/.

References

Badjatiya, P., Gupta, S., Gupta, M., et al.: Deep learning for hate speech detection in tweets. CoRR abs/1706.00188 (2017). http://arxiv.org/abs/1706.00188
Davidson, T., Bhattacharya, D., Weber, I.: Racial bias in hate speech and abusive language detection datasets. CoRR abs/1905.12516 (2019). http://arxiv.org/abs/1905.12516
Davidson, T., Warmsley, D., Macy, M.W., et al.: Automated hate speech detection and the problem of offensive language. CoRR abs/1703.04009 (2017). http://arxiv.org/abs/1703.04009
Devlin, J., Chang, M., Lee, K., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Djuric, N., Zhou, J., Morris, R., et al.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Companion, pp. 29–30. ACM, New York (2015). https://doi.org/10.1145/2740908.2742760
Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 85:1–85:30 (2018). https://doi.org/10.1145/3232676
Article Google Scholar
Founta, A.M., Chatzakou, D., Kourtellis, N., et al.: A unified deep learning architecture for abuse detection. In: Proceedings of the 10th ACM Conference on Web Science, WebSci 2019, pp. 105–114. ACM, New York (2019)
Google Scholar
Gambäck, B., Sikdar, U.K.: Using convolutional neural networks to classify hate-speech. In: Proceedings of the First Workshop on Abusive Language Online, pp. 85–90. Association for Computational Linguistics, Vancouver (2017). https://doi.org/10.18653/v1/W17-3013
Howard, J., Ruder, S.: Fine-tuned language models for text classification. CoRR abs/1801.06146 (2018). http://arxiv.org/abs/1801.06146
Malmasi, S., Zampieri, M.: Challenges in discriminating profanity from hate speech. CoRR abs/1803.05495 (2018). http://arxiv.org/abs/1803.05495
Mehdad, Y., Tetreault, J.: Do characters abuse more than words? In: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 299–303. Association for Computational Linguistics, Los Angeles (2016). https://doi.org/10.18653/v1/W16-3638
Mittos, A., Zannettou, S., Blackburn, J., et al.: And We Will Fight For Our Race! A Measurement Study of Genetic Testing Conversations on Reddit and 4chan. CoRR abs/1901.09735 (2019). http://arxiv.org/abs/1901.09735
Nobata, C., Tetreault, J., Thomas, A., et al.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web, WWW 2016, pp. 145–153. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2016). https://doi.org/10.1145/2872427.2883062
Olteanu, A., Castillo, C., Boy, J., et al.: The effect of extremist violence on hateful speech online. CoRR abs/1804.05704 (2018). http://arxiv.org/abs/1804.05704
Ottoni, R., Cunha, E., Magno, G., et al.: Analyzing right-wing Youtube channels: hate, violence and discrimination. In: Proceedings of the 10th ACM Conference on Web Science, WebSci 2018, pp. 323–332. ACM, New York (2018). https://doi.org/10.1145/3201064.3201081
Pete, B., Williams, M.L.: Cyber hate speech on Twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7(2), 223–242 (2015). https://doi.org/10.1002/poi3.8
Article Google Scholar
Peters, M.E., Neumann, M., Iyyer, M., et al.: Deep contextualized word representations. CoRR abs/1802.05365 (2018). http://arxiv.org/abs/1802.05365
Radford, A.: Improving language understanding by generative pre-training (2018)
Google Scholar
Sap, M., Card, D., Gabriel, S., et al.: The risk of racial bias in hate speech detection. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1668–1678. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/P19-1163
Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, pp. 138–142. Association for Computational Linguistics, Austin (2016). https://doi.org/10.18653/v1/W16-5618
Waseem, Z., Davidson, T., Warmsley, D., et al.: Understanding abuse: a typology of abusive language detection subtasks. In: Proceedings of the First Workshop on Abusive Language Online, pp. 78–84. Association for Computational Linguistics, Vancouver (2017). https://doi.org/10.18653/v1/W17-3012, https://www.aclweb.org/anthology/W17-3012
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL Student Research Workshop, pp. 88–93. Association for Computational Linguistics, San Diego (2016). https://doi.org/10.18653/v1/N16-2013
Waseem, Z., Thorne, J., Bingel, J.: Bridging the Gaps: Multi Task Learning for Domain Transfer of Hate Speech Detection, pp. 29–55. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-78583-7_3
Chapter Google Scholar
Wiegand, M., Ruppenhofer, J., Kleinbauer, T.: Detection of abusive language: the problem of biased datasets. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 602–608. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1060
Zhang, Z., Robinson, D., Tepper, J.: Detecting hate speech on twitter using a convolution-GRU based deep neural network. In: The Semantic Web, pp. 745–760. Springer International Publishing, Cham (2018)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

CNRS UMR5157, Télécom SudParis, Institut Polytechnique de Paris, Évry, France
Marzieh Mozafari, Reza Farahbakhsh & Noël Crespi

Authors

Marzieh Mozafari
View author publications
You can also search for this author in PubMed Google Scholar
Reza Farahbakhsh
View author publications
You can also search for this author in PubMed Google Scholar
Noël Crespi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marzieh Mozafari .

Editor information

Editors and Affiliations

University of Burgundy, Dijon Cedex, France
Hocine Cherifi
Università degli Studi di Milano, Milan, Italy
Sabrina Gaito
University of Aveiro, Aveiro, Portugal
José Fernendo Mendes
Universidad Carlos III de Madrid, Leganés, Madrid, Spain
Esteban Moro
Indiana University, Bloomington, IN, USA
Luis Mateus Rocha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mozafari, M., Farahbakhsh, R., Crespi, N. (2020). A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media. In: Cherifi, H., Gaito, S., Mendes, J., Moro, E., Rocha, L. (eds) Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, vol 881. Springer, Cham. https://doi.org/10.1007/978-3-030-36687-2_77

Download citation

DOI: https://doi.org/10.1007/978-3-030-36687-2_77
Published: 26 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36686-5
Online ISBN: 978-3-030-36687-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics