Skip to main content
Top
Published in:

01-12-2023 | Original Article

Unsupervised fine-grained hate speech target community detection and characterisation on social media

Authors: Anaïs Ollagnier, Elena Cabrio, Serena Villata

Published in: Social Network Analysis and Mining | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recent studies have highlighted the importance to reach a fine-grained online hate speech characterisation to better understand how hate is conveyed, especially on social media. A key element in this scenario is the identification and characterisation of the hate speech target community, e.g. national, ethnic, and religious minorities. In this paper, we propose a full pipeline relying on unsupervised methods to distinguish specific hate speech manifestations, i.e. targeted (group of) victim(s) and the protected characteristics (target-types) discriminated. Our contribution is threefold: (1) we leverage multiple data views to contrast different abusive behaviours; (2) we explore the use of clustering techniques to perform fine-grained hate speech target community detection, and (3) we address an in-depth content analysis of the generated hate speech target communities. Relying on multiple data views derived from multilingual pre-trained language models (i.e. multilingual BERT and multilingual Universal Sentence Encoder) and the Multi-view Spectral Clustering (MvSC) algorithm, the 69 experiments performed on the Multilingual Hate Speech dataset (MLMA) of tweets show that most of the configurations of the proposed pipeline significantly outperform state-of-the-art clustering algorithms on French and English. Our experiments confirm the ability of the proposed approach to capture complex hate speech phenomena (i.e. intersections between victim-groups, target-types or both).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Basile V, Bosco C, Fersini E, Nozza D, Patti V, Pardo FMR, Rosso P, Sanguinetti M (2019) Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In: May J, Shutova E, Herbelot A, Zhu X, Apidianaki M, Mohammad SM (eds) Proceedings of the 13th international workshop on semantic evaluation, SemEval@NAACL-HLT 2019. Association for Computational Linguistics, Minneapolis, MN, USA, pp 54–63. https://doi.org/10.18653/v1/s19-2007 Basile V, Bosco C, Fersini E, Nozza D, Patti V, Pardo FMR, Rosso P, Sanguinetti M (2019) Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter. In: May J, Shutova E, Herbelot A, Zhu X, Apidianaki M, Mohammad SM (eds) Proceedings of the 13th international workshop on semantic evaluation, SemEval@NAACL-HLT 2019. Association for Computational Linguistics, Minneapolis, MN, USA, pp 54–63. https://​doi.​org/​10.​18653/​v1/​s19-2007
go back to reference Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of hindi-english code-mixed social media text for hate speech detection. In: Nissim M, Patti V, Plank B, Wagner C (eds) Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media, PEOPLES@NAACL-HTL 2018. Association for Computational Linguistics, New Orleans, Louisiana, USA, pp 36–41. https://doi.org/10.18653/v1/w18-1105 Bohra A, Vijay D, Singh V, Akhtar SS, Shrivastava M (2018) A dataset of hindi-english code-mixed social media text for hate speech detection. In: Nissim M, Patti V, Plank B, Wagner C (eds) Proceedings of the second workshop on computational modeling of people’s opinions, personality, and emotions in social media, PEOPLES@NAACL-HTL 2018. Association for Computational Linguistics, New Orleans, Louisiana, USA, pp 36–41. https://​doi.​org/​10.​18653/​v1/​w18-1105
go back to reference Caselli T, Basile V, Mitrovic J, Granitzer M (2020) Hatebert: Retraining BERT for abusive language detection in english. arXiv:2010.12472 Caselli T, Basile V, Mitrovic J, Granitzer M (2020) Hatebert: Retraining BERT for abusive language detection in english. arXiv:​2010.​12472
go back to reference Cer D, Yang Y, Kong S, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Sung Y, Strope B, Kurzweil R (2018) Universal sentence encoder. arXiv:1803.11175 Cer D, Yang Y, Kong S, Hua N, Limtiaco N, John RS, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Sung Y, Strope B, Kurzweil R (2018) Universal sentence encoder. arXiv:​1803.​11175
go back to reference Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the eleventh international conference on web and social media, ICWSM 2017, Montréal, Québec, Canada, May 15–18, 2017, pp 512–515 Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the eleventh international conference on web and social media, ICWSM 2017, Montréal, Québec, Canada, May 15–18, 2017, pp 512–515
go back to reference Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), pp 4171–4186 . https://aclweb.org/anthology/papers/N/N19/N19-1423/ Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, Volume 1 (Long and Short Papers), pp 4171–4186 . https://​aclweb.​org/​anthology/​papers/​N/​N19/​N19-1423/​
go back to reference Fortuna P, da Silva JR, Wanner L, Nunes S, et al (2019) A hierarchically-labeled portuguese hate speech dataset. In: Proceedings of the third workshop on abusive language online, pp 94–104 Fortuna P, da Silva JR, Wanner L, Nunes S, et al (2019) A hierarchically-labeled portuguese hate speech dataset. In: Proceedings of the third workshop on abusive language online, pp 94–104
go back to reference Jain M, Goel P, Singla P, Tehlan R (2021) Comparison of various word embeddings for hate-speech detection. In: Khanna A, Gupta D, Pólkowski Z, Bhattacharyya S, Castillo O (eds) Data analytics and management. Springer, Singapore, pp 251–265CrossRef Jain M, Goel P, Singla P, Tehlan R (2021) Comparison of various word embeddings for hate-speech detection. In: Khanna A, Gupta D, Pólkowski Z, Bhattacharyya S, Castillo O (eds) Data analytics and management. Springer, Singapore, pp 251–265CrossRef
go back to reference Kumar R, Ojha AK, Malmasi S, Zampieri M (2018) Benchmarking aggression identification in social media. In: Kumar R, Ojha AK, Zampieri M, Malmasi S (eds) Proceedings of the first workshop on trolling, aggression and cyberbullying, TRAC@COLING 2018. Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp 1–11. https://aclanthology.org/W18-4401/ Kumar R, Ojha AK, Malmasi S, Zampieri M (2018) Benchmarking aggression identification in social media. In: Kumar R, Ojha AK, Zampieri M, Malmasi S (eds) Proceedings of the first workshop on trolling, aggression and cyberbullying, TRAC@COLING 2018. Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp 1–11. https://​aclanthology.​org/​W18-4401/​
go back to reference Levene H (1960) Robust tests for equality of variances. Stanford University Press, pp 278–292 Levene H (1960) Robust tests for equality of variances. Stanford University Press, pp 278–292
go back to reference Liu P, Li W, Zou L (2019) NULI at semeval-2019 task 6: Transfer learning for offensive language detection using bidirectional transformers. In: May J, Shutova E, Herbelot A, Zhu X, Apidianaki M, Mohammad SM (eds) Proceedings of the 13th international workshop on semantic evaluation, SemEval@NAACL-HLT 2019. Association for Computational Linguistics, Minneapolis, MN, USA, pp 87–91. https://doi.org/10.18653/v1/s19-2011 Liu P, Li W, Zou L (2019) NULI at semeval-2019 task 6: Transfer learning for offensive language detection using bidirectional transformers. In: May J, Shutova E, Herbelot A, Zhu X, Apidianaki M, Mohammad SM (eds) Proceedings of the 13th international workshop on semantic evaluation, SemEval@NAACL-HLT 2019. Association for Computational Linguistics, Minneapolis, MN, USA, pp 87–91. https://​doi.​org/​10.​18653/​v1/​s19-2011
go back to reference MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Oakland, CA, USA, vol 1, pp 281–297 MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Oakland, CA, USA, vol 1, pp 281–297
go back to reference Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. J Am Soc Inform Sci Technol 1:496 Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. J Am Soc Inform Sci Technol 1:496
go back to reference Meena P, Pawar M, Pandey A (2021) A survey on community detection algorithm and its applications. Turk J Comput Math Educ (TURCOMAT) 12(6):4807–4815 Meena P, Pawar M, Pandey A (2021) A survey on community detection algorithm and its applications. Turk J Comput Math Educ (TURCOMAT) 12(6):4807–4815
go back to reference Nascimento G, Carvalho F, da Cunha AM, Viana CR, Guedes GP (2019) Hate speech detection using brazilian imageboards. In: dos Santos JAF, Muchaluat-Saade DC (eds) Proceedings of the 25th Brazillian symposium on multimedia and the web, WebMedia 2019. ACM, Rio de Janeiro, Brazil, pp 325–328. https://doi.org/10.1145/3323503.3360619 Nascimento G, Carvalho F, da Cunha AM, Viana CR, Guedes GP (2019) Hate speech detection using brazilian imageboards. In: dos Santos JAF, Muchaluat-Saade DC (eds) Proceedings of the 25th Brazillian symposium on multimedia and the web, WebMedia 2019. ACM, Rio de Janeiro, Brazil, pp 325–328. https://​doi.​org/​10.​1145/​3323503.​3360619
go back to reference Nobata C, Tetreault JR, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, WWW 2016, Montreal, Canada, April 11–15, 2016, pp 145–153 Nobata C, Tetreault JR, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. In: Proceedings of the 25th international conference on world wide web, WWW 2016, Montreal, Canada, April 11–15, 2016, pp 145–153
go back to reference Nockleby JT (2000) Hate speech. Encyclopedia of the American Constitution. 2nd ed., pp 1277–1279 Nockleby JT (2000) Hate speech. Encyclopedia of the American Constitution. 2nd ed., pp 1277–1279
go back to reference Ollagnier A, Williams HTP (2020) Sequential transfer learning for event detection and key sentence extraction. In: Wani MA, Luo F, Li XA, Dou D, Bonchi F (eds) 19th IEEE international conference on machine learning and applications, ICMLA 2020. IEEE, Miami, FL, USA, pp 1023–1027. https://doi.org/10.1109/ICMLA51294.2020.00166 Ollagnier A, Williams HTP (2020) Sequential transfer learning for event detection and key sentence extraction. In: Wani MA, Luo F, Li XA, Dou D, Bonchi F (eds) 19th IEEE international conference on machine learning and applications, ICMLA 2020. IEEE, Miami, FL, USA, pp 1023–1027. https://​doi.​org/​10.​1109/​ICMLA51294.​2020.​00166
go back to reference Ousidhoum N, Lin Z, Zhang H, Song Y, Yeung D (2019) Multilingual and multi-aspect hate speech analysis. In: Inui K, Jiang J, Ng V, Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019. Association for Computational Linguistics, Hong Kong, China, pp 4674–4683 (2019). https://doi.org/10.18653/v1/D19-1474 Ousidhoum N, Lin Z, Zhang H, Song Y, Yeung D (2019) Multilingual and multi-aspect hate speech analysis. In: Inui K, Jiang J, Ng V, Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019. Association for Computational Linguistics, Hong Kong, China, pp 4674–4683 (2019). https://​doi.​org/​10.​18653/​v1/​D19-1474
go back to reference Poletto F, Basile V, Bosco C, Patti V, Stranisci M (2019) Annotating hate speech: Three schemes at comparison. In: Bernardi R, Navigli R, Semeraro G (eds) Proceedings of the sixth italian conference on computational linguistics. CEUR Workshop Proceedings, vol 2481. CEUR-WS.org, Bari, Italy . http://ceur-ws.org/Vol-2481/paper56.pdf Poletto F, Basile V, Bosco C, Patti V, Stranisci M (2019) Annotating hate speech: Three schemes at comparison. In: Bernardi R, Navigli R, Semeraro G (eds) Proceedings of the sixth italian conference on computational linguistics. CEUR Workshop Proceedings, vol 2481. CEUR-WS.org, Bari, Italy . http://​ceur-ws.​org/​Vol-2481/​paper56.​pdf
go back to reference Pradha S, Halgamuge MN, Vinh NTQ (2019) Effective text data preprocessing technique for sentiment analysis in social media data. In: 11th international conference on knowledge and systems engineering, KSE 2019, Da Nang, Vietnam, October 24–26, 2019. IEEE, pp 1–8. https://doi.org/10.1109/KSE.2019.8919368 Pradha S, Halgamuge MN, Vinh NTQ (2019) Effective text data preprocessing technique for sentiment analysis in social media data. In: 11th international conference on knowledge and systems engineering, KSE 2019, Da Nang, Vietnam, October 24–26, 2019. IEEE, pp 1–8. https://​doi.​org/​10.​1109/​KSE.​2019.​8919368
go back to reference Rout N, Mishra D, Mallick MK (2018) Handling imbalanced data: a survey. In: Reddy MS, Viswanath K (eds) International proceedings on advances in soft computing, intelligent systems and applications. Springer, Singapore, pp 431–443 Rout N, Mishra D, Mallick MK (2018) Handling imbalanced data: a survey. In: Reddy MS, Viswanath K (eds) International proceedings on advances in soft computing, intelligent systems and applications. Springer, Singapore, pp 431–443
go back to reference Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611MathSciNetCrossRef Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611MathSciNetCrossRef
go back to reference Tian Z, Kübler S (2020) Offensive language detection using brown clustering. In: Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, Goggi S, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S (eds) Proceedings of The 12th language resources and evaluation conference, LREC 2020, Marseille, France, May 11–16, 2020. European Language Resources Association, pp 5079–5087. https://aclanthology.org/2020.lrec-1.625/ Tian Z, Kübler S (2020) Offensive language detection using brown clustering. In: Calzolari N, Béchet F, Blache P, Choukri K, Cieri C, Declerck T, Goggi S, Isahara H, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J, Piperidis S (eds) Proceedings of The 12th language resources and evaluation conference, LREC 2020, Marseille, France, May 11–16, 2020. European Language Resources Association, pp 5079–5087. https://​aclanthology.​org/​2020.​lrec-1.​625/​
go back to reference Vigna FD, Cimino A, Dell’Orletta F, Petrocchi M, Tesconi M (2017) Hate me, hate me not: Hate speech detection on facebook. In: Armando A, Baldoni R, Focardi R (eds) Proceedings of the first Italian conference on cybersecurity (ITASEC17). CEUR Workshop Proceedings. CEUR-WS.org, Venice, Italy, vol 1816, pp 86–95. http://ceur-ws.org/Vol-1816/paper-09.pdf Vigna FD, Cimino A, Dell’Orletta F, Petrocchi M, Tesconi M (2017) Hate me, hate me not: Hate speech detection on facebook. In: Armando A, Baldoni R, Focardi R (eds) Proceedings of the first Italian conference on cybersecurity (ITASEC17). CEUR Workshop Proceedings. CEUR-WS.org, Venice, Italy, vol 1816, pp 86–95. http://​ceur-ws.​org/​Vol-1816/​paper-09.​pdf
go back to reference Vijaymeena M, Kavitha K (2016) A survey on similarity measures in text mining. Mach Learn Appl Int J 3(2):19–28 Vijaymeena M, Kavitha K (2016) A survey on similarity measures in text mining. Mach Learn Appl Int J 3(2):19–28
go back to reference Vogel I, Meghana M (2021) Profiling hate speech spreaders on twitter: SVM vs. bi-lstm. In: Faggioli G, Ferro N, Joly A, Maistro M, Piroi F (eds) Proceedings of the working notes of CLEF 2021: conference and labs of the evaluation forum. CEUR Workshop Proceedings, vol 2936. CEUR-WS.org, Bucharest, Romania, pp 2193–2200. http://ceur-ws.org/Vol-2936/paper-196.pdf Vogel I, Meghana M (2021) Profiling hate speech spreaders on twitter: SVM vs. bi-lstm. In: Faggioli G, Ferro N, Joly A, Maistro M, Piroi F (eds) Proceedings of the working notes of CLEF 2021: conference and labs of the evaluation forum. CEUR Workshop Proceedings, vol 2936. CEUR-WS.org, Bucharest, Romania, pp 2193–2200. http://​ceur-ws.​org/​Vol-2936/​paper-196.​pdf
go back to reference Wang A, Gao X (2019) Multifunctional product marketing using social media based on the variable-scale clustering. Tehnički vjesnik 26(1):193–200 Wang A, Gao X (2019) Multifunctional product marketing using social media based on the variable-scale clustering. Tehnički vjesnik 26(1):193–200
go back to reference Yang Y, Cer D, Ahmad A, Guo M, Law J, Constant N, Ábrego GH, Yuan S, Tar C, Sung Y, Strope B, Kurzweil R (2020) Multilingual universal sentence encoder for semantic retrieval. In: Celikyilmaz A, Wen T (eds.) Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations, ACL 2020. Association for Computational Linguistics, Online, pp 87–94. https://doi.org/10.18653/v1/2020.acl-demos.12 Yang Y, Cer D, Ahmad A, Guo M, Law J, Constant N, Ábrego GH, Yuan S, Tar C, Sung Y, Strope B, Kurzweil R (2020) Multilingual universal sentence encoder for semantic retrieval. In: Celikyilmaz A, Wen T (eds.) Proceedings of the 58th annual meeting of the association for computational linguistics: system demonstrations, ACL 2020. Association for Computational Linguistics, Online, pp 87–94. https://​doi.​org/​10.​18653/​v1/​2020.​acl-demos.​12
go back to reference Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). In: May J, Shutova E, Herbelot A, Zhu X, Apidianaki M, Mohammad SM (eds) Proceedings of the 13th international workshop on semantic evaluation. Association for Computational Linguistics, Minneapolis, MN, USA, pp 75–86. https://doi.org/10.18653/v1/s19-2010 Zampieri M, Malmasi S, Nakov P, Rosenthal S, Farra N, Kumar R (2019) Semeval-2019 task 6: Identifying and categorizing offensive language in social media (offenseval). In: May J, Shutova E, Herbelot A, Zhu X, Apidianaki M, Mohammad SM (eds) Proceedings of the 13th international workshop on semantic evaluation. Association for Computational Linguistics, Minneapolis, MN, USA, pp 75–86. https://​doi.​org/​10.​18653/​v1/​s19-2010
go back to reference Zhang Z, Robinson D, Tepper JA (2018) Detecting hate speech on twitter using a convolution-gru based deep neural network. In: Gangemi A, Navigli R, Vidal M, Hitzler P, Troncy R, Hollink L, Tordai A, Alam M (eds) The semantic web: 15th international conference, ESWC 2018, proceedings, vol 10843. Lecture Notes in Computer Science. Springer, Heraklion, Crete, Greece, pp 745–760. https://doi.org/10.1007/978-3-319-93417-4_48CrossRef Zhang Z, Robinson D, Tepper JA (2018) Detecting hate speech on twitter using a convolution-gru based deep neural network. In: Gangemi A, Navigli R, Vidal M, Hitzler P, Troncy R, Hollink L, Tordai A, Alam M (eds) The semantic web: 15th international conference, ESWC 2018, proceedings, vol 10843. Lecture Notes in Computer Science. Springer, Heraklion, Crete, Greece, pp 745–760. https://​doi.​org/​10.​1007/​978-3-319-93417-4_​48CrossRef
Metadata
Title
Unsupervised fine-grained hate speech target community detection and characterisation on social media
Authors
Anaïs Ollagnier
Elena Cabrio
Serena Villata
Publication date
01-12-2023
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2023
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-023-01061-4

Premium Partner