Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2023

29.04.2023 | Research

Offensive language identification with multi-task learning

verfasst von: Marcos Zampieri, Tharindu Ranasinghe, Diptanu Sarkar, Alex Ororbia

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The widespread presence of offensive content is a major issue in social media. This has motivated the development of computational models to identify such content in posts or conversations. Most of these models, however, treat offensive language identification as an isolated task. Very recently, a few datasets have been annotated with post-level offensiveness and related phenomena, such as offensive tokens, humor, engaging content, etc., creating the opportunity of modeling related tasks jointly which will help improve the explainability of offensive language detection systems and potentially aid human moderators. This study proposes a novel multi-task learning (MTL) architecture that can predict: (1) offensiveness at both post and token levels in English; and (2) offensiveness and related subjective tasks such as humor, engaging content, and gender bias identification in multilingual settings. Our results show that the proposed multi-task learning architecture outperforms current state-of-the-art methods trained to identify offense at the post level. We further demonstrate that MTL outperforms single-task learning (STL) across different tasks and language combinations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abu Farha, I., & Magdy, W. (2020). Multitask learning for Arabic offensive language and hate-speech detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 86–90). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.14 Abu Farha, I., & Magdy, W. (2020). Multitask learning for Arabic offensive language and hate-speech detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 86–90). Marseille, France: European Language Resource Association. https://​aclanthology.​org/​2020.​osact-1.​14
Zurück zum Zitat Antoun, W., Baly, F., & Hajj, H. (2020). AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 9–15). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.2 Antoun, W., Baly, F., & Hajj, H. (2020). AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 9–15). Marseille, France: European Language Resource Association. https://​aclanthology.​org/​2020.​osact-1.​2
Zurück zum Zitat Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel Pardo, F. M., Rosso, P., & Sanguinetti, M. (2019). SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (pp. 54–63). Minneapolis, Minnesota, USA: Association for Computational Linguistics. https://doi.org/10.18653/v1/S19-2007, https://aclanthology.org/S19-2007 Basile, V., Bosco, C., Fersini, E., Nozza, D., Patti, V., Rangel Pardo, F. M., Rosso, P., & Sanguinetti, M. (2019). SemEval-2019 task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (pp. 54–63). Minneapolis, Minnesota, USA: Association for Computational Linguistics. https://​doi.​org/​10.​18653/​v1/​S19-2007, https://​aclanthology.​org/​S19-2007
Zurück zum Zitat Castro, S., Hazarika, D., Pérez-Rosas, V., Zimmermann, R., Mihalcea, R., & Poria, S. (2019). Towards multimodal sarcasm detection (an _Obviously_ perfect paper). In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4619–4629). Florence, Italy: Association for Computational Linguistics. https://aclanthology.org/P19-1455 Castro, S., Hazarika, D., Pérez-Rosas, V., Zimmermann, R., Mihalcea, R., & Poria, S. (2019). Towards multimodal sarcasm detection (an _Obviously_ perfect paper). In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4619–4629). Florence, Italy: Association for Computational Linguistics. https://​aclanthology.​org/​P19-1455
Zurück zum Zitat Chang, J. P., Cheng, J., & Danescu-Niculescu-Mizil, C. (2020). Don’t let me be misunderstood:comparing intentions and perceptions in online discussions. In Proceedings of The Web Conference 2020, WWW ’20 (pp. 2066–2077). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3366423.3380273 Chang, J. P., Cheng, J., & Danescu-Niculescu-Mizil, C. (2020). Don’t let me be misunderstood:comparing intentions and perceptions in online discussions. In Proceedings of The Web Conference 2020, WWW ’20 (pp. 2066–2077). New York, NY, USA: Association for Computing Machinery. https://​doi.​org/​10.​1145/​3366423.​3380273
Zurück zum Zitat Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08 (pp. 160–167). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/1390156.1390177 Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08 (pp. 160–167). New York, NY, USA: Association for Computing Machinery. https://​doi.​org/​10.​1145/​1390156.​1390177
Zurück zum Zitat Çöltekin, Ç. (2020). A corpus of Turkish offensive language on social media. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6174–6184). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.758 Çöltekin, Ç. (2020). A corpus of Turkish offensive language on social media. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6174–6184). Marseille, France: European Language Resources Association. https://​aclanthology.​org/​2020.​lrec-1.​758
Zurück zum Zitat Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., & Sen, P. (2020). A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (pp. 447–459). Suzhou, China: Association for Computational Linguistics. https://aclanthology.org/2020.aacl-main.46 Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., & Sen, P. (2020). A survey of the state of explainable AI for natural language processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (pp. 447–459). Suzhou, China: Association for Computational Linguistics. https://​aclanthology.​org/​2020.​aacl-main.​46
Zurück zum Zitat Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Minneapolis, Minnesota: Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423 Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Minneapolis, Minnesota: Association for Computational Linguistics. https://​doi.​org/​10.​18653/​v1/​N19-1423, https://​aclanthology.​org/​N19-1423
Zurück zum Zitat Djandji, M., Baly, F., Antoun, W., & Hajj, H. (2020). Multi-task learning using AraBert for offensive language detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 97–101). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.16 Djandji, M., Baly, F., Antoun, W., & Hajj, H. (2020). Multi-task learning using AraBert for offensive language detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 97–101). Marseille, France: European Language Resource Association. https://​aclanthology.​org/​2020.​osact-1.​16
Zurück zum Zitat Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
Zurück zum Zitat Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2018). Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) (pp. 1–11). Santa Fe, New Mexico, USA: Association for Computational Linguistics. https://aclanthology.org/W18-4401 Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2018). Benchmarking aggression identification in social media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018) (pp. 1–11). Santa Fe, New Mexico, USA: Association for Computational Linguistics. https://​aclanthology.​org/​W18-4401
Zurück zum Zitat Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2020). Evaluating aggression identification in social media. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 1–5). Marseille, France: European Language Resources Association (ELRA). https://aclanthology.org/2020.trac-1.1 Kumar, R., Ojha, A. K., Malmasi, S., & Zampieri, M. (2020). Evaluating aggression identification in social media. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 1–5). Marseille, France: European Language Resources Association (ELRA). https://​aclanthology.​org/​2020.​trac-1.​1
Zurück zum Zitat Kumar, R., Ratan, S., Singh, S., Nandi, E., Devi, L. N., Bhagat, A., Dawer, Y., Lahiri, B., & Bansal, A. (2021). ComMA@ICON: Multilingual gender biased and communal language identification task at ICON-2021. In Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification (pp. 1–12). NIT Silchar: NLP Association of India (NLPAI). https://aclanthology.org/2021.icon-multigen.1 Kumar, R., Ratan, S., Singh, S., Nandi, E., Devi, L. N., Bhagat, A., Dawer, Y., Lahiri, B., & Bansal, A. (2021). ComMA@ICON: Multilingual gender biased and communal language identification task at ICON-2021. In Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification (pp. 1–12). NIT Silchar: NLP Association of India (NLPAI). https://​aclanthology.​org/​2021.​icon-multigen.​1
Zurück zum Zitat Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019b). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019b). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:​1907.​11692.
Zurück zum Zitat Modha, S., Mandl, T., Shahi, G. K., Madhu, H., Satapara, S., Ranasinghe, T., & Zampieri, M. (2022). Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech. In Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE ’21 (pp. 1–3). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3503162.3503176 Modha, S., Mandl, T., Shahi, G. K., Madhu, H., Satapara, S., Ranasinghe, T., & Zampieri, M. (2022). Overview of the hasoc subtrack at fire 2021: Hate speech and offensive content identification in english and indo-aryan languages and conversational hate speech. In Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation, FIRE ’21 (pp. 1–3). New York, NY, USA: Association for Computing Machinery. https://​doi.​org/​10.​1145/​3503162.​3503176
Zurück zum Zitat Mubarak, H., Darwish, K., Magdy, W., Elsayed, T., & Al-Khalifa, H. (2020). Overview of OSACT4 Arabic offensive language detection shared task. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 48–52). Marseille, France: European Language Resource Association. https://aclanthology.org/2020.osact-1.7 Mubarak, H., Darwish, K., Magdy, W., Elsayed, T., & Al-Khalifa, H. (2020). Overview of OSACT4 Arabic offensive language detection shared task. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (pp. 48–52). Marseille, France: European Language Resource Association. https://​aclanthology.​org/​2020.​osact-1.​7
Zurück zum Zitat Mubarak, H., Rashed, A., Darwish, K., Samih, Y., & Abdelali, A. (2021). Arabic offensive language on Twitter: Analysis and experiments. In Proceedings of the Sixth Arabic Natural Language Processing Workshop (pp. 126–135). Kyiv, Ukraine (Virtual): Association for Computational Linguistics. https://aclanthology.org/2021.wanlp-1.13 Mubarak, H., Rashed, A., Darwish, K., Samih, Y., & Abdelali, A. (2021). Arabic offensive language on Twitter: Analysis and experiments. In Proceedings of the Sixth Arabic Natural Language Processing Workshop (pp. 126–135). Kyiv, Ukraine (Virtual): Association for Computational Linguistics. https://​aclanthology.​org/​2021.​wanlp-1.​13
Zurück zum Zitat Pitenis, Z., Zampieri, M., Ranasinghe, T. (2020) Offensive language identification in Greek. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 5113–5119). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.629 Pitenis, Z., Zampieri, M., Ranasinghe, T. (2020) Offensive language identification in Greek. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 5113–5119). Marseille, France: European Language Resources Association. https://​aclanthology.​org/​2020.​lrec-1.​629
Zurück zum Zitat Risch, J., & Krestel, R. (2020). Bagging BERT models for robust aggression identification. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 55–61). Marseille, France: European Language Resources Association (ELRA). https://aclanthology.org/2020.trac-1.9 Risch, J., & Krestel, R. (2020). Bagging BERT models for robust aggression identification. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying (pp. 55–61). Marseille, France: European Language Resources Association (ELRA). https://​aclanthology.​org/​2020.​trac-1.​9
Zurück zum Zitat Risch, J., Stoll, A., Wilms, L., & Wiegand, M. (2021). Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments (pp. 1–12). Duesseldorf, Germany: Association for Computational Linguistics. https://aclanthology.org/2021.germeval-1.1 Risch, J., Stoll, A., Wilms, L., & Wiegand, M. (2021). Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. In Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments (pp. 1–12). Duesseldorf, Germany: Association for Computational Linguistics. https://​aclanthology.​org/​2021.​germeval-1.​1
Zurück zum Zitat Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P. C., Carvalho, J. P., Oliveira, S., Coheur, L., Paulino, P., Simão, A. V., & Trancoso, I. (2019). Automatic cyberbullying detection: A systematic review. Computers in Human Behavior, 93, 333–345.CrossRef Rosa, H., Pereira, N., Ribeiro, R., Ferreira, P. C., Carvalho, J. P., Oliveira, S., Coheur, L., Paulino, P., Simão, A. V., & Trancoso, I. (2019). Automatic cyberbullying detection: A systematic review. Computers in Human Behavior, 93, 333–345.CrossRef
Zurück zum Zitat Sarkar, D. (2021). An empirical study of offensive language in online interactions. Sarkar, D. (2021). An empirical study of offensive language in online interactions.
Zurück zum Zitat Satapara, S., Majumder, P., Mandl, T., Modha, S., Madhu, H., Ranasinghe, T., Zampieri, M., North, K., & Premasiri, D. (2023). Overview of the hasoc subtrack at fire 2022: Hate speech and offensive content identification in english and indo-aryan languages. In Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation FIRE ’22, (pp. 4–7). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3574318.3574326 Satapara, S., Majumder, P., Mandl, T., Modha, S., Madhu, H., Ranasinghe, T., Zampieri, M., North, K., & Premasiri, D. (2023). Overview of the hasoc subtrack at fire 2022: Hate speech and offensive content identification in english and indo-aryan languages. In Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation FIRE ’22, (pp. 4–7). New York, NY, USA: Association for Computing Machinery. https://​doi.​org/​10.​1145/​3574318.​3574326
Zurück zum Zitat Sellam, T., Yadlowsky, S., Tenney, I., Wei, J., Saphra, N., D’Amour, A., Linzen, T., Bastings, J., Turc, I. R., Eisenstein, J., Das, D., & Pavlick, E. (2022). The multiBERTs: BERT reproductions for robustness analysis. In International Conference on Learning Representations. https://openreview.net/forum?id=K0E_F0gFDgA Sellam, T., Yadlowsky, S., Tenney, I., Wei, J., Saphra, N., D’Amour, A., Linzen, T., Bastings, J., Turc, I. R., Eisenstein, J., Das, D., & Pavlick, E. (2022). The multiBERTs: BERT reproductions for robustness analysis. In International Conference on Learning Representations. https://​openreview.​net/​forum?​id=​K0E_​F0gFDgA
Zurück zum Zitat Sigurbergsson, G. I., & Derczynski, L. (2020). Offensive language and hate speech detection for Danish. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 3498–3508). Marseille, France: European Language Resources Association. https://aclanthology.org/2020.lrec-1.430 Sigurbergsson, G. I., & Derczynski, L. (2020). Offensive language and hate speech detection for Danish. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 3498–3508). Marseille, France: European Language Resources Association. https://​aclanthology.​org/​2020.​lrec-1.​430
Zurück zum Zitat Zhang, J., Chang, J., Danescu-Niculescu-Mizil, C., Dixon, L., Hua, Y., Taraborelli, D., & Thain, N. (2018). Conversations gone awry: Detecting early signs of conversational failure. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1350–1361). Melbourne, Australia: Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1125, https://aclanthology.org/P18-1125 Zhang, J., Chang, J., Danescu-Niculescu-Mizil, C., Dixon, L., Hua, Y., Taraborelli, D., & Thain, N. (2018). Conversations gone awry: Detecting early signs of conversational failure. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1350–1361). Melbourne, Australia: Association for Computational Linguistics. https://​doi.​org/​10.​18653/​v1/​P18-1125, https://​aclanthology.​org/​P18-1125
Zurück zum Zitat Zhao, X., Li, H., Shen, X., Liang, X., & Wu, Y. (2018). A modulation module for multi-task learning with applications in image retrieval. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision - ECCV 2018 (pp. 415–432). Cham: Springer International Publishing.CrossRef Zhao, X., Li, H., Shen, X., Liang, X., & Wu, Y. (2018). A modulation module for multi-task learning with applications in image retrieval. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision - ECCV 2018 (pp. 415–432). Cham: Springer International Publishing.CrossRef
Metadaten
Titel
Offensive language identification with multi-task learning
verfasst von
Marcos Zampieri
Tharindu Ranasinghe
Diptanu Sarkar
Alex Ororbia
Publikationsdatum
29.04.2023
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2023
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-023-00787-z

Weitere Artikel der Ausgabe 3/2023

Journal of Intelligent Information Systems 3/2023 Zur Ausgabe

Premium Partner