Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2023

22.02.2023

Leveraging posts’ and authors’ metadata to spot several forms of abusive comments in Twitter

verfasst von: Marco Casavantes, Mario Ezra Aragón, Luis C. González, Manuel Montes-y-Gómez

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Social media is frequently plagued with undesirable phenomena such as cyberbullying and abusive content in the form of hateful and racist posts. Therefore, it is crucial to study and propose better mechanisms to automatically identify communication that promote hate speech, hostility, and aggressiveness. Traditional approaches have only focused on exploiting the content and writing style of social media posts while ignoring information related to their context. On the other hand, several recent works have reported some interesting findings in this direction, although they have lacked an exhaustive analysis of contextual information, and also an evaluation about if this same premise holds to detect different types of abusive comments, e.g. offensive, hostile and hateful. For this, we have extended seven Twitter benchmark datasets related to the detection of offensive, aggressive, hostile, and hateful communication. We evaluate our hypothesis by using three different learning models, considering classical (Bag of Words), advanced (Glove), and state-of-the-art (BERT) text representations. Experiments show statistically significant differences between the classification scores of all methods that use a combination of text and metadata in comparison to the classical view of only using the text content of the messages, thus suggesting the importance of paying attention to context to spot the different kinds of abusive comments on social networks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Just in one minute: Facebook users upload 147,000 photos, Twitter registers 319 new users, Instagram adds 350,000 new stories, etc. Source: https://​www.​socialmediatoday​.​com/​news/​what-happens-on-the-internet-every-minute-2020-version-infographic/​583340/​
 
2
Important to remark that although this data is particular to specific posts, the privacy of its authors is never compromised.
 
3
Those tweets were probably easier to spot and deleted by Twitter itself because of the racist keywords used for corpus collection.
 
5
A zero value means both variables are independent.
 
Literatur
Zurück zum Zitat Benavoli, A., Mangili, F., Corani, G., & et al. (2014). A bayesian wilcoxon signed-rank test based on the dirichlet process. In Proceedings of the 31st international conference on international conference on machine learning - Volume 32. JMLR.org, ICML’14, p. II–1026–II–1034. http://proceedings.mlr.press/v32/benavoli14.pdf. Benavoli, A., Mangili, F., Corani, G., & et al. (2014). A bayesian wilcoxon signed-rank test based on the dirichlet process. In Proceedings of the 31st international conference on international conference on machine learning - Volume 32. JMLR.org, ICML’14, p. II–1026–II–1034. http://​proceedings.​mlr.​press/​v32/​benavoli14.​pdf.
Zurück zum Zitat Casavantes, M., López, R., & González-Gurrola, L. C. (2019). Uach at mex-a3t 2019: Preliminary results on detecting aggressive tweets by adding author information via an unsupervised strategy. In Proceedings of the first workshop on Iberian languages evaluation forum (IberLEF 2019), CEUR WS proceedings. https://ceur-ws.org/Vol-2421/MEX-A3T_paper_8.pdf. Casavantes, M., López, R., & González-Gurrola, L. C. (2019). Uach at mex-a3t 2019: Preliminary results on detecting aggressive tweets by adding author information via an unsupervised strategy. In Proceedings of the first workshop on Iberian languages evaluation forum (IberLEF 2019), CEUR WS proceedings. https://​ceur-ws.​org/​Vol-2421/​MEX-A3T_​paper_​8.​pdf.
Zurück zum Zitat Chatzakou, D., Kourtellis, N., Blackburn, J., & et al. (2017). Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on web science conference, WebSci ’17 (pp. 13–22). New York: Association for computing machinery, https://doi.org/10.1145/3091478.3091487. Chatzakou, D., Kourtellis, N., Blackburn, J., & et al. (2017). Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on web science conference, WebSci ’17 (pp. 13–22). New York: Association for computing machinery, https://​doi.​org/​10.​1145/​3091478.​3091487.
Zurück zum Zitat Devlin, J., Chang, M.-W., Lee, K., & et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423 (pp. 4171–4186). Minnesota: Association for computational linguistics. Devlin, J., Chang, M.-W., Lee, K., & et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). https://​doi.​org/​10.​18653/​v1/​N19-1423. https://​aclanthology.​org/​N19-1423 (pp. 4171–4186). Minnesota: Association for computational linguistics.
Zurück zum Zitat Fersini, E., Nozza, D., & Rosso, P. (2018). Overview of the evalita 2018 task on automatic misogyny identification (ami). In EVALITA Evaluation of NLP and speech tools for Italian: proceedings of the final workshop 12-13 December 2018, Naples. Torino: Accademia University Press, https://doi.org/10.4000/books.aaccademia.4497. Fersini, E., Nozza, D., & Rosso, P. (2018). Overview of the evalita 2018 task on automatic misogyny identification (ami). In EVALITA Evaluation of NLP and speech tools for Italian: proceedings of the final workshop 12-13 December 2018, Naples. Torino: Accademia University Press, https://​doi.​org/​10.​4000/​books.​aaccademia.​4497.
Zurück zum Zitat Géron, A. (2017). Hands-on machine learning with scikit-learn and tensorflow: concepts, tools, and techniques to build intelligent systems, 1st edn. O’Reilly Media, Inc. Géron, A. (2017). Hands-on machine learning with scikit-learn and tensorflow: concepts, tools, and techniques to build intelligent systems, 1st edn. O’Reilly Media, Inc.
Zurück zum Zitat Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 31st international conference on neural information processing systems, NIPS’17 (pp. 1025–1035). Red Hook, NY: Curran Associates Inc. Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 31st international conference on neural information processing systems, NIPS’17 (pp. 1025–1035). Red Hook, NY: Curran Associates Inc.
Zurück zum Zitat Kumar, R., Ojha, A. K., Malmasi, S., & et al. (2018). Benchmarking aggression identification in social media. In Proceedings of the first workshop on trolling, aggression and cyberbullying (TRAC-2018). https://aclanthology.org/W18-4401 (pp. 1–11). New Mexico: Association for computational linguistics. Kumar, R., Ojha, A. K., Malmasi, S., & et al. (2018). Benchmarking aggression identification in social media. In Proceedings of the first workshop on trolling, aggression and cyberbullying (TRAC-2018). https://​aclanthology.​org/​W18-4401 (pp. 1–11). New Mexico: Association for computational linguistics.
Zurück zum Zitat Mandl, T., Modha, S., Majumder, P., & et al. (2019). Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages, FIRE ’19, (pp. 14–17). New York: Association for computing machinery. Mandl, T., Modha, S., Majumder, P., & et al. (2019). Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages, FIRE ’19, (pp. 14–17). New York: Association for computing machinery.
Zurück zum Zitat Mozafari, M., Farahbakhsh, R., & Crespi, N. (2020). A bert-based transfer learning approach for hate speech detection in online social media. In H. Cherifi, S. Gaito, J. F. Mendes, & et al. (Eds.) Complex networks and their applications VIII (pp. 928–940). Cham: Springer, https://doi.org/10.1007/978-3-030-36687-2_77. Mozafari, M., Farahbakhsh, R., & Crespi, N. (2020). A bert-based transfer learning approach for hate speech detection in online social media. In H. Cherifi, S. Gaito, J. F. Mendes, & et al. (Eds.) Complex networks and their applications VIII (pp. 928–940). Cham: Springer, https://​doi.​org/​10.​1007/​978-3-030-36687-2_​77.
Zurück zum Zitat Nelatoori, K., & Kommanti, H. (2022). Multi-task learning for toxic comment classification and rationale extraction. Journal of Intelligent Information Systems. Nelatoori, K., & Kommanti, H. (2022). Multi-task learning for toxic comment classification and rationale extraction. Journal of Intelligent Information Systems.
Zurück zum Zitat Nobata, C., Tetreault, J., Thomas, A., & et al. (2016). Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. international world wide web conferences steering committee, Republic and Canton of Geneva, CHE, WWW ’16, pp. 145–153, https://doi.org/10.1145/2872427.2883062. Nobata, C., Tetreault, J., Thomas, A., & et al. (2016). Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. international world wide web conferences steering committee, Republic and Canton of Geneva, CHE, WWW ’16, pp. 145–153, https://​doi.​org/​10.​1145/​2872427.​2883062.
Zurück zum Zitat Pandey, R., & Singh, J. (2022). Bert-lstm model for sarcasm detection in code-mixed social media post. Journal of Intelligent Information Systems :1–20. Pandey, R., & Singh, J. (2022). Bert-lstm model for sarcasm detection in code-mixed social media post. Journal of Intelligent Information Systems :1–20.
Zurück zum Zitat Plaza del Arco, F. M., Montejo-Ráez, A., Ureña-López, L. A., & et al. (2021b). OffendES: A new corpus in Spanish for offensive language research. In Proceedings of the international conference on recent advances in natural language processing (RANLP 2021), INCOMA Ltd., Held Online, pp. 1096–1108. https://aclanthology.org/2021.ranlp-1.123. Plaza del Arco, F. M., Montejo-Ráez, A., Ureña-López, L. A., & et al. (2021b). OffendES: A new corpus in Spanish for offensive language research. In Proceedings of the international conference on recent advances in natural language processing (RANLP 2021), INCOMA Ltd., Held Online, pp. 1096–1108. https://​aclanthology.​org/​2021.​ranlp-1.​123.
Zurück zum Zitat Sanguinetti, M., Comandini, G., di Nuovo, E., & et al. (2020). Haspeede 2 @ evalita2020: Overview of the evalita 2020 hate speech detection task. In V. Basile, D. Croce, M. Di Maro, & et al. (Eds.) Proceedings of the seventh evaluation campaign of natural language processing and speech tools for Italian. Final Workshop (EVALITA 2020), vol 2765. CEUR Workshop Proceedings (CEUR-WS.org). Sanguinetti, M., Comandini, G., di Nuovo, E., & et al. (2020). Haspeede 2 @ evalita2020: Overview of the evalita 2020 hate speech detection task. In V. Basile, D. Croce, M. Di Maro, & et al. (Eds.) Proceedings of the seventh evaluation campaign of natural language processing and speech tools for Italian. Final Workshop (EVALITA 2020), vol 2765. CEUR Workshop Proceedings (CEUR-WS.org).
Metadaten
Titel
Leveraging posts’ and authors’ metadata to spot several forms of abusive comments in Twitter
verfasst von
Marco Casavantes
Mario Ezra Aragón
Luis C. González
Manuel Montes-y-Gómez
Publikationsdatum
22.02.2023
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2023
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-023-00779-z

Weitere Artikel der Ausgabe 2/2023

Journal of Intelligent Information Systems 2/2023 Zur Ausgabe

Premium Partner