Top

Published in:

2018 | OriginalPaper | Chapter

Fighting Adversarial Attacks on Online Abusive Language Moderation

Authors : Nestor Rodriguez, Sergio Rojas-Galeano

Published in: Applied Computer Sciences in Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Lack of moderation in online conversations may result in personal aggression, harassment or cyberbullying. Such kind of hostility is usually expressed by using profanity or abusive language. On the basis of this assumption, recently Google has developed a machine-learning model to detect hostility within a comment. The model is able to assess to what extent abusive language is poisoning a conversation, obtaining a “toxicity” score for the comment. Unfortunately, it has been suggested that such a toxicity model can be deceived by adversarial attacks that manipulate the text sequence of the abusive language. In this paper we aim to fight this anomaly; firstly we characterise two types of adversarial attacks, one using obfuscation and the other using polarity transformations. Then, we propose a two–stage approach to disarm such attacks by coupling a text deobfuscation method and the toxicity scoring model. The approach was validated on a dataset of approximately 24000 distorted comments showing that it is feasible to restore the toxicity score of the adversarial variants. We anticipate that combining machine learning and text pattern recognition methods operating on different layers of linguistic features, will help to foster aggression–safe online conversations despite the adversary challenges inherent to the versatile nature of written language.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Programming Model for Decentralised Data Networks

next chapter Simulation of a 14 Node IEEE System with Distributed Generation Using Quasi-dynamic Analysis

Available only for authorised users

Dale, R.: NLP in a post-truth world. Nat. Lang. Eng. 23(2), 319–324 (2017)MathSciNetCrossRef

Hosseinmardi, H.: Survey of computational methods in cyberbullying research. In: Proceedings of the First International Workshop on Computational Methods for CyberSafety. ACM, New York (2016)

Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(1), 11 (2016)CrossRef

Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., Chang, Y.: Abusive language detection in online user content. In: Proceedings of the 25th International Conference on World Wide Web (2016)

Wulczyn, E., Thain, N., Dixon, L.: Ex machina: personal attacks seen at scale. arXiv preprint arXiv:1610.08914, February 2017

Hosseini, H., Kannan, S., Zhang, B., Poovendran, R.: Deceiving google’s perspective API built for detecting toxic comments. arXiv preprint arXiv:1702.08138, February 2017

Rojas-Galeano, S.: On obstructing obscenity obfuscation. ACM Trans. Web 11(2), 12:1–12:24 (2017). https://doi.org/10.1145/3032963CrossRef

Laskov, P., Lippmann, R.: Machine learning in adversarial environments. Mach. Learn. 81(2), 115–119 (2010)CrossRef

Samanta, S., Mehta, S.: Towards crafting text adversarial samples. arXiv preprint arXiv:1707.02812 (2017)

10.

PerspectiveAPI: Jigsaw (2017). https://www.perspectiveapi.com. Accessed 26 May 2018

11.

TextPatrolAPI: TPLabs (2017). https://api.textpatrol.tk. Accessed 26 May 2018

12.

Stone, T.E., McMillan, M., Hazelton, M.: Back to swear one: a review of English language literature on swearing and cursing in western health settings. Aggress. Violent Behav. 25, 65–74 (2015)CrossRef

13.

Hosseinmardi, H., Mattson, S.A., Ibn Rafiq, R., Han, R., Lv, Q., Mishra, S.: Analyzing labeled cyberbullying incidents on the instagram social network. In: Liu, T.Y., Scollon, C., Zhu, W. (eds.) Social Informatics. LNCS, vol. 9471, pp. 49–66. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27433-1_4CrossRef

Title: Fighting Adversarial Attacks on Online Abusive Language Moderation
Authors: Nestor Rodriguez
Sergio Rojas-Galeano
Publisher: Springer International Publishing
Book: Applied Computer Sciences in Engineering
Print ISBN: 978-3-030-00349-4

Electronic ISBN: 978-3-030-00350-0

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-030-00350-0_40

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner