ABSTRACT
We address the problem of hate speech detection in online user comments. Hate speech, defined as an "abusive speech targeting specific group characteristics, such as ethnicity, religion, or gender", is an important problem plaguing websites that allow users to leave feedback, having a negative impact on their online business and overall user experience. We propose to learn distributed low-dimensional representations of comments using recently proposed neural language models, that can then be fed as inputs to a classification algorithm. Our approach addresses issues of high-dimensionality and sparsity that impact the current state-of-the-art, resulting in highly efficient and effective hate speech detectors.
- P. Burnap and M. Williams. Hate speech, machine classification and statistical modelling of information flows on Twitter: Interpretation and communication for policy decision making. In IPP, 2014.Google Scholar
- I. Kwok and Y. Wang. Locate the hate: Detecting tweets against blacks. In AAAI, 2013.Google ScholarDigital Library
- Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv:1405.4053, 2014.Google Scholar
- T. M. Massaro. Equality and freedom of expression: The hate speech dilemma. Wm. & Mary L. Rev., 32:211, 1990.Google Scholar
- B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1--2):1--135, 2008. Google ScholarDigital Library
- W. Warner and J. Hirschberg. Detecting hate speech on the World Wide Web. In Workshop on Language in Social Media at ACL, pages 19--26, 2012. Google ScholarDigital Library
- Z. Xu and S. Zhu. Filtering offensive language in online communities using grammatical relations. In Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, 2010.Google Scholar
Index Terms
- Hate Speech Detection with Comment Embeddings
Recommendations
DeepHate: Hate Speech Detection via Multi-Faceted Text Representations
WebSci '20: Proceedings of the 12th ACM Conference on Web ScienceOnline hate speech is an important issue that breaks the cohesiveness of online social communities and even raises public safety concerns in our societies. Motivated by this rising issue, researchers have developed many traditional machine learning and ...
Hate Speech Detection Using Static BERT Embeddings
Big Data AnalyticsAbstractWith increasing popularity of social media platforms hate speech is emerging as a major concern, where it expresses abusive speech that targets specific group characteristics, such as gender, religion or ethnicity to spread violence. Earlier ...
Hate Speech Detection in Roman Urdu
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular PapersHate speech is a specific type of controversial content that is widely legislated as a crime that must be identified and blocked. However, due to the sheer volume and velocity of the Twitter data stream, hate speech detection cannot be performed ...
Comments