skip to main content
10.1145/2740908.2742760acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
other

Hate Speech Detection with Comment Embeddings

Published:18 May 2015Publication History

ABSTRACT

We address the problem of hate speech detection in online user comments. Hate speech, defined as an "abusive speech targeting specific group characteristics, such as ethnicity, religion, or gender", is an important problem plaguing websites that allow users to leave feedback, having a negative impact on their online business and overall user experience. We propose to learn distributed low-dimensional representations of comments using recently proposed neural language models, that can then be fed as inputs to a classification algorithm. Our approach addresses issues of high-dimensionality and sparsity that impact the current state-of-the-art, resulting in highly efficient and effective hate speech detectors.

References

  1. P. Burnap and M. Williams. Hate speech, machine classification and statistical modelling of information flows on Twitter: Interpretation and communication for policy decision making. In IPP, 2014.Google ScholarGoogle Scholar
  2. I. Kwok and Y. Wang. Locate the hate: Detecting tweets against blacks. In AAAI, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv:1405.4053, 2014.Google ScholarGoogle Scholar
  4. T. M. Massaro. Equality and freedom of expression: The hate speech dilemma. Wm. & Mary L. Rev., 32:211, 1990.Google ScholarGoogle Scholar
  5. B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1--2):1--135, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Warner and J. Hirschberg. Detecting hate speech on the World Wide Web. In Workshop on Language in Social Media at ACL, pages 19--26, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Xu and S. Zhu. Filtering offensive language in online communities using grammatical relations. In Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, 2010.Google ScholarGoogle Scholar

Index Terms

  1. Hate Speech Detection with Comment Embeddings

      Recommendations

      Reviews

      Lalit P Saxena

      Hate speech comments in online forums are a form of offensive language targeted at specific groups with an aim to dishonor. Hate speech is also considered as synonym to misinformation, smears, and social pollution. The unmonitored activities of online social communities and uncontrollable access to the Internet are proliferating hate speech in online comments. The authors propose a two-step method to address the issue of hate speech detection in online comments. The method comprises a continuous bag-of-words (BOW) neural language model and embeddings using paragraph-to-vector and a binary classifier for training, respectively. In the first step, the method uses hierarchical soft-max to reduce time complexity, which enables efficient training. In the second step, the method learns vector representations for processing through a linear regression classifier to distinguish between hate speech and clean comments. The authors collected 56,280 hate speech comments and 895,456 clean comments from 209,776 anonymous Yahoo Finance website users over six months. They claim that the vocabulary size of 304,427 is the largest dataset of hate speech comments available in the literature. The neural language model accepts a continuous feature vector of dimensionality of size 200 and the context for word sequences of length 5 for 5 iterative processing. The authors compared the proposed method with BOW (term frequency) and BOW (term frequency-inverse document frequency) and use the area under the curve to validate their results. The authors present insights on the proposed method in terms of reduced training time and less memory usage compared to other methods. They further propose that their method is a solution to the hate speech detection problem, alongside reducing high dimensionality and sparsity issues in online comments. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web
        May 2015
        1602 pages
        ISBN:9781450334730
        DOI:10.1145/2740908

        Copyright © 2015 Copyright is held by the owner/author(s)

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 May 2015

        Check for updates

        Qualifiers

        • other

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader