skip to main content
10.1145/3091478.3091509acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
short-paper
Public Access

A Large Labeled Corpus for Online Harassment Research

Authors Info & Claims
Published:25 June 2017Publication History

ABSTRACT

A fundamental part of conducting cross-disciplinary web science research is having useful, high-quality datasets that provide value to studies across disciplines. In this paper, we introduce a large, hand-coded corpus of online harassment data. A team of researchers collaboratively developed a codebook using grounded theory and labeled 35,000 tweets. Our resulting dataset has roughly 15% positive harassment examples and 85% negative examples. This data is useful for training machine learning models, identifying textual and linguistic features of online harassment, and for studying the nature of harassing comments and the culture of trolling.

References

  1. Uwe Bretschneider, Thomas Wöhner, and Ralf Peters. 2014. Detecting Online Harassment in Social Networks. (2014).Google ScholarGoogle Scholar
  2. Erin E Buckels, Paul D Trapnell, and Delroy L Paulhus. 2014. Trolls just want to have fun. Personality and individual Differences 67 (2014), 97--102.Google ScholarGoogle Scholar
  3. Maeve Duggan and Aaron Smith. 2013. Social media update 2013. Pew Internet and American Life Project (2013).Google ScholarGoogle Scholar
  4. Claire Hardaker. 2010. Trolling in asynchronous computer-mediated communication: from user discussions to theoretical concepts. Journal of Politeness Research 6, 2 (2010), 215--242.Google ScholarGoogle ScholarCross RefCross Ref
  5. April Kontostathis, Kelly Reynolds, Andy Garron, and Lynne Edwards. 2013. Detecting cyberbullying: query terms and techniques. In Proceedings of the 5th annual acm web science conference. ACM, 195--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sara Owsley Sood, Elizabeth F Churchill, and Judd Antin. 2012. Automatic identification of personal insults on social news sites. Journal of the American Society for Information Science and Technology 63, 2 (2012), 270--285. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Large Labeled Corpus for Online Harassment Research

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WebSci '17: Proceedings of the 2017 ACM on Web Science Conference
      June 2017
      438 pages
      ISBN:9781450348966
      DOI:10.1145/3091478

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 June 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      WebSci '17 Paper Acceptance Rate30of85submissions,35%Overall Acceptance Rate218of875submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader