skip to main content
10.1145/2566486.2567987acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games

Published:07 April 2014Publication History

ABSTRACT

One problem facing players of competitive games is negative, or toxic, behavior. League of Legends, the largest eSport game, uses a crowdsourcing platform called the Tribunal to judge whether a reported toxic player should be punished or not. The Tribunal is a two stage system requiring reports from those players that directly observe toxic behavior, and human experts that review aggregated reports. While this system has successfully dealt with the vague nature of toxic behavior by majority rules based on many votes, it naturally requires tremendous cost, time, and human efforts. In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions. Our result shows good performance in detecting overwhelmingly majority cases and predicting crowdsourced decisions on them. We demonstrate good portability of our classifier across regions. Finally, we estimate the practical implications of our approach, potential cost savings and victim protection.

References

  1. J. Barnett, M. Coulson, and N. Foreman. Examining player anger in World of Warcraft. In Online worlds: Convergence of the real and the virtual, pages 147--160. 2010.Google ScholarGoogle Scholar
  2. F. S. Bellezza, A. G. Greenwald, and M. R. Banaji. Words high and low in pleasantness as rated by male and female college students. Behavior Research Methods, Instruments, & Computers, 18(3):299--303, 1986.Google ScholarGoogle Scholar
  3. J. Blackburn, R. Simha, N. Kourtellis, X. Zuo, M. Ripeanu, J. Skvoretz, and A. Iamnitchi. Branded with a scarlet "C": Cheaters in a gaming social network. In WWW '12, pages 81--90, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. M. Bradley and P. J. Lang. Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical report, Technical Report C-1, The Center for Research in Psychophysiology, University of Florida, 1999.Google ScholarGoogle Scholar
  5. A. Brew, D. Greene, and P. Cunningham. Using crowdsourcing and active learning to track sentiment in online media. In ECAI, pages 145--150, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. V. H.-H. Chen, H. B.-L. Duh, and C. W. Ng. Players who play to make others cry: The influence of anonymity and immersion. In ACE '09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Chesney, I. Coyne, B. Logan, and N. Madden. Griefing in virtual worlds: Causes, casualties and coping strategies. Information Systems Journal, 19(6):525--548, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Davies. Gamers don't want any more grief, 2011. http://tinyurl.com/stfunub11.Google ScholarGoogle Scholar
  9. P. S. Dodds and C. M. Danforth. Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. Journal of Happiness Studies, 11(4):441--456, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  10. C. Y. Foo and E. M. I. Koivisto. Defining grief play in MMORPGs: Player and developer perceptions. In ACE '04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. R. Frank, L. Mitchell, P. S. Dodds, and C. M. Danforth. Happiness and the patterns of life: A study of geolocated tweets. Scientific Reports, 3(2625), 2013.Google ScholarGoogle Scholar
  12. M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. CrowdDB: Answering queries with crowdsourcing. In SIGMOD '11, pages 61--72, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. A. Golder and M. W. Macy. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science, 333(6051):1878--1881, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  14. M. F. Goodchild and J. A. Glennon. Crowdsourcing geographic information for disaster response: a research frontier. International Journal of Digital Earth, 3(3):231--241, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Heer and M. Bostock. Crowdsourcing graphical perception: Using Mechanical Turk to assess visualization design. In CHI '10, pages 203--212, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. S. Ho and D. M. McLeod. Social-psychological influences on opinion expression in face-to-face and computer-mediated communication. Communication Research, 35(2):190--207, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Howe. The rise of crowdsourcing. Wired magazine, 14(6):1--4, 2006.Google ScholarGoogle Scholar
  18. P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on Amazon Mechanical Turk. In SIGKDD '10 Workshop on Human Computation, pages 64--67, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Kamar, S. Hacker, and E. Horvitz. Combining human and machine intelligence in large-scale crowdsourcing. In AAMAS '12, pages 467--474, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with Mechanical Turk. In CHI '08, pages 453--456, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Kumar and M. Lease. Modeling annotator accuracies for supervised learning. In WSDM '11 Workshop on Crowdsourcing for Search and Data Mining, pages 19--22, 2011.Google ScholarGoogle Scholar
  22. H. Kwak and S. Han. "So many bad guys, so little time": Understanding toxic behavior and reaction in team competition games. Submitted.Google ScholarGoogle Scholar
  23. H. Lin and C.-T. Sun. The "white-eyed" player culture: Grief play and construction of deviance in MMORPGs. In DiGRA '05, 2005.Google ScholarGoogle Scholar
  24. J.-K. Lou, K. Park, M. Cha, J. Park, C.-L. Lei, and K.-T. Chen. Gender swapping and user behaviors in online social games. In WWW '13, pages 827--836, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Y. A. McKenna and J. A. Bargh. Plan 9 from cyberspace: The implications of the Internet for personality and social psychology. Personality and Social Psychology Review, 4(1):57--75, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  26. L. Mitchell, M. R. Frank, K. D. Harris, P. S. Dodds, and C. M. Danforth. The geography of happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place. PloS one, 8(5):e64417, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Mulligan, B. Patrovsky, and R. Koster. Developing online games: An insider's guide. Pearson Education, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Nowak and S. Rüger. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the international conference on Multimedia information retrieval, MIR '10, pages 557--566, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Olweus. Bullying at school: Long term outcomes for the victims and an effective school-based intervention program. Aggressive Behavior: Current Perspectives, pages 97--130, 1996.Google ScholarGoogle Scholar
  30. G. Paolacci, J. Chandler, and P. Ipeirotis. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5):411--419, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. J. Quinn and B. B. Bederson. Human-machine hybrid computation. In CHI '11 Workshop On Crowdsourcing And Human Computation, 2011.Google ScholarGoogle Scholar
  32. A. J. Quinn, B. B. Bederson, T. Yeh, and J. Lin. CrowdFlow: Integrating machine learning with Mechanical Turk for speed-cost-quality flexibility. Better Performance Over Iterations, 2010.Google ScholarGoogle Scholar
  33. V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? Improving data quality and data mining using multiple, noisy labelers. In KDD '08, pages 614--622, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. K. Smith, J. Mahdavi, M. Carvalho, S. Fisher, S. Russell, and N. Tippett. Cyberbullying: Its nature and impact in secondary school pupils. Journal of Child Psychology and Psychiatry, 49(4):376--385, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  35. R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast--but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing, pages 254--263. Association for Computational Linguistics, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Suler. The online disinhibition effect. Cyberpsychology & behavior, 7(3):321--326, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  37. W. Tang and M. Lease. Semi-supervised consensus labeling for crowdsourcing. In SIGIR '11 Workshop on Crowdsourcing for Information Retrieval, 2011.Google ScholarGoogle Scholar
  38. A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with Twitter: What 140 characters reveal about political sentiment. In ICWSM '10, pages 178--185, 2010.Google ScholarGoogle Scholar
  39. R. Van Houten. Punishment: From the animal laboratory to the applied setting. The Effects of Punishment on Human Behavior, pages 13--44, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  40. E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information processing & management, 36(5):697--716, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. E. Warner and M. Ratier. Social context in massively-multiplayer online games (MMOGs): Ethical questions in shared space. International Review of Information Ethics, 4(7), 2005.Google ScholarGoogle Scholar
  42. B. Weiner. A cognitive (attribution)-emotion-action model of motivated behavior: An analysis of judgments of help-giving. Journal of Personality and Social Psychology, 39(2):186, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  43. O. Zaidan and C. Callison-Burch. Crowdsourcing translation: Professional quality from non-professionals. In ACL '11, pages 1220--1229, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '14: Proceedings of the 23rd international conference on World wide web
          April 2014
          926 pages
          ISBN:9781450327442
          DOI:10.1145/2566486

          Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 April 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          WWW '14 Paper Acceptance Rate84of645submissions,13%Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader