research-article

STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games

Authors:
Jeremy Blackburn

University of South Florida, Tampa, FL, USA

University of South Florida, Tampa, FL, USA
View Profile

,
Haewoon Kwak

Telefonica Research, Barcelona, Spain

Telefonica Research, Barcelona, Spain
View Profile

WWW '14: Proceedings of the 23rd international conference on World wide webApril 2014Pages 877–888https://doi.org/10.1145/2566486.2567987

Published:07 April 2014Publication History

WWW '14: Proceedings of the 23rd international conference on World wide web

Pages 877–888

ABSTRACT

One problem facing players of competitive games is negative, or toxic, behavior. League of Legends, the largest eSport game, uses a crowdsourcing platform called the Tribunal to judge whether a reported toxic player should be punished or not. The Tribunal is a two stage system requiring reports from those players that directly observe toxic behavior, and human experts that review aggregated reports. While this system has successfully dealt with the vague nature of toxic behavior by majority rules based on many votes, it naturally requires tremendous cost, time, and human efforts. In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions. Our result shows good performance in detecting overwhelmingly majority cases and predicting crowdsourced decisions on them. We demonstrate good portability of our classifier across regions. Finally, we estimate the practical implications of our approach, potential cost savings and victim protection.

References

J. Barnett, M. Coulson, and N. Foreman. Examining player anger in World of Warcraft. In Online worlds: Convergence of the real and the virtual, pages 147--160. 2010.Google Scholar
F. S. Bellezza, A. G. Greenwald, and M. R. Banaji. Words high and low in pleasantness as rated by male and female college students. Behavior Research Methods, Instruments, & Computers, 18(3):299--303, 1986.Google Scholar
J. Blackburn, R. Simha, N. Kourtellis, X. Zuo, M. Ripeanu, J. Skvoretz, and A. Iamnitchi. Branded with a scarlet "C": Cheaters in a gaming social network. In WWW '12, pages 81--90, 2012. Google ScholarDigital Library
M. M. Bradley and P. J. Lang. Affective norms for English words (ANEW): Instruction manual and affective ratings. Technical report, Technical Report C-1, The Center for Research in Psychophysiology, University of Florida, 1999.Google Scholar
A. Brew, D. Greene, and P. Cunningham. Using crowdsourcing and active learning to track sentiment in online media. In ECAI, pages 145--150, 2010. Google ScholarDigital Library
V. H.-H. Chen, H. B.-L. Duh, and C. W. Ng. Players who play to make others cry: The influence of anonymity and immersion. In ACE '09, 2009. Google ScholarDigital Library
T. Chesney, I. Coyne, B. Logan, and N. Madden. Griefing in virtual worlds: Causes, casualties and coping strategies. Information Systems Journal, 19(6):525--548, 2009.Google ScholarCross Ref
M. Davies. Gamers don't want any more grief, 2011. http://tinyurl.com/stfunub11.Google Scholar
P. S. Dodds and C. M. Danforth. Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. Journal of Happiness Studies, 11(4):441--456, 2010.Google ScholarCross Ref
C. Y. Foo and E. M. I. Koivisto. Defining grief play in MMORPGs: Player and developer perceptions. In ACE '04, 2004. Google ScholarDigital Library
M. R. Frank, L. Mitchell, P. S. Dodds, and C. M. Danforth. Happiness and the patterns of life: A study of geolocated tweets. Scientific Reports, 3(2625), 2013.Google Scholar
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. CrowdDB: Answering queries with crowdsourcing. In SIGMOD '11, pages 61--72, 2011. Google ScholarDigital Library
S. A. Golder and M. W. Macy. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science, 333(6051):1878--1881, 2011.Google ScholarCross Ref
M. F. Goodchild and J. A. Glennon. Crowdsourcing geographic information for disaster response: a research frontier. International Journal of Digital Earth, 3(3):231--241, 2010.Google ScholarCross Ref
J. Heer and M. Bostock. Crowdsourcing graphical perception: Using Mechanical Turk to assess visualization design. In CHI '10, pages 203--212, 2010. Google ScholarDigital Library
S. S. Ho and D. M. McLeod. Social-psychological influences on opinion expression in face-to-face and computer-mediated communication. Communication Research, 35(2):190--207, 2008.Google ScholarCross Ref
J. Howe. The rise of crowdsourcing. Wired magazine, 14(6):1--4, 2006.Google Scholar
P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on Amazon Mechanical Turk. In SIGKDD '10 Workshop on Human Computation, pages 64--67, 2010. Google ScholarDigital Library
E. Kamar, S. Hacker, and E. Horvitz. Combining human and machine intelligence in large-scale crowdsourcing. In AAMAS '12, pages 467--474, 2012. Google ScholarDigital Library
A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with Mechanical Turk. In CHI '08, pages 453--456, 2008. Google ScholarDigital Library
A. Kumar and M. Lease. Modeling annotator accuracies for supervised learning. In WSDM '11 Workshop on Crowdsourcing for Search and Data Mining, pages 19--22, 2011.Google Scholar
H. Kwak and S. Han. "So many bad guys, so little time": Understanding toxic behavior and reaction in team competition games. Submitted.Google Scholar
H. Lin and C.-T. Sun. The "white-eyed" player culture: Grief play and construction of deviance in MMORPGs. In DiGRA '05, 2005.Google Scholar
J.-K. Lou, K. Park, M. Cha, J. Park, C.-L. Lei, and K.-T. Chen. Gender swapping and user behaviors in online social games. In WWW '13, pages 827--836, 2013. Google ScholarDigital Library
K. Y. A. McKenna and J. A. Bargh. Plan 9 from cyberspace: The implications of the Internet for personality and social psychology. Personality and Social Psychology Review, 4(1):57--75, 2000.Google ScholarCross Ref
L. Mitchell, M. R. Frank, K. D. Harris, P. S. Dodds, and C. M. Danforth. The geography of happiness: Connecting Twitter sentiment and expression, demographics, and objective characteristics of place. PloS one, 8(5):e64417, 2013.Google ScholarCross Ref
J. Mulligan, B. Patrovsky, and R. Koster. Developing online games: An insider's guide. Pearson Education, 2003. Google ScholarDigital Library
S. Nowak and S. Rüger. How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In Proceedings of the international conference on Multimedia information retrieval, MIR '10, pages 557--566, 2010. Google ScholarDigital Library
D. Olweus. Bullying at school: Long term outcomes for the victims and an effective school-based intervention program. Aggressive Behavior: Current Perspectives, pages 97--130, 1996.Google Scholar
G. Paolacci, J. Chandler, and P. Ipeirotis. Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5):411--419, 2010.Google ScholarCross Ref
A. J. Quinn and B. B. Bederson. Human-machine hybrid computation. In CHI '11 Workshop On Crowdsourcing And Human Computation, 2011.Google Scholar
A. J. Quinn, B. B. Bederson, T. Yeh, and J. Lin. CrowdFlow: Integrating machine learning with Mechanical Turk for speed-cost-quality flexibility. Better Performance Over Iterations, 2010.Google Scholar
V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? Improving data quality and data mining using multiple, noisy labelers. In KDD '08, pages 614--622, 2008. Google ScholarDigital Library
P. K. Smith, J. Mahdavi, M. Carvalho, S. Fisher, S. Russell, and N. Tippett. Cyberbullying: Its nature and impact in secondary school pupils. Journal of Child Psychology and Psychiatry, 49(4):376--385, 2008.Google ScholarCross Ref
R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast--but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing, pages 254--263. Association for Computational Linguistics, 2008. Google ScholarDigital Library
J. Suler. The online disinhibition effect. Cyberpsychology & behavior, 7(3):321--326, 2004.Google ScholarCross Ref
W. Tang and M. Lease. Semi-supervised consensus labeling for crowdsourcing. In SIGIR '11 Workshop on Crowdsourcing for Information Retrieval, 2011.Google Scholar
A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe. Predicting elections with Twitter: What 140 characters reveal about political sentiment. In ICWSM '10, pages 178--185, 2010.Google Scholar
R. Van Houten. Punishment: From the animal laboratory to the applied setting. The Effects of Punishment on Human Behavior, pages 13--44, 1983.Google ScholarCross Ref
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information processing & management, 36(5):697--716, 2000. Google ScholarDigital Library
D. E. Warner and M. Ratier. Social context in massively-multiplayer online games (MMOGs): Ethical questions in shared space. International Review of Information Ethics, 4(7), 2005.Google Scholar
B. Weiner. A cognitive (attribution)-emotion-action model of motivated behavior: An analysis of judgments of help-giving. Journal of Personality and Social Psychology, 39(2):186, 1980.Google ScholarCross Ref
O. Zaidan and C. Callison-Burch. Crowdsourcing translation: Professional quality from non-professionals. In ACL '11, pages 1220--1229, 2011. Google ScholarDigital Library

Index Terms

STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games
1. Applied computing
  1. Law, social and behavioral sciences
    1. Psychology
    2. Sociology
2. Social and professional topics
  1. Computing / technology policy
    1. Computer crime

Recommendations

Understanding toxic behavior in online games
WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web

With the remarkable advances from isolated console games to massively multi-player online role-playing games, the online gaming world provides yet another place where people interact with each other. Online games have attracted attention from ...
Read More
Toxic Behaviors in Team-Based Competitive Gaming: The Case of League of Legends
CHI PLAY '20: Proceedings of the Annual Symposium on Computer-Human Interaction in Play

Toxic behaviors in online gaming such as flaming and harassment have been gaining attention from the research community, yet little consensus has formed about what constitutes toxic behavior. Game developers usually maintain a classification system of ...
Read More
Studying toxic behavior influence and player chat in an online video game
WI '17: Proceedings of the International Conference on Web Intelligence

Many online collaborative games, e-sports in particular, heavily rely on teamwork. However, players can act in an antisocial way during the match, creating dissent into the match. This kind of behavior is referred to as toxic. We aim to discover the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486
General Chair:
Chin-Wan Chung
Korea Advanced Institute of Science and Technology, Korea
,
Program Chairs:
Andrei Broder
Google Inc., USA
,
Kyuseok Shim
Seoul National University, Korea
,
Torsten Suel
New York University, USA
Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 April 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
league of legends
machine learning
online video games
toxic behavior
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '14 Paper Acceptance Rate84of645submissions,13%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 88
  Total Citations
  View Citations
- 2,399
  Total Downloads
- Downloads (Last 12 months)221
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games

WWW '14: Proceedings of the 23rd international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding toxic behavior in online games

Toxic Behaviors in Team-Based Competitive Gaming: The Case of League of Legends

Studying toxic behavior influence and player chat in an online video game