research-article

Wikipedia vandalism detection

Author:
Santiago M. Mola-Velasco

NLE Lab. - ELiRF, DSIC, Universidad Politécnica de Valencia, Valencia, Spain

NLE Lab. - ELiRF, DSIC, Universidad Politécnica de Valencia, Valencia, Spain
View Profile

WWW '11: Proceedings of the 20th international conference companion on World wide webMarch 2011Pages 391–396https://doi.org/10.1145/1963192.1963349

Published:28 March 2011Publication History

WWW '11: Proceedings of the 20th international conference companion on World wide web

Pages 391–396

ABSTRACT

Wikipedia is an online encyclopedia that anyone can access and edit. It has become one of the most important sources of knowledge online and many third party projects rely on it for a wide-range of purposes. The open model of Wikipedia allows pranksters, lobbyists and spammers to attack the integrity of the encyclopedia and this endangers it as a public resource. This is known in the community as vandalism.

A plethora of methods have been developed within the Wikipedia and the scientific community to tackle this problem. We have participated in this effort and developed one of the leading approaches. Our research aims to create a fully-working antivandalism system and get it working in the real world.

References

B. Adler and L. de Alfaro. A Content-Driven Reputation System for the Wikipedia. In WWW 2007: Proceedings of the 16th International World Wide Web Conference. ACM Press, 2007. Google ScholarDigital Library
B. Adler, L. de Alfaro, and I. Pye. Detecting Wikipedia Vandalism using WikiTrust. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22-23 September, Padua, Italy, Sept. 2010.Google Scholar
B. T. Adler, L. de Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features. In A. Gelbukh, editor, CICLing 2011, volume 6609 of LNCS, Tokyo, Japan, February 2011. Springer. Google ScholarDigital Library
L. Breiman. Random Forests. Machine Learning, 45(1):5--32, 2001. Google ScholarDigital Library
J. Carter. ClueBot and Vandalism on Wikipedia. 2010. http://www.acm.uiuc.edu/ carter11/ClueBot.pdf.Google Scholar
D. Chichkov. Submission to the 1st International Competition on Wikipedia Vandalism Detection. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22-23 September, Padua, Italy, USA, Sept. 2010.Google Scholar
S.-C. Chin, W. N. Street, P. Srinivasan, and D. Eichmann. Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models. In WICOW '10: Proceedings of the Fourth Workshop on Information Credibility on the Web, Apr 2010. Google ScholarDigital Library
G. Druck, G. Miklau, and A. McCallum. Learning to Predict the Quality of Contributions to Wikipedia. In WikiAI'08: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pages 7--12. AAAI Press, 2008.Google Scholar
J. Giles. Internet encyclopaedias go head to head. Nature, 438:900--901, Dec. 2005.Google ScholarCross Ref
D. Guthrie, B. Allison, W. Liu, L. Guthrie, and Y. Wilks. A Closer Look at Skip-gram Modelling. In Proceedings of the Fifth international Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy, 2006.Google Scholar
K. Y. Itakura and C. L. Clarke. Using Dynamic Markov Compression to Detect Vandalism in the Wikipedia. In SIGIR'09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 822--823. ACM Press, 2009. Google ScholarDigital Library
S. Kullback and R. A. Leibler. On Information and Sufficiency. Annals of Mathematical Statistics, 22(1):79--86, 1951.Google ScholarCross Ref
D. McGuinness, H. Zeng, P. da Silva, L. Ding, D. Narayanan, and M. Bhaowal. Investigation into Trust for Collaborative Information Repositories: A Wikipedia Case Study. In Proceedings of the Workshop on Models of Trust for the Web, 2006.Google Scholar
S. M. Mola-Velasco. Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22--23 September, Padua, Italy, Sept. 2010.Google Scholar
M. Potthast. Crowdsourcing a Wikipedia Vandalism Corpus. In Proc. of the 33rd Intl. ACM SIGIR Conf. (SIGIR 2010). ACM Press, Jul 2010. Google ScholarDigital Library
M. Potthast, B. Stein, and R. Gerling. Automatic Vandalism Detection in Wikipedia. In ECIR'08: Proceedings of the 30th European Conference on IR Research, volume 4956 of LNCS, pages 663--668. Springer-Verlag, 2008. Google ScholarDigital Library
M. Potthast, B. Stein, and T. Holfeld. Overview of the 1st International Competition on Wikipedia Vandalism Detection. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22--23 September, Padua, Italy, Sept. 2010.Google Scholar
R. Priedhorsky, J. Chen, S. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, Destroying, and Restoring Value in Wikipedia. In Group'07: Proceedings of the International Conference on Supporting Group Work, 2007. Google ScholarDigital Library
E. J. Rodríguez Posada. AVBOT: detección y corrección de vandalismos en Wikipedia. NovATIca, (203):51--53, 2010.Google Scholar
K. Smets, B. Goethals, and B. Verdonk. Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach. In WikiAI'08: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pages 43--48. AAAI Press, 2008.Google Scholar
F. B. Viégas, M. Wattenberg, and K. Dave. Studying cooperation and conflict between authors with History Flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 575--582. ACM Press, 2004. Google ScholarDigital Library
A. G. West, S. Kannan, and I. Lee. Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In EUROSEC'10: Proceedings of the Third European Workshop on System Security, pages 22--28, 2010. Google ScholarDigital Library
Wikipedia contributors. Wikipedia: Vandalism - Wikipedia, The Free Encyclopedia, 2010. {accessed 23-Oct-2010}.Google Scholar
H. Zeng, M. Alhoussaini, L. Ding, R. Fikes, and D. McGuinness. Computing Trust from Revision History. In Intl. Conf. on Privacy, Security and Trust, 2006. Google ScholarDigital Library

Index Terms

Wikipedia vandalism detection
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Crowdsourcing a wikipedia vandalism corpus
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32452 edits on 28468 Wikipedia articles, among which 2391 vandalism edits have been identified. 753 human annotators ...
Read More
Vandalism Detection in Wikidata
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity. Wikidata can be edited by ...
Read More
Wikipedia vandalism detection: combining natural language, metadata, and reputation features
CICLing'11: Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II

Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content.

In this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '11: Proceedings of the 20th international conference companion on World wide web
March 2011
552 pages
ISBN:9781450306379
DOI:10.1145/1963192
General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Wikipedia vandalism detection
machine learning
natural language processing
reputation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 370
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Wikipedia vandalism detection

WWW '11: Proceedings of the 20th international conference companion on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Crowdsourcing a wikipedia vandalism corpus

Vandalism Detection in Wikidata

Wikipedia vandalism detection: combining natural language, metadata, and reputation features