ABSTRACT
Wikipedia is an online encyclopedia that anyone can access and edit. It has become one of the most important sources of knowledge online and many third party projects rely on it for a wide-range of purposes. The open model of Wikipedia allows pranksters, lobbyists and spammers to attack the integrity of the encyclopedia and this endangers it as a public resource. This is known in the community as vandalism.
A plethora of methods have been developed within the Wikipedia and the scientific community to tackle this problem. We have participated in this effort and developed one of the leading approaches. Our research aims to create a fully-working antivandalism system and get it working in the real world.
- B. Adler and L. de Alfaro. A Content-Driven Reputation System for the Wikipedia. In WWW 2007: Proceedings of the 16th International World Wide Web Conference. ACM Press, 2007. Google ScholarDigital Library
- B. Adler, L. de Alfaro, and I. Pye. Detecting Wikipedia Vandalism using WikiTrust. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22-23 September, Padua, Italy, Sept. 2010.Google Scholar
- B. T. Adler, L. de Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features. In A. Gelbukh, editor, CICLing 2011, volume 6609 of LNCS, Tokyo, Japan, February 2011. Springer. Google ScholarDigital Library
- L. Breiman. Random Forests. Machine Learning, 45(1):5--32, 2001. Google ScholarDigital Library
- J. Carter. ClueBot and Vandalism on Wikipedia. 2010. http://www.acm.uiuc.edu/ carter11/ClueBot.pdf.Google Scholar
- D. Chichkov. Submission to the 1st International Competition on Wikipedia Vandalism Detection. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22-23 September, Padua, Italy, USA, Sept. 2010.Google Scholar
- S.-C. Chin, W. N. Street, P. Srinivasan, and D. Eichmann. Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models. In WICOW '10: Proceedings of the Fourth Workshop on Information Credibility on the Web, Apr 2010. Google ScholarDigital Library
- G. Druck, G. Miklau, and A. McCallum. Learning to Predict the Quality of Contributions to Wikipedia. In WikiAI'08: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pages 7--12. AAAI Press, 2008.Google Scholar
- J. Giles. Internet encyclopaedias go head to head. Nature, 438:900--901, Dec. 2005.Google ScholarCross Ref
- D. Guthrie, B. Allison, W. Liu, L. Guthrie, and Y. Wilks. A Closer Look at Skip-gram Modelling. In Proceedings of the Fifth international Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy, 2006.Google Scholar
- K. Y. Itakura and C. L. Clarke. Using Dynamic Markov Compression to Detect Vandalism in the Wikipedia. In SIGIR'09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 822--823. ACM Press, 2009. Google ScholarDigital Library
- S. Kullback and R. A. Leibler. On Information and Sufficiency. Annals of Mathematical Statistics, 22(1):79--86, 1951.Google ScholarCross Ref
- D. McGuinness, H. Zeng, P. da Silva, L. Ding, D. Narayanan, and M. Bhaowal. Investigation into Trust for Collaborative Information Repositories: A Wikipedia Case Study. In Proceedings of the Workshop on Models of Trust for the Web, 2006.Google Scholar
- S. M. Mola-Velasco. Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22--23 September, Padua, Italy, Sept. 2010.Google Scholar
- M. Potthast. Crowdsourcing a Wikipedia Vandalism Corpus. In Proc. of the 33rd Intl. ACM SIGIR Conf. (SIGIR 2010). ACM Press, Jul 2010. Google ScholarDigital Library
- M. Potthast, B. Stein, and R. Gerling. Automatic Vandalism Detection in Wikipedia. In ECIR'08: Proceedings of the 30th European Conference on IR Research, volume 4956 of LNCS, pages 663--668. Springer-Verlag, 2008. Google ScholarDigital Library
- M. Potthast, B. Stein, and T. Holfeld. Overview of the 1st International Competition on Wikipedia Vandalism Detection. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22--23 September, Padua, Italy, Sept. 2010.Google Scholar
- R. Priedhorsky, J. Chen, S. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, Destroying, and Restoring Value in Wikipedia. In Group'07: Proceedings of the International Conference on Supporting Group Work, 2007. Google ScholarDigital Library
- E. J. Rodríguez Posada. AVBOT: detección y corrección de vandalismos en Wikipedia. NovATIca, (203):51--53, 2010.Google Scholar
- K. Smets, B. Goethals, and B. Verdonk. Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach. In WikiAI'08: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pages 43--48. AAAI Press, 2008.Google Scholar
- F. B. Viégas, M. Wattenberg, and K. Dave. Studying cooperation and conflict between authors with History Flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 575--582. ACM Press, 2004. Google ScholarDigital Library
- A. G. West, S. Kannan, and I. Lee. Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In EUROSEC'10: Proceedings of the Third European Workshop on System Security, pages 22--28, 2010. Google ScholarDigital Library
- Wikipedia contributors. Wikipedia: Vandalism - Wikipedia, The Free Encyclopedia, 2010. {accessed 23-Oct-2010}.Google Scholar
- H. Zeng, M. Alhoussaini, L. Ding, R. Fikes, and D. McGuinness. Computing Trust from Revision History. In Intl. Conf. on Privacy, Security and Trust, 2006. Google ScholarDigital Library
Index Terms
- Wikipedia vandalism detection
Recommendations
Crowdsourcing a wikipedia vandalism corpus
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrievalWe report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon's Mechanical Turk. The corpus compiles 32452 edits on 28468 Wikipedia articles, among which 2391 vandalism edits have been identified. 753 human annotators ...
Vandalism Detection in Wikidata
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementWikidata is the new, large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity. Wikidata can be edited by ...
Wikipedia vandalism detection: combining natural language, metadata, and reputation features
CICLing'11: Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part IIWikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content.
In this ...
Comments