skip to main content
10.1145/1963192.1963349acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Wikipedia vandalism detection

Published:28 March 2011Publication History

ABSTRACT

Wikipedia is an online encyclopedia that anyone can access and edit. It has become one of the most important sources of knowledge online and many third party projects rely on it for a wide-range of purposes. The open model of Wikipedia allows pranksters, lobbyists and spammers to attack the integrity of the encyclopedia and this endangers it as a public resource. This is known in the community as vandalism.

A plethora of methods have been developed within the Wikipedia and the scientific community to tackle this problem. We have participated in this effort and developed one of the leading approaches. Our research aims to create a fully-working antivandalism system and get it working in the real world.

References

  1. B. Adler and L. de Alfaro. A Content-Driven Reputation System for the Wikipedia. In WWW 2007: Proceedings of the 16th International World Wide Web Conference. ACM Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. Adler, L. de Alfaro, and I. Pye. Detecting Wikipedia Vandalism using WikiTrust. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22-23 September, Padua, Italy, Sept. 2010.Google ScholarGoogle Scholar
  3. B. T. Adler, L. de Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features. In A. Gelbukh, editor, CICLing 2011, volume 6609 of LNCS, Tokyo, Japan, February 2011. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Breiman. Random Forests. Machine Learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Carter. ClueBot and Vandalism on Wikipedia. 2010. http://www.acm.uiuc.edu/ carter11/ClueBot.pdf.Google ScholarGoogle Scholar
  6. D. Chichkov. Submission to the 1st International Competition on Wikipedia Vandalism Detection. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22-23 September, Padua, Italy, USA, Sept. 2010.Google ScholarGoogle Scholar
  7. S.-C. Chin, W. N. Street, P. Srinivasan, and D. Eichmann. Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models. In WICOW '10: Proceedings of the Fourth Workshop on Information Credibility on the Web, Apr 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Druck, G. Miklau, and A. McCallum. Learning to Predict the Quality of Contributions to Wikipedia. In WikiAI'08: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pages 7--12. AAAI Press, 2008.Google ScholarGoogle Scholar
  9. J. Giles. Internet encyclopaedias go head to head. Nature, 438:900--901, Dec. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  10. D. Guthrie, B. Allison, W. Liu, L. Guthrie, and Y. Wilks. A Closer Look at Skip-gram Modelling. In Proceedings of the Fifth international Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy, 2006.Google ScholarGoogle Scholar
  11. K. Y. Itakura and C. L. Clarke. Using Dynamic Markov Compression to Detect Vandalism in the Wikipedia. In SIGIR'09: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 822--823. ACM Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Kullback and R. A. Leibler. On Information and Sufficiency. Annals of Mathematical Statistics, 22(1):79--86, 1951.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. McGuinness, H. Zeng, P. da Silva, L. Ding, D. Narayanan, and M. Bhaowal. Investigation into Trust for Collaborative Information Repositories: A Wikipedia Case Study. In Proceedings of the Workshop on Models of Trust for the Web, 2006.Google ScholarGoogle Scholar
  14. S. M. Mola-Velasco. Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22--23 September, Padua, Italy, Sept. 2010.Google ScholarGoogle Scholar
  15. M. Potthast. Crowdsourcing a Wikipedia Vandalism Corpus. In Proc. of the 33rd Intl. ACM SIGIR Conf. (SIGIR 2010). ACM Press, Jul 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Potthast, B. Stein, and R. Gerling. Automatic Vandalism Detection in Wikipedia. In ECIR'08: Proceedings of the 30th European Conference on IR Research, volume 4956 of LNCS, pages 663--668. Springer-Verlag, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Potthast, B. Stein, and T. Holfeld. Overview of the 1st International Competition on Wikipedia Vandalism Detection. In M. Braschler and D. Harman, editors, Notebook Papers of CLEF 2010 LABs and Workshops, 22--23 September, Padua, Italy, Sept. 2010.Google ScholarGoogle Scholar
  18. R. Priedhorsky, J. Chen, S. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, Destroying, and Restoring Value in Wikipedia. In Group'07: Proceedings of the International Conference on Supporting Group Work, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. J. Rodríguez Posada. AVBOT: detección y corrección de vandalismos en Wikipedia. NovATIca, (203):51--53, 2010.Google ScholarGoogle Scholar
  20. K. Smets, B. Goethals, and B. Verdonk. Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach. In WikiAI'08: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pages 43--48. AAAI Press, 2008.Google ScholarGoogle Scholar
  21. F. B. Viégas, M. Wattenberg, and K. Dave. Studying cooperation and conflict between authors with History Flow visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 575--582. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. G. West, S. Kannan, and I. Lee. Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In EUROSEC'10: Proceedings of the Third European Workshop on System Security, pages 22--28, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wikipedia contributors. Wikipedia: Vandalism - Wikipedia, The Free Encyclopedia, 2010. {accessed 23-Oct-2010}.Google ScholarGoogle Scholar
  24. H. Zeng, M. Alhoussaini, L. Ding, R. Fikes, and D. McGuinness. Computing Trust from Revision History. In Intl. Conf. on Privacy, Security and Trust, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Wikipedia vandalism detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '11: Proceedings of the 20th international conference companion on World wide web
        March 2011
        552 pages
        ISBN:9781450306379
        DOI:10.1145/1963192

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 March 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader