skip to main content
10.1145/1321440.1321476acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Measuring article quality in wikipedia: models and evaluation

Authors Info & Claims
Published:06 November 2007Publication History

ABSTRACT

Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our B<scp>asic</scp> model is designed based on the mutual dependency between article quality and their author authority. The P<scp>eer</scp>R<scp>eview</scp> model introduces the review behavior into measuring article quality. Finally, our P<scp>rob</scp>R<scp>eview</scp> models extend P<scp>eer</scp>R<scp>eview</scp> with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.

References

  1. S. F. Adafre and M. de Rijke. Discovering missing links in Wikipedia. In Proc. of LinkKDD'05, pages 90--97, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. T. Adler and L. de Alfaro. A content-driven reputation system for the Wikipedia. In Proc. of WWW'07, pages 261--270, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Agichtein, E. Brill, and S. Dumais. Improving Web search ranking by incoporating user behavior information. In Proc. of SIGIR'06, pages 19--26, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. B. Almeida, B. Mozafari, and J. Cho. On the evolution of Wikipedia. In Proc. of ICWSM'07, March 2007.Google ScholarGoogle Scholar
  5. D. Anthony, S. Smith, and T. Williamson. Explaining quality in Internet collective goods: Zealots and good samaritans in the case of Wikipedia, 2005. Retireved online: http://web.mit.edu/iandeseminar/Papers/Fall2005/anthony.pdf.Google ScholarGoogle Scholar
  6. T. Cross. Puppy smoothies: Improving the reliability of open, collaborative wikis, 2006. Retrieved online: http://www.firstmonday.org/issues/issue11_9/cross/index.html.Google ScholarGoogle Scholar
  7. C. Dwork, R. Kumar, and M. Naor. Rank aggregation methods for the Web. In Proc. of WWW'01, pages 613--622, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Giles. Internet encyclopaedias go head to head, 2005. Published online: 14 December 2005 http://www.nature.com/news/2005/051212/full/438900a.html.Google ScholarGoogle Scholar
  9. J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Proc. of SIGIR'99, pages 121--128, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. H. Golub and C. F. V. Loan. Matrix Computations. Johns Hopkins University Press, 3rd edition, 1996.Google ScholarGoogle Scholar
  11. Z. Gyöngyi, P. Berkhin, H. Garcia-Molina, and J. Pedersen. Link spam detection based on mass estimation. In Proc. of VLDB'06, pages 439--450, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. In Proc. of VLDB'04, pages 576--587, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR'00, pages 41--48, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Jeh and J. Widom. Scaling personalized Web search. In Proc. of WWW'03, pages 271--279, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Lih. Wikipedia as participatory journalism: Reliable sources? metrics for evaluating collaborative media as a news resource. In Proc. of the 5th International Symposium on Online Journalism, April 2004.Google ScholarGoogle Scholar
  17. E.-P. Lim, B.-Q. Vuong, H. W. Lauw, and A. Sun. Measuring qualities of articles contributed by online communities. In Proc. of WI'06, pages 81--87, December 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Max Völkel and Markus Krötzsch and Denny Vrandečić and Heiko Haller and Rudi Studer. Semantic Wikipedia. In Proc. of WWW'06, pages 585--594, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Pychological Review, 63:81--97, 1956.Google ScholarGoogle ScholarCross RefCross Ref
  20. B. B. C. News. Wikipedia survives research test, 2005. Published online: 15 December 2005 http://news.bbc.co.uk/2/hi/technology/4530930.stm.Google ScholarGoogle Scholar
  21. A. Orlowski. Wikipedia founder admits to serious quality problems, 2005. Published online: 18 October 2005 http://www.theregister.co.uk/2005/10/18/wikipedia_quality_problem.Google ScholarGoogle Scholar
  22. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, November 1999.Google ScholarGoogle Scholar
  23. P. Schönhofen. Identifying document topics using the Wikipedia category network. In Proc. of WI'06, pages 456--462, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedness using Wikipedia. In Proc. of AAAI'06, pages 1419--1424, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Tsaparas. Using non-linear dynamical systems for Web searching and ranking. In Proc. of PODS'04, pages 59--70, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Voss. Measuring Wikipedia. In Proc. of the 10th International Conference of the International Society for Scientometrics and Informatics, pages 221--231, July 2005.Google ScholarGoogle Scholar
  27. J. Wales. Wikipedia sociographics, 2004. Retrieved online: www.ccc.de/congress/2004/fahrplan/files/372-wikipedia-sociographics-slides.pdf.Google ScholarGoogle Scholar
  28. Wikipedia. Replies to common objections, 2007. http://en.wikipedia.org/wiki/Replies_to_common_objections Accessed on April 2007.Google ScholarGoogle Scholar
  29. H. Zeng, M. A. Alhossaini, L. Ding, R. Fikes, and D. L. McGuinness. Computing trust from revision history. In Proc. of International Conference on Privacy, Security and Trust, October-November 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Measuring article quality in wikipedia: models and evaluation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader