research-article

Measuring article quality in wikipedia: models and evaluation

Authors:
Meiqun Hu

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Ee-Peng Lim

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Aixin Sun

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Hady Wirawan Lauw

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

,
Ba-Quy Vuong

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore
View Profile

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementNovember 2007Pages 243–252https://doi.org/10.1145/1321440.1321476

Published:06 November 2007Publication History

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 243–252

ABSTRACT

Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our B<scp>asic</scp> model is designed based on the mutual dependency between article quality and their author authority. The P<scp>eer</scp>R<scp>eview</scp> model introduces the review behavior into measuring article quality. Finally, our P<scp>rob</scp>R<scp>eview</scp> models extend P<scp>eer</scp>R<scp>eview</scp> with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.

References

S. F. Adafre and M. de Rijke. Discovering missing links in Wikipedia. In Proc. of LinkKDD'05, pages 90--97, 2005. Google ScholarDigital Library
B. T. Adler and L. de Alfaro. A content-driven reputation system for the Wikipedia. In Proc. of WWW'07, pages 261--270, 2007. Google ScholarDigital Library
E. Agichtein, E. Brill, and S. Dumais. Improving Web search ranking by incoporating user behavior information. In Proc. of SIGIR'06, pages 19--26, 2006. Google ScholarDigital Library
R. B. Almeida, B. Mozafari, and J. Cho. On the evolution of Wikipedia. In Proc. of ICWSM'07, March 2007.Google Scholar
D. Anthony, S. Smith, and T. Williamson. Explaining quality in Internet collective goods: Zealots and good samaritans in the case of Wikipedia, 2005. Retireved online: http://web.mit.edu/iandeseminar/Papers/Fall2005/anthony.pdf.Google Scholar
T. Cross. Puppy smoothies: Improving the reliability of open, collaborative wikis, 2006. Retrieved online: http://www.firstmonday.org/issues/issue11_9/cross/index.html.Google Scholar
C. Dwork, R. Kumar, and M. Naor. Rank aggregation methods for the Web. In Proc. of WWW'01, pages 613--622, 2001. Google ScholarDigital Library
J. Giles. Internet encyclopaedias go head to head, 2005. Published online: 14 December 2005 http://www.nature.com/news/2005/051212/full/438900a.html.Google Scholar
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Proc. of SIGIR'99, pages 121--128, 1999. Google ScholarDigital Library
G. H. Golub and C. F. V. Loan. Matrix Computations. Johns Hopkins University Press, 3rd edition, 1996.Google Scholar
Z. Gyöngyi, P. Berkhin, H. Garcia-Molina, and J. Pedersen. Link spam detection based on mass estimation. In Proc. of VLDB'06, pages 439--450, 2006. Google ScholarDigital Library
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating Web spam with TrustRank. In Proc. of VLDB'04, pages 576--587, 2004. Google ScholarDigital Library
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR'00, pages 41--48, 2000. Google ScholarDigital Library
G. Jeh and J. Widom. Scaling personalized Web search. In Proc. of WWW'03, pages 271--279, May 2003. Google ScholarDigital Library
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
A. Lih. Wikipedia as participatory journalism: Reliable sources? metrics for evaluating collaborative media as a news resource. In Proc. of the 5th International Symposium on Online Journalism, April 2004.Google Scholar
E.-P. Lim, B.-Q. Vuong, H. W. Lauw, and A. Sun. Measuring qualities of articles contributed by online communities. In Proc. of WI'06, pages 81--87, December 2006. Google ScholarDigital Library
Max Völkel and Markus Krötzsch and Denny Vrandečić and Heiko Haller and Rudi Studer. Semantic Wikipedia. In Proc. of WWW'06, pages 585--594, 2006. Google ScholarDigital Library
G. A. Miller. The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Pychological Review, 63:81--97, 1956.Google ScholarCross Ref
B. B. C. News. Wikipedia survives research test, 2005. Published online: 15 December 2005 http://news.bbc.co.uk/2/hi/technology/4530930.stm.Google Scholar
A. Orlowski. Wikipedia founder admits to serious quality problems, 2005. Published online: 18 October 2005 http://www.theregister.co.uk/2005/10/18/wikipedia_quality_problem.Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, November 1999.Google Scholar
P. Schönhofen. Identifying document topics using the Wikipedia category network. In Proc. of WI'06, pages 456--462, 2006. Google ScholarDigital Library
M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedness using Wikipedia. In Proc. of AAAI'06, pages 1419--1424, 2006. Google ScholarDigital Library
P. Tsaparas. Using non-linear dynamical systems for Web searching and ranking. In Proc. of PODS'04, pages 59--70, 2004. Google ScholarDigital Library
J. Voss. Measuring Wikipedia. In Proc. of the 10th International Conference of the International Society for Scientometrics and Informatics, pages 221--231, July 2005.Google Scholar
J. Wales. Wikipedia sociographics, 2004. Retrieved online: www.ccc.de/congress/2004/fahrplan/files/372-wikipedia-sociographics-slides.pdf.Google Scholar
Wikipedia. Replies to common objections, 2007. http://en.wikipedia.org/wiki/Replies_to_common_objections Accessed on April 2007.Google Scholar
H. Zeng, M. A. Alhossaini, L. Ding, R. Fikes, and D. L. McGuinness. Computing trust from revision history. In Proc. of International Conference on Privacy, Security and Trust, October-November 2006. Google ScholarDigital Library

Index Terms

Measuring article quality in wikipedia: models and evaluation
1. Information systems
  1. Information systems applications
  2. World Wide Web
    1. Web applications
    2. Web services

Recommendations

Who does what: Collaboration patterns in the wikipedia and their impact on article quality

The quality of Wikipedia articles is debatable. On the one hand, existing research indicates that not only are people willing to contribute articles but the quality of these articles is close to that found in conventional encyclopedias. On the other ...
Read More
Statistical measure of quality in Wikipedia
SOMA '10: Proceedings of the First Workshop on Social Media Analytics

Wikipedia is commonly viewed as the main online encyclopedia. Its content quality, however, has often been questioned due to the open nature of its editing model. A high--quality contribution by an expert may be followed by a low-quality contribution ...
Read More
Assessing quality score of Wikipedia article using mutual evaluation of editors and texts
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

In this paper, we propose a method for assessing quality scores of Wikipedia articles by mutually evaluating editors and texts. Survival ratio based approach is a major approach to assessing article quality. In this approach, when a text survives beyond ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
article quality
authority
collaborative authoring
peer review
wikipedia
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 171
  Total Citations
  View Citations
- 3,100
  Total Downloads
- Downloads (Last 12 months)162
- Downloads (Last 6 weeks)40
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Measuring article quality in wikipedia: models and evaluation

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Who does what: Collaboration patterns in the wikipedia and their impact on article quality

Statistical measure of quality in Wikipedia

Assessing quality score of Wikipedia article using mutual evaluation of editors and texts