Skip to main content

2019 | OriginalPaper | Buchkapitel

Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia

verfasst von : Włodzimierz Lewoniewski

Erschienen in: Business Information Systems Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the most popular collaborative knowledge bases on the Internet is Wikipedia. Articles of this free encyclopaedia are created and edited by users from different countries in about 300 languages. Depending on topic and language version, quality of information there may vary. This study presents and classifies measures that can be extracted from Wikipedia articles for the purpose of automatic quality assessment in different languages. Based on a state of the art analysis and own experiments, specific measures for various aspects of quality have been defined. Additional, in this work they were also defined measures for quality assessment of data contained in the structural parts of Wikipedia articles - infoboxes. This study describes also an extraction methods for various sources of measures, that can be used in quality assessment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
4.
Zurück zum Zitat Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of Wikipedia. Ph.D. Bauhaus-Universitaet Weimar Germany (2013) Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of Wikipedia. Ph.D. Bauhaus-Universitaet Weimar Germany (2013)
7.
Zurück zum Zitat Bormuth, J.R.: Readability: a new approach. Read. Res. Q. 1, 79–132 (1966)CrossRef Bormuth, J.R.: Readability: a new approach. Read. Res. Q. 1, 79–132 (1966)CrossRef
8.
Zurück zum Zitat Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRef Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)CrossRef
9.
Zurück zum Zitat De la Calzada, G., Dekhtyar, A.: On measuring the quality of Wikipedia articles. In: Proceedings of the 4th Workshop on Information Credibility, pp. 11–18. ACM (2010) De la Calzada, G., Dekhtyar, A.: On measuring the quality of Wikipedia articles. In: Proceedings of the 4th Workshop on Information Credibility, pp. 11–18. ACM (2010)
10.
Zurück zum Zitat Caylor, J.S., Sticht, T.G.: Development of a simple readability index for job reading material (1973) Caylor, J.S., Sticht, T.G.: Development of a simple readability index for job reading material (1973)
11.
Zurück zum Zitat Chen, H.H.: How to use readability formulas to access and select English reading materials. J. Educ. Media Libr. Sci. 50(2), 229–254 (2012) Chen, H.H.: How to use readability formulas to access and select English reading materials. J. Educ. Media Libr. Sci. 50(2), 229–254 (2012)
12.
Zurück zum Zitat Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)CrossRef Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)CrossRef
13.
Zurück zum Zitat Conti, R., Marzini, E., Spognardi, A., Matteucci, I., Mori, P., Petrocchi, M.: Maturity assessment of Wikipedia medical articles. In: 2014 IEEE 27th International Symposium on Computer-Based Medical Systems (CBMS), pp. 281–286. IEEE (2014) Conti, R., Marzini, E., Spognardi, A., Matteucci, I., Mori, P., Petrocchi, M.: Maturity assessment of Wikipedia medical articles. In: 2014 IEEE 27th International Symposium on Computer-Based Medical Systems (CBMS), pp. 281–286. IEEE (2014)
14.
Zurück zum Zitat Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 18, 37–54 (1948) Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 18, 37–54 (1948)
15.
Zurück zum Zitat Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 295–304 (2009). https://doi.org/10.1145/1555400.1555449 Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 295–304 (2009). https://​doi.​org/​10.​1145/​1555400.​1555449
17.
Zurück zum Zitat Dang, Q.V., Ignat, C.L.: Measuring quality of collaboratively edited documents: the case of Wikipedia. In: 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC), pp. 266–275. IEEE (2016) Dang, Q.V., Ignat, C.L.: Measuring quality of collaboratively edited documents: the case of Wikipedia. In: 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC), pp. 266–275. IEEE (2016)
19.
28.
Zurück zum Zitat Ferschke, O., Gurevych, I., Rittberger, M.: FlawFinder: a modular system for predicting quality flaws in Wikipedia. In: CLEF (Online Working Notes/Labs/Workshop), pp. 1–10 (2012) Ferschke, O., Gurevych, I., Rittberger, M.: FlawFinder: a modular system for predicting quality flaws in Wikipedia. In: CLEF (Online Working Notes/Labs/Workshop), pp. 1–10 (2012)
30.
Zurück zum Zitat Flekova, L., Ferschke, O., Gurevych, I.: What makes a good biography? Multidimensional quality analysis based on Wikipedia article feedback data. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 855–866. ACM (2014) Flekova, L., Ferschke, O., Gurevych, I.: What makes a good biography? Multidimensional quality analysis based on Wikipedia article feedback data. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 855–866. ACM (2014)
31.
Zurück zum Zitat Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)CrossRef Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)CrossRef
32.
Zurück zum Zitat Greenfield, G.R.: Classic readability formulas in an EFL context: are they valid for Japanese speakers? Ph.D. thesis. Temple University (1999) Greenfield, G.R.: Classic readability formulas in an EFL context: are they valid for Japanese speakers? Ph.D. thesis. Temple University (1999)
33.
Zurück zum Zitat Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952) Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)
36.
Zurück zum Zitat Juran, J., Godfrey, A.B.: Quality Handbook, pp. 173–178. McGraw-Hill, New York (1999) Juran, J., Godfrey, A.B.: Quality Handbook, pp. 173–178. McGraw-Hill, New York (1999)
37.
Zurück zum Zitat Kane, G.C.: A multimethod study of information quality in Wiki collaboration. ACM Trans. Manag. Inf. Syst. (TMIS) 2(1), 4 (2011) Kane, G.C.: A multimethod study of information quality in Wiki collaboration. ACM Trans. Manag. Inf. Syst. (TMIS) 2(1), 4 (2011)
38.
Zurück zum Zitat Kincaid, J.P., Fishburne Jr, R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Technical report. Naval Technical Training Command Millington TN Research Branch (1975) Kincaid, J.P., Fishburne Jr, R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Technical report. Naval Technical Training Command Millington TN Research Branch (1975)
40.
Zurück zum Zitat Kontokostas, D., et al.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014) Kontokostas, D., et al.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014)
41.
Zurück zum Zitat Lerner, J., Lomi, A.: Knowledge categorization affects popularity and quality of Wikipedia articles. PloS One 13(1), e0190674 (2018)CrossRef Lerner, J., Lomi, A.: Knowledge categorization affects popularity and quality of Wikipedia articles. PloS One 13(1), e0190674 (2018)CrossRef
44.
Zurück zum Zitat Lewoniewski, W., Härting, R.-C., Wecel, K., Reichstein, C., Abramowicz, W.: Application of SEO metrics to determine the quality of Wikipedia articles and their sources. In: Damaševičius, R., Vasiljevienė, G. (eds.) ICIST 2018. CCIS, vol. 920, pp. 139–152. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99972-2_11CrossRef Lewoniewski, W., Härting, R.-C., Wecel, K., Reichstein, C., Abramowicz, W.: Application of SEO metrics to determine the quality of Wikipedia articles and their sources. In: Damaševičius, R., Vasiljevienė, G. (eds.) ICIST 2018. CCIS, vol. 920, pp. 139–152. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-99972-2_​11CrossRef
45.
50.
Zurück zum Zitat Lih, A.: Wikipedia as participatory journalism: reliable sources? Metrics for evaluating collaborative media as a news resource. In: 5th International Symposium on Online Journalism, p. 31 (2004) Lih, A.: Wikipedia as participatory journalism: reliable sources? Metrics for evaluating collaborative media as a news resource. In: 5th International Symposium on Online Journalism, p. 31 (2004)
51.
Zurück zum Zitat Liu, J., Ram, S.: Using big data and network analysis to understand Wikipedia article quality. Data Knowl. Eng. 115, 80–93 (2018)CrossRef Liu, J., Ram, S.: Using big data and network analysis to understand Wikipedia article quality. Data Knowl. Eng. 115, 80–93 (2018)CrossRef
52.
Zurück zum Zitat Lucassen, T., Schraagen, J.M.: Trust in Wikipedia: how users trust information from an unknown source. In: Proceedings of the 4th Workshop on Information Credibility, pp. 19–26. ACM (2010) Lucassen, T., Schraagen, J.M.: Trust in Wikipedia: how users trust information from an unknown source. In: Proceedings of the 4th Workshop on Information Credibility, pp. 19–26. ACM (2010)
53.
Zurück zum Zitat Mc Laughlin, G.H.: SMOG grading-a new readability formula. J. Read. 12(8), 639–646 (1969) Mc Laughlin, G.H.: SMOG grading-a new readability formula. J. Read. 12(8), 639–646 (1969)
54.
Zurück zum Zitat Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 116–123. ACM (2012) Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 116–123. ACM (2012)
56.
Zurück zum Zitat Moyer, D., Carson, S.L., Dye, T.K., Carson, R.T., Goldbaum, D.: Determining the influence of reddit posts on Wikipedia pageviews. In: Ninth International AAAI Conference on Web and Social Media, pp. 75–82. AAAI Press Oxford, UK (2015) Moyer, D., Carson, S.L., Dye, T.K., Carson, R.T., Goldbaum, D.: Determining the influence of reddit posts on Wikipedia pageviews. In: Ninth International AAAI Conference on Web and Social Media, pp. 75–82. AAAI Press Oxford, UK (2015)
57.
Zurück zum Zitat O’Brien, J.A., Marakas, G.M.: Introduction to Information Systems, vol. 13. McGraw-Hill/Irwin, New York City (2005) O’Brien, J.A., Marakas, G.M.: Introduction to Information Systems, vol. 13. McGraw-Hill/Irwin, New York City (2005)
59.
Zurück zum Zitat Ransbotham, S., Kane, G.: Membership turnover and collaboration success in online communities: explaining rises and falls from grace in Wikipedia. MIS Q. 35(3), 613–627 (2011)CrossRef Ransbotham, S., Kane, G.: Membership turnover and collaboration success in online communities: explaining rises and falls from grace in Wikipedia. MIS Q. 35(3), 613–627 (2011)CrossRef
60.
Zurück zum Zitat Ransbotham, S., Kane, G.C., Lurie, N.H.: Network characteristics and the value of collaborative user-generated content. Mark. Sci. 31(3), 387–405 (2012)CrossRef Ransbotham, S., Kane, G.C., Lurie, N.H.: Network characteristics and the value of collaborative user-generated content. Mark. Sci. 31(3), 387–405 (2012)CrossRef
61.
Zurück zum Zitat di Sciascio, C., Strohmaier, D., Errecalde, M., Veas, E.: WikiLyzer: interactive information quality assessment in Wikipedia. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 377–388. ACM (2017) di Sciascio, C., Strohmaier, D., Errecalde, M., Veas, E.: WikiLyzer: interactive information quality assessment in Wikipedia. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces, pp. 377–388. ACM (2017)
62.
Zurück zum Zitat Senter, R., Smith, E.A.: Automated readability index. Technical report, University of Cincinnati, Ohio (1967) Senter, R., Smith, E.A.: Automated readability index. Technical report, University of Cincinnati, Ohio (1967)
64.
Zurück zum Zitat Shen, A., Qi, J., Baldwin, T.: A hybrid model for quality assessment of Wikipedia articles. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 43–52 (2017) Shen, A., Qi, J., Baldwin, T.: A hybrid model for quality assessment of Wikipedia articles. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 43–52 (2017)
65.
Zurück zum Zitat Soonthornphisaj, N., Paengporn, P.: Thai Wikipedia article quality filtering algorithm. In: Proceedings of the International Multi Conference of Engineers and Computer Scientists, vol. 1 (2017) Soonthornphisaj, N., Paengporn, P.: Thai Wikipedia article quality filtering algorithm. In: Proceedings of the International Multi Conference of Engineers and Computer Scientists, vol. 1 (2017)
67.
Zurück zum Zitat Stvilia, B., Twidale, M.B., Gasser, L., Smith, L.C.: Information quality discussions in Wikipedia. In: Proceedings of the 2005 International Conference on Knowledge Management, pp. 101–113. Citeseer (2005) Stvilia, B., Twidale, M.B., Gasser, L., Smith, L.C.: Information quality discussions in Wikipedia. In: Proceedings of the 2005 International Conference on Knowledge Management, pp. 101–113. Citeseer (2005)
68.
Zurück zum Zitat Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L.: Assessing information quality of a community-based encyclopedia. In: Proceedings of ICIQ, pp. 442–454 (2005) Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L.: Assessing information quality of a community-based encyclopedia. In: Proceedings of ICIQ, pp. 442–454 (2005)
69.
Zurück zum Zitat Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996)CrossRef Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12(4), 5–33 (1996)CrossRef
71.
Zurück zum Zitat Warncke-Wang, M., Ranjan, V., Terveen, L.G., Hecht, B.J.: Misalignment between supply and demand of quality content in peer production communities. In: ICWSM, pp. 493–502 (2015) Warncke-Wang, M., Ranjan, V., Terveen, L.G., Hecht, B.J.: Misalignment between supply and demand of quality content in peer production communities. In: ICWSM, pp. 493–502 (2015)
81.
Zurück zum Zitat Wu, K., Zhu, Q., Zhao, Y., Zheng, H.: Mining the factors affecting the quality of Wikipedia articles. In: 2010 International Conference of Information Science and Management Engineering (ISME), vol. 1, pp. 343–346. IEEE (2010) Wu, K., Zhu, Q., Zhao, Y., Zheng, H.: Mining the factors affecting the quality of Wikipedia articles. In: 2010 International Conference of Information Science and Management Engineering (ISME), vol. 1, pp. 343–346. IEEE (2010)
82.
Zurück zum Zitat Yaari, E., Baruchson-Arbib, S., Bar-Ilan, J.: Information quality assessment of community generated content: a user study of wikipedia. J. Inf. Sci. 37(5), 487–498 (2011)CrossRef Yaari, E., Baruchson-Arbib, S., Bar-Ilan, J.: Information quality assessment of community generated content: a user study of wikipedia. J. Inf. Sci. 37(5), 487–498 (2011)CrossRef
83.
Zurück zum Zitat Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRef Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRef
84.
Zurück zum Zitat Zhang, S., Hu, Z., Zhang, C., Yu, K.: History-based article quality assessment on Wikipedia. In: 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8. IEEE (2018) Zhang, S., Hu, Z., Zhang, C., Yu, K.: History-based article quality assessment on Wikipedia. In: 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 1–8. IEEE (2018)
Metadaten
Titel
Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia
verfasst von
Włodzimierz Lewoniewski
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-04849-5_53