Skip to main content
Top

2021 | OriginalPaper | Chapter

Is Wikipedia Easy to Understand?: A Study Beyond Conventional Readability Metrics

Authors : Simran Setia, S. R. S. Iyengar, Amit Arjun Verma, Neeru Dubey

Published in: Advances in Computational Collective Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Wikipedia has emerged to be one of the most prominent sources of information available on the Internet today. It provides a collaborative platform for editors to edit and share their information, making Wikipedia a valuable source of information. The Wikipedia articles have been duly studied from an editor’s point of view. But, the analysis of Wikipedia from the reader’s perspective is yet to be studied. Since Wikipedia serves as an encyclopedia of information for its users, its role as an information securing tool must be examined. The readability of a written text plays a major role in imparting the intended comprehension to its readers. Readability is the ease with which a reader can understand the underlying piece of text. In this paper, we study the readability of various Wikipedia articles. Apart from judging the readability of Wikipedia articles against standard readability metrics, we introduce some new parameters related specifically to the comprehension of the text present in Wikipedia articles. These new parameters, combined with standard readability metrics, help classify the Wikipedia articles into comprehensible and non-comprehensible classes through the SVM classification technique.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Antin, J., Cheshire, C.: Readers are not free-riders: reading as a form of participation on Wikipedia. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 127–130 (2010) Antin, J., Cheshire, C.: Readers are not free-riders: reading as a form of participation on Wikipedia. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 127–130 (2010)
3.
go back to reference Beran, R., et al.: Minimum Hellinger distance estimates for parametric models. Ann. Stat. 5(3), 445–463 (1977)MathSciNetMATH Beran, R., et al.: Minimum Hellinger distance estimates for parametric models. Ann. Stat. 5(3), 445–463 (1977)MathSciNetMATH
4.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
5.
go back to reference Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995) Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist. 21(4), 543–565 (1995)
6.
go back to reference Bryant, S.L., Forte, A., Bruckman, A.: Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In: Proceedings of the 2005 international ACM SIGGROUP Conference on Supporting Group Work, pp. 1–10 (2005) Bryant, S.L., Forte, A., Bruckman, A.: Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In: Proceedings of the 2005 international ACM SIGGROUP Conference on Supporting Group Work, pp. 1–10 (2005)
8.
go back to reference Davison, A., Kantor, R.N.: On the failure of readability formulas to define readable texts: a case study from adaptations. Read. Res. Q., 187–209 (1982) Davison, A., Kantor, R.N.: On the failure of readability formulas to define readable texts: a case study from adaptations. Read. Res. Q., 187–209 (1982)
9.
go back to reference Gernsbacher, M.A.: Language Comprehension as Structure Building. Psychology Press (2013) Gernsbacher, M.A.: Language Comprehension as Structure Building. Psychology Press (2013)
10.
go back to reference Graham, N., Hirst, G., Marthi, B.: Segmenting documents by stylistic character. Nat. Lang. Eng. 11(4), 397–416 (2005)CrossRef Graham, N., Hirst, G., Marthi, B.: Segmenting documents by stylistic character. Nat. Lang. Eng. 11(4), 397–416 (2005)CrossRef
11.
go back to reference Jatowt, A., Tanaka, K.: Is Wikipedia too difficult? Comparative analysis of readability of Wikipedia, simple Wikipedia and Britannica. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2607–2610 (2012) Jatowt, A., Tanaka, K.: Is Wikipedia too difficult? Comparative analysis of readability of Wikipedia, simple Wikipedia and Britannica. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2607–2610 (2012)
12.
go back to reference Kate, R.J., et al.: Learning to predict readability using diverse linguistic features. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 546–554. Association for Computational Linguistics (2010) Kate, R.J., et al.: Learning to predict readability using diverse linguistic features. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 546–554. Association for Computational Linguistics (2010)
13.
go back to reference Kendeou, P.: A general inference skill. In: Inferences During Reading, pp. 160–181 (2015) Kendeou, P.: A general inference skill. In: Inferences During Reading, pp. 160–181 (2015)
14.
go back to reference Kintsch, W.: The role of knowledge in discourse comprehension: a construction-integration model. Psychol. Rev. 95(2), 163 (1988)CrossRef Kintsch, W.: The role of knowledge in discourse comprehension: a construction-integration model. Psychol. Rev. 95(2), 163 (1988)CrossRef
15.
go back to reference Leicht, N.: Given enough eyeballs, all bugs are shallow-a literature review for the use of crowdsourcing in software testing. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018) Leicht, N.: Given enough eyeballs, all bugs are shallow-a literature review for the use of crowdsourcing in software testing. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)
16.
go back to reference Leroy, G., Helmreich, S., Cowie, J.R., Miller, T., Zheng, W.: Evaluating online health information: beyond readability formulas. In: AMIA Annual Symposium Proceedings, vol. 2008, p. 394. American Medical Informatics Association (2008) Leroy, G., Helmreich, S., Cowie, J.R., Miller, T., Zheng, W.: Evaluating online health information: beyond readability formulas. In: AMIA Annual Symposium Proceedings, vol. 2008, p. 394. American Medical Informatics Association (2008)
17.
go back to reference Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69(4), 1643–1671 (2014)CrossRef Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69(4), 1643–1671 (2014)CrossRef
18.
go back to reference Lucassen, T., Dijkstra, R., Schraagen, J.M.: Readability of Wikipedia. First Monday (2012) Lucassen, T., Dijkstra, R., Schraagen, J.M.: Readability of Wikipedia. First Monday (2012)
19.
go back to reference McCallum, D.R., Peterson, J.L.: Computer-based readability indexes. In: Proceedings of the ACM 1982 Conference, pp. 44–48 (1982) McCallum, D.R., Peterson, J.L.: Computer-based readability indexes. In: Proceedings of the ACM 1982 Conference, pp. 44–48 (1982)
20.
go back to reference Mosenthal, P.B., Kirsch, I.S.: A new measure for assessing document complexity: The PMOSE/IKIRSCH document readability formula. J. Adolesc. Adult Literacy 41(8), 638–657 (1998) Mosenthal, P.B., Kirsch, I.S.: A new measure for assessing document complexity: The PMOSE/IKIRSCH document readability formula. J. Adolesc. Adult Literacy 41(8), 638–657 (1998)
21.
go back to reference Myers, J.L., O’Brien, E.J.: Accessing the discourse representation during reading. Discourse Process. 26(2–3), 131–157 (1998)CrossRef Myers, J.L., O’Brien, E.J.: Accessing the discourse representation during reading. Discourse Process. 26(2–3), 131–157 (1998)CrossRef
22.
go back to reference Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F.Å., Lanamäki, A.: The people’s encyclopedia under the gaze of the sages: a systematic review of scholarly research on Wikipedia. Available at SSRN 2021326 (2012) Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F.Å., Lanamäki, A.: The people’s encyclopedia under the gaze of the sages: a systematic review of scholarly research on Wikipedia. Available at SSRN 2021326 (2012)
23.
go back to reference Piscopo, A., Simperl, E.: What we talk about when we talk about Wikidata quality: a literature survey. In: Proceedings of the 15th International Symposium on Open Collaboration, pp. 1–11 (2019) Piscopo, A., Simperl, E.: What we talk about when we talk about Wikidata quality: a literature survey. In: Proceedings of the 15th International Symposium on Open Collaboration, pp. 1–11 (2019)
24.
go back to reference Preece, J., Nonnecke, B., Andrews, D.: The top five reasons for lurking: improving community experiences for everyone. Comput. Hum. Behav. 20(2), 201–223 (2004)CrossRef Preece, J., Nonnecke, B., Andrews, D.: The top five reasons for lurking: improving community experiences for everyone. Comput. Hum. Behav. 20(2), 201–223 (2004)CrossRef
26.
go back to reference Rezgui, A., Crowston, K.: Stigmergic coordination in Wikipedia. In: Proceedings of the 14th International Symposium on Open Collaboration, pp. 1–12 (2018) Rezgui, A., Crowston, K.: Stigmergic coordination in Wikipedia. In: Proceedings of the 14th International Symposium on Open Collaboration, pp. 1–12 (2018)
28.
go back to reference Setia, S., Iyengar, S., Verma, A.A.: QWiki: need for QnA & Wiki to Co-exist. In: Proceedings of the 16th International Symposium on Open Collaboration, pp. 1–12 (2020) Setia, S., Iyengar, S., Verma, A.A.: QWiki: need for QnA & Wiki to Co-exist. In: Proceedings of the 16th International Symposium on Open Collaboration, pp. 1–12 (2020)
29.
go back to reference Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 574–576 (2001) Si, L., Callan, J.: A statistical model for scientific readability. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 574–576 (2001)
30.
go back to reference Singer, P., et al.: Why we read Wikipedia. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1591–1600 (2017) Singer, P., et al.: Why we read Wikipedia. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1591–1600 (2017)
31.
go back to reference Swartz, A.: Who writes Wikipedia. Raw Thought 4 (2006) Swartz, A.: Who writes Wikipedia. Raw Thought 4 (2006)
32.
go back to reference Tzeng, Y., Van Den Broek, P., Kendeou, P., Lee, C.: The computational implementation of the landscape model: modeling inferential processes and memory representations of text comprehension. Behav. Res. Methods 37(2), 277–286 (2005)CrossRef Tzeng, Y., Van Den Broek, P., Kendeou, P., Lee, C.: The computational implementation of the landscape model: modeling inferential processes and memory representations of text comprehension. Behav. Res. Methods 37(2), 277–286 (2005)CrossRef
33.
go back to reference Wallot, S., O’Brien, B.A., Haussmann, A., Kloos, H., Lyby, M.S.: The role of reading time complexity and reading speed in text comprehension. J. Exp. Psychol. Learn. Mem. Cogn. 40(6), 1745 (2014)CrossRef Wallot, S., O’Brien, B.A., Haussmann, A., Kloos, H., Lyby, M.S.: The role of reading time complexity and reading speed in text comprehension. J. Exp. Psychol. Learn. Mem. Cogn. 40(6), 1745 (2014)CrossRef
34.
go back to reference Yan, X., Song, D., Li, X.: Concept-based document readability in domain specific information retrieval. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 540–549 (2006) Yan, X., Song, D., Li, X.: Concept-based document readability in domain specific information retrieval. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 540–549 (2006)
Metadata
Title
Is Wikipedia Easy to Understand?: A Study Beyond Conventional Readability Metrics
Authors
Simran Setia
S. R. S. Iyengar
Amit Arjun Verma
Neeru Dubey
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-88113-9_14

Premium Partner