Skip to main content
Top

2018 | OriginalPaper | Chapter

Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features

Authors : Włodzimierz Lewoniewski, Krzysztof Węcel, Witold Abramowicz

Published in: Information and Software Technologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Wikipedia is the most popular and the largest user-generated source of knowledge on the Web. Quality of the information in this encyclopedia is often questioned. Therefore, Wikipedians have developed an award system for high quality articles, which follows the specific style guidelines. Nevertheless, more than 1.2 million articles in Polish Wikipedia are unassessed. This paper considers over 100 linguistic features to determine the quality of Wikipedia articles in Polish language. We evaluate our models on 500 000 articles of Polish Wikipedia. Additionally, we discuss the importance of linguistic features for quality prediction.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Unique words was counted on base forms each of each in the texts.
 
2
More detailed results in tabular form can be found on the page: http://​data.​lewoniewski.​info/​icist2018pl/​.
 
Literature
5.
go back to reference Broda, B., Piasecki, M.: Parallel, massive processing in supermatrix: a general tool for distributional semantic analysis of corpora. Int. J. Data Min. Model. Manag. 5(1), 1–19 (2013) Broda, B., Piasecki, M.: Parallel, massive processing in supermatrix: a general tool for distributional semantic analysis of corpora. Int. J. Data Min. Model. Manag. 5(1), 1–19 (2013)
7.
go back to reference Dang, Q.V., Ignat, C.L.: Quality assessment of Wikipedia articles without feature engineering. In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 27–30, June 2016 Dang, Q.V., Ignat, C.L.: Quality assessment of Wikipedia articles without feature engineering. In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 27–30, June 2016
8.
9.
go back to reference Gruszczyński, W., Broda, B., Charzyńska, E., Dębowski, u., Hadryan, M., Nitoń, B., Ogrodniczuk, M.: Measuring readability of polish texts. In: Proceedings of the 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, November 2015 Gruszczyński, W., Broda, B., Charzyńska, E., Dębowski, u., Hadryan, M., Nitoń, B., Ogrodniczuk, M.: Measuring readability of polish texts. In: Proceedings of the 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, November 2015
12.
17.
go back to reference Suzuki, Y., Nakamura, S.: Assessing the quality of Wikipedia editors through crowdsourcing. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, pp. 1001–1006. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2016). https://doi.org/10.1145/2872518.2891113 Suzuki, Y., Nakamura, S.: Assessing the quality of Wikipedia editors through crowdsourcing. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, pp. 1001–1006. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva (2016). https://​doi.​org/​10.​1145/​2872518.​2891113
28.
go back to reference Wolinski, M., Milkowski, M., Ogrodniczuk, M., Przepiórkowski, A.: PoliMorf: a (not so) new open morphological dictionary for polish. In: LREC, pp. 860–864 (2012) Wolinski, M., Milkowski, M., Ogrodniczuk, M., Przepiórkowski, A.: PoliMorf: a (not so) new open morphological dictionary for polish. In: LREC, pp. 860–864 (2012)
29.
go back to reference Woliński, M.: System znaczników morfosyntaktycznych w korpusie ipi pan. In: Polonica XXII-XXIII, pp. 39–55 (2003) Woliński, M.: System znaczników morfosyntaktycznych w korpusie ipi pan. In: Polonica XXII-XXIII, pp. 39–55 (2003)
30.
go back to reference Wu, G., Harrigan, M., Cunningham, P.: Characterizing Wikipedia pages using edit network motif profiles. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, SMUC 2011, pp. 45–52. ACM, New York (2011). https://doi.org/10.1145/2065023.2065036 Wu, G., Harrigan, M., Cunningham, P.: Characterizing Wikipedia pages using edit network motif profiles. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, SMUC 2011, pp. 45–52. ACM, New York (2011). https://​doi.​org/​10.​1145/​2065023.​2065036
Metadata
Title
Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features
Authors
Włodzimierz Lewoniewski
Krzysztof Węcel
Witold Abramowicz
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-99972-2_45

Premium Partner