Skip to main content

2017 | OriginalPaper | Buchkapitel

Using Morphological and Semantic Features for the Quality Assessment of Russian Wikipedia

verfasst von : Włodzimierz Lewoniewski, Nina Khairova, Krzysztof Węcel, Nataliia Stratiienko, Witold Abramowicz

Erschienen in: Information and Software Technologies

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, the assessment of the quality and credibility of Wikipedia articles becomes increasingly important. We propose to use morphological and semantic features to estimate the quality of Wikipedia articles in Russian language. We distinguished over 150 linguistic features and divided them into four groups. In these groups, we considered the features of encyclopedic style, readability and subjectivism of the article’s text. Based on Random Forest as a classification algorithm, we show the most importance linguistic features that affect the quality of Russian Wikipedia articles. We compare the classification results of our four linguistic features groups separately. We have achieved the F-measure of 89,75%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Xu, Y., Luo, T.: Measuring article quality in Wikipedia: Lexical clue model. In Web Society (SWS). In: 2011 3rd Symposium on IEEE, pp. 141–146 (2011) Xu, Y., Luo, T.: Measuring article quality in Wikipedia: Lexical clue model. In Web Society (SWS). In: 2011 3rd Symposium on IEEE, pp. 141–146 (2011)
3.
Zurück zum Zitat Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of wikipedia. Ph.D., Bauhaus-Universitaet Weimar Germany (2013) Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of wikipedia. Ph.D., Bauhaus-Universitaet Weimar Germany (2013)
4.
Zurück zum Zitat Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in wikipedia: quality through coordination. In: Proceedings of the 2008 ACM conference on Computer Supported Cooperative Work, pp. 37–46. ACM (2008) Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in wikipedia: quality through coordination. In: Proceedings of the 2008 ACM conference on Computer Supported Cooperative Work, pp. 37–46. ACM (2008)
5.
Zurück zum Zitat Velázquez, C.G., Cagnina, L.C., Errecalde, M.L.: On the feasibility of external factual support as Wikipedia’s quality metric. Procesamiento del Lenguaje Natural 58, 93–100 (2017) Velázquez, C.G., Cagnina, L.C., Errecalde, M.L.: On the feasibility of external factual support as Wikipedia’s quality metric. Procesamiento del Lenguaje Natural 58, 93–100 (2017)
6.
Zurück zum Zitat Lipka, N., Stein, B.: Identifying featured articles in wikipedia: writing style matters. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010) Lipka, N., Stein, B.: Identifying featured articles in wikipedia: writing style matters. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)
7.
Zurück zum Zitat Khairova, N., Petrasova, S., Gautam, A.: The logical-linguistic model of fact extraction from english texts. In: International Conference on Information and Software Technologies, CCIS 2016, Communications in Computer and Information Science, pp. 625–635 (2016) Khairova, N., Petrasova, S., Gautam, A.: The logical-linguistic model of fact extraction from english texts. In: International Conference on Information and Software Technologies, CCIS 2016, Communications in Computer and Information Science, pp. 625–635 (2016)
8.
Zurück zum Zitat Warncke-Wang, M., Cosley, D., Riedl, J.: Tell me more: an actionable quality model for Wikipedia. In: Proceedings of the 9th International Symposium on Open Collaboration (2013) Warncke-Wang, M., Cosley, D., Riedl, J.: Tell me more: an actionable quality model for Wikipedia. In: Proceedings of the 9th International Symposium on Open Collaboration (2013)
9.
Zurück zum Zitat Giles, G.: Internet encyclopaedias go head to head. Nature 438, 900–901 (2005)CrossRef Giles, G.: Internet encyclopaedias go head to head. Nature 438, 900–901 (2005)CrossRef
10.
Zurück zum Zitat Panicheva, P., Ledovaya, Y., Bogolyubova, O.: Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts. In: Artificial Intelligence and Natural Language Conference (AINL), pp. 1–8. IEEE (2016) Panicheva, P., Ledovaya, Y., Bogolyubova, O.: Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts. In: Artificial Intelligence and Natural Language Conference (AINL), pp. 1–8. IEEE (2016)
11.
Zurück zum Zitat Lenzner, T.: Are readability formulas valid tools for assessing survey question difficulty? Sociol. Methods Res. 43(4), 677–698 (2014)MathSciNetCrossRef Lenzner, T.: Are readability formulas valid tools for assessing survey question difficulty? Sociol. Methods Res. 43(4), 677–698 (2014)MathSciNetCrossRef
12.
Zurück zum Zitat Sharoff, S., Umanskaya, E., Wilson, J.: A frequency dictionary of Russian: core vocabulary for learners, Routledge (2014) Sharoff, S., Umanskaya, E., Wilson, J.: A frequency dictionary of Russian: core vocabulary for learners, Routledge (2014)
13.
Zurück zum Zitat Khairova, N., Lewoniewski, W., Wecel, K.: Estimating the quality of articles in russian Wikipedia using the logical-linguistic model of fact extraction. In: International Conference on Business Information Systems, pp. 28–42 (2017) Khairova, N., Lewoniewski, W., Wecel, K.: Estimating the quality of articles in russian Wikipedia using the logical-linguistic model of fact extraction. In: International Conference on Business Information Systems, pp. 28–42 (2017)
15.
Zurück zum Zitat Lewoniewski, W., Węcel, K., Abramowicz, W.: Quality and importance of wikipedia articles in different languages. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 613–624. Springer, Cham (2016). doi:10.1007/978-3-319-46254-7_50 CrossRef Lewoniewski, W., Węcel, K., Abramowicz, W.: Quality and importance of wikipedia articles in different languages. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 613–624. Springer, Cham (2016). doi:10.​1007/​978-3-319-46254-7_​50 CrossRef
16.
Zurück zum Zitat Rebuschat, P.E., Detmar, M., McEnery, T.: Language learning research at the intersection of experimental, computational and corpus-based approaches, Language Learning (2017) Rebuschat, P.E., Detmar, M., McEnery, T.: Language learning research at the intersection of experimental, computational and corpus-based approaches, Language Learning (2017)
17.
Zurück zum Zitat Wu, G., Harrigan, M., Cunningham, P.: Characterizing wikipedia pages using edit network motif profiles. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 45–52. ACM (2011) Wu, G., Harrigan, M., Cunningham, P.: Characterizing wikipedia pages using edit network motif profiles. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 45–52. ACM (2011)
18.
Zurück zum Zitat Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Granitzer, M.: Measuring the quality of web content using factual information, In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 7–10. ACM (2012) Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Granitzer, M.: Measuring the quality of web content using factual information, In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 7–10. ACM (2012)
Metadaten
Titel
Using Morphological and Semantic Features for the Quality Assessment of Russian Wikipedia
verfasst von
Włodzimierz Lewoniewski
Nina Khairova
Krzysztof Węcel
Nataliia Stratiienko
Witold Abramowicz
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-67642-5_46