Skip to main content

2016 | OriginalPaper | Buchkapitel

Authorship Attribution of Polish Newspaper Articles

verfasst von : Marcin Kuta, Bartłomiej Puto, Jacek Kitowski

Erschienen in: Artificial Intelligence and Soft Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper examines the machine learning approach to authorship attribution of articles in the Polish language. The focus is on the effect of the data volume, number of authors and thematic homogeneity on authorship attribution quality. We study the impact of feature selection under various feature selection criteria, mainly chi square and information gain measures, as well as the effect of combining features of different types. Results are reported for the Rzeczpospolita corpus in terms of the \(F_1\) measure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dershowitz, I., Koppel, M., Akiva, N., Dershowitz, N.: Computerized source criticism of biblical texts. J. Biblical Lit. 134(2), 253–271 (2015) Dershowitz, I., Koppel, M., Akiva, N., Dershowitz, N.: Computerized source criticism of biblical texts. J. Biblical Lit. 134(2), 253–271 (2015)
2.
Zurück zum Zitat Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)CrossRef Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)CrossRef
3.
Zurück zum Zitat Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 659–660 (2006) Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 659–660 (2006)
4.
Zurück zum Zitat Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of the Twenty-Second International Conference on Computational Linguistics (COLING 2008), Manchester, UK, pp. 513–520 (2008) Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of the Twenty-Second International Conference on Computational Linguistics (COLING 2008), Manchester, UK, pp. 513–520 (2008)
5.
Zurück zum Zitat Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH
6.
Zurück zum Zitat Dasarasthy, B.: Nearest Neighbor Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991) Dasarasthy, B.: Nearest Neighbor Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
7.
Zurück zum Zitat Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986) Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
8.
Zurück zum Zitat McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41–48 (1998) McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41–48 (1998)
9.
Zurück zum Zitat Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRef Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRef
11.
Zurück zum Zitat Bridle, J.: Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems, pp. 211–217. Morgan Kaufman (1990) Bridle, J.: Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems, pp. 211–217. Morgan Kaufman (1990)
12.
Zurück zum Zitat van Rijsbergen, C.J.: Information Retrieval. Butterworth, Newton (1979)MATH van Rijsbergen, C.J.: Information Retrieval. Butterworth, Newton (1979)MATH
13.
Zurück zum Zitat Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Literary Linguist. Comput. 26(1), 35–55 (2011)CrossRef Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Literary Linguist. Comput. 26(1), 35–55 (2011)CrossRef
14.
Zurück zum Zitat Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)MATH Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)MATH
Metadaten
Titel
Authorship Attribution of Polish Newspaper Articles
verfasst von
Marcin Kuta
Bartłomiej Puto
Jacek Kitowski
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-39384-1_41

Premium Partner