Skip to main content

2014 | OriginalPaper | Buchkapitel

Probabilistic Anomaly Detection Method for Authorship Verification

verfasst von : Mohamed Amine Boukhaled, Jean-Gabriel Ganascia

Erschienen in: Statistical Language and Speech Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Authorship verification is the task of determining if a given text is written by a candidate author or not. In this paper, we present a first study on using an anomaly detection method for the authorship verification task. We have considered a weakly supervised probabilistic model based on a multivariate Gaussian distribution. To evaluate the effectiveness of the proposed method, we conducted experiments on a classic French corpus. Our preliminary results show that the probabilistic method can achieve a high verification performance that can reach an F1 score of 85 %. Thus, this method can be very valuable for authorship verification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Argamon, S., Levitan, S.: Measuring the usefulness of function words for authorship attribution. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (2005) Argamon, S., Levitan, S.: Measuring the usefulness of function words for authorship attribution. In: Proceedings of the Joint Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing (2005)
2.
Zurück zum Zitat Baayen, H., van Halteren, H., Neijt, A., Tweedie, F.: An experiment in authorship attribution. In: 6th JADT, pp. 29–37 (2002) Baayen, H., van Halteren, H., Neijt, A., Tweedie, F.: An experiment in authorship attribution. In: 6th JADT, pp. 29–37 (2002)
3.
Zurück zum Zitat Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRef Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRef
4.
Zurück zum Zitat Chung, C., Pennebaker, J.W.: The psychological functions of function words. In: Fielder, K. (ed.) Social Communication, pp. 343–359. Psychology Press, New York (2007) Chung, C., Pennebaker, J.W.: The psychological functions of function words. In: Fielder, K. (ed.) Social Communication, pp. 343–359. Psychology Press, New York (2007)
5.
Zurück zum Zitat Eder, M.: Does size matter? Authorship attribution, small samples, big problem. Lit. Linguist. Comput. fqt066 (2013) Eder, M.: Does size matter? Authorship attribution, small samples, big problem. Lit. Linguist. Comput. fqt066 (2013)
6.
Zurück zum Zitat Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 611 (2004) Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 611 (2004)
7.
Zurück zum Zitat Görnitz, N., Kloft, M.M., Rieck, K., Brefeld, U.: Toward supervised anomaly detection (2014) arXiv Preprint arXiv:1401.6424 Görnitz, N., Kloft, M.M., Rieck, K., Brefeld, U.: Toward supervised anomaly detection (2014) arXiv Preprint arXiv:1401.6424
8.
Zurück zum Zitat Heller, K., Svore, K., Keromytis, A.D., Stolfo, S.: One class support vector machines for detecting anomalous windows registry accesses. In: Workshop on Data Mining for Computer Security (DMSEC), Melbourne, FL, 19 November 2003, pp. 2–9 (2003) Heller, K., Svore, K., Keromytis, A.D., Stolfo, S.: One class support vector machines for detecting anomalous windows registry accesses. In: Workshop on Data Mining for Computer Security (DMSEC), Melbourne, FL, 19 November 2003, pp. 2–9 (2003)
9.
Zurück zum Zitat Holmes, D.I., Robertson, M., Paez, R.: Stephen Crane and the New-York tribune: a case study in traditional and non-traditional authorship attribution. Comput. Humanit. 35(3), 315–331 (2001)CrossRef Holmes, D.I., Robertson, M., Paez, R.: Stephen Crane and the New-York tribune: a case study in traditional and non-traditional authorship attribution. Comput. Humanit. 35(3), 315–331 (2001)CrossRef
10.
11.
Zurück zum Zitat Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING, vol. 3, pp. 255–264 (2003) Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING, vol. 3, pp. 255–264 (2003)
12.
Zurück zum Zitat Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 62 (2004) Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 62 (2004)
13.
Zurück zum Zitat Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009)CrossRef Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009)CrossRef
14.
Zurück zum Zitat Kukushkina, O.V., Polikarpov, A.A., Khmelev, V.: Using literal and grammatical statistics for authorship attribution. Probl. Inf. Transm. 37(2), 172–184 (2001)CrossRefMATHMathSciNet Kukushkina, O.V., Polikarpov, A.A., Khmelev, V.: Using literal and grammatical statistics for authorship attribution. Probl. Inf. Transm. 37(2), 172–184 (2001)CrossRefMATHMathSciNet
15.
Zurück zum Zitat Markou, M., Singh, S.: Novelty detection: a review—part 1: statistical approaches. Sig. Process. 83(12), 2481–2497 (2003)CrossRefMATH Markou, M., Singh, S.: Novelty detection: a review—part 1: statistical approaches. Sig. Process. 83(12), 2481–2497 (2003)CrossRefMATH
16.
Zurück zum Zitat Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)CrossRef Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)CrossRef
17.
Zurück zum Zitat Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)CrossRef Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)CrossRef
18.
Zurück zum Zitat Wressnegger, C., Schwenk, G., Arp, D., Rieck, K.: A close look on n-grams in intrusion detection: anomaly detection vs. classification. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, pp. 67–76 (2013) Wressnegger, C., Schwenk, G., Arp, D., Rieck, K.: A close look on n-grams in intrusion detection: anomaly detection vs. classification. In: Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, pp. 67–76 (2013)
19.
Zurück zum Zitat Yule, G.U.: The Statistical Study of Literary Vocabulary. CUP Archive, Cambridge (1944) Yule, G.U.: The Statistical Study of Literary Vocabulary. CUP Archive, Cambridge (1944)
20.
Zurück zum Zitat Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds.) Information Retrieval Technology. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)CrossRef Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds.) Information Retrieval Technology. LNCS, vol. 3689, pp. 174–189. Springer, Heidelberg (2005)CrossRef
Metadaten
Titel
Probabilistic Anomaly Detection Method for Authorship Verification
verfasst von
Mohamed Amine Boukhaled
Jean-Gabriel Ganascia
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-11397-5_16