Skip to main content

2018 | OriginalPaper | Buchkapitel

Authorship Attribution for Short Texts with Author-Document Topic Model

verfasst von : Haowen Zhang, Peng Nie, Yanlong Wen, Xiaojie Yuan

Erschienen in: Knowledge Science, Engineering and Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The goal of authorship attribution is to assign the controversial texts to the known authors correctly. With the development of social media services, authorship attribution for short texts becomes very necessary. In the earlier works, topic models, such as the Latent Dirichlet Allocation (LDA), have been used to find latent semantic features of authors and achieve better performance on authorship attribution. However, most of them focus on authorship attribution for long texts. In this paper, we propose a novel model named Author-Document Topic Model (ADT) which builds the model for the corpus both at the author level and the document level to figure out the problem of authorship attribution for short texts. Also, we propose a new classification algorithm to calculate the similarity between texts for finding the authors of the anonymous texts. Experimental results on two public datasets validate the effectiveness of our proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Stamatatos, E.: A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009)CrossRef Stamatatos, E.: A survey of modern authorship attribution methods. J. Assoc. Inf. Sci. Technol. 60(3), 538–556 (2009)CrossRef
2.
Zurück zum Zitat Abbasi, A., Chen, H.: Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 1–29 (2008)CrossRef Abbasi, A., Chen, H.: Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), 1–29 (2008)CrossRef
3.
Zurück zum Zitat Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Resour. Eval. 45(1), 83–94 (2011)CrossRef Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Resour. Eval. 45(1), 83–94 (2011)CrossRef
4.
Zurück zum Zitat Azarbonyad, H., Dehghani, M., Marx, M., Kamps, J.: Time-aware authorship attribution for short text streams. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 727–730 (2015) Azarbonyad, H., Dehghani, M., Marx, M., Kamps, J.: Time-aware authorship attribution for short text streams. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 727–730 (2015)
5.
Zurück zum Zitat Hofmann, T.: Probabilistic latent semantic indexing. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999) Hofmann, T.: Probabilistic latent semantic indexing. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
6.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
7.
Zurück zum Zitat Seroussi, Y., Zukerman, I., Bohnert, F.: Authorship attribution with latent Dirichlet allocation. In: Fifteenth Conference on Computational Natural Language Learning, pp. 181–189 (2011) Seroussi, Y., Zukerman, I., Bohnert, F.: Authorship attribution with latent Dirichlet allocation. In: Fifteenth Conference on Computational Natural Language Learning, pp. 181–189 (2011)
8.
Zurück zum Zitat Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 2011 Labs and Workshops, Amsterdam, Netherlands, 19–22 September 2011 Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 2011 Labs and Workshops, Amsterdam, Netherlands, 19–22 September 2011
9.
Zurück zum Zitat Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. Front. Inf. Technol. Electron. Eng. 274(s 1–2), 199–205 (2006) Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. Front. Inf. Technol. Electron. Eng. 274(s 1–2), 199–205 (2006)
10.
Zurück zum Zitat Frantzeskou, G., Stamatatos, E., Gritzalis, S., Chaski, C.E., Howald, B.S.: Identifying authorship by byte-level N-grams: the source code author profile (SCAP) method. Int. J. Digit. Evid. 6(1), 1–18 (2007) Frantzeskou, G., Stamatatos, E., Gritzalis, S., Chaski, C.E., Howald, B.S.: Identifying authorship by byte-level N-grams: the source code author profile (SCAP) method. Int. J. Digit. Evid. 6(1), 1–18 (2007)
11.
Zurück zum Zitat Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 659–660 (2006) Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 659–660 (2006)
12.
13.
Zurück zum Zitat Schwartz, R., Tsur, O., Rappoport, A., Koppel, M.: Authorship attribution of micro-messages. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1880–1891 (2013) Schwartz, R., Tsur, O., Rappoport, A., Koppel, M.: Authorship attribution of micro-messages. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1880–1891 (2013)
14.
Zurück zum Zitat Seroussi, Y., Bohnert, F., Zukerman, I.: Authorship attribution with author-aware topic models. In: Meeting of the Association for Computational Linguistics: Short Papers, pp. 264–269 (2012) Seroussi, Y., Bohnert, F., Zukerman, I.: Authorship attribution with author-aware topic models. In: Meeting of the Association for Computational Linguistics: Short Papers, pp. 264–269 (2012)
15.
Zurück zum Zitat Yang, M., Zhu, D., Tang, Y., Wang, J.: Authorship attribution with topic drift model. In: AAAI, pp. 5015–5016 (2017) Yang, M., Zhu, D., Tang, Y., Wang, J.: Authorship attribution with topic drift model. In: AAAI, pp. 5015–5016 (2017)
16.
Zurück zum Zitat Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101(Suppl 1), 5228 (2004)CrossRef Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.S.A. 101(Suppl 1), 5228 (2004)CrossRef
17.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef
Metadaten
Titel
Authorship Attribution for Short Texts with Author-Document Topic Model
verfasst von
Haowen Zhang
Peng Nie
Yanlong Wen
Xiaojie Yuan
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-99365-2_3

Premium Partner