Skip to main content
Top

2015 | OriginalPaper | Chapter

Author Attribution of Email Messages Using Parse-Tree Features

Authors : Jagadeesh Patchala, Raj Bhatnagar, Sridharan Gopalakrishnan

Published in: Machine Learning and Data Mining in Pattern Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most existing research on authorship attribution uses various types of lexical, syntactic, and structural features for classification. Some of these features are not meaningful for small texts such as email messages. In this paper we demonstrate a very effective use of a syntactic feature of an author’s writing - text’s parse tree characteristics - for authorship analysis of email messages. We define author templates consisting of context free grammar (CFG) production frequencies occurring in an author’s training set of email messages. We then use similar frequencies extracted from a new email message to match against various authors’ templates to identify the best match. We evaluate our approach on Enron email dataset and show that CFG production frequencies work very well and are robust in attributing the authorship of email messages.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hope, W., Holston, K.: The Shakespeare Controversy: An Analysis of the Authorship Theories. McFarland, Jefferson (2009) Hope, W., Holston, K.: The Shakespeare Controversy: An Analysis of the Authorship Theories. McFarland, Jefferson (2009)
2.
go back to reference Sousa Silva, R., Laboreiro, G., Sarmento, L., Grant, T., Oliveira, E., Maia, B.: ‘twazn me!!!;(’ Automatic authorship analysis of micro-blogging messages. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 161–168. Springer, Heidelberg (2011) CrossRef Sousa Silva, R., Laboreiro, G., Sarmento, L., Grant, T., Oliveira, E., Maia, B.: ‘twazn me!!!;(’ Automatic authorship analysis of micro-blogging messages. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 161–168. Springer, Heidelberg (2011) CrossRef
3.
go back to reference De Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. ACM Sigmod Rec. 30, 55–64 (2001)CrossRef De Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. ACM Sigmod Rec. 30, 55–64 (2001)CrossRef
4.
go back to reference Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 611. Association for Computational Linguistics, Stroudsburg (2004) Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 611. Association for Computational Linguistics, Stroudsburg (2004)
5.
go back to reference Baayen, R., Van Halteren, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Literary Linguist. Comput. 11, 121–131 (1996)CrossRef Baayen, R., Van Halteren, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Literary Linguist. Comput. 11, 121–131 (1996)CrossRef
6.
go back to reference Teng, G.F., Lai, M.S., Ma, J.B., Li, Y. :E-mail authorship mining based on SVM for computer forensic. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, pp. 1204–1207. IEEE Press, New York (2004) Teng, G.F., Lai, M.S., Ma, J.B., Li, Y. :E-mail authorship mining based on SVM for computer forensic. In: Proceedings of 2004 International Conference on Machine Learning and Cybernetics, pp. 1204–1207. IEEE Press, New York (2004)
7.
go back to reference De Vel, O.: Mining e-mail authorship. In: Proceedings of Workshop on Text Mining, ACM 6th International Conference on Knowledge Discovery and Data Mining (2000) De Vel, O.: Mining e-mail authorship. In: Proceedings of Workshop on Text Mining, ACM 6th International Conference on Knowledge Discovery and Data Mining (2000)
8.
go back to reference Nizamani, S., Memon, N.: CEAI: CCM-based e-mail authorship identification model. Egypt. Inf. J. 14, 239–249 (2013)CrossRef Nizamani, S., Memon, N.: CEAI: CCM-based e-mail authorship identification model. Egypt. Inf. J. 14, 239–249 (2013)CrossRef
9.
go back to reference Zheng, R., Qin, Y., Huang, Z., Chen, H.: Authorship analysis in cybercrime investigation. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 59–73. Springer, Heidelberg (2003) CrossRef Zheng, R., Qin, Y., Huang, Z., Chen, H.: Authorship analysis in cybercrime investigation. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 59–73. Springer, Heidelberg (2003) CrossRef
10.
go back to reference Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.: Mining writeprints from anonymous e-mails for forensic investigation. Digital Invest. 7, 56–64 (2010)CrossRef Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.: Mining writeprints from anonymous e-mails for forensic investigation. Digital Invest. 7, 56–64 (2010)CrossRef
11.
go back to reference Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.: A unified data mining solution for authorship analysis in anonymous textual communications. Inf. Sci. 231, 98–112 (2013)CrossRef Iqbal, F., Binsalleeh, H., Fung, B., Debbabi, M.: A unified data mining solution for authorship analysis in anonymous textual communications. Inf. Sci. 231, 98–112 (2013)CrossRef
12.
go back to reference Peng, F., Schuurmans, D., Wang, S., Keselj, V.: Language independent authorship attribution using character level language models. In: Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics, pp. 267–274. Association for Computational Linguistics, Stroudsburg (2003) Peng, F., Schuurmans, D., Wang, S., Keselj, V.: Language independent authorship attribution using character level language models. In: Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics, pp. 267–274. Association for Computational Linguistics, Stroudsburg (2003)
13.
go back to reference Mosteller, F., Wallace, D.L.: Applied Bayesian and Classical Inference. Springer Series in Statistics. Springer, New York (1984) MATHCrossRef Mosteller, F., Wallace, D.L.: Applied Bayesian and Classical Inference. Springer Series in Statistics. Springer, New York (1984) MATHCrossRef
14.
go back to reference Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 423–430. Association for Computational Linguistics, Stroudsburg (2003) Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, pp. 423–430. Association for Computational Linguistics, Stroudsburg (2003)
16.
17.
go back to reference Inder, J.E.T.A.: New developments in generalized information measures. In: Hawkes, P.W. (ed.) Advances in Imaging and Electron Physics, vol. 91, pp. 37–135. Academic Press, New York (2006) Inder, J.E.T.A.: New developments in generalized information measures. In: Hawkes, P.W. (ed.) Advances in Imaging and Electron Physics, vol. 91, pp. 37–135. Academic Press, New York (2006)
18.
go back to reference Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)CrossRef Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)CrossRef
Metadata
Title
Author Attribution of Email Messages Using Parse-Tree Features
Authors
Jagadeesh Patchala
Raj Bhatnagar
Sridharan Gopalakrishnan
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-21024-7_21

Premium Partner