Skip to main content
Top
Published in: Neural Computing and Applications 4/2010

01-06-2010 | Original Article

A comparative study on authorship attribution classification tasks using both neural network and statistical methods

Authors: Nikos Tsimboukakis, George Tambouratzis

Published in: Neural Computing and Applications | Issue 4/2010

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The present paper investigates the application of the multi-layer perceptron (MLP) to the task of categorizing texts based on their authors’ style. This task is of particular importance for information retrieval applications involving very large document databases. The emphasis of this article is to determine the extent to which the MLP model can be fine-tuned to successfully analyse such data, uncovering the stylistic differences among authors. The MLP-based method is compared and contrasted to statistical techniques, such as discriminant analysis, that are widely used in stylistic studies. The comparison of the methods is based on their classification performance, to provide an objective evaluation of the advantages of each method. A second aim of the study presented here is to compare the effectiveness of distinct features in the task of uncovering the author identity for each method. To evaluate to a greater depth the effectiveness of the entire approach, the results of the proposed MLP-based method are compared to those of established approaches, such as the support vector machines (SVM), using both the original parameters employed by the MLP as well as term frequency–inverse document frequency (TF–IDF) parameters, and the cascade correlation approach. It is found that the proposed MLP-based approach possesses a number of advantages, such as high classification accuracy, broadly comparable to that of the SVM, coupled with the ability to algorithmically reduce the set of parameters used without adversely affecting the classification accuracy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Gurney PJ, Gurney LW (1998) Subsets and homogeneity: authorship attribution in the Scriptores Historiae Augustae. Lit Linguist Comput 13(3):133–140CrossRef Gurney PJ, Gurney LW (1998) Subsets and homogeneity: authorship attribution in the Scriptores Historiae Augustae. Lit Linguist Comput 13(3):133–140CrossRef
2.
go back to reference Holmes DI (1994) Authorship attribution. Comput Humanit 28:86–106 Holmes DI (1994) Authorship attribution. Comput Humanit 28:86–106
3.
go back to reference Mosteller F, Wallace DL (1984) Applied Bayesian and classical inference: the case of the Federalist papers, 2nd edn. Springer, New YorkMATH Mosteller F, Wallace DL (1984) Applied Bayesian and classical inference: the case of the Federalist papers, 2nd edn. Springer, New YorkMATH
4.
go back to reference Holmes DI, Singh S, Tweedie FJ (1996) Neural network applications in stylometry: the Federalist papers. Comput Humanit 30:1–10CrossRef Holmes DI, Singh S, Tweedie FJ (1996) Neural network applications in stylometry: the Federalist papers. Comput Humanit 30:1–10CrossRef
5.
go back to reference Lowe D, Matthews R (1995) Shakespeare vs. Fletcher: a stylometric analysis by radial basis functions. Comput Humanit 29:449–461CrossRef Lowe D, Matthews R (1995) Shakespeare vs. Fletcher: a stylometric analysis by radial basis functions. Comput Humanit 29:449–461CrossRef
6.
go back to reference Tambouratzis G, Hairetakis N, Markantonatou S, Carayannis G (2003) Applying the SOM model to text classification according to register and stylistic content. Int J Neural Syst 13(1):1–11CrossRef Tambouratzis G, Hairetakis N, Markantonatou S, Carayannis G (2003) Applying the SOM model to text classification according to register and stylistic content. Int J Neural Syst 13(1):1–11CrossRef
7.
go back to reference Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, Englewood CliffsMATH Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, Englewood CliffsMATH
8.
go back to reference Somers H, Tweedie F (2003) Authorship attribution and practice. Comput Humanit 37:407–429CrossRef Somers H, Tweedie F (2003) Authorship attribution and practice. Comput Humanit 37:407–429CrossRef
9.
go back to reference Tambouratzis G, Markantonatou S, Hairetakis N, Vassiliou M, Tambouratzis D, Carayannis G (2004) Discriminating the registers and styles in the Modern Greek language—part 2: extending the feature vector to optimise author discrimination. Lit Linguist Comput 19(2):221–242CrossRef Tambouratzis G, Markantonatou S, Hairetakis N, Vassiliou M, Tambouratzis D, Carayannis G (2004) Discriminating the registers and styles in the Modern Greek language—part 2: extending the feature vector to optimise author discrimination. Lit Linguist Comput 19(2):221–242CrossRef
10.
go back to reference Kolman E, Margaliot M (2005) Are artificial neural networks white boxes? IEEE Trans Neural Netw 16(4):844–852CrossRef Kolman E, Margaliot M (2005) Are artificial neural networks white boxes? IEEE Trans Neural Netw 16(4):844–852CrossRef
11.
go back to reference Mackay D (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4(3):448–472CrossRef Mackay D (1992) A practical Bayesian framework for backpropagation networks. Neural Comput 4(3):448–472CrossRef
12.
go back to reference Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. Proc Int Jt Conf Neural Netw 3:21–26CrossRef Nguyen D, Widrow B (1990) Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. Proc Int Jt Conf Neural Netw 3:21–26CrossRef
13.
go back to reference Papageorgiou H, Prokopidis P, Giouli V, Piperidis S (2000) A unified PoS tagging architecture and its application to Greek, vol 3. Second international conference on language resources and evaluation proceedings. Athens, pp 1455–1462 Papageorgiou H, Prokopidis P, Giouli V, Piperidis S (2000) A unified PoS tagging architecture and its application to Greek, vol 3. Second international conference on language resources and evaluation proceedings. Athens, pp 1455–1462
14.
go back to reference Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkMATH Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkMATH
15.
go back to reference Fahlman S, Lebiere C (1990) The cascade-correlation learning architecture. Adv Neural Inform Process Syst 2:524–532 Morgan Kaufmann Fahlman S, Lebiere C (1990) The cascade-correlation learning architecture. Adv Neural Inform Process Syst 2:524–532 Morgan Kaufmann
16.
go back to reference Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inform Process Manag 24(5):513–523CrossRef Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inform Process Manag 24(5):513–523CrossRef
17.
go back to reference Vapnik V (1998) Statistical learning theory. Wiley Interscience, New YorkMATH Vapnik V (1998) Statistical learning theory. Wiley Interscience, New YorkMATH
18.
go back to reference Diederich J, Kinderman J, Leopold E, Paass G (2003) Authorship attribution with support vector machines. Appl Intell 19:109–123MATHCrossRef Diederich J, Kinderman J, Leopold E, Paass G (2003) Authorship attribution with support vector machines. Appl Intell 19:109–123MATHCrossRef
19.
go back to reference Bhamidipati NL, Pal SK (2007) Stemming via distribution-based word segregation for classification and retrieval. IEEE Trans Syst Man Cybern B Cybern 37(2):350–360CrossRef Bhamidipati NL, Pal SK (2007) Stemming via distribution-based word segregation for classification and retrieval. IEEE Trans Syst Man Cybern B Cybern 37(2):350–360CrossRef
20.
go back to reference Drucker H, Wu D, Vapnik V (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054CrossRef Drucker H, Wu D, Vapnik V (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 10(5):1048–1054CrossRef
21.
go back to reference Koprinska I, Poon J, Clark J, Chan J (2007) Learning to classify e-mail. Inform Sci 177(10):2167–2187CrossRef Koprinska I, Poon J, Clark J, Chan J (2007) Learning to classify e-mail. Inform Sci 177(10):2167–2187CrossRef
Metadata
Title
A comparative study on authorship attribution classification tasks using both neural network and statistical methods
Authors
Nikos Tsimboukakis
George Tambouratzis
Publication date
01-06-2010
Publisher
Springer-Verlag
Published in
Neural Computing and Applications / Issue 4/2010
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-009-0314-7

Other articles of this Issue 4/2010

Neural Computing and Applications 4/2010 Go to the issue

Premium Partner