Skip to main content
Top
Published in:
Cover of the book

2017 | OriginalPaper | Chapter

Relevance of Named Entities in Authorship Attribution

Authors : Germán Ríos-Toledo, Grigori Sidorov, Noé Alejandro Castro-Sánchez, Alondra Nava-Zea, Liliana Chanona-Hernández

Published in: Advances in Computational Intelligence

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Named entities (NE) are words that refer to names of people, locations, organization, etc. NE are present in every kind of documents: e-mails, letters, essays, novels, poems. Automatic detection of these words is very important task in natural language processing. Sometimes, NE are used in authorship attribution studies as a stylometric feature. The goal of this paper is to evaluate the effect of the presence of NE in texts for the authorship attribution task: are we really detecting the style of an author or are we just discovering the appearance of the same NE. We used the corpus that consists of 91 novels of 7 authors of XVIII century. These authors spoke and wrote English, their native language. All novels belong to fiction genre. The used stylometric features were character n-grams, word n-gram and n-gram of POS tags of various sizes (2-grams, 3-grams, etc.). Five novels were selected for each author, these novels contain between 4 and 7% of the NE. All novels were divided into blocks, each block contains 10,000 terms. Two kinds of experiment were conducted: automatic classification of blocks containing NE and of the same blocks without NE. In some cases, we use only the most frequent n-grams (500, 2,000 and 4,000 n-grams). Three machine learning algorithms were used for classification task: NB, SVM (SMO) and J48. The results show that as a tendency the presence of the NE helps to classify (improvements from 5% to 20%), but there are specific authors when NE do not help and even make the classification worse (about 10% of experimental data).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference van Dalen-Oskam, K.: Names in novels: an experiment in computational stylistics. Literary Linguist. Comput. 28(2), 359–370 (2013)CrossRef van Dalen-Oskam, K.: Names in novels: an experiment in computational stylistics. Literary Linguist. Comput. 28(2), 359–370 (2013)CrossRef
2.
go back to reference Gómez-Adorno, H., Markov, I., Sidorov, G., Posadas-Durán, J.P., Sanchez-Perez, M.A., Chanona-Hernandez, L.: Improving feature representation based on a neural network for author profiling in social media texts. Comput. Intell. Neurosci. 2016 (2016) Gómez-Adorno, H., Markov, I., Sidorov, G., Posadas-Durán, J.P., Sanchez-Perez, M.A., Chanona-Hernandez, L.: Improving feature representation based on a neural network for author profiling in social media texts. Comput. Intell. Neurosci. 2016 (2016)
3.
go back to reference Gómez-Adorno, H., Sidorov, G., Pinto, D., Markov, I.: A graph based authorship identification approach. Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR Workshop Proceedings, vol. 1391. CEUR (2015) Gómez-Adorno, H., Sidorov, G., Pinto, D., Markov, I.: A graph based authorship identification approach. Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR Workshop Proceedings, vol. 1391. CEUR (2015)
4.
go back to reference Goodman, R., Hahn, M., Marella, M., Ojar, C., Westcott, S.: The use of stylometry for email author identification: a feasibility study. Pace Pacing Clin. Electrophysiol., 1–7 (2007) Goodman, R., Hahn, M., Marella, M., Ojar, C., Westcott, S.: The use of stylometry for email author identification: a feasibility study. Pace Pacing Clin. Electrophysiol., 1–7 (2007)
5.
go back to reference Koppel, M., Akiva, N., Dagan, I.: A corpus-independent feature set for style-based text categorization. Science, 1263–1276 (2002) Koppel, M., Akiva, N., Dagan, I.: A corpus-independent feature set for style-based text categorization. Science, 1263–1276 (2002)
6.
7.
go back to reference Lucic, A., Blake, C.L.: A syntactic characterization of authorship style surrounding proper names. Digit. Scholarsh. Humanit. 30(1), 53–70 (2015)CrossRef Lucic, A., Blake, C.L.: A syntactic characterization of authorship style surrounding proper names. Digit. Scholarsh. Humanit. 30(1), 53–70 (2015)CrossRef
8.
go back to reference Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning based document modeling for personality detection from text. IEEE Intell. Syst. 32(2), 74–79 (2017)CrossRef Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning based document modeling for personality detection from text. IEEE Intell. Syst. 32(2), 74–79 (2017)CrossRef
9.
go back to reference Markov, I., Gómez-Adorno, H., Sidorov, G., Gelbukh, A.: Adapting cross-genre author profiling to language and corpus. Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, vol. 1609, pp. 947–955. CLEF and CEUR-WS.org (2016) Markov, I., Gómez-Adorno, H., Sidorov, G., Gelbukh, A.: Adapting cross-genre author profiling to language and corpus. Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings, vol. 1609, pp. 947–955. CLEF and CEUR-WS.org (2016)
10.
go back to reference Mikros, G.K., Argiri, E.K.: Investigating topic influence in authorship attribution, vol. 276 (2007) Mikros, G.K., Argiri, E.K.: Investigating topic influence in authorship attribution, vol. 276 (2007)
11.
go back to reference Nadeau, D.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)CrossRef Nadeau, D.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)CrossRef
12.
go back to reference Rudman, J.: The state of authorship attribution studies: some problems and solutions. Comput. Humanit. 31(4), 351–365 (1997)CrossRef Rudman, J.: The state of authorship attribution studies: some problems and solutions. Comput. Humanit. 31(4), 351–365 (1997)CrossRef
13.
go back to reference Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Syst. Appl. 41(3), 853–860 (2014)CrossRef Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic n-grams as machine learning features for natural language processing. Expert Syst. Appl. 41(3), 853–860 (2014)CrossRef
14.
go back to reference Sidorov, G., Ibarra Romero, M., Markov, I., Guzman Cabrera, R., Chanona-Hernández, L., Velásquez, F.: Automatic detection of similarity of programs in karel programming language based on natural language processing techniques. Computación y Systemas 20(2), 279–288 (2016) Sidorov, G., Ibarra Romero, M., Markov, I., Guzman Cabrera, R., Chanona-Hernández, L., Velásquez, F.: Automatic detection of similarity of programs in karel programming language based on natural language processing techniques. Computación y Systemas 20(2), 279–288 (2016)
15.
go back to reference Sidorov, G., Ibarra Romero, M., Markov, I., Guzman Cabrera, R., Chanona-Hernández, L., Velásquez, F.: Measuring similarity between karel programs using character and word n-grams. Program. Comput. Softw. 43 (accepted, 2017) Sidorov, G., Ibarra Romero, M., Markov, I., Guzman Cabrera, R., Chanona-Hernández, L., Velásquez, F.: Measuring similarity between karel programs using character and word n-grams. Program. Comput. Softw. 43 (accepted, 2017)
16.
go back to reference Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60, 538–556 (2009)CrossRef Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60, 538–556 (2009)CrossRef
17.
go back to reference Tanguy, L., Sajous, F., Calderone, B., Hathout, N.: Authorship attribution: using rich linguistic features when training data is scarce. Working Notes Papers of the CLEF 2012 Evaluation Labs, pp. 1–12 (2012) Tanguy, L., Sajous, F., Calderone, B., Hathout, N.: Authorship attribution: using rich linguistic features when training data is scarce. Working Notes Papers of the CLEF 2012 Evaluation Labs, pp. 1–12 (2012)
18.
go back to reference Tanguy, L., Urieli, A., Calderone, B., Hathout, N., Sajous, F.: A multitude of linguistically-rich features for authorship attribution. In: CEUR Workshop Proceedings, vol. 1177 (2011) Tanguy, L., Urieli, A., Calderone, B., Hathout, N., Sajous, F.: A multitude of linguistically-rich features for authorship attribution. In: CEUR Workshop Proceedings, vol. 1177 (2011)
19.
go back to reference Poria, S., Cambria, E., Gelbukh, A.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: EMNLP 2015, pp. 2539–2544 (2015) Poria, S., Cambria, E., Gelbukh, A.: Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: EMNLP 2015, pp. 2539–2544 (2015)
20.
go back to reference Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 439–448 (2016) Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 439–448 (2016)
Metadata
Title
Relevance of Named Entities in Authorship Attribution
Authors
Germán Ríos-Toledo
Grigori Sidorov
Noé Alejandro Castro-Sánchez
Alondra Nava-Zea
Liliana Chanona-Hernández
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-62434-1_1

Premium Partner