Skip to main content

2012 | OriginalPaper | Buchkapitel

Automatic Categorization of Ottoman Literary Texts by Poet and Time Period

verfasst von : Ethem F. Can, Fazli Can, Pinar Duygulu, Mehmet Kalpakli

Erschienen in: Computer and Information Sciences II

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Millions of manuscripts and printed texts are available in the Ottoman language. The automatic categorization of Ottoman texts would make these documents much more accessible in various applications ranging from historical investigations to literary analyses. In this work, we use transcribed version of Ottoman literary texts in the Latin alphabet and show that it is possible to develop effective Automatic Text Categorization techniques that can be applied to the Ottoman language. For this purpose, we use two fundamentally different machine learning methods: Naïve Bayes and Support Vector Machines, and employ four style markers: most frequent words, token lengths, two-word collocations, and type lengths. In the experiments, we use the collected works (divans) of ten different poets: two poets from five different hundred-year periods ranging from the 15th to 19th century. The experimental results show that it is possible to obtain highly accurate classifications in terms of poet and time period. By using statistical analysis we are able to recommend which style marker and machine learning method are to be used in future studies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sebastiani, F.: Machine learning in automatic text categorization. ACM Comput. Surv. 34(1), 1–47 (October 2002)CrossRef Sebastiani, F.: Machine learning in automatic text categorization. ACM Comput. Surv. 34(1), 1–47 (October 2002)CrossRef
4.
Zurück zum Zitat Holmes, D.I.: Authorship attribution. Comput. Human. 28(2), 87–106 (October 1994)CrossRef Holmes, D.I.: Authorship attribution. Comput. Human. 28(2), 87–106 (October 1994)CrossRef
5.
Zurück zum Zitat Merriam, T.: An experiment with the federalist papers. Comput. Human. 23(3), 251–254 (1989)CrossRef Merriam, T.: An experiment with the federalist papers. Comput. Human. 23(3), 251–254 (1989)CrossRef
6.
Zurück zum Zitat McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization (1998) McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization (1998)
7.
Zurück zum Zitat Joachims, T.: A statistical learning model of text classification for support vector machines. In: Proceedings of the 24th ACM SIGIR conference, 128–136 (2001) Joachims, T.: A statistical learning model of text classification for support vector machines. In: Proceedings of the 24th ACM SIGIR conference, 128–136 (2001)
8.
Zurück zum Zitat Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, F.: Chat mining: Predicting user and message attributes in computer-mediated communication. Inf. Process. Manag. 44(4), 1448–1466 (2008)CrossRef Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, F.: Chat mining: Predicting user and message attributes in computer-mediated communication. Inf. Process. Manag. 44(4), 1448–1466 (2008)CrossRef
9.
Zurück zum Zitat Yu, B.: An evaluation of text classification methods for literary study. Lit. Ling. Comp. 23(3), 327–343 (2008)CrossRef Yu, B.: An evaluation of text classification methods for literary study. Lit. Ling. Comp. 23(3), 327–343 (2008)CrossRef
10.
Zurück zum Zitat Can, F., Patton, J.M.: Change of writing style with time. Comput. Human. 38(1), 61–82 (2004)CrossRef Can, F., Patton, J.M.: Change of writing style with time. Comput. Human. 38(1), 61–82 (2004)CrossRef
11.
Zurück zum Zitat Patton, J.M., Can, F.: A stylometric analysis of Ya?ar Kemal’s ?nce Memed tetralogy. Comput. Human. 38(4), 457–467 (2004)CrossRef Patton, J.M., Can, F.: A stylometric analysis of Ya?ar Kemal’s ?nce Memed tetralogy. Comput. Human. 38(4), 457–467 (2004)CrossRef
12.
Zurück zum Zitat Andrews, W.G., Black, N., Kalpakli, M.: Ottoman lyric poetry. University of Texas Press, Austin, Texas, USA (1997) Andrews, W.G., Black, N., Kalpakli, M.: Ottoman lyric poetry. University of Texas Press, Austin, Texas, USA (1997)
13.
Zurück zum Zitat Forsyth, R.S., Holmes, D.I.: Feature-finding for text classification. Lit. Ling. Comput. 11(4), 162–174 (June 1996) Forsyth, R.S., Holmes, D.I.: Feature-finding for text classification. Lit. Ling. Comput. 11(4), 162–174 (June 1996)
14.
Zurück zum Zitat Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification (2nd edn.). Wiley-Interscience, New York (2000) Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification (2nd edn.). Wiley-Interscience, New York (2000)
15.
16.
Zurück zum Zitat Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. Lect. Notes Comput. Sci. 3689, 174–189 (November 2005)CrossRef Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. Lect. Notes Comput. Sci. 3689, 174–189 (November 2005)CrossRef
17.
Zurück zum Zitat Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: ECML-98, 137–142 (1998) Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: ECML-98, 137–142 (1998)
18.
Zurück zum Zitat Scheffe, H.: A method for judging all contrasts in the analysis of variance. Biometrica 40, 87–104 (1953)MATHMathSciNet Scheffe, H.: A method for judging all contrasts in the analysis of variance. Biometrica 40, 87–104 (1953)MATHMathSciNet
Metadaten
Titel
Automatic Categorization of Ottoman Literary Texts by Poet and Time Period
verfasst von
Ethem F. Can
Fazli Can
Pinar Duygulu
Mehmet Kalpakli
Copyright-Jahr
2012
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-2155-8_6

Neuer Inhalt