Skip to main content

2018 | OriginalPaper | Buchkapitel

2. Morphological Processing for Turkish

verfasst von : Kemal Oflazer

Erschienen in: Turkish Natural Language Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter presents an overview of Turkish morphology followed by the architecture of a state-of-the-art wide coverage morphological analyzer for Turkish implemented using the Xerox Finite State Tools. It covers the morphophonological and morphographemic phenomena in Turkish such as vowel harmony, the morphotactics of words, and issues that one encounters when processing real text with myriads of phenomena: numbers, foreign words with Turkish inflections, unknown words, and multi-word constructs. The chapter presents ample illustrations of phenomena and provides many examples for sometimes ambiguous morphological interpretations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Literally, “(the thing existing) at the time we caused (something) to become strong.” Obviously this is not a word that one would use everyday. Turkish words (excluding non-inflecting high-frequency words such as conjunctions, clitics, etc.) found in typical running text average about 10 letters in length. The average number of bound morphemes in such words is about 2.
 
2
For phonological representations we employ the SAMPA representation. The Speech Assessment Methods Phonetic Alphabet (SAMPA) is a computer-readable phonetic script using 7-bit printable ASCII characters, based on the International Phonetic Alphabet (IPA) (see http://​en.​wikipedia.​org/​wiki/​Speech_​Assessment_​Methods_​Phonetic_​Alphabet (Accessed Sept. 14, 2017) and www.​phon.​ucl.​ac.​uk/​home/​sampa/​ (Accessed Sept. 14, 2017)). The Turkish SAMPA encoding convention can be found at www.​phon.​ucl.​ac.​uk/​home/​sampa/​turkish.​htm (Accessed Sept. 14, 2017).
 
3
In this chapter, we use - to denote syllable boundaries and + to denote morpheme boundaries wherever appropriate.
 
4
For example, Xerox Finite State Tools, available at http://​www.​fsmbook.​com (Accessed Sept. 14, 2017), FOMA, available at http://​fomafst.​github.​io/​ (Accessed Sept. 14, 2017), HFST available at http://​hfst.​sf.​net (Accessed Sept. 14, 2017) or OpenFST available at http://​www.​openfst.​org (Accessed Sept. 14, 2017).
 
5
Note that we also explicitly show the morpheme boundary symbol, as in implementation, it serves as an explicit context marker to constrain where changes occur.
 
6
There are also very special forms denoting families of relatives, where the number and possessive morphemes will swap positions to mean something slightly different: e.g., teyze+ler+im “my aunts” vs. teyze+m+ler “the family of my aunt.”
 
7
An example below when we discuss derivation will show a full deconstruction of a complex verb to highlight these features.
 
8
Obviously the first two are applicable to a smaller set of (usually) transitive verbs.
 
9
We present the surface morpheme segmentations highlighting the relevant derivational morpheme with italics.
 
10
So the next time you are up on a cliff looking down and momentarily lose your balance and then recover, you can describe the experience with the single verb düşeyazdım.
 
11
Where meaningful we also give the segmentation of the words form into surface morphemes in italics.
 
12
Users of such words have the bizarre presumption that readers know how to pronounce those words in English!
 
13
In every group we first list the morphological features of all the tokens, one on every line and then provide the morphological features of the multiword construct followed by a gloss and a literal meaning.
 
14
Here we just show the roots of the verb with - denoting the rest of the suffixes for any inflectional and derivational markers.
 
15
The question and the emphasis clitics which are written as separate tokens can occasionally intervene between the components of a semi-lexicalized collocation. We omit the details of these.
 
Literatur
Zurück zum Zitat Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford University, Stanford, CA Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford University, Stanford, CA
Zurück zum Zitat Clements GN, Sezer E (1982) Vowel and consonant disharmony in Turkish. In: van der Hulst H, Smith N (eds) The structure of phonological representations. Foris, Dordrecht, pp 213–255 Clements GN, Sezer E (1982) Vowel and consonant disharmony in Turkish. In: van der Hulst H, Smith N (eds) The structure of phonological representations. Foris, Dordrecht, pp 213–255
Zurück zum Zitat Karttunen L (1993) Finite-state lexicon compiler. Technical report, Xerox PARC, Palo Alto, CA Karttunen L (1993) Finite-state lexicon compiler. Technical report, Xerox PARC, Palo Alto, CA
Zurück zum Zitat Karttunen L, Beesley KR (1992) Two-level rule compiler. Technical report, Xerox PARC, Palo Alto, CA Karttunen L, Beesley KR (1992) Two-level rule compiler. Technical report, Xerox PARC, Palo Alto, CA
Zurück zum Zitat Karttunen L, Chanod JP, Grefenstette G, Schiller A (1996) Regular expressions for language engineering. Nat Lang Eng 2(4):305–328CrossRef Karttunen L, Chanod JP, Grefenstette G, Schiller A (1996) Regular expressions for language engineering. Nat Lang Eng 2(4):305–328CrossRef
Zurück zum Zitat Kornfilt J (1997) Turkish. Routledge, London Kornfilt J (1997) Turkish. Routledge, London
Zurück zum Zitat Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, Helsinki Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, Helsinki
Zurück zum Zitat Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148CrossRef Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148CrossRef
Zurück zum Zitat Oflazer K (2003) Lenient morphological analysis. Nat Lang Eng 9:87–99CrossRef Oflazer K (2003) Lenient morphological analysis. Nat Lang Eng 9:87–99CrossRef
Zurück zum Zitat Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106CrossRef Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106CrossRef
Zurück zum Zitat Oflazer K, Çetinoğlu Ö, Say B (2004) Integrating morphology with multi-word expression processing in Turkish. In: Proceedings of the ACL workshop on multiword expressions: integrating processing, Barcelona, pp 64–71 Oflazer K, Çetinoğlu Ö, Say B (2004) Integrating morphology with multi-word expression processing in Turkish. In: Proceedings of the ACL workshop on multiword expressions: integrating processing, Barcelona, pp 64–71
Zurück zum Zitat Sproat RW (1992) Morphology and computation. MIT Press, Cambridge, MA Sproat RW (1992) Morphology and computation. MIT Press, Cambridge, MA
Zurück zum Zitat van der Hulst H, van de Weijer J (1991) Topics in Turkish phonology. In: Boeschoten H, Verhoeven L (eds) Turkish linguistics today. Brill, Leiden van der Hulst H, van de Weijer J (1991) Topics in Turkish phonology. In: Boeschoten H, Verhoeven L (eds) Turkish linguistics today. Brill, Leiden
Metadaten
Titel
Morphological Processing for Turkish
verfasst von
Kemal Oflazer
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-90165-7_2

Neuer Inhalt