Skip to main content
Erschienen in:
Buchtitelbild

2018 | OriginalPaper | Buchkapitel

1. Turkish and Its Challenges for Language and Speech Processing

verfasst von : Kemal Oflazer, Murat Saraçlar

Erschienen in: Turkish Natural Language Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a short survey and exposition of some of the important aspects of Turkish that have proved to be interesting and challenging for natural language and speech processing. Most of the challenges stem from the complex morphology of Turkish and how morphology interacts with syntax. Finally we provide a short overview of the major tools and resources developed for Turkish over the last two decades. (Parts of this chapter were previously published as Oflazer (Lang Resour Eval 48(4):639–653, 2014).)

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
These numbers were counted by using the xfst, the Xerox finite state tool (Beesley and Karttunen 2003), by filtering through composition by restricting output by the respective root words and with the number of symbols marking a derivational morpheme, and then counting the number of possible words.
 
2
See Wickwire (1987) for an interesting take on this.
 
3
It turns out that there are a couple of suffixes that can at least theoretically be used iteratively. The causative morpheme is one such morpheme, but in practice up to three could be used and even then it is hard to track who is doing what to whom.
 
4
One constraint usually mentioned is that indefinite (and nominative marked) direct objects move with the verb, but there are valid violations of that observed in speech (Sarah Kennelly, personal communication).
 
5
Although we have written out the root word explicitly here, whenever convenient we will assume that the root word is part of the first inflectional group.
 
6
uzak is far/distant; the morphological features other than the obvious part-of-speech features are: +Become: become verb, +Caus: causative verb, +Pass: passive verb, +Pos: Positive Polarity, +FutPart: Derived future participle, +Pnon: no possessive agreement.
 
7
Here we show surface dependency relations, but going from the dependent to the head.
 
8
The pre-trained MaltParser model and configuration files for Turkish can be downloaded from https://​web.​itu.​edu.​tr/​gulsenc/​TurkishDepModel.​html (Accessed Sept. 14, 2017).
 
9
See also ParGram/ParSem. An international collaboration on LFG-based grammar and semantics development: https://​pargram.​b.​uib.​no (Accessed Sept. 14, 2017).
 
Literatur
Zurück zum Zitat Aksan Y, Aksan M, Koltuksuz A, Sezer T, Mersinli Ü, Demirhan UU, Yılmazer H, Kurtoğlu Ö, Öz S, Yıldız İ (2012) Construction of the Turkish National Corpus (TNC). In: Proceedings of LREC, Istanbul, pp 3223–3227 Aksan Y, Aksan M, Koltuksuz A, Sezer T, Mersinli Ü, Demirhan UU, Yılmazer H, Kurtoğlu Ö, Öz S, Yıldız İ (2012) Construction of the Turkish National Corpus (TNC). In: Proceedings of LREC, Istanbul, pp 3223–3227
Zurück zum Zitat Arısoy E (2009) Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition. PhD thesis, Boğaziçi University, Istanbul Arısoy E (2009) Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition. PhD thesis, Boğaziçi University, Istanbul
Zurück zum Zitat Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883 Arısoy E, Can D, Parlak S, Sak H, Saraçlar M (2009) Turkish broadcast news transcription and retrieval. IEEE Trans Audio Speech Lang Process 17(5):874–883
Zurück zum Zitat Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford University, Stanford Beesley KR, Karttunen L (2003) Finite state morphology. CSLI Publications, Stanford University, Stanford
Zurück zum Zitat Bilgin O, Çetinoğlu Ö, Oflazer K (2004) Building a Wordnet for Turkish. Rom J Inf Sci Technol 7(1–2):163–172 Bilgin O, Çetinoğlu Ö, Oflazer K (2004) Building a Wordnet for Turkish. Rom J Inf Sci Technol 7(1–2):163–172
Zurück zum Zitat Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL, New York, NY, pp 149–164 Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL, New York, NY, pp 149–164
Zurück zum Zitat Butt M, Dyvik H, King TH, Masuichi H, Rohrer C (2002) The parallel grammar project. In: Proceedings of the workshop on grammar engineering and evaluation, Taipei, pp 1–7 Butt M, Dyvik H, King TH, Masuichi H, Rohrer C (2002) The parallel grammar project. In: Proceedings of the workshop on grammar engineering and evaluation, Taipei, pp 1–7
Zurück zum Zitat Can F, Koçberber S, Balçık E, Kaynak C, Öcalan HC, Vursavaş OM (2008) Information retrieval on Turkish texts. J Am Soc Inf Sci Technol 59(3):407–421 Can F, Koçberber S, Balçık E, Kaynak C, Öcalan HC, Vursavaş OM (2008) Information retrieval on Turkish texts. J Am Soc Inf Sci Technol 59(3):407–421
Zurück zum Zitat Çetinoğlu Ö (2009) A large scale LFG grammar for Turkish. PhD thesis, Sabancı University, Istanbul Çetinoğlu Ö (2009) A large scale LFG grammar for Turkish. PhD thesis, Sabancı University, Istanbul
Zurück zum Zitat Chelba C, Hazen TJ, Saraçlar M (2008) Retrieval and browsing of spoken content. IEEE Signal Process Mag 25(3):39–49 Chelba C, Hazen TJ, Saraçlar M (2008) Retrieval and browsing of spoken content. IEEE Signal Process Mag 25(3):39–49
Zurück zum Zitat Durgar-El Kahlout İ (2009) A prototype English-Turkish statistical machine translation system. PhD thesis, Sabancı University, Istanbul Durgar-El Kahlout İ (2009) A prototype English-Turkish statistical machine translation system. PhD thesis, Sabancı University, Istanbul
Zurück zum Zitat Durgar-El Kahlout İ, Oflazer K (2010) Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation. IEEE Trans Audio Speech Lang Process 18(6):1313–1322 Durgar-El Kahlout İ, Oflazer K (2010) Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation. IEEE Trans Audio Speech Lang Process 18(6):1313–1322
Zurück zum Zitat Eryiğit G, Oflazer K (2006) Statistical dependency parsing of Turkish. In: Proceedings of EACL, Trento, pp 89–96 Eryiğit G, Oflazer K (2006) Statistical dependency parsing of Turkish. In: Proceedings of EACL, Trento, pp 89–96
Zurück zum Zitat Eryiğit G, Nivre J, Oflazer K (2008) Dependency parsing of Turkish. Comput Linguist 34(3):357–389 Eryiğit G, Nivre J, Oflazer K (2008) Dependency parsing of Turkish. Comput Linguist 34(3):357–389
Zurück zum Zitat Göksel A, Kerslake C (2005) Turkish: a comprehensive grammar. Routledge, London Göksel A, Kerslake C (2005) Turkish: a comprehensive grammar. Routledge, London
Zurück zum Zitat Hakkani-Tür DZ, Oflazer K, Tür G (2002) Statistical morphological disambiguation for agglutinative languages. Comput Hum 36(4):381–410 Hakkani-Tür DZ, Oflazer K, Tür G (2002) Statistical morphological disambiguation for agglutinative languages. Comput Hum 36(4):381–410
Zurück zum Zitat Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Prague, pp 177–180 Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Prague, pp 177–180
Zurück zum Zitat Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, Helsinki Koskenniemi K (1983) Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki, Helsinki
Zurück zum Zitat Külekçi MO (2006) Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish. PhD thesis, Sabancı University, Istanbul Külekçi MO (2006) Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish. PhD thesis, Sabancı University, Istanbul
Zurück zum Zitat Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E (2007) MaltParser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135 Nivre J, Hall J, Nilsson J, Chanev A, Eryiğit G, Kübler S, Marinov S, Marsi E (2007) MaltParser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135
Zurück zum Zitat Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148 Oflazer K (1994) Two-level description of Turkish morphology. Lit Linguist Comput 9(2):137–148
Zurück zum Zitat Oflazer K (1996) Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Comput Linguist 22(1):73–99 Oflazer K (1996) Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Comput Linguist 22(1):73–99
Zurück zum Zitat Oflazer K (2008) Statistical machine translation into a morphologically complex language. In: Proceedings of CICLING, Haifa, pp 376–387 Oflazer K (2008) Statistical machine translation into a morphologically complex language. In: Proceedings of CICLING, Haifa, pp 376–387
Zurück zum Zitat Oflazer K (2014) Turkish and its challenges for language processing. Lang Resour Eval 48(4):639–653 Oflazer K (2014) Turkish and its challenges for language processing. Lang Resour Eval 48(4):639–653
Zurück zum Zitat Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106 Oflazer K, Inkelas S (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Comput Speech Lang 20:80–106
Zurück zum Zitat Oflazer K, Kuruöz İ (1994) Tagging and morphological disambiguation of Turkish text. In: Proceedings of ANLP, Stuttgart, pp 144–149 Oflazer K, Kuruöz İ (1994) Tagging and morphological disambiguation of Turkish text. In: Proceedings of ANLP, Stuttgart, pp 144–149
Zurück zum Zitat Oflazer K, Tür G (1996) Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In: Proceedings of EMNLP-VLC, Philadelphia, PA Oflazer K, Tür G (1996) Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In: Proceedings of EMNLP-VLC, Philadelphia, PA
Zurück zum Zitat Oflazer K, Say B, Hakkani-Tür DZ, Tür G (2003) Building a Turkish Treebank. In: Treebanks: building and using parsed corpora. Kluwer Academic Publishers, Berlin Oflazer K, Say B, Hakkani-Tür DZ, Tür G (2003) Building a Turkish Treebank. In: Treebanks: building and using parsed corpora. Kluwer Academic Publishers, Berlin
Zurück zum Zitat Parlak S, Saraçlar M (2012) Performance analysis and improvement of Turkish broadcast news retrieval. IEEE Trans Audio Speech Lang Process 20(3):731–741 Parlak S, Saraçlar M (2012) Performance analysis and improvement of Turkish broadcast news retrieval. IEEE Trans Audio Speech Lang Process 20(3):731–741
Zurück zum Zitat Sak H (2011) Integrating morphology into automatic speech recognition: morpholexical and discriminative language models for Turkish. PhD thesis, Boğaziçi University, Istanbul Sak H (2011) Integrating morphology into automatic speech recognition: morpholexical and discriminative language models for Turkish. PhD thesis, Boğaziçi University, Istanbul
Zurück zum Zitat Sak H, Güngör T, Saraçlar M (2007) Morphological disambiguation of Turkish text with perceptron algorithm. In: Proceedings of CICLING, Mexico City, pp 107–118 Sak H, Güngör T, Saraçlar M (2007) Morphological disambiguation of Turkish text with perceptron algorithm. In: Proceedings of CICLING, Mexico City, pp 107–118
Zurück zum Zitat Sak H, Güngör T, Saraçlar M (2011) Resources for Turkish morphological processing. Lang Resour Eval 45(2):249–261 Sak H, Güngör T, Saraçlar M (2011) Resources for Turkish morphological processing. Lang Resour Eval 45(2):249–261
Zurück zum Zitat Saraçlar M (2012) Turkish broadcast news speech and transcripts (LDC2012S06). Resource available from Linguistic Data Consortium Saraçlar M (2012) Turkish broadcast news speech and transcripts (LDC2012S06). Resource available from Linguistic Data Consortium
Zurück zum Zitat Stamou S, Oflazer K, Pala K, Christoudoulakis D, Cristea D, Tufis D, Koeva S, Totkov G, Dutoit D, Grigoriadou M (2002) Balkanet: a multilingual semantic network for Balkan languages. In: Proceedings of the first global WordNet conference, Mysore Stamou S, Oflazer K, Pala K, Christoudoulakis D, Cristea D, Tufis D, Koeva S, Totkov G, Dutoit D, Grigoriadou M (2002) Balkanet: a multilingual semantic network for Balkan languages. In: Proceedings of the first global WordNet conference, Mysore
Zurück zum Zitat Wickwire DE (1987) The Sevmek Thesis, a grammatical analysis of the Turkish verb system illustrated by the verb sevmek-to love. Master’s thesis, Pacific Western University, San Diego, CA Wickwire DE (1987) The Sevmek Thesis, a grammatical analysis of the Turkish verb system illustrated by the verb sevmek-to love. Master’s thesis, Pacific Western University, San Diego, CA
Zurück zum Zitat Yeniterzi R, Oflazer K (2010) Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish. In: Proceedings of ACL, Uppsala, pp 454–464 Yeniterzi R, Oflazer K (2010) Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish. In: Proceedings of ACL, Uppsala, pp 454–464
Zurück zum Zitat Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. In: Proceedings of NAACL-HLT, New York, NY, pp 328–334 Yuret D, Türe F (2006) Learning morphological disambiguation rules for Turkish. In: Proceedings of NAACL-HLT, New York, NY, pp 328–334
Zurück zum Zitat Zeyrek D, Turan ÜD, Bozşahin C, Çakıcı R, Sevdik-Çallı A, Demirşahin I, Aktaş B, Yalçınkaya İ, Ögel H (2009) Annotating subordinators in the Turkish Discourse Bank. In: Proceedings of the linguistic annotation workshop, Singapore, pp 44–47 Zeyrek D, Turan ÜD, Bozşahin C, Çakıcı R, Sevdik-Çallı A, Demirşahin I, Aktaş B, Yalçınkaya İ, Ögel H (2009) Annotating subordinators in the Turkish Discourse Bank. In: Proceedings of the linguistic annotation workshop, Singapore, pp 44–47
Metadaten
Titel
Turkish and Its Challenges for Language and Speech Processing
verfasst von
Kemal Oflazer
Murat Saraçlar
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-90165-7_1

Neuer Inhalt