Turkish and its challenges for language processing

Oflazer, Kemal

doi:10.1007/s10579-014-9267-2

Turkish and its challenges for language processing

Original Paper
Published: 04 April 2014

Volume 48, pages 639–653, (2014)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Kemal Oflazer¹

1169 Accesses
22 Citations
Explore all metrics

Abstract

We present a short survey and exposition of some of the important aspects of Turkish that have proven challenging for natural language processing. Most of the challenges stem from the complex morphology of Turkish and how morphology interacts with syntax. We also provide a short overview of the major tools and resources developed for Turkish natural language processing over the last two decades.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resources for Turkish natural language processing: A critical survey

Article Open access 26 August 2022

Turkish and Its Challenges for Language and Speech Processing

The Turkish Treebank

Notes

Source: Wikipedia.
Source: Wikipedia.
Readers interested in Turkish grammar from more of a linguistics perspective may refer to Kerslake and Göksel (2005).
These numbers were counted by using the xfst, the Xerox finite state tool (Beesley and Karttunen 2003), by filtering through composition by restricting output by the respective root words and with the number of symbols marking a derivational morpheme, and then counting the number of possible words.
It turns out that there are a couple of suffixes that can be used iteratively. The causative morpheme is one such morpheme, but in practice up to three could be used and even then it is hard to track who is doing what to whom.
One constraint usually mentioned is that indefinite (and nominative marked) direct objects move with the verb, but there are valid violations of that observed in speech (Sarah Kennelly, personal communication).
uzak is far/distant; the morphological features other than the obvious part-of-speech features are: +Become: become verb, +Caus: causative verb, +Pass: passive verb, +Pos: Positive Polarity, +FutPart: Derived future participle, +Pnon: no possessive agreement.
Here we show surface dependency relations, but going from the dependent to the head.
The pre-trained MaltParser model and configuration files for Turkish can be downloaded from http://web.itu.edu.tr/gulsenc/TurkishDepModel.html.
See also http://pargram.b.uib.no/.
Available at http://ii.metu.edu.tr/corpus.

References

Aksan, Y., Aksan, M., Koltuksuz, A., Sezer, T., Mersinli, Ü., Demirhan, U. U., et al. (2012). Construction of the Turkish National Corpus (TNC). In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, J. Odijk, & S. Piperidis (Eds.), Proceedings of the eight international conference on language resources and evaluation (LREC’12). Turkey: European Language Resources Association (ELRA), Istanbul.
Arısoy, E. (2009). Statistical and discriminative language modeling for Turkish large vocabulary continuous speech recognition. Ph.D. thesis, Boğaziçi University.
Beesley, K. R., & Karttunen, L. (2003). Finite state morphology. CSLI Publications, Stanford University
Bilgin, O., Çetinoğlu, Ö., & Oflazer, K. (2004). Building a wordnet for Turkish. Romanian Journal of Information Science and Technology, 7(1–2), 163–172.
Google Scholar
Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CoNLL (pp. 149–164).
Butt, M., Dyvik, H., King, T.H., Masuichi, H., & Rohrer, C. (2002). The parallel grammar project. In: Proceedings of the COLING-2002 workshop on grammar engineering and evaluation (pp. 1–7).
Çetinoğlu, Ö. (2009). A large scale LFG grammar for Turkish. Ph.D. thesis, Sabancı University.
Çetinoğlu, Ö., & Oflazer, K. (2009). Integrating derivational morphology into syntax. In N. Nicolov, G. Angelova, & R. Mitkov (Eds.), Recent Advances in Natural Language Processing. Amsterdam: John Benjamins.
Durgar-El-Kahlout, İ. (2009). A prototype English-Turkish statistical machine translation system. Ph.D. thesis, Sabancı University.
Durgar-El-Kahlout, İ., & Oflazer, K. (2010). Exploiting morphology and local word reordering in English to Turkish phrase-based statistical machine translation. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1313–1322.
Article Google Scholar
Eryiğit, G., & Oflazer, K. (2006). Statistical dependency parsing of Turkish. In Proceedings of the 11th EACL (pp. 89–96). Trento
Eryiğit, G., Nivre, J., & Oflazer, K. (2008). Dependency parsing of Turkish. Computational Linguistics, 34(3), 357–389.
Article Google Scholar
Hakkani-Tür, D., Oflazer, K., & Tür, G. (2002). Statistical morphological disambiguation for agglutinative languages. Computers and the Humanities 36(4), 381–410.
Google Scholar
Kerslake, C., & Göksel, A. (2005). Turkish: A comprehensive grammar, Comprehensive Grammars (Taylor and Francis). New York: Routledge
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., et al. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions (pp. 177–180). Prague, Czech Republic: Association for Computational Linguistics.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryiğit, G., Kübler, S., et al. (2007). Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering Journal, 13(2), 99–135.
Google Scholar
Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing, 9(2), 137–148.
Article Google Scholar
Oflazer, K. (1996). Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, 22(1), 73–90.
Google Scholar
Oflazer, K. (2008). Statistical machine translation into a morphologically complex language. In Proceedings of the conference on intelligent text processing and computational linguistics (CICLing) (pp. 376–387).
Oflazer, K., Inkelas, S. (2006) The architecture and the implementation of a finite state pronunciation lexicon for Turkish. Computer Speech and Language, 20(1), 80–106.
Google Scholar
Oflazer, K., & Kuruöz, İ. (1994). Tagging and morphological disambiguation of Turkish text. Proceedings of the fourth conference on applied natural language processing (pp. 144–149). Stuttgart, Germany: Association for Computational Linguistics.
Oflazer, K., Say, B., Hakkani-Tür, D. Z., & Tür, G. (2003). Building a Turkish treebank. In A. Abeillé (Ed.), Treebanks: Building and using parsed corpora (pp. 261–277). London: Kluwer.
Chapter Google Scholar
Oflazer, K., & Tür, G. (1996). Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. In E. Brill, & K. Church (Eds.) Proceedings of the ACL-SIGDAT conference on empirical methods in natural language processing.
Sak, H., Güngör, T., & Saraçlar, M. (2007). Morphological disambiguation of Turkish text with perceptron algorithm. In: CICLing 2007, LNCS 4394 (pp. 107–118).
Sak, H., Güngör, T., & Saraçlar, M. (2011). Resources for Turkish morphological processing. Language Resources and Evaluation, 45(2), 249–261.
Article Google Scholar
Stamou, S., Oflazer, K., Pala, K., Christodoulakis, D., Cristea, D., Tufis, D., et al. (2002). Balkanet: A multilingual semantic network for Balkan language. In: Proceedings of the 1st global wordnet conference. Mysore, India.
Wickwire, D. E. (1987). The “sevmek” thesis. a grammatical analysis of the Turkish verb system illustrated by the verb “sevmek”-to love. Master’s thesis, Pacific Western University.
Yeniterzi, R., & Oflazer, K. (2010). Syntax-to-morphology mapping in factored phrase-based statistical machine translation from English to Turkish. Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 454–464). Uppsala, Sweden: Association for Computational Linguistics.
Yuret, D., & Türe, F. (2006). Learning morphological disambiguation rules for Turkish. Proceedings of HLT/NAACL-2006 (pp. 328–334). New York City, USA.
Zeyrek, D., Turan, U.D., Bozşahin, C., Çakıcı, R., Sevdik-Çallı, A.B., Demirşahin, I., et al. (2009). Annotating subordinators in the Turkish discourse bank. In Proceedings of the third linguistic annotation workshop (pp. 44–47). Suntec, Singapore: Association for Computational Linguistics.

Download references

Author information

Authors and Affiliations

Carnegie Mellon University in Qatar, Doha, Qatar
Kemal Oflazer

Authors

Kemal Oflazer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kemal Oflazer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oflazer, K. Turkish and its challenges for language processing. Lang Resources & Evaluation 48, 639–653 (2014). https://doi.org/10.1007/s10579-014-9267-2

Download citation

Received: 14 March 2013
Accepted: 28 February 2014
Published: 04 April 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10579-014-9267-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Turkish and its challenges for language processing

Abstract

Access this article

Similar content being viewed by others

Resources for Turkish natural language processing: A critical survey

Turkish and Its Challenges for Language and Speech Processing

The Turkish Treebank

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Turkish and its challenges for language processing

Abstract

Access this article

Similar content being viewed by others

Resources for Turkish natural language processing: A critical survey

Turkish and Its Challenges for Language and Speech Processing

The Turkish Treebank

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation