Skip to main content
Top

2018 | OriginalPaper | Chapter

13. The Turkish Treebank

Authors : Gülşen Eryiğit, Kemal Oflazer, Umut Sulubacak

Published in: Turkish Natural Language Processing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the last three decades, treebanks have become a crucial resource for building and evaluating natural language processing tools and applications. In this chapter, we review the essential aspects of the first treebank for Turkish that was built in early 2000s and its evolution and extensions since then.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
See Chap. 2 for conventions for morphophonological symbols.
 
2
Refer to Chap. 2 for a list of morphological features.
 
3
The Universal Dependencies project [universaldepende​ncies.​org (Accessed Sept. 14, 2017)] is an international collaborative project to make cross-linguistically consistent treebanks available for a wide variety of languages.
 
4
Which incidentally do not follow Turkish noun phrase rules so have to be treated specially anyway.
 
5
ETOL encodes light verb constructions involving the Turkish verbs et- (do) and ol- (be).
 
6
Words in this context may also be a lexicalized or non-lexicalized collocations.
 
7
In the initial version of the treebank, this field was left empty, with the expectation that it would be provided in future versions.
 
8
See Chap. 2.
 
9
All treebanks described in this section are available at ITU Turkish Natural Language Processing Pipeline: http://​tools.​nlp.​itu.​edu.​tr (Accessed Sept. 14, 2017).
 
10
Available at universaldepende​ncies.​org (Accessed Sept. 14. 2017).
 
Literature
go back to reference Adalı K, Dinç T, Gökırmak M, Eryiğit G (2016) Comprehensive annotation of multiword expressions for Turkish. In: Proceedings of TurCLing 2016, the first international conference on Turkic computational linguistics, Konya, pp 60–66 Adalı K, Dinç T, Gökırmak M, Eryiğit G (2016) Comprehensive annotation of multiword expressions for Turkish. In: Proceedings of TurCLing 2016, the first international conference on Turkic computational linguistics, Konya, pp 60–66
go back to reference Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL, New York, NY, pp 149–164 Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of CONLL, New York, NY, pp 149–164
go back to reference Çöltekin Ç (2016) (When) do we need inflectional groups? In: Proceedings of TurCLing 2016, the first international conference on Turkic computational linguistics, pp 38–43 Çöltekin Ç (2016) (When) do we need inflectional groups? In: Proceedings of TurCLing 2016, the first international conference on Turkic computational linguistics, pp 38–43
go back to reference de Marneffe MC, MacCartney B, Manning C (2006) Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, Genoa, pp 449–454 de Marneffe MC, MacCartney B, Manning C (2006) Generating typed dependency parses from phrase structure parses. In: Proceedings of LREC, Genoa, pp 449–454
go back to reference de Marneffe MC, Dozat T, Silveira N, Haverinen K, Ginter F, Nivre J, Manning CD (2014) Universal Stanford Dependencies: a cross-linguistic typology. In: Proceedings of LREC, Reykjavík, pp 4585–4592 de Marneffe MC, Dozat T, Silveira N, Haverinen K, Ginter F, Nivre J, Manning CD (2014) Universal Stanford Dependencies: a cross-linguistic typology. In: Proceedings of LREC, Reykjavík, pp 4585–4592
go back to reference Erguvanlı EE (1979) The function of word order in Turkish grammar. PhD thesis, UCLA, Los Angeles, CA Erguvanlı EE (1979) The function of word order in Turkish grammar. PhD thesis, UCLA, Los Angeles, CA
go back to reference Eryiğit G (2007) ITU treebank annotation tool. In: Proceedings of the linguistic annotation workshop, Prague, pp 117–120 Eryiğit G (2007) ITU treebank annotation tool. In: Proceedings of the linguistic annotation workshop, Prague, pp 117–120
go back to reference Eryiğit G, Pamay T (2014) ITU validation set. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 7(1):103–106 Eryiğit G, Pamay T (2014) ITU validation set. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 7(1):103–106
go back to reference Eryiğit G, İlbay T, Can OA (2011) Multiword expressions in statistical dependency parsing. In: Proceedings of the workshop on statistical parsing of morphologically rich languages, Dublin, pp 45–55 Eryiğit G, İlbay T, Can OA (2011) Multiword expressions in statistical dependency parsing. In: Proceedings of the workshop on statistical parsing of morphologically rich languages, Dublin, pp 45–55
go back to reference Eryiğit G, Adalı K, Torunoğlu-Selamet D, Sulubacak U, Pamay T (2015) Annotation and extraction of multiword expressions in Turkish treebanks. In: Proceedings of the workshop on multiword expressions, Denver, CO, pp 70–76 Eryiğit G, Adalı K, Torunoğlu-Selamet D, Sulubacak U, Pamay T (2015) Annotation and extraction of multiword expressions in Turkish treebanks. In: Proceedings of the workshop on multiword expressions, Denver, CO, pp 70–76
go back to reference Hajič J (1998) Building a syntactically annotated corpus: the Prague Dependency Treebank. In: Hajicova E (ed) Issues in valency and meaning: studies in honour of Jarmila Panenova. Charles University Press, Prague Hajič J (1998) Building a syntactically annotated corpus: the Prague Dependency Treebank. In: Hajicova E (ed) Issues in valency and meaning: studies in honour of Jarmila Panenova. Charles University Press, Prague
go back to reference Hakkani-Tür DZ, Oflazer K, Tür G (2002) Statistical morphological disambiguation for agglutinative languages. Comput Hum 36(4):381–410 Hakkani-Tür DZ, Oflazer K, Tür G (2002) Statistical morphological disambiguation for agglutinative languages. Comput Hum 36(4):381–410
go back to reference Lepage Y, Shin-Ichi A, Susumu A, Hitoshi I (1998) An annotated corpus in Japanese using Tesniere’s structural syntax. In: Proceedings of the workshop on the processing of dependency-based grammars, Montreal, pp 109–115 Lepage Y, Shin-Ichi A, Susumu A, Hitoshi I (1998) An annotated corpus in Japanese using Tesniere’s structural syntax. In: Proceedings of the workshop on the processing of dependency-based grammars, Montreal, pp 109–115
go back to reference Lin D (1998) A dependency-based method for evaluating broad-coverage parsers. Nat Lang Eng 4(02):97–114 Lin D (1998) A dependency-based method for evaluating broad-coverage parsers. Nat Lang Eng 4(02):97–114
go back to reference Marcus M, Marcinkiewicz M, Santorini B (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330 Marcus M, Marcinkiewicz M, Santorini B (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19(2):313–330
go back to reference Nilsson J, Riedel S, Yuret D (2007) The CoNLL 2007 shared task on dependency parsing. In: Proceedings of CoNLL, Prague, pp 915–932 Nilsson J, Riedel S, Yuret D (2007) The CoNLL 2007 shared task on dependency parsing. In: Proceedings of CoNLL, Prague, pp 915–932
go back to reference Oflazer K, Say B, Hakkani-Tür DZ, Tür G (2003) Building a Turkish Treebank. In: Treebanks: building and using parsed corpora. Kluwer Academic, Berlin Oflazer K, Say B, Hakkani-Tür DZ, Tür G (2003) Building a Turkish Treebank. In: Treebanks: building and using parsed corpora. Kluwer Academic, Berlin
go back to reference Pamay T, Sulubacak U, Torunoğlu-Selamet D, Eryiğit G (2015) The annotation process of the ITU Web Treebank. In: Proceedings of the linguistic annotation workshop, Denver, CO, pp 95–101 Pamay T, Sulubacak U, Torunoğlu-Selamet D, Eryiğit G (2015) The annotation process of the ITU Web Treebank. In: Proceedings of the linguistic annotation workshop, Denver, CO, pp 95–101
go back to reference Petrov S, McDonald R (2012) Overview of the 2012 shared task on parsing the web. In: Notes of the first workshop on syntactic analysis of non-canonical language Petrov S, McDonald R (2012) Overview of the 2012 shared task on parsing the web. In: Notes of the first workshop on syntactic analysis of non-canonical language
go back to reference Petrov S, Das D, McDonald R (2012) A universal part-of-speech tagset. In: Proceedings of LREC, Istanbul, pp 2089–2096 Petrov S, Das D, McDonald R (2012) A universal part-of-speech tagset. In: Proceedings of LREC, Istanbul, pp 2089–2096
go back to reference Seddah D, Sagot B, Candito M, Mouilleron V, Combet V (2012) The French Social Media Bank: a treebank of noisy user generated content. In: Proceedings of COLING, Mumbai, pp 2441–2457 Seddah D, Sagot B, Candito M, Mouilleron V, Combet V (2012) The French Social Media Bank: a treebank of noisy user generated content. In: Proceedings of COLING, Mumbai, pp 2441–2457
go back to reference Skut W, Krenn B, Brants T, Uszkoreit H (1997) An annotation scheme for free word order languages. In: Proceedings of the conference on applied natural language processing, Washington, DC, pp 88–95 Skut W, Krenn B, Brants T, Uszkoreit H (1997) An annotation scheme for free word order languages. In: Proceedings of the conference on applied natural language processing, Washington, DC, pp 88–95
go back to reference Sulubacak U, Eryiğit G (2013) Representation of morphosyntactic units and coordination structures in the Turkish dependency treebank. In: Proceedings of the workshop on statistical parsing of morphologically rich languages, Seattle, WA, pp 129–134 Sulubacak U, Eryiğit G (2013) Representation of morphosyntactic units and coordination structures in the Turkish dependency treebank. In: Proceedings of the workshop on statistical parsing of morphologically rich languages, Seattle, WA, pp 129–134
go back to reference Sulubacak U, Gökırmak M, Tyers F, Çöltekin Ç, Nivre J, Eryiğit G (2016a) Universal dependencies for Turkish. In: Proceedings of COLING, Osaka, pp 3444–3454 Sulubacak U, Gökırmak M, Tyers F, Çöltekin Ç, Nivre J, Eryiğit G (2016a) Universal dependencies for Turkish. In: Proceedings of COLING, Osaka, pp 3444–3454
go back to reference Sulubacak U, Pamay T, Eryiğit G (2016b) IMST: a revisited Turkish dependency treebank. In: Proceedings of TurCLing 2016, the first international conference on Turkic computational linguistics, Konya, pp 1–6 Sulubacak U, Pamay T, Eryiğit G (2016b) IMST: a revisited Turkish dependency treebank. In: Proceedings of TurCLing 2016, the first international conference on Turkic computational linguistics, Konya, pp 1–6
go back to reference Tsarfaty R (2013) A unified morpho-syntactic scheme of Stanford Dependencies. In: Proceedings of ACL, Sofia, pp 578–584 Tsarfaty R (2013) A unified morpho-syntactic scheme of Stanford Dependencies. In: Proceedings of ACL, Sofia, pp 578–584
go back to reference Zeman D (2008) Reusable tagset conversion using tagset drivers. In: Proceedings of LREC, Marrakesh, pp 213–218 Zeman D (2008) Reusable tagset conversion using tagset drivers. In: Proceedings of LREC, Marrakesh, pp 213–218
Metadata
Title
The Turkish Treebank
Authors
Gülşen Eryiğit
Kemal Oflazer
Umut Sulubacak
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-90165-7_13