Skip to main content
Log in

A dependency annotation scheme for Bangla treebank

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Dependency grammar is considered appropriate for many Indian languages. In this paper, we present a study of the dependency relations in Bangla language. We have categorized these relations in three different levels, namely intrachunk relations, interchunk relations and interclause relations. Each of these levels is further categorized and an annotation scheme has been developed. Both syntactic and semantic features have been taken into consideration for describing the relations. In our scheme, there are 63 such syntactico–semantic relations. We have verified the scheme by tagging a corpus of 4167 Bangla sentences to create a treebank (KGPBenTreebank).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The dependency grammar for Bangla language and the Bangla treebank is created under the project “The Bangla Treebank”. This project is supported by Linguistic Data Consortium for Indian Languages (LDC-IL) built by MHRD, Govt. of India under the aegis of the Central Institute of Indian Languages, Mysore, India. See the link for details. http://www.cel.iitkgp.ernet.in/~oldtools/kgpbentreebank.html.

  2. The list with detailed description of the dependency relations can be seen at http://corpus.quran.com/documentation/syntaxrelation.jsp

  3. The annotation has been done using the Sanchay annotation tool of Singh (2011)

References

  • Begum, R., Husain, S., Dhwaj, A., Misra, D., Bai, L., & Sangal, R. (2008). Dependency annotation scheme for indian languages. In Proceedings of the third international joint conference on natural language processing(IJCNLP). Hyderabad, India.

  • Bharati, A., Chaitanya, V., Sangal, R. (1999) Natural language processesing: A paninian perspective. New Delhi: Prentice-Hall of India.

  • Bharati, A., Sangal, R., Chaitanya, V., Kulkarni, A., Sharma, D. M., & Ramakrishnamacharyulu, K. V. (2002). Anncorra: building tree-banks in indian languages. In Proceedings of the 3rd workshop on Asian language resources and international standardization (Vol. 12, pp. 1–8), COLING ’02.

  • Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D. M., & Xia, F. (2009). A multi-representational and multi-layered treebank for hindi/urdu. In Proceedings of the third Linguistic annotation workshop, ACL-IJCNLP ’09, (pp. 186–189). Association for Computational Linguistics, Stroudsburg, PA, USA. URL http://dl.acm.org/citation.cfm?id=1698381.1698417

  • Black, E., Eubank, S., Kashioka, H., Magerman, D., Garside, R., & Leech, G. (1996). Beyond skeleton parsing: Producing a comprehensive large-scale general-English treebank with full grammatical analysis. In Proceedings of the 17th international conference on computational linguistics (COLING-96), (pp. 107–112).

  • Black, E. W., Garside, R., & Leech, G. N. (Eds.) (1993). Statistically-driven computer grammars of English: The IBM/Lancaster approach. No. 8 in Language and Computers. Amsterdam. http://books.google.de/books?id=Hkzr-LYVz2wC&lpg=PR5&ots=QJhw16OVS4&dq=Statistically-driven%20computer%20grammars%20of%20English&lr&pg=PP1#v=onepage&q&f=false

  • Chakravarty, B. (2010). “uchchatara bangla vyakaran”, a complete text book on higher bengali grammar. Akshay Malancha.

  • Charniak, E., Blaheta, D., Ge, N., Hall, K., Hale, J., & Johnson, M. (2000). Bllip 1987–89 wsj corpus release 1. Linguistic Data Consortium.

  • Chatterji, S., Sarkar, T. M., Sarkar, S., & Chakraborty, J. (2009). Karak relations in bengali. In Proceedings of 31st All-India conference of Linguists (AICL 2009), (pp. 33–36). Hyderabad, India.

  • Chatterji, S. K. (2003). Bhasha-prakash bangala vyakaran [a grammar of the bangla language]. Calcutta: Roopa and Company.

    Google Scholar 

  • Chopde, A. (2000). Itrans “indian language transliteration package”, a package for printing text in indian language scripts. http://www.aczone.com/itrans/.

  • Dandapat, S., Sarkar, S., & Basu, A. (2004). A hybrid model for part-of-speech tagging and its application to bengali. In International conference on computational intelligence, (pp. 169–172).

  • de Marneffe, M., & Manning, C. D. (2008). Stanford typed dependencies manual.

  • Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.

    Article  Google Scholar 

  • Hajič, J., Böhmová, A., Hajičová, E., & Vidová-Hladká, B. (2000). The prague dependency Treebank: A three-level annotation scenario. In A. Abeillé (Ed.), Treebanks: Building and using parsed corpora (pp. 103–127). Amsterdam: Kluwer.

    Google Scholar 

  • Hajič, J., Hajičová, E., & Rosen, A. (1996). Formal representation of language structures. TELRI Newsletter, 3, 12–19.

    Google Scholar 

  • Hajič, J., Vidová-Hladká, B., & Pajas, P. (2001). The prague dependency Treebank: Annotation structure and support. In Proceedings of the IRCS Workshop on Linguistic Databases, (pp. 105–114). Philadelphia, USA: University of Pennsylvania.

  • Karlsson, F., Voutilainen, A., Heikkilä, J., & Anttila, A. (Eds.) (1995). Constraint Grammar: A language-independent system for parsing unrestricted text. Berlin: Mouton de Gruyter.

  • Marcus, M.P., Marcinkiewicz, M.A., & Santorini, B. (1993). Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19, 313–330. http://dl.acm.org/citation.cfm?id=972470.972475

  • McCord, M. C. (1990). Slot grammar: A system for simpler construction of practical natural language grammars. In R. Studer (Ed.), Natural Language and Logic: Proceedings of the international scientific symposium, Hamburg, FRG, (pp. 118–145). Berlin: Springer.

  • Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31, 71–106. doi:10.1162/0891201053630264.

    Article  Google Scholar 

  • Santorini, B., & Marcinkiewicz, M.A. (1991). Bracketing guidelines for the penn treebank project. unpublished manuscript.

  • Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19, 321–325.

    Article  Google Scholar 

  • Sharma, D.M., Sangal, R., Bai, L., Begam, R., Ramakrishnamacharyulu, K. (2007). Anncorra : Treebanks for Indian languages, annotation guidelines (manuscript).

  • Singh, A. K. (2011). Part-of-speech annotation with sanchay. In Proceedings of the National Seminar On POS annotation for Indian Languages: Issues & Perspectives. Mysore, India.

  • Xue, N., Xia, F., Chiou, F.D., Palmer, M. (2005). The penn chinese treebank: Phrase structure annotation of a large corpus. In Natural Language Engineering.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjay Chatterji.

Appendices

Appendix 1: The Relation set of the Bangla Treebank

Intrachunk relations

ppl

Postposition/Anusarga

Rel. with noun/pron.

stc

Spatio-temp. con./Sthan-samay. samp.

Re. with space-time noun

vx

Auxiliary verb/Sahayak kriya

Related with verb

pof

Part of/Kriya antargata bisheshya

redup

Reduplication/Shabda dbaita

Rel. with same rhym.

frag

Fragment/Bhagnamsha

Related with suffix

Karak relations

k1d

Doer subject/Kriya sampadak karta

Related with verb

k1e

Experiencer subject/Anubhab karta

Related with verb

k1p

Passive subject/Paroksha karta

Related with verb

k1s

Noun of proposition/Samanadhikaran

Related with verb

k1g

General subject/Sadharan karta

Related with verb

k2t

Transitive object/sakarmak karma

Related with verb

k2m

Direct object/Mukhya karma

Related with verb

k2g

Indirect object/Gauna karma

Related with verb

k2u

Purposive object/Uddyeshya karma

Related with verb

k2s

Predicative object/Bidheya karma

Related with verb

k3

Instrumental/Karan

Related with verb

k5p

Place rel. ablative/Sthanbachak apadan

Related with verb

k5s

State rel. ablative/Abasthabachak apadan

Related with verb

k5t

Time rel. ablative/Kalbachak apadan

Related with verb

k5d

Dist. rel. ablative/Duratbabachak apadan

Related with verb

k7p

Place rel. locative/Deshadhikaran

Related with verb

k7t

Time rel. locative/Kaladhikaran

Related with verb

k7d

Domain rel. locative/Bishayadhikaran

Related with verb

k7s

State rel. locative/Bhabadhikaran

Related with verb

rh

Reason/Hetu

Related with verb

ru

Purpose/Uddeshya

Related with verb

des

Destination/Gantabyasthal

Related with verb

r6v

Possession/Dakhal

Related with verb

compr

Comparison/Taratamya

Related with any

sim

Similarity/Sadrishya

Related with any

Modifier Relations

r6

Genitive/Sambandha

Related with noun

ras

Associative relation/Saharthak sambandha

Related with noun

rasneg

Non-associative relation/Namarthak sambandha

Related with noun

nnmod

Noun noun modifier/Sanyogmulak bisheshya

Related with noun

jnmod

Adj. noun mod./Bisheshyer bisheshan

Related with noun

dnmod

Dem. noun mod./Nirnay suchak sarbanam

Related with noun

pronmod

Pron. noun mod./Sarbanamjata bisheshan

Related with noun

pnmod

Participial noun mod./Kridanta bisheshan

Related with noun

anmod

App. noun mod./Tulyarupe sthapita bisheshan

Related with noun

adv

Adv. mod./Kriya bisheshan jatiya bisheshan

Related with verb

vmod

Verb-verb modifier/Kriya jatiya bisheshan

Related with verb

neg

Negation modifier/Namarthak abyay

Related with verb

acomp

Adjectival Complement/Bidheya bisheshan

Related with verb

Few other interchunk relations

ccof

Conjunct/Samyojak abyay

Rel. with conjunct

pcc

Preconjunct/Abasthatmak abyay

Related with SC.

rad

Address word/Sambodhan sabda

Related with verb

par

Particle/Bakyalankar abyay

Related with verb

qs

Question mark/Prashnabodhak chihna

Related with verb

end

End/Samapti

Related with verb

sym

Symbol/Chihna

Related with verb

Interclause relation

ref

Referent/Nirdesak

Rel. with noun/pron

clausal*

Clausal star/Bakyamsha samagra

Related with verb

clausalcomp

Clausal complement/Bakyamsha sampurak

Related with verb

comp

Complementizer/Sampurak

Related with verb

  1. rel.-related, pron.-pronoun, rhym.-rhyming word, mod.-modifier, adj.-adjectival, dem.-demonstrative, app.-appositional, adv.-adverbial, nom.-nominal, bish.-bisheshan, comp.-comparison, sim.-similarity, SC.-subordinating conjunction, temp.-Temporal, con.-Connection, samay.-Samaygata, samp.-Samparka

Appendix 2: Itrans to glyphs in Bangla and Hindi scripts mapping

figure a

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chatterji, S., Sarkar, T.M., Dhang, P. et al. A dependency annotation scheme for Bangla treebank. Lang Resources & Evaluation 48, 443–477 (2014). https://doi.org/10.1007/s10579-014-9266-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-014-9266-3

Keywords

Navigation