2014 | OriginalPaper | Buchkapitel
Discourse Tagging for Indian Languages
verfasst von : Sobha Lalitha Devi, S. Lakshmi, Sindhuja Gopalan
Erschienen in: Computational Linguistics and Intelligent Text Processing
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Indian Language Discourse Project is to develop large corpus annotated with various types of discourse relations which are explicit and implicit. As an initial step towards it we have annotated corpus in three languages, Hindi, Tamil and Malayalam belonging to the two major language families in India- Indo Aryan and Dravidian. In this paper we describe our initial experiments in annotating all the three language corpus and the domains of the corpus belongs to health. The initial experiment brought out various types of discourse connectives in the three languages and how they vary amongst the languages. The preliminary study itself revealed that there is cross linguistic variation among the three languages. We have shown the inter annotator agreement for all the three languages.