Skip to main content

2010 | Buch

Sanskrit Computational Linguistics

4th International Symposium, New Delhi, India, December 10-12, 2010. Proceedings

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter
Rule Interaction, Blocking and Derivation in Pāṇini
Abstract
Pāṇini ’s grammar is a class of rules formulated based on generalization abstracted from usage, so that the vast oceans of words could be properly understood. This class of rules will consist of general (utsarga) and their related particulars (viśeṣa ). A general rule, since it is to be formulated with certain generalizations made about its scope of application, must yield to its related particulars which would necessarily require delineation of their own particular scope of application. A particular rule is thus formulated with particular properties relative to generalized properties. A general rule is supposed to pervade its scope of application in its entirety. It is in this sense that it is called vyāpaka (pervader). Since a particular rule is formulated with particular properties relative to the general, the scope of application of a particular must then be extracted from within the scope of its general counterpart. A related particular is called pervaded (vyāpya), since its scope of application is to be carved out from within the general scope of its corresponding utsarga, the pervader (vyāpaka). Rules whose application cannot be captured within the related class of general and particular have been classed as residual (śeṣa). A residual would thus fall outside of the applicational scope of its general and particular counterparts. For, it refers to a proposal which is yet to be made, close to its context (upayuktād anyaḥ śeṣaḥ).
Rama Nath Sharma
On the Generalizability of Pāṇini’s Pratyāhāra-Technique to Other Languages
Abstract
Pāṇini defines the sound classes involved in grammatical rules by pratyāhāras, i.e., a two-letter code based on the order of the sounds in the Śivasūtras. In the present paper we demonstrate that Pāṇini’s pratyāhāra method is generalizable to the description of the phonological systems of other languages by applying it to the sound classes and phonological alternations of German. Furthermore, we compare Pāṇini’s pratyāhāra technique with the technique of describing phonological classes by phonological features, which is more common in Western phonology. It turns out that pratyāhāras perform better than features for the description of our sample of German phonological processes if one considers the quality criterion for class-description devices proposed by [10] which is based on the ratio of describable to actual classes.
Wiebke Petersen, Silke Hamann
Building a Prototype Text to Speech for Sanskrit
Abstract
This paper describes about the work done in building a prototype text to speech system for Sanskrit. A basic prototype text-to-speech is built using a simplified Sanskrit phone set, and employing a unit selection technique, where prerecorded sub-word units are concatenated to synthesize a sentence. We also discuss the issues involved in building a full-fledged text-to-speech for Sanskrit.
Baiju Mahananda, C. M. S. Raju, Ramalinga Reddy Patil, Narayana Jha, Shrinivasa Varakhedi, Prahallad Kishore
Rule-Blocking and Forward-Looking Conditions in the Computational Modelling of Pāṇinian Derivation
Abstract
Attempting to model Pāṇinian procedure computationally forces one to clarify concepts explicitly and allows one to test various versions and interpretations of his grammar against each other and against bodies of extant Sanskrit texts. To model Pāṇinian procedure requires creating data structures and a framework that allow one to approximate the statement of Pāṇinian rules in an executable language. Scharf (2009: 117-125) provided a few examples of how rules would be formulated in a computational model of Pāṇinian grammar as opposed to in software that generated speech forms without regard to Pāṇinian procedure. Mishra (2009) described the extensive use of attributes to track classification, marking and other features of phonetic strings. Goyal, Kulkarni, and Behera (2009, especially sec. 3.5) implemented a model of the asiddhavat section of rules (6.4.22-129) in which the state of the data passed to rules of the section is maintained unchanged and is utilized by those rules as conditions, yet the rules of the section are applied in parallel, and the result of all applicable rules applying exits the section. The current paper describes Scharf and Hyman’s implementation of rule blocking and forward-looking conditions. The former deals with complex groups of rules concerned with domains included within the scope of a general rule. The latter concerns a case where a decision at an early stage in the derivation requires evaluation of conditions that do not obtain until a subsequent stage in the derivation.
Peter M. Scharf
Sanskrit Compound Processor
Abstract
Sanskrit is very rich in compound formation. Typically a compound does not code the relation between its components explicitly. To understand the meaning of a compound, it is necessary to identify its components, discover the relations between them and finally generate a paraphrase of the compound. In this paper, we discuss the automatic segmentation and type identification of a compound using simple statistics that results from the manually annotated data.
Anil Kumar, Vipul Mittal, Amba Kulkarni
Designing a Constraint Based Parser for Sanskrit
Abstract
Verbal understanding (śā bdabodha) of any utterance requires the knowledge of how words in that utterance are related to each other. Such knowledge is usually available in the form of cognition of grammatical relations. Generative grammars describe how a language codes these relations. Thus the knowledge of what information various grammatical relations convey is available from the generation point of view and not the analysis point of view. In order to develop a parser based on any grammar one should then know precisely the semantic content of the grammatical relations expressed in a language string, the clues for extracting these relations and finally whether these relations are expressed explicitly or implicitly. Based on the design principles that emerge from this knowledge, we model the parser as finding a directed Tree, given a graph with nodes representing the words and edges representing the possible relations between them. Further, we also use the Mīmā ṃsā constraint of ākā ṅkṣā (expectancy) to rule out non-solutions and sannidhi (proximity) to prioritize the solutions. We have implemented a parser based on these principles and its performance was found to be satisfactory giving us a confidence to extend its functionality to handle the complex sentences.
Amba Kulkarni, Sheetal Pokar, Devanand Shukl
Generative Graph Grammar of Neo-Vaiśeṣika Formal Ontology (NVFO)
Abstract
NLP applications for Sanskrit so far work within computational paradigm of string grammars. However, to compute ‘meanings’, as in traditional śā bdabodha prakriyā-s, there is a need to develop suitable graph grammars. Ontological structures are fundamentally graphs. We work within the formal framework of Neo-Vaiśeṣika Formal Ontology (NVFO) to propose a generative graph grammar. The proposed formal grammar only produces well-formed graphs that can be readily interpreted in accordance with Vaiśeṣ ika Ontology. We show that graphs not permitted by Vaiśeṣ ika ontology are not generated by the proposed grammar. Further, we write Interpreter of these graphical structures. This creates computational environment which can be deployed for writing computational applications of Vaiśeṣ ika ontology. We illustrate how this environment can be used to create applications like computing śā bdabodha of sentences.
Rajesh Tavva, Navjyoti Singh
Headedness and Modification in Nyāya Morpho-Syntactic Analysis: Towards a Bracket-Parsing Model
Abstract
The paper aims to develop a parsing model using the nyāya-morpho-syntactic analysis using the two terms namely, prakāratā and viśeṣyatā. The idea is that prakāratā and viśeṣyatā are to be seen as modification (modifiedness) and headedness respectively. Several representative sentences have been exemplified using the method developed. prakāratā and viśeṣyatā not only come through to give a thorough analysis at word level, but may be extended, as it has been shown in this paper, to get a thorough analysis at syntactic and discourse level, as well.
Malhar Kulkarni, Anuja Ajotikar, Tanuja Ajotikar, Dipesh Katira, Chinmay Dharurkar, Chaitali Dangarikar
Citation Matching in Sanskrit Corpora Using Local Alignment
Abstract
Citation matching is the problem of finding which citation occurs in a given textual corpus. Most existing citation matching work is done on scientific literature. The goal of this paper is to present methods for performing citation matching on Sanskrit texts. Exact matching and approximate matching are the two methods for performing citation matching. The exact matching method checks for exact occurrence of the citation with respect to the textual corpus. Approximate matching is a fuzzy string-matching method which computes a similarity score between an individual line of the textual corpus and the citation. The Smith-Waterman-Gotoh algorithm for local alignment, which is generally used in bioinformatics, is used here for calculating the similarity score. This similarity score is a measure of the closeness between the text and the citation. The exact- and approximate-matching methods are evaluated and compared. The methods presented can be easily applied to corpora in other Indic languages like Kannada, Tamil, etc. The approximate-matching method can in particular be used in the compilation of critical editions and plagiarism detection in a literary work.
Abhinandan S. Prasad, Shrisha Rao
RDBMS Based Lexical Resource for Indian Heritage: The Case of Mahābhārata
Abstract
The paper describes a lexical resource in the form of a relational database based indexing system for Sanskrit documents - Mahābhārata (MBh) as an example. The system is available online on http://sanskrit.jnu.ac.in/mb with input and output in Devanāgarī Unicode, using technologies such as RDBMS and Java Servlet. The system works as an interactive and multi-dimensional indexing system with search facility for MBh and has potentials for use as a generic system for all Sanskrit texts of similar structure. Currently, the system allows three types of searching facilities- ‘Direct Search’, ‘Alphabetical Search’ and ‘Search by Classes’. The input triggers an indexing process by which a temporary index is created for the search string, and then clicking on any indexed word displays the details for that word and also a facility to search that word in some other online lexical resources.
Diwakar Mani
Evaluating Tagsets for Sanskrit
Abstract
In this paper we present an evaluation of available Part Of Speech (POS) tagsets designed for tagging Sanskrit and Indian languages which are developed in India. The tagsets evaluated are - JNU-Sanskrit tagset (JPOS), Sanskrit consortium tagset (CPOS), MSRI-Sanskrit tagset (IL-POST), IIIT Hyderabad tagset (ILMT POS) and CIIL Mysore tagset for the Linguistic Data Consortium for Indian Languages (LDCIL) project (LDCPOS). The main goal behind this enterprise is to check the suitability of existing tagsets for Sanskrit from various Natural Language Processing (NLP) points of view.
Madhav Gopal, Diwakar Mishra, Devi Priyanka Singh
Performance of a Lexical and POS Tagger for Sanskrit
Abstract
Due to the phonetic, morphological, and lexical complexity of Sanskrit, the automatic analysis of this language is a real challenge in the area of natural language processing. The paper describes a series of tests that were performed to assess the accuracy of the tagging program SanskritTagger. To our knowlegde, it offers the first reliable benchmark data for evaluating the quality of taggers for Sanskrit using an unrestricted dictionary and texts from different domains. Based on a detailed analysis of the test results, the paper points out possible directions for future improvements of statistical tagging procedures for Sanskrit.
Oliver Hellwig
The Knowledge Structure in Amarakośa
Abstract
Amarakośa is the most celebrated and authoritative ancient thesaurus of Sanskrit. It is one of the books which an Indian child learning through Indian traditional educational system memorizes as early as his first year of formal learning. Though it appears as a linear list of words, close inspection of it shows a rich organisation of words expressing various relations a word bears with other words. Thus when a child studies Amarakośa further, the linear list of words unfolds into a knowledge web. In this paper we describe our effort to make the implicit knowledge in Amarakośa explicit. A model for storing such structure is discussed and a web tool is described that answers the queries by reconstructing the links among words from the structured tables dynamically.
Sivaja S. Nair, Amba Kulkarni
Gloss in Sanskrit Wordnet
Abstract
Glosses and examples are the essential components of the computational lexical databases like, Wordnet. These two components of the lexical database can be used in building domain ontologies, semantic relations, phrase structure rules etc., and can help automatic or manual word sense disambiguation tasks. The present paper aims to highlight the importance of gloss in the process of WSD based on the experiences from building Sanskrit Wordnet. This paper presents a survey of Sanskrit Synonymy lexica, use of Navya-Nyāya terminology in developing a gloss and the kind of patterns evolved that are useful for the computational purpose of WSD with special reference to Sanskrit.
Malhar Kulkarni, Irawati Kulkarni, Chaitali Dangarikar, Pushpak Bhattacharyya
Vibhakti Divergence between Sanskrit and Hindi
Abstract
Translation divergence at various levels between languages arises due to the different conventions followed by different languages for coding the information of grammatical relations. Though Sanskrit and Hindi belong to the same Indo-Aryan family and structurally as well as lexically Hindi inherits a lot from Sanskrit, yet divergences are observed at the level of function words such as vibhaktis. Pāṇini in his Aṣṭādhyāyī has assigned a default vibhakti to kārakas alongwith many scopes for exceptions. He handles these exceptions either by imposing a new kāraka role or by assigning a special vibhakti. However, these methods are not acceptable in Hindi in toto. Based on the nature of deviation, we propose seven cases of divergences in this paper.
Preeti Shukla, Devanand Shukl, Amba Kulkarni
Anaphora Resolution Algorithm for Sanskrit
Abstract
This paper presents an algorithm, which identifies different types of pronominal and its antecedents in Sanskrit, an Indo-European language. The computational grammar implemented here uses very familiar concepts such as clause, subject, object etc., which are identified with the help of morphological information and concepts such as precede and follow. It is well known that natural languages contain anaphoric expressions, gaps and elliptical constructions of various kinds and that understanding of natural languages involves assignment of interpretations to these elements. Therefore, it is only to be expected that natural language understanding systems must have the necessary mechanism to resolve the same. The method we adopt here for resolving the anaphors is by exploiting the morphological richness of the language. The system is giving encouraging results when tested with a small corpus.
Pravin Pralayankar, Sobha Lalitha Devi
Linguistic Investigations into Ellipsis in Classical Sanskrit
Abstract
Ellipsis is a common phenomenon of Classical Sanskrit prose. No inventory of the forms of ellipsis in Classical Sanskrit has been made. This paper presents an inventory, based both on a systematic investigation of one text and on examples based on sundry reading.
Brendan S. Gillon
Asiddhatva Principle in Computational Model of Aṣṭādhyāyī
Abstract
Pāṇini’s Aṣṭādhyāyī can be thought of as an automaton to generate Sanskrit words and sentences. Aṣṭādhyāyī consists of sūtras that are organized in a systematic manner. The words are derived from the roots and affixes by the application of these sūtras that follow a well defined procedure. Therefore, Aṣṭādhyāyī is best suited for computational modeling. A computational model with conflict resolution techniques was discussed by us (Sridhar et al, 2009)[12]. In continuation with that, this paper presents, an improvised computational model of Aṣṭādhyāyī. This model is further developed based on the principle of asiddhatva. A new mathematical technique called ‘filter’ is introduced to comprehensively envisage all usages of asiddhatva in Aṣṭādhyāyī.
Sridhar Subbanna, Shrinivasa Varakhedi
Modelling Aṣṭādhyāyī: An Approach Based on the Methodology of Ancillary Disciplines (Vedāṅga)
Abstract
This article proposes a general model based on the common methodological approach of the ancillary disciplines (Vedāṅga) associated with the Vedas taking examples from Śikṣā, Chandas, Vyākaraṇa and Prātiśā khya texts. It develops and elaborates this model further to represent the contents and processes of Aṣṭādhyāyī. Certain key features are added to my earlier modelling of Pāṇinian system of Sanskrit grammar. This includes broader coverage of the Pāṇinian meta-language, mechanism for automatic application of rules and positioning the grammatical system within the procedural complexes of ancillary disciplines.
Anand Mishra
Backmatter
Metadaten
Titel
Sanskrit Computational Linguistics
herausgegeben von
Girish Nath Jha
Copyright-Jahr
2010
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-17528-2
Print ISBN
978-3-642-17527-5
DOI
https://doi.org/10.1007/978-3-642-17528-2