Skip to main content

2010 | Buch

Finite-State Methods and Natural Language Processing

8th International Workshop, FSMNLP 2009, Pretoria, South Africa, July 21-24, 2009, Revised Selected Papers

herausgegeben von: Anssi Yli-Jyrä, András Kornai, Jacques Sakarovitch, Bruce Watson

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Inhaltsverzeichnis

Frontmatter

Tutorials

Learning Finite State Machines
Abstract
The terms grammatical inference and grammar induction both seem to indicate that techniques aiming at building grammatical formalisms when given some information about a language are not concerned with automata or other finite state machines. This is far from true, and many of the more important results in grammatical inference rely heavily on automata formalisms, and particularly on the specific use of determinism that is made. We survey here some of the main ideas and results in the field.
Colin de la Higuera

Special Theme Tutorials

Developing Computational Morphology for Low- and Middle-Density Languages
Abstract
This tutorial will present an overview of the techiques for developing morphological processors for low- and middle-density languages. The developed morphological analyzers and generators can be used in many language processing applications.
Kemal Oflazer

Invited Papers

fsm2 – A Scripting Language Interpreter for Manipulating Weighted Finite-State Automata
Abstract
The present article describes fsm2, a software program which can be used interactively or as a script interpreter to manipulate weighted finite-state automata with around 100 different commands. fsm2 is based on FSM<2.0> – an efficient C++ template library to create and algebraically manipulate weighted automata.
Thomas Hanneforth
Selected Operations and Applications of n-Tape Weighted Finite-State Machines
Abstract
A weighted finite-state machine with n tapes (n-WFSM) defines a rational relation on n strings. The paper recalls important operations on these relations, and an algorithm for their auto-intersection. Through a series of practical applications, it investigates the augmented descriptive power of n-WFSMs, w.r.t. classical 1- and 2-WFSMs (weighted acceptors and transducers). Some of the presented applications are not feasible with the latter.
André Kempe
OpenFst
Abstract
In this invited talk, we describe OpenFST, an open-source library for weighted finite-state transducers (WFSTs). OpenFst consists of a C++ template library with efficient WFST representations and over twenty-five operations for constructing, combining, optimizing, and searching them. At the shell-command level, there are corresponding transducer file representations and programs that operate on them. OpenFst is designed to be both very efficient in time and space and to scale to very large problems. This library has key applications speech, image, and natural language processing, pattern and string matching, and machine learning. We give an overview of the library, examples of its use, details of its design that allow customizing the labels, states, and weights and the lazy evaluation of many of its operations. Further information and a download of OpenFst can be obtained from http://www.openfst.org .
The accompanying tutorial, “OpenFst in Depth”, gives an overview of algorithms, OpenFst code design and applications.
Johan Schalkwyk

Special Theme Invited Talks

Morphological Analysis of Tone Marked Kinyarwanda Text
Abstract
Tones are a significant feature in most Bantu languages. However, previous work on morphological analysis of Bantu languages has always missed out the tones. In this paper we describe an approach to carry out morphological analysis of tone marked Kinyarwanda text. Kinyarwanda is an agglutinating tonal Bantu language with a complex morphological structure. Results obtained from these experiments were compared to results obtained with analysis of tone unmarked text. We found that analysis results from tone marked text were less ambiguous compared to similar analysis on tone unmarked text.
Jackson Muhirwe

Regular Papers

Minimizing Weighted Tree Grammars Using Simulation
Abstract
Weighted tree grammars (for short: WTG) are an extension of weighted context-free grammars that generate trees instead of strings. They can be used in natural language parsing to directly generate the parse tree of a sentence or to encode the set of all parse trees of a sentence. Two types of simulations for WTG over idempotent, commutative semirings are introduced. They generalize the existing notions of simulation and bisimulation for WTG. Both simulations can be used to reduce the size of WTG while preserving the semantics, and are thus an important tool in toolkits. Since the new notions are more general than the existing ones, they yield the best reduction rates achievable by all minimization procedures that rely on simulation or bisimulation. However, the existing notions might allow faster minimization.
Andreas Maletti
Compositions of Top-Down Tree Transducers with ε-Rules
Abstract
Top-down tree transducers with ε-rules (εtdtts) are a restricted version of extended top-down tree transducers. They are implemented in the framework Tiburon and fulfill some criteria desirable in a machine translation model. However, they compute a class of transformations that is not closed under composition (not even for linear and nondeleting εtdtts). A composition construction that composes two εtdtts M and N is presented, and it is shown that the construction is correct, whenever (i) N is linear, (ii) M is total or N is nondeleting, and (iii) M has at most one output symbol in each rule.
Andreas Maletti, Heiko Vogler
Reducing Nondeterministic Finite Automata with SAT Solvers
Abstract
We consider the problem of reducing the number of states of nondeterministic finite automata, and show how to encode the reduction as a Boolean satisfiability problem. This approach improves on previous work by reducing a more general class of automata. Experimental results show that it produces a minimal automaton in almost all cases and that the running time compares favourably to the Kameda-Weiner algorithm.
Jaco Geldenhuys, Brink van der Merwe, Lynette van Zijl
Joining Composition and Trimming of Finite-State Transducers
Abstract
The composition of two (weighted) finite-state transducers is usually carried out in two steps: In the first step, all accessible states of the result are constructed regardless of their co-accessibility. Non-co-accessible states are removed afterwards in the second step. This approach can lead to huge intermediate automata with only a fraction of their states being useful in the end. We present a novel composition algorithm which avoids the construction of non useful states by using a single depth-first traversal while having the same asymptotic complexity as the existing approaches.
Johannes Bubenzer, Kay-Michael Würzner

Special Theme Extended Abstracts

Porting Basque Morphological Grammars to foma, an Open-Source Tool
Abstract
Basque is a morphologically rich language, of which several finite-state morphological descriptions have been constructed, primarily using the Xerox/PARC finite-state tools. In this paper we describe the process of porting a previous description of Basque morphology to foma, an open-source finite-state toolkit compatible with Xerox tools, provide a comparison of the two tools, and contrast the development of a two-level grammar with parallel alternation rules and a sequential grammar developed by composing individual replacement rules.
Iñaki Alegria, Izaskun Etxeberria, Mans Hulden, Montserrat Maritxalar
Describing Georgian Morphology with a Finite-State System
Abstract
In the paper, application of the Finite State Tools to one of the Southern Caucasian languages, Georgian, is discussed. In Georgian, as in many non-Indo-European agglutinative languages, concatenative morphotactics is impressively productive due to its rich morphology. The presented Georgian Language Morphological Transducer is capable to parse all theoretically possible forms for the lemmata of Georgian nouns, pronouns, adjectives, adverbs, numerals, functional words and for most of the lemmata from 72 verb sets.
Oleg Kapanadze
Finite State Morphology of the Nguni Language Cluster: Modelling and Implementation Issues
Abstract
The paper provides an overview of a project on computational morphological analysers for the Nguni cluster of languages namely Zulu, Xhosa, Swati and Ndebele. These languages are agglutinative and lesser-resourced. The project adopted a finite approach, which is well-suited to modelling both regular morphophonological phenomena and linguistic idiosyncrasies. The paper includes a brief overview of the morphology of this cluster of languages, then focuses on how the various morphophonological phenomena of Zulu are modelled and implemented using the Xerox finite-state toolkit. The bootstrapping of the Zulu morphological analyser prototype, ZulMorph, to obtain analyser prototypes for Xhosa, Swati and Ndebele, is outlined and experimental results given.
Laurette Pretorius, Sonja Bosch
A Finite State Approach to Setswana Verb Morphology
Abstract
Setswana is characterised by a disjunctive orthography according to which verbal prefixal morphemes are usually written disjunctively, while suffixal morphemes to the verb root follow a conjunctive writing style. This article specifically focusses on a finite-state approach to Setswana verb morphology and the challenges of the disjunctive orthography used for prefixes.
Laurette Pretorius, Biffie Viljoen, Rigardt Pretorius, Ansu Berg

Competition Announcements

Zulu: An Interactive Learning Competition
Abstract
Active language learning is an interesting task for which theoretical results are known and several applications exist. In order to better understand what the better strategies may be, a new competition called Zulu ( http://labh-curien.univ-st-etienne.fr/zulu/ ) is launched: participants are invited to learn deterministic finite automata from membership queries. The goal is to obtain the best classification rate from a fixed number of queries.
David Combe, Colin de la Higuera, Jean-Christophe Janodet
Backmatter
Metadaten
Titel
Finite-State Methods and Natural Language Processing
herausgegeben von
Anssi Yli-Jyrä
András Kornai
Jacques Sakarovitch
Bruce Watson
Copyright-Jahr
2010
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-14684-8
Print ISBN
978-3-642-14683-1
DOI
https://doi.org/10.1007/978-3-642-14684-8

Premium Partner