nach oben

2003 | Buch

Kapitel lesen Erstes Kapitel lesen

Implementation and Application of Automata

7th International Conference, CIAA 2002 Tours, France, July 3–5, 2002 Revised Papers

herausgegeben von: Jean-Marc Champarnaud, Denis Maurel

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

Enthalten in: Professional Book Archive

Einloggen, um Zugang zu erhalten

Inhaltsverzeichnis

Frontmatter

Invited Lecture

Edit-Distance of Weighted Automata

The edit-distance of two strings is the minimal cost of a sequence of symbol insertions, deletions, or substitutions transforming one string into the other. The definition is used in various contexts to give a measure of the difference or similarity between two strings. This definition can be extended to measure the similarity between two sets of strings. In particular, when these sets are represented by automata, their edit-distance can be computed using the general algorithm of composition of weighted transducers combined with a single-source shortest-paths algorithm. More generally, in some applications such as speech recognition and computational biology, the strings may represent a range of alternative hypotheses with associated probabilities. Thus, we introduce the definition of the edit-distance of two distributions of strings given by two weighted automata. We show that general weighted automata algorithms over the appropriate semirings can be used to compute the edit-distance of two weighted automata exactly. The algorithm for computing exactly the edit-distance of weighted automata can be used to improve the word accuracy of automatic speech recognition systems. More generally, the algorithm can be extended to provide an edit-distance automaton useful for rescoring and other post-processing purposes in the context of large-vocabulary speech recognition. In the course of the presentation of our algorithm, we also introduce a new and general synchronization algorithm for weighted transducers which, combined with ∈-removal, can be used to normalize weighted transducers with bounded delays.

Mehryar Mohri

Technical Contributions

p-Subsequentiable Transducers

p-subsequential transducers are efficient finite-state transducers with p final outputs used in a variety of applications. Not all transducers admit equivalent p-subsequential transducers however. We briefly describe an existing generalized determinization algorithm for psubsequential transducers and give the first characterization of p-subsequentiable transducers, transducers that admit equivalent p-subsequential transducers. Our characterization shows the existence of an efficient algorithm for testing p-subsequentiability. We have fully implemented the generalized determinization algorithm and the algorithm for testing p- subsequentiability. We report experimental results showing that these algorithms are practical in large-vocabulary speech recognition applications. The theoretical formulation of our results is the equivalence of the following three properties for finite-state transducers: determinizability in the sense of the generalized algorithm, p-subsequentiability, and the twins property.

Cyril Allauzen, Mehryar Mohri

Bidirectional Push Down Automata

We define a new model of automata for the description of bidirectional parsing strategies for context-free grammars and a tabulation mechanism that allow them to be executed in polynomial time. This new model of automata provides a modular way of defining bidirectional parsers, separating the description of a strategy from its execution.

Miguel A. Alonso, Víctor J. Díaz, Manuel Vilares

Finite Automata and Non-self-Embedding Grammars

We consider non-self-embedding (NSE) context-free grammars as a representation of regular sets.We point out its advantages with respect to more classical representations by finite automata, in particular when considering the efficient realization of the rational operations. We give a characterization in terms of composition of regular grammars and state relationships between NSE grammars and push-down automata. Finally we show a polynomial algorithm to decide whether a context-free grammars is self-embedding or not.

Marcella Anselmo, Dora Giammarresi, Stefano Varricchio

Simulation of Gate Circuits in the Algebra of Transients

We study simulation of gate circuits in algebra C recently introduced by Brzozowski and Ésik.A transient is a word consisting of alternating 0s and 1s; it represents a changing signal. In C, gates process transients instead of 0s and 1s. Simulation in C is capable of counting signal changes, and detecting hazards.We study two simulation algorithms: a general one, A, that works with any state, and Ã, that applies if the initial state is stable.We show that the two algorithms agree in the stable case. We prove the sufficiency of the simulation: all signal changes occurring in binary analysis are also predicted by Algorithm A.

Janusz Brzozowski, Mihaela Gheorghiu

The Number of Similarity Relations and the Number of Minimal Deterministic Finite Cover Automata

Finite Deterministic Cover Automata (DFCA) can be obtained from Deterministic Finite Automata (DFA) using the similarity relation. Since the similarity relation is not an equivalence relation, the minimal DFCA for a finite language is usually not unique. We count the number of minimal DFCA that can be obtained from a given minimal DFA with n states by merging the similar states in the given DFA. We compute an upper bound for this number and prove that in the worst case (for a non-unary alphabet) it is [4n-9+√8n+1!/8]/(2[4n-9+√8n+1/8] - n + 1)!We prove that this upper bound is reached, i.e. for any given positive integer n we find a minimal DFA with n states, which has the number of minimal DFCA obtained by merging similar states equal to this maximum.

Cezar Cămpeanu, Andrei Păun

Regex and Extended Regex

Regex are used in many programs such as Perl, Awk, Python, egrep, vi, emacs etc. It is known that regex are different from regular expressions. In this paper, we give regex a formal treatment. We make a distinction between regex and extended regex; while regex present regular languages, extended regex present a family of languages larger than regular languages. We prove a pumping lemma for the languages expressed by extended regex. We show that the languages represented by extended regex are incomparable with context-free languages and a proper subset of context-sensitive languages.

Cezar Câmpeanu, Kai Salomaa, Sheng Yu

Prime Decompositions of Regular Prefix Codes

One of the new approaches to data classification uses finite state automata for representation of prefix codes. An important task driven by the need for the efficient storage of such automata in memory is the decomposition of prefix codes into prime factors. We investigate properties of such prefix code decompositions. A linear time algorithm is designed which finds the prime decomposition F₁F₂ . . . F_k of a regular prefix code F given by its minimal deterministic automaton.

Jurek Czyzowicz, Wojciech Fraczak, Andrzej Pelc, Wojciech Rytter

Implementation of Dictionaries via Automata and Decision Trees

Finite-state transducers can be used to map a language onto a set of values. This paper proposes an alternate representation method for such a mapping, consisting of associating a finite-state automaton accepting the input language with a decision tree representing the output values. The advantages of this approach are that it leads to more compact representations than transducers, and that decision trees can easily be synthesized by machine learning techniques.

Abolfazl Fatholahzadeh

Feedback-Free Circuits in the Algebra of Transients

An efficient simulation algorithm using an algebra of transients for gate circuits was proposed by Brzozowski and Ésik. This algorithm seems capable of predicting all the signal changes that can occur in a circuit under worst-case delay conditions. We verify this claim by comparing simulation with binary analysis. For any feedback-free circuit consisting of 1- and 2-input gates and started in a stable state, we prove that all signal changes predicted by simulation occur in binary analysis, provided that wire delays are taken into account. Two types of finite automata play an important role in our proof.

Mihaela Gheorghiu, Janusz Brzozowski

On Minimizing Cover Automata for Finite Languages in O(n log n) Time

A deterministic finite automaton (DFA) A is called a cover automaton (DFCA) for a finite language L over some alphabet σ if L = L(A) ∩ σ≤l, with l being the length of some longest word in L. Thus a word w ∈ σ* is in L if and only if ∣w∣ ≤ l and w ∈ L(A). The DFCA A is minimal if no DFCA for L has fewer states.In this paper, we present an algorithm which converts an n-state DFA for some finite language L into a corresponding minimal DFCA, using only O(n log n) time and O(n) space. The best previously known algorithm [2] requires O(n2) time and space. Furthermore, the new algorithm can also be used to minimize any DFCA, where the best previous method [1] takes O(n4) time and space.

Heiko Körner

Compilation of Constraint-Based Contextual Rules for Part-of-Speech Tagging into Finite State Transducers

With the aim of removing the residuary errors made by pure stochastic disambiguation models, we put forward a hybrid system in which linguist users introduce high level contextual rules to be applied in combination with a tagger based on a Hidden Markov Model. The design of these rules is inspired in the Constraint Grammars formalism. In the present work, we review this formalism in order to propose a more intuitive syntax and semantics for rules, and we develop a strategy to compile the rules under the form of Finite State Transducers, thus guaranteeing an efficient execution framework.

Jorge Graña, Gloria Andrade, Jesús Vilares

Finite State Lazy Operations in NLP

Finite state networks can represent dictionaries and lexical relations. Traditional finite-state operations like composition can produce huge networks with prohibitive computation space and time. For a subset of finite state operations, these drawbacks can be avoided by using virtual networks, which rely on structures that are partially built on demand. This paper addresses the implementation of virtual network operations in xfst (XEROX Finite State Technology software). The example of “priority union” which is particularly useful in NLP, is developed.

Franck Guingne, Florent Nicart

State Complexity of Basic Operations on Nondeterministic Finite Automata

The state complexities of basic operations on nondeterministic finite automata (NFA) are investigated. In particular, we consider Boolean operations, catenation operations - concatenation, iteration, λ- free iteration - and the reversal on NFAs that accept finite and infinite languages over arbitrary alphabets. Most of the shown bounds are tight in the exact number of states, i.e. the number is sufficient and necessary in the worst case. For the complementation tight bounds in the order of magnitude are proved. It turns out that the state complexities of operations on NFAs and deterministic .nite automata (DFA) are quite different. For example, the reversal and concatenation have exponential state complexity on DFAs but linear complexity on NFAs. Conversely, the complementation can be done with linear complexity on DFAs but needs exponentially many states on NFAs.

Markus Holzer, Martin Kutrib

Adaptive Automata - A Revisited Proposal

This paper impose further discipline to the use of adaptive automata [Jos94], [Iwa00] by restricting some of their features, in order to obtain devices that are easier to create and more readable, without loosing computational power. An improved notation is proposed as a first try towards a language for adaptive paradigm programming.

João José Neto, César Bravo

Efficient Automaton-Based Recognition for Linear Conjunctive Languages

This paper studies practical algorithms for dealing with a particular family of cellular automata, which has recently been proved computationally equivalent to linear conjunctive grammars. The relation between these grammars and these automata resembles that between regular expressions and finite automata: while the former are better suited for human use, the latter are considerably easier to implement. In this paper, an algorithm for converting an arbitrary linear conjunctive grammar to an equivalent automaton is proposed, and different techniques of reducing the size of existing automata are studied.

Alexander Okhotin

Syntactic Semiring and Language Equations

A classical construction assigns to any language its (ordered) syntactic monoid. Recently the author defined the so-called syntactic semiring of a language. We show here that elements of the syntactic semiring of L can be identified with transformations of a certain modification of the minimal automaton for L.The main issue here are the inequalities r(x₁, . . . , x_m) ⊆ L and equations r(x₁, . . . , x_m) = L where L is a given regular language over a finite alphabet A and r is a given regular expression over A in variables x₁, . . . , x_m. We show that the search for maximal solutions can be translated into the (finite) syntactic semiring of the language L. In such a way we are able to decide the solvability and to find all maximal solutions effectively. In fact, the last questions were already solved by Conway using his factors. The first advantage of our method is the complexity and the second one is that we calculate in a transparent algebraic structure.

Libor Polák

Reduced Power Automata

We describe a class of transitive semiautomata whose power automata are reduced: any two reachable sets of states have distinct behavior. These automata appear naturally in the study of one-dimensional cellular automata.

Klaus Sutner

A Polynomial Time Algorithm for Left [Right] Local Testability

A right [left] locally testable language S is a language with the property that for some nonnegative integer k two words u and v in alphabet S are equal in the semigroup if (1) the prefix and sufix of the words of length k -1 coincide, (2) the set of segments of length k of the words as well as 3) the order of the first appearance of these segments in prefixes [sufixes] coincide.We present necessary and sufficient condition for graph [semigroup] to be transition graph [semigroup] of the deterministic finite automaton that accepts right [left] locally testable language and necessary and sufficient condition for transition graph of the deterministic finite automaton with locally idempotent semigroup. We introduced polynomial time algorithms for the right [left] local testability problem for transition semigroup and transition graph of the deterministic finite automaton based on these conditions. Polynomial time algorithm verifies transition graph of automaton with locally idempotent transition semigroup.

A.N. Trahtman

Whale Calf, a Parser Generator for Conjunctive Grammars

Whale Calf is a parser generator that uses conjunctive grammars, a generalization of context-free grammars with an explicit intersection operation, as the formalism of specifying the language. All existing parsing algorithms for conjunctive grammars are implemented — namely, the tabular algorithm for grammars in the binary normal form, the tabular algorithm for grammars in the linear normal form, the tabular algorithm for arbitrary grammars, the conjunctive LL, the conjunctive LR and the algorithm based on simulation of the automata equivalent to linear conjunctive grammars. The generated C++ programs can determine the membership of strings in the language and, if needed, create parse trees of these strings.

Alexander Okhotin

automata, a Hybrid System for Computational Automata Theory

We present a system that performs computations on finite state machines, syntactic semigroups, and one-dimensional cellular automata.

Klaus Sutner

A Package TESTAS for Checking Some Kinds of Testability

We implement a set of procedures for deciding whether or not a language given by its minimal automaton or by its syntactic semigroup is locally testable, right or left locally testable, threshold locally testable, strictly locally testable, or piecewise testable. The bounds on order of local testability of transition graph and order of local testability of transition semigroup are also found. For given k, the k-testability of transition graph is verified. Some new effective polynomial time algorithms are used. These algorithms have been implemented as a C/C++ package.

A. N. Trahtman

DAWG versus Suffix Array

This paper shows a comparison of two data structures used for indexing of input texts. The first structure is the Suffix Array and the second is the Directed Acyclic Word Graph (DAWG). We present an eficient DAWG implementation. This implementation is compared with other structures used for text indexing. The construction time and speed of searching of a set of substrings are shown for the DAWG and the Suffix Array.

Miroslav Balík

On Predictive Parsing and Extended Context-Free Grammars

Extended context-free grammars are context-free grammars in which the right-hand sides of productions are allowed to be any regular language rather than being restricted to only finite languages. We present a novel view on top-down predictive parser construction for extended context-free grammars that is based on the rewriting of partial syntax trees. This work is motivated by our development of ecfg, a Java toolkit for the manipulation of extended context-free grammars, and by our continuing investigation of XML.

Anne Brüggemann-Klein, Derick Wood

Star Normal Form, Rational Expressions, and Glushkov WFAs Properties

In this paper, we extend the characterisation of Glushkov automata to multiplicities. We consider automata obtained from rational expressions in star normal form. We show that for this class of automata, the graphical Boolean properties are preserved. We prove that this new characterization only depends on conditions on coefficients and we explicit these conditions.

Pascal Caron, Marianne Flouret

Comparison of Construction Algorithms for Minimal, Acyclic, Deterministic, Finite-State Automata from Sets of Strings

This paper compares various methods for constructing minimal, deterministic, acyclic, finite-state automata (recognizers) from sets of words. Incremental, semi-incremental, and non-incremental methods have been implemented and evaluated.

Jan Daciuk

Term Validation of Distributed Hard Real-Time Applications

To validate real-time systems, one must especially validate on the one hand its functional behaviours (by proving that it does what it must do), and on the other hand its operational behaviours (by proving that it respects its time specifications). Here, we deal with the operational aspects. In previous works, we presented a technique, based on finite automata, to validate real-time systems designed to run on a centralised architecture. Here, we extend this approach to distributed systems. The main contribution of this work is to show that, when the modeled physical process is closed, finite automata and product operators are suficient to valid distributed systems on an operational way.

Gaëlle Largeteau, Dominique Geniet

Common Subsequence Automaton

Given a set of strings, a common subsequence of this set is a string that is a subsequence of each string in this set. We describe an on-line algorithm building the finite automaton that accepts all common subsequences of the given set of strings.

Zdenêk Troniĉek

Searching for Asymptotic Error Repair

We work in the domain of a regional least-cost strategy with dynamic validation in order to avoid cascaded errors [3], extending the theoretical model to illustrate its asymptotic equivalence with global repair algorithms. This is an objective criterion to measure the quality of an error repair algorithm, since the point of reference is a technique that guarantees the best quality for a given error metric when all contextual information is available. To the best of our knowledge, it is the first time that such a discussion takes place. We also reformulate the parsing framework using parsing schemata [1], simplifying the description.

Manuel Vilares, Victor M. Darriba, Miguel A. Alonso

Abstracts

Automata-Based Representations for Arithmetic Constraints in Automated Verification

In this paper we discuss efficient symbolic representations for infinite-state systems specified using linear arithmetic constraints. We give new algorithms for constructing finite automata which represent integer sets that satisfy linear constraints. These automata can represent either signed or unsigned integers and have a lower number of states compared to other similar approaches. We experimentally compare different symbolic representations by using them to verify non-trivial specification examples. In many cases symbolic representations based on our construction algorithms outperform the polyhedral representation used in Omega Library, or the automata representation used in LASH.

Constantinos Bartzis, Tev.k Bultan

On the Implementation of Compact DAWG’s

There are several data structures that allow searching for a pattern P in a preprocessed text T in time dependent just on the length of P. In this paper we present an implementation of CDAWG’s— Compact Direct Acyclic Word Graphs. While the previous implementations of CDAWG’s required from 7n to 23n bytes of memory space, ours achieves 1.7n to 5n for a text T of length n. The implementation is suitable for large data files, since it minimizes the number of disk accesses. If disk accesses are not to be optimized, space requirements can be further decreased.

Jan Holub, Maxime Crochemore

Dynamic Programming — NFA Simulation

Nondeterministic finite automaton (NFA) cannot be directly used because of its nondeterminism. One way how to use NFA is to determinize NFA and the second way is to use one of simulation methods. This paper deals with one of the simulation method called dynamic programming. We present the method on one of the pattern matching problems as well as modifications for several other tasks.

Jan Holub

Deterministic Parsing of Cyclic Strings

A technique of parsing of cyclic strings using an adapted strong LR parser is described. The adapted parser is able to find the starting point of the normal form of cyclic string. The resulting algorithm of parsing has linear time complexity.

Bořivoj Melichar

Backmatter

Titel: Implementation and Application of Automata
herausgegeben von: Jean-Marc Champarnaud
Denis Maurel
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-44977-5
Print ISBN: 978-3-540-40391-3
DOI: https://doi.org/10.1007/3-540-44977-9

Springer Professional