Skip to main content

2017 | Buch

Grammatical Inference

Algorithms, Routines and Applications

insite
SUCHEN

Über dieses Buch

This book focuses on grammatical inference, presenting classic and modern methods of grammatical inference from the perspective of practitioners. To do so, it employs the Python programming language to present all of the methods discussed.
Grammatical inference is a field that lies at the intersection of multiple disciplines, with contributions from computational linguistics, pattern recognition, machine learning, computational biology, formal learning theory and many others.
Though the book is largely practical, it also includes elements of learning theory, combinatorics on words, the theory of automata and formal languages, plus references to real-world problems. The listings presented here can be directly copied and pasted into other programs, thus making the book a valuable source of ready recipes for students, academic researchers, and programmers alike, as well as an inspiration for their further development.>

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
The introductory chapter contains the different formulations of grammatical inference and indicates such formulations that will be considered in this book. Basically, the problem of grammatical inference within the present book is to be studied from machine learning and combinatorial optimization perspectives. In addition to this, the different representations of languages are assumed: deterministic and non-deterministic finite-state automata, regular expressions, and context-free grammars. As regards the machine learning approach, the design and analysis of learning experiments for comparing GI algorithms to each other or with machine learning methods will be discussed. Typical applications of the GI field are presented in this chapter as well.
Wojciech Wieczorek
Chapter 2. State Merging Algorithms
Abstract
This chapter is devoted to the most popular algorithms for the induction of automata, namely state merging algorithms. The following three of them are presented: evidence driven state merging, Gold’s algorithm, and an algorithm based on the minimum description length principle. Their common denominator is the merging of two states. All differences result from various particular reasons for choosing the pair of states to do this operation.
Wojciech Wieczorek
Chapter 3. Partition-Based Algorithms
Abstract
This chapter describes the most popular algorithms for the induction of automata and grammars, in which the partition of states (or non-terminals in case of grammars) plays the central role. A partition of a set is a grouping of the set’s elements into non-empty subsets, in such a way that every element is included in one and only one of the subsets. The following three approaches are presented: the k-tails method, GI by genetic search, and learning context-free grammars using tabular representations. Their common denominator is the merging of the groups of states (or non-terminals). All differences result from various ideas for partitioning.
Wojciech Wieczorek
Chapter 4. Substring-Based Algorithms
Abstract
This chapter discusses the selected algorithms for induction of automata and grammars in which words comparison for searching their fragments that are changed or fixed, plays the crucial role. The following two approaches are presented: Error-Correcting Grammatical Inference (EGCI) and Alignment-Based Learning (ABL). The former approach incrementally grows an inferred automaton by explicitly minimizing (with the help of dynamic programming) the number of states added when each new word is presented. The latter approach, ABL, learns structure from plain sequences by comparing them. Based on the parts of words that are the same and the parts that are not the same in two words, the structure is inserted into a hypothesis space.
Wojciech Wieczorek
Chapter 5. Identification Using Mathematical Modeling
Abstract
This chapter is devoted to three algorithms for the induction of the smallest automaton and grammar that match given data. Deterministic finite automaton identification is transformed to a graph coloring problem, non-deterministic finite automaton identification is transformed to the Boolean satisfiability problem, while context-free grammar identification is transformed to a zero-one nonlinear programming problem. These transformations will be done through the formulation of the induction problem as a set of constraints and (optionally) an objective function. The constraints have the form of nonlinear equations and inequalities or just of Boolean expressions.
Wojciech Wieczorek
Chapter 6. A Decomposition-Based Algorithm
Abstract
A language is said to possess a decomposition if it can be written as a catenation of two languages neither of which is the singleton language consisting of the empty word. Languages which are not such products are called primes, like irreducible algebraic integers in number theory. On a prime language we can perform decomposition for the subset of it. By repeating this manner, the sequence of products is to be achieved. We call it a multi-decomposition. This chapter is devoted to an algorithm for the induction of a context-free grammar that matches given data. The main idea of this algorithm relies on a multi-decomposition obtained from the positive part of a sample.
Wojciech Wieczorek
Chapter 7. An Algorithm Based on a Directed Acyclic Word Graph
Abstract
In this chapter an algorithm for the induction of a Directed Acyclic Word Graph (DAWG) is proposed. A DAWG can serve as a very efficient data structure for lexicon representation and fast string matching, and it may have a variety of applications as well. Since a DAWG is acyclic the proposed method is suited for GI problems where the target language does not necessarily have to be infinite. The results of experiments showed in the next chapter for a dataset from the domain of bioinformatics reveal that this approach may compete with the current state-of-the-art methods in heuristic DFA induction.
Wojciech Wieczorek
Chapter 8. Applications of GI Methods in Selected Fields
Abstract
The present chapter is a novel contribution to the field of discrete mathematics and bio-informatics by using grammatical inference in the analysis of data. We show how to apply: (1) mathematical modeling in discovery of ordinary generating functions, (2) a decomposition-based approach: (a) to minimize boolean functions (b) to improve combinatorial game performance, and (c) in classification of biological sequences. The overall thrust of the first and third sections included in this chapter is from the author’s research publications. The detailed references are given in the bibliographical section.
Wojciech Wieczorek
Backmatter
Metadaten
Titel
Grammatical Inference
verfasst von
Wojciech Wieczorek
Copyright-Jahr
2017
Electronic ISBN
978-3-319-46801-3
Print ISBN
978-3-319-46800-6
DOI
https://doi.org/10.1007/978-3-319-46801-3