Skip to main content

2010 | Buch

Parsing Beyond Context-Free Grammars

insite
SUCHEN

Über dieses Buch

Given that context-free grammars (CFG) cannot adequately describe natural languages, grammar formalisms beyond CFG that are still computationally tractable are of central interest for computational linguists. This book provides an extensive overview of the formal language landscape between CFG and PTIME, moving from Tree Adjoining Grammars to Multiple Context-Free Grammars and then to Range Concatenation Grammars while explaining available parsing techniques for these formalisms. Although familiarity with the basic notions of parsing and formal languages is helpful when reading this book, it is not a strict requirement. The presentation is supported with many illustrations and examples relating to the different formalisms and algorithms, and chapter summaries, problems and solutions. The book will be useful for students and researchers in computational linguistics and in formal language theory.

Inhaltsverzeichnis

Frontmatter
1. Introduction
Abstract
Since the 1980s it has been known that context-free grammars (CFGs) are not powerful enough to describe all phenomena we encounter in natural languages. Examples that show the limitation of CFGs are cross-serial dependencies in Dutch, as in (1), and in Swiss German (Shieber, 1985; Bresnan et al., 1982) and so-called unbounded scrambling phenomena (Becker, Rambow, and Niv, 1992; Rambow, 1994) in, for instance, German and Korean. A German scrambling example is given in (2).
Laura Kallmeyer
2. Grammar Formalisms for Natural Languages
Abstract
For a long time there has been a debate about whether CFGs are sufficiently powerful to describe natural languages. Several approaches have used CFGs, oftentimes enriched with some additional mechanism of transformation (Chomsky, 1956) or with features (Gazdar et al., 1985) for natural languages. These approaches were able to treat a large range of linguistic phenomena.
However, in the 1980s Stuart Shieber was able to prove in (1985) that there are natural languages that cannot be generated by a CFG. Before that, Bresnan et al. (1982) made a similar argument but their proof is based on the tree structures obtained with CFGs while Shieber argues on the basis of weak generative capacity, i.e., of the string languages.
Laura Kallmeyer
3. Parsing: Preliminaries
Abstract
There are different means of specifying a parsing algorithm. The most frequently used are a pseudo-code description of the algorithm and deduction rules.
The pseudo-code description has the advantage of being relatively close to the proper implementation. Consequently, implementing an algorithm given in pseudo-code is more or less immediate. However, the pseudo-code specification makes a lot of choices that actually do not belong to the parsing strategy of the algorithm. It introduces data structures and control structures.
Laura Kallmeyer
4. Tree Adjoining Grammars
Abstract
In this section we introduce TAG. Besides giving the definition of the formalism, we briefly mention the linguistic principles underlying TAG in order to give an idea of the way TAG is used for natural language processing. The formal definitions are taken from (Kallmeyer, 2009).
Laura Kallmeyer
5. Parsing Tree Adjoining Grammars
Abstract
This chapter treats different parsing techniques for TAG. We will extend the standard algorithms for CFG, i.e., present a CYK parser, different types of Earley algorithms and LR parsing for TAG.
Laura Kallmeyer
6. Multiple Context-Free Grammars and Linear Context-Free Rewriting Systems
Abstract
Multiple Context-Free Grammars (MCFGs) have been introduced by Seki et al. (1991) while the equivalent Linear Context-Free Rewriting Systems (LCFRSs) were independently proposed by Vijay-Shanker, Weir, and Joshi (1987). The central idea is to extend CFGs such that non-terminal symbols can span a tuple of strings that need not be adjacent in the input string. In other words, the yield of a non-terminal symbol can be discontinuous. The grammar contains productions of the form \(A_0 \to f[A_1, \dots, A_q]\) where \(A_0, \dots, A_q\) are non-terminals and f is a function describing how to compute the yield of \(A_{0}\) (a string tuple) from the yields of \(A_1, \dots, A_q\).
Laura Kallmeyer
7. Parsing MCFG, LCFRS and Simple RCG
Abstract
In this section, we present different non-directional bottom-up parsing algorithms for MCFG. The algorithms are described in (Seki et al., 1991; Ljunglöf, 2004; Burden and Ljunglöf, 2005).
Laura Kallmeyer
8. Range Concatenation Grammars
Abstract
In this chapter, we turn to Range Concatenation Grammars (RCGs) (Boullier, 1998a; Boullier, 1999b; Boullier, 2000b), the most powerful formalism treated in this book.
In the previous chapter, we have already seen simple RCG, a syntactic variant of LCFRS. As already mentioned, in a simple RCG, we can understand non-terminals as predicates that are satisfied by all the string tuples that are in their yields. A rewriting rule (clause) with left-hand side predicate A is then the specification of a sufficient condition for a string tuple to satisfy the predicate A.
Laura Kallmeyer
9. Parsing Range Concatenation Grammars
Abstract
In the following, we will see different parsing algorithms for RCGs. Let us consider a variant of the RCG for the MIX language from Fig. 8.4 as a running example. This variant is given in Fig. 9.1.
Laura Kallmeyer
10. Automata
Abstract
We have seen the definitions of Finite State Automata (FSA) and of Push-Down Automata (PDA) in Chapter 1. The former accept all regular languages while the latter accept all context-free languages. In this chapter, we present automaton models for different extensions of CFG, in particular for TAG and LCFRS.
Laura Kallmeyer
Backmatter
Metadaten
Titel
Parsing Beyond Context-Free Grammars
verfasst von
Laura Kallmeyer
Copyright-Jahr
2010
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-14846-0
Print ISBN
978-3-642-14845-3
DOI
https://doi.org/10.1007/978-3-642-14846-0

Premium Partner