nach oben

2010 | Buch

Kapitel lesen Erstes Kapitel lesen

Parsing Beyond Context-Free Grammars

verfasst von: Laura Kallmeyer

Verlag: Springer Berlin Heidelberg

Buchreihe : Cognitive Technologies

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Given that context-free grammars (CFG) cannot adequately describe natural languages, grammar formalisms beyond CFG that are still computationally tractable are of central interest for computational linguists. This book provides an extensive overview of the formal language landscape between CFG and PTIME, moving from Tree Adjoining Grammars to Multiple Context-Free Grammars and then to Range Concatenation Grammars while explaining available parsing techniques for these formalisms. Although familiarity with the basic notions of parsing and formal languages is helpful when reading this book, it is not a strict requirement. The presentation is supported with many illustrations and examples relating to the different formalisms and algorithms, and chapter summaries, problems and solutions. The book will be useful for students and researchers in computational linguistics and in formal language theory.

Inhaltsverzeichnis

Frontmatter

1. Introduction

Abstract

Since the 1980s it has been known that context-free grammars (CFGs) are not powerful enough to describe all phenomena we encounter in natural languages. Examples that show the limitation of CFGs are cross-serial dependencies in Dutch, as in (1), and in Swiss German (Shieber, 1985; Bresnan et al., 1982) and so-called unbounded scrambling phenomena (Becker, Rambow, and Niv, 1992; Rambow, 1994) in, for instance, German and Korean. A German scrambling example is given in (2).

Laura Kallmeyer

2. Grammar Formalisms for Natural Languages

Abstract

For a long time there has been a debate about whether CFGs are sufficiently powerful to describe natural languages. Several approaches have used CFGs, oftentimes enriched with some additional mechanism of transformation (Chomsky, 1956) or with features (Gazdar et al., 1985) for natural languages. These approaches were able to treat a large range of linguistic phenomena.

However, in the 1980s Stuart Shieber was able to prove in (1985) that there are natural languages that cannot be generated by a CFG. Before that, Bresnan et al. (1982) made a similar argument but their proof is based on the tree structures obtained with CFGs while Shieber argues on the basis of weak generative capacity, i.e., of the string languages.

Laura Kallmeyer

3. Parsing: Preliminaries

Abstract

There are different means of specifying a parsing algorithm. The most frequently used are a pseudo-code description of the algorithm and deduction rules.

The pseudo-code description has the advantage of being relatively close to the proper implementation. Consequently, implementing an algorithm given in pseudo-code is more or less immediate. However, the pseudo-code specification makes a lot of choices that actually do not belong to the parsing strategy of the algorithm. It introduces data structures and control structures.

Laura Kallmeyer

4. Tree Adjoining Grammars

Abstract

In this section we introduce TAG. Besides giving the definition of the formalism, we briefly mention the linguistic principles underlying TAG in order to give an idea of the way TAG is used for natural language processing. The formal definitions are taken from (Kallmeyer, 2009).

Laura Kallmeyer

5. Parsing Tree Adjoining Grammars

Abstract

This chapter treats different parsing techniques for TAG. We will extend the standard algorithms for CFG, i.e., present a CYK parser, different types of Earley algorithms and LR parsing for TAG.

Laura Kallmeyer

6. Multiple Context-Free Grammars and Linear Context-Free Rewriting Systems

Abstract

Multiple Context-Free Grammars (MCFGs) have been introduced by Seki et al. (1991) while the equivalent Linear Context-Free Rewriting Systems (LCFRSs) were independently proposed by Vijay-Shanker, Weir, and Joshi (1987). The central idea is to extend CFGs such that non-terminal symbols can span a tuple of strings that need not be adjacent in the input string. In other words, the yield of a non-terminal symbol can be discontinuous. The grammar contains productions of the form \(A_0 \to f[A_1, \dots, A_q]\) where \(A_0, \dots, A_q\) are non-terminals and f is a function describing how to compute the yield of \(A_{0}\) (a string tuple) from the yields of \(A_1, \dots, A_q\).

Laura Kallmeyer

7. Parsing MCFG, LCFRS and Simple RCG

Abstract

In this section, we present different non-directional bottom-up parsing algorithms for MCFG. The algorithms are described in (Seki et al., 1991; Ljunglöf, 2004; Burden and Ljunglöf, 2005).

Laura Kallmeyer

8. Range Concatenation Grammars

Abstract

In this chapter, we turn to Range Concatenation Grammars (RCGs) (Boullier, 1998a; Boullier, 1999b; Boullier, 2000b), the most powerful formalism treated in this book.

In the previous chapter, we have already seen simple RCG, a syntactic variant of LCFRS. As already mentioned, in a simple RCG, we can understand non-terminals as predicates that are satisfied by all the string tuples that are in their yields. A rewriting rule (clause) with left-hand side predicate A is then the specification of a sufficient condition for a string tuple to satisfy the predicate A.

Laura Kallmeyer

9. Parsing Range Concatenation Grammars

Abstract

In the following, we will see different parsing algorithms for RCGs. Let us consider a variant of the RCG for the MIX language from Fig. 8.4 as a running example. This variant is given in Fig. 9.1.

Laura Kallmeyer

10. Automata

Abstract

We have seen the definitions of Finite State Automata (FSA) and of Push-Down Automata (PDA) in Chapter 1. The former accept all regular languages while the latter accept all context-free languages. In this chapter, we present automaton models for different extensions of CFG, in particular for TAG and LCFRS.

Laura Kallmeyer

Backmatter

Titel: Parsing Beyond Context-Free Grammars
verfasst von: Laura Kallmeyer
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-14846-0
Print ISBN: 978-3-642-14845-3
DOI: https://doi.org/10.1007/978-3-642-14846-0

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

1. Introduction

2. Grammar Formalisms for Natural Languages

3. Parsing: Preliminaries

4. Tree Adjoining Grammars

5. Parsing Tree Adjoining Grammars

6. Multiple Context-Free Grammars and Linear Context-Free Rewriting Systems

7. Parsing MCFG, LCFRS and Simple RCG

8. Range Concatenation Grammars

9. Parsing Range Concatenation Grammars

10. Automata

Backmatter

Premium Partner