Skip to main content
Top

2019 | Book

Formal Languages and Compilation

Authors: Prof. Stefano Crespi Reghizzi, Prof. Luca Breveglieri, Prof. Angelo Morzenti

Publisher: Springer International Publishing

Book Series : Texts in Computer Science

insite
SEARCH

About this book

This classroom-tested and clearly-written textbook presents a focused guide to the conceptual foundations of compilation, explaining the fundamental principles and algorithms used for defining the syntax of languages, and for implementing simple translators.

This significantly updated and expanded third edition has been enhanced with additional coverage of regular expressions, visibly pushdown languages, bottom-up and top-down deterministic parsing algorithms, and new grammar models.

Topics and features: describes the principles and methods used in designing syntax-directed applications such as parsing and regular expression matching; covers translations, semantic functions (attribute grammars), and static program analysis by data flow equations; introduces an efficient method for string matching and parsing suitable for ambiguous regular expressions (NEW); presents a focus on extended BNF grammars with their general parser and with LR(1) and LL(1) parsers (NEW); introduces a parallel parsing algorithm that exploits multiple processing threads to speed up syntax analysis of large files; discusses recent formal models of input-driven automata and languages (NEW); includes extensive use of theoretical models of automata, transducers and formal grammars, and describes all algorithms in pseudocode; contains numerous illustrative examples, and supplies a large set of exercises with solutions at an associated website.

Advanced undergraduate and graduate students of computer science will find this reader-friendly textbook to be an invaluable guide to the essential concepts of syntax-directed compilation. The fundamental paradigms of language structures are elegantly explained in terms of the underlying theory, without requiring the use of software tools or knowledge of implementation, and through algorithms simple enough to be practiced by paper and pencil.

Table of Contents

Frontmatter
Chapter 1. Introduction
Abstract
We describe the motivations, objectives and structure of the book, and we provide information useful to both instructors and general readers. We explain why it should be useful for the readers who wish to acquire a solid understanding of the foundations of formal languages, including grammars, regular expressions, abstract machines, algorithms for language analysis and translation, syntax to semantic mapping, static program flow analysis and many simple examples.
Stefano Crespi Reghizzi, Luca Breveglieri, Angelo Morzenti
Chapter 2. Syntax
Abstract
We present the fundamental concepts regarding the syntax of formal languages, and we develop two practically relevant language families: regular languages and context-free languages. After a preliminary discussion about the categories of natural, artificial, and formal languages, we introduce the basic elements of formal language theory: alphabet, language operations, and linguistic abstractions. Then, we present the notation of regular expressions, the notion of derivation, and the regular language family with its closure properties. Next, we introduce and thoroughly discuss the context-free (or BNF) grammars, and the related concepts of derivation, recursivity and self-nesting, syntax tree, structural adequacy, and parenthesis languages with examples from JSON. We classify the forms of ambiguity and discuss how to avoid them when possible. We review the grammar normalized forms that are used in later chapters: operator form, invertible form, and nonleft-recursive form, and the corresponding grammar transformations. We compare the expressive power of the families of regular and context-free languages, and we briefly refer to the formal properties that characterize each family. We show the advantages of extending context-free grammars with regular expressions (EBNF). Finally, for completeness we mention the more general phrase-structure and context-sensitive grammar families, at the top of the Chomsky classification of grammars. At the end, we hint to two recent more powerful types, the conjunctive and Boolean grammars.
Stefano Crespi Reghizzi, Luca Breveglieri, Angelo Morzenti
Chapter 3. Finite Automata as Regular Language Recognizers
Abstract
We introduce the finite automata, we discuss their properties, and we present their role as recognizers of regular languages, in particular at lexical compilation level. After overviewing the abstract model of Turing machine, we focus on the finite-state devices. We define the notions of state accessibility, determinism, spontaneous move and ambiguity. We analyze the relations between finite automata and grammars. We present the basic constructions for cleaning, determinizing and minimizing automata. Then we describe the BMC method for deriving regular expressions from automata. Conversely, we present the methods by Thompson and by Berry–Sethi for obtaining finite recognizers from regular expressions. The latter method is based on the concept of locally testable language and is also used for eliminating nondeterminism. For ambiguous regular expressions, we present a new algorithm by extending Berry–Sethi, to perform parsing and string matching. Finally, by exploiting their relation to finite automata, we introduce the regular expressions extended with the operations of complement and intersection. A synopsis of relations between regular expressions, grammars and automata ends the chapter.
Stefano Crespi Reghizzi, Luca Breveglieri, Angelo Morzenti
Chapter 4. Pushdown Automata and Parsing
Abstract
This large chapter covers pushdown automata and parsing algorithms with emphasis on their application to syntax analysis. We start by introducing general and deterministic pushdown automata as the recognizers of context-free and deterministic context-free languages defined by grammars. As an interesting special case, the input-driven languages, also known as visibly pushdown, are described. Then we present the classical deterministic bottom-up \(LR \, (1)\) and top-down \(LL \, (1)\) parsing methods, by adopting a novel approach, called Extended \(LR \, (1)\) (\(ELR \, (1)\)), that allows for the analysis of the languages defined by Extended BNF (EBNF) grammars, i.e., grammars admitting regular expressions in the rule right parts. The EBNF grammars are here represented by a network of recursive finite automata. The Extended \(LL \, (1)\) (\(ELL \, (1)\)) parsers are presented as a special case of the \(ELR \, (1)\) algorithms, such that a simpler and more compact parser implementation by means of recursive procedures is possible. The chapter continues with the definition of operator-precedence grammars and their very efficient parsing algorithm, and an innovative parallel parsing technique suitable for parsing large files on multi-core computers. The various deterministic parsing methods explained are compared. To cover nondeterministic CF languages, the classical Earley method for parsing general context-free grammars is presented in a form generalized to EBNF. At the end, we outline simple methods for parser error recovery and we mention incremental parsing.
Stefano Crespi Reghizzi, Luca Breveglieri, Angelo Morzenti
Chapter 5. Translation Semantics and Static Analysis
Abstract
We present and discuss the translation techniques and the static analysis of programs. After abstractly defining a translation as a relation between strings, we present the operational models of transliteration and purely syntactic translation. These models are defined by means of regular translation expressions and by transduction grammars, also known as syntactic translation schemes; they are, respectively, computed by finite or pushdown transducers. Then we move to the more practical approach known as semantic translation, which consists of functions that operate on the syntactic tree of the source text, and compute the values of semantic attributes. Semantic translations are formalized by means of attribute grammars. We show some typical applications of attribute grammars for checking variable declaration, code generation and semantics-directed parsing. We examine the dependencies between semantic attributes and how they condition the evaluation order, and the number and direction of compiler passes. Finally, we consider the static analysis of programs: the program is viewed as a finite control-flow graph, and a few important properties concerning its correctness and efficiency are established, through an approach based on writing flow equations and iteratively solving them. The approach is illustrated by the problems of liveness intervals and reaching definitions, and their use for memory allocation and constant propagation.
Stefano Crespi Reghizzi, Luca Breveglieri, Angelo Morzenti
Backmatter
Metadata
Title
Formal Languages and Compilation
Authors
Prof. Stefano Crespi Reghizzi
Prof. Luca Breveglieri
Prof. Angelo Morzenti
Copyright Year
2019
Electronic ISBN
978-3-030-04879-2
Print ISBN
978-3-030-04878-5
DOI
https://doi.org/10.1007/978-3-030-04879-2

Premium Partner