Introduction

Abstract

In the past compiler writers and designers seemed to form an elite group within computing science, set apart by their esoteric knowledge and their ability to produce large, important system programs which really worked. The admiration of the computing public, whether it was once deserved or not, is no longer merited now that the principles of programming-language implementation are so well understood. Compiler-writing is no longer a mystery.

Richard Bornat

Modular Organisation of Compilers

Frontmatter

1. Phases and Passes

Abstract

The most obvious overall task of a compiler is to read a program in one language — the ‘source’ program in the ‘source’ language — and to translate it to produce an equivalent program in another language — the ‘object’ program in the ‘object’ language. The object language is usually the machine language of a computer, or something close to it, and the source program is usually in a ‘high-level’ language such as FORTRAN, PASCAL, ALGOL 68, SIMULA 67 or BCPL, because translation from high-level language to machine language is the practical problem which compilers exist to solve. Compilers can be written to translate from any kind of programming language into any other, however, with varying degrees of efficiency of the resulting object program.

Richard Bornat

2. Introduction to Translation

Abstract

The most important task that a compiler performs is to translate a program from one language into another — from source language to object language. Simple translation is a mechanism which takes a representation of a fragment of the source program and produces an equivalent fragment in the object language — a code fragment which, when executed by the object machine, will perform the operations specified by the original source fragment.

Richard Bornat

3. Introduction to Syntax Analysis

Abstract

If the task of translation is to go from the tree to the object code, then the task of analysis must be to go from source program to tree. Lexical analysis, as chapter 4 shows, discovers how the input characters are grouped into items and the task of syntax analysis is to discover the way in which the items link together into phrases, the phrases link into larger phrases and so on. The output of the syntax analyser must be a tree or some equivalent representation such as a sequence of ‘triples’ or ‘quadruples’¹ or a postfix string. It’s convenient to divide up the description of syntax analysis into two parts: first to describe how to recognise a phrase and second how to output a tree node which describes that phrase. It turns out that the recognition technique used, of which there are many, doesn’t affect and is not affected by considerations of how to produce analysis output.

Richard Bornat

4. Lexical Analysis and Loading

Abstract

It’s worthwhile to dispose of the simplest and least interesting phases of compiling as early as possible, so as to be able to concentrate in the rest of the book on the more difficult and interesting topics of syntax analysis, translation and code optimisation. The processes of input to and output from a compiler are fairly straightforward. There are simple techniques used to implement these activities and it is on these that I concentrate in this chapter.

Richard Bornat

Translation and Crucial Code Fragments

5. Translating Arithmetic Expressions

Abstract

Figure 5.1 shows an example arithmetic expression which might appear anywhere in a typical program — say on the right-hand side of an assignment statement — together with the parse tree which describes its structure. It also shows the object code which is the best translation of this expression — ‘best’, that is, if the expression is considered in isolation. The example shows the point from which a compiler-writers’ discussion will start — what translation algorithm will generate this ‘best’ code in practice?

Richard Bornat

6. Translating Boolean Expressions

Abstract

It’s possible to treat Boolean expressions in exactly the same way as arithmetic expressions — generate instructions to evaluate the subnodes, generate an instruction (or a sequence of instructions) to combine the Boolean values. Thus Boolean and and Boolean or operations can be translated using the TranBinOp procedure of chapter 5 with the machine instructions AND and OR. However Boolean expressions more often than not appear as the conditional test in an if or a while statement and, as a result, are used more as sections of program which select between alternative paths of computation than as algebraic expressions which compute a truth value. Most good compilers therefore try to generate ‘jumping’ code for Boolean expressions in these contexts. First of all, however, it is necessary to demonstrate the code fragments which are required when Boolean expressions are regarded as value-calculating mechanisms.

Richard Bornat

7. Translating Statements and Declarations

Abstract

Chapters 5 and 6 show the tree-walking mechanism to its best advantage, working on the translation of source program constructs whose efficient implementation is crucial to the efficiency of the object program. Code fragments for statements are rarely so crucial, particularly when the expressions which they contain are efficiently translated. This chapter therefore shows example procedures which translate statements by linking together the code fragments discussed in chapters 5, 6 and 9 and in section III.

Richard Bornat

8. Creating and Using the Symbol Table

Abstract

Chapters 1 and 4 justify the existence of a table which is used by the lexical analyser to correlate multiple occurrences of a single name and is used by the translation phases to convert occurrences of source program names into references to run-time objects. This table is called the symbol table in this book: it is sometimes called a ‘cross-reference table’ a ‘name table’ or an ‘identifier table’. It is used by all the phases from lexical analysis to translation and, as chapter 20 shows, it may be used at run-time to help debug the program. The word ‘table’ is misleading since the information may be stored in a list-structure, a tree, or in any other kind of data structure.

Richard Bornat

9. Accessing an Element of a Data Structure

Abstract

Compiled languages usually have one or more of the kinds of data structures discussed in this chapter. Records are like the nodes of the parse tree used in the examples of this book — multi-element objects which contain a collection of values, some of which may be pointers to other such objects. Accessing an element of a record, via the name of that element, is relatively efficient and may be compile-time checked for validity.

Richard Bornat

10. Code Optimisation

Abstract

In this book I argue that for a modern, well-designed programming language a compiler which includes a tree-walking simple translation phase can produce acceptable object code, particularly when careful attention is paid to the ‘crucial code fragments’ discussed in chapters 5, 6, 7 and 9 and in section III. No matter how carefully the individual code fragments are designed, though, the code produced by such a compiler can rarely be ‘optimal’ and it is always easy to spot redundant instructions in the object program or to think of ways in which a particular source construct could be better translated in a particular setting. Code optimisation techniques attempt systematically to reduce the disparity between the code produced by a compiler and the code which might be generated by a very careful hand-translation of the source program. The simpler techniques, discussed in this chapter, merely re-order and merge code fragments which might be produced by simple translation, but more advanced techniques can go so far as to replace the source program’s algorithm with a more efficient algorithm of equivalent effect.

Richard Bornat

Run-time Support

Frontmatter

11. Procedure Call and Return

Abstract

This chapter discusses the design of code fragments which support procedure call and return in both recursive and non-recursive programming languages. To aid the discussion in later chapters it includes a model of procedure call and return based on the notion of a procedure activation record. A procedure activation is a special case of a ‘micro-process¹’, and the same model can explain the operation of languages such as SIMULA 67 whose control structures are not restricted to simple procedure call and return. It also helps to explain how restrictions on the use of data and procedures in the source language can allow restricted (and therefore less costly) implementations of the general procedure call and return mechanism. The mechanism of procedure call and return in FORTRAN is briefly discussed: then I concentrate on the mechanisms of stack handling which are required to implement procedure call and return in a recursive language on a conventional object machine.

Richard Bornat

12. Arguments and Parameters

Abstract

The tasks set out in figure 11.1 of chapter 11 include the preparation of argument information — for example the value of an argument expression, a pointer to an argument variable or a pointer to an argument vector — and the placing of this information in a location within the new procedure activation’s data frame. This chapter concentrates on this apparently insignificant aspect of run-time support in part because it is so often inefficiently implemented in practice and in part because compilers do not always perform all possible syntactic checks at compile-time.

Richard Bornat

13. Environments and Closures

Abstract

In a block-structured language such as PASCAL, ALGOL 60, ALGOL 68, CORAL 66, PL/1 or SIMULA 67 the text of a procedure may refer to run-time objects or values declared in the current procedure or in textually-enclosing procedures¹ but the address of the data frames which contain these objects isn’t known at compile-time or even at load-time. Thus it is difficult to provide single-instruction access to all data objects which may be accessed by a procedure, since it may be impossible to provide enough index registers to point at all relevant data frames on the stack. The problem of environment addressing is to provide a mechanism which allows efficient access to non-local data objects and which doesn’t impose too great an overhead on procedure entry and exit, when the data frame pointers which define the environment must be set up.

Richard Bornat

14. Efficiency, Heaps and Lifetimes

Abstract

In the discussion in chapters 11, 12 and 13 I emphasise that a recursive programming language such as PASCAL, ALGOL 60 or ALGOL 68 can be implemented almost as efficiently as FORTRAN. In the case of system programming Languages such as BCPL, BLISS or C ‘almost as efficiently’ isn’t good enough: these languages must be implemented extremely efficiently, and certainly more efficiently than FORTRAN, in order to satisfy the demands of their users. In this chapter I discuss two ways in which efficiency can be improved. The first improvement involves the use of special stack-handling instructions which reduce the number of instructions required in procedure call and return. The second improvement relies on source language restrictions which enable the object program to address the stack with only a single register, thus reducing the register manipulations required in procedure call and return.

Richard Bornat

Parsing Algorithms

Frontmatter

15. Notation and Formal Language Theory

Abstract

The theory of formal languages, so far as it is applicable to compiler writing, covers only issues of syntax: it describes those arrangements of symbols which constitute executable (runnable) programs in a programming language. It isn’t concerned at all with the semantics of those programs: what they ‘mean’ or what will be the effect when you run them. Sections II and III above assume an intuitive understanding of the semantics of languages such as PASCAL, FORTRAN, ALGOL 60 or ALGOL 68 and that’s as far as this book goes in the discussion of programming language semantics.

Richard Bornat

16. Top-down Syntax Analysis

Abstract

The most difficult task when writing a top-down syntax analyser is that of preparing a grammar which is suitable for top-down analysis. Once you have manipulated the grammar so that it possesses certain simple properties, which I describe below, it is trivially easy to write a syntax analyser as a collection of mutually recursive procedures, one for each non-terminal symbol in the grammar. Each procedure has the task of recognising and analysing a section of the input which represents a phrase described by its particular non-terminal symbol. It does so by checking the output of the lexical analyser, and by calling procedures associated with the non-terminals of the grammar, in the sequence defined by the production on which it is based.

Richard Bornat

17. Operator Precedence Analysis of Expressions

Abstract

The bottom-up syntax analysis algorithm of chapter 3 handles a subset of programming language arithmetic expressions using a system of numerical priorities. This is the basis of the operator-precedence mechanism. It is one of the oldest forms of syntax analysis, now less popular because it is not as flexible or as good at error detection as the one-track technique nor applicable to as many grammars as the LR (1) technique described in chapter 18, but still very useful in practice in the analysis of expressions of a programming language. It is useful in part because of its efficiency and in part because an operator-precedence analyser can usually be generated direct from the unmodified ‘reference’ grammar for expressions. Its error-processing performance is deficient in theory but with care and some simple pragmatic adjustments it can be made acceptable, given the simple syntactic structure of programming language expressions.

Richard Bornat

18. LR(1) Syntax Analysis

Abstract

Theoretical studies of the properties of programming language grammars and of algorithms for syntax analysis have always been partly motivated by the search for a truly automatic means of constructing a syntax analyser. In the early 1960s so called ‘compiler-compilers’ were popular. One of the earliest was developed at the University of Manchester (Brooker et al., 1963): it included a parser-transcriber which took a syntax description and without alteration transcribed it into a top-down syntax analyser¹. Foster’s SID (Foster,1968) was the first of many parser-generator programs which went further so far as syntax analysis was concerned: its input was a type 2 grammar, on which it performed most of the operations discussed in chapter 16 to produce a one-symbol-look-ahead grammar, which it finally transcribed into an ALGOL 60 program. The transformations required to make a grammar one-track or one-symbol-look-ahead aren’t always simply mechanical, however, and in practice a top-down parser-generator like SID often fails to complete its task. Parser-generators for top-down analysers were little used, therefore, and most syntax analysers were written by hand using the techniques discussed in earlier chapters.

Richard Bornat

Interpreting and Debugging

Frontmatter

19. Interpreters and Interpretation

Abstract

An interpreter is simply a device which takes some representation of a program and carries out the operations which the program specifies — i.e. it mimics or simulates the operations which a machine would carry out if it were directly capable of processing programs written in that language. A compiler takes a similar representation of a program and produces instructions which, when processed by a machine, will carry out the operations which the program specifies. The difference between an interpreter and a machine under this definition is not very great: the microprogram of a computer is an interpreter which reads a machine code program and imitates the behaviour that a ‘real’ machine would express given that program.

Richard Bornat

20. Run-time Debugging Aids

Abstract

A program presented to a compiler is not written for the purpose of being compiled. Compilation is a means to an end and the compiler is just an aid to programming in the original source language. Once a program has been compiled it would be wrong to expect that the user should debug the actual instructions of the object program, because that would view the compiler as an ‘auto-coder’ which merely helps in the business of programming in machine code. The source program prescribes a sequence of actions which the machine must carry out: when the object program behaves unexpectedly the cause is a source program error. Run-time debugging aids must therefore help to find source program errors: only the compiler-writer should be concerned with the possibility that there are errors in the object program which make it other than a faithful representation of the user’s expressed intentions.

Richard Bornat

Springer Professional

Understanding and Writing Compilers

A do-it-yourself guide

Table of Contents

Frontmatter

Introduction

Introduction

Modular Organisation of Compilers

Frontmatter

1. Phases and Passes

2. Introduction to Translation

3. Introduction to Syntax Analysis

4. Lexical Analysis and Loading

Translation and Crucial Code Fragments

5. Translating Arithmetic Expressions

6. Translating Boolean Expressions

7. Translating Statements and Declarations

8. Creating and Using the Symbol Table

9. Accessing an Element of a Data Structure

10. Code Optimisation

Run-time Support

Frontmatter

11. Procedure Call and Return

12. Arguments and Parameters

13. Environments and Closures

14. Efficiency, Heaps and Lifetimes

Parsing Algorithms

Frontmatter

15. Notation and Formal Language Theory

16. Top-down Syntax Analysis

17. Operator Precedence Analysis of Expressions

18. LR(1) Syntax Analysis

Interpreting and Debugging

Frontmatter

19. Interpreters and Interpretation

20. Run-time Debugging Aids

Backmatter