Skip to main content

2000 | Buch

Loop Tiling for Parallelism

insite
SUCHEN

Über dieses Buch

Loop tiling, as one of the most important compiler optimizations, is beneficial for both parallel machines and uniprocessors with a memory hierarchy. This book explores the use of loop tiling for reducing communication cost and improving parallelism for distributed memory machines. The author provides mathematical foundations, investigates loop permutability in the framework of nonsingular loop transformations, discusses the necessary machineries required, and presents state-of-the-art results for finding communication- and time-minimal tiling choices. Throughout the book, theorems and algorithms are illustrated with numerous examples and diagrams. The techniques presented in Loop Tiling for Parallelism can be adapted to work for a cluster of workstations, and are also directly applicable to shared-memory machines once the machines are modeled as BSP (Bulk Synchronous Parallel) machines.
Features and key topics: Detailed review of the mathematical foundations, including convex polyhedra and cones; Self-contained treatment of nonsingular loop transformations, code generation, and full loop permutability; Tiling loop nests by rectangles and parallelepipeds, including their mathematical definition, dependence analysis, legality test, and code generation; A complete suite of techniques for generating SPMD code for a tiled loop nest; Up-to-date results on tile size and shape selection for reducing communication and improving parallelism; End-of-chapter references for further reading. Researchers and practitioners involved in optimizing compilers and students in advanced computer architecture studies will find this a lucid and well-presented reference work with numerous citations to original sources.

Inhaltsverzeichnis

Frontmatter

Mathematic Background and Loop Transformation

Frontmatter
Chapter 1. Mathematical Background
Abstract
This chapter discusses some preliminaries on linear algebra, matrix theory and polyhedral theory with an emphasis on convex cones. Several important concepts are illustrated with examples wherever appropriate. Some notations and terminologies that we will use are introduced.
Jingling Xue
Chapter 2. Nonsingular Transformations and Permutability
Abstract
Tiling focuses on very structured computations expressed in perfectly nested loops to expose parallelism or otherwise optimise data locality. In this chapter, the class of perfectly nested loops we consider is defined. Like other iteration-reordering transformations, tiling does not change what is inside an iteration but simply transforms the loop nest into a new one with the same iterations but a new execution order. Since the iterations of a loop nest are considered to be atomic units, it is sufficient to define the data dependences in the loop nest at the level of its iterations. We use a dependence vector abstraction, which encompasses both distance and direction vectors, to capture the vector difference of two dependent iterations. We will also use a dependence polyhedron abstraction to represent a dependence vector equivalently.
Jingling Xue

Tiling as a Loop Transformation

Frontmatter
Chapter 3. Rectangular Tiling
Abstract
Tilling starts with an iteration space and partitions the iteration space into uniform tiles of a given size and shape. The tiles can be any shapes such as triangles, squares, rectangles, parallelograms, hexagons or their higher-dimensional equivalents. In practice, however, squares, rectangles and parallelograms are common. As a result, two types of tiling techniques are distinguished in the literature: rectangular tiling and parallelepiped tiling. In the former case, all tiles are squares, rectangles or their higher-dimensional equivalents. In the latter case, the tiles can also be parallelepipeds (known as parallelograms in the 2-D space).
Jingling Xue
Chapter 4. Parallelepiped Tiling
Abstract
Parallelepiped tiling uses parallelepipeds of a given size and shape to partition the iteration space. By using parallelepipeds rather than merely rectangles or squares, parallelepiped tiling offers more opportunities for exposing parallelism, improving locality and reducing communication overhead.
Jingling Xue

Tiling for Distributed-Memory Machines

Frontmatter
Chapter 5. SPMD Code Generation
Abstract
When compiling a loop nest for execution in the SPMD (Single Program, Multiple Data) mode on a distributed memory machine, the data distribution is typically specified by the programmer and the computation distribution is inferred from the data distribution using the owner-computes rule,1 by which a processor is responsible for computing the data allocated to it. The data distribution specifies which data are owned by which processors. The computation distribution determines which iterations are executed at which processors.
Jingling Xue
Chapter 6. Communication-Minimal Tiling
Abstract
Tiling is a good paradigm for parallel computers with distributed memory. In these multicomputers, the relatively high communication startup cost makes frequent communication very expensive. Tiling can be used to reduce the communication overhead between processors by grouping loop iterations into tiles so that communication takes place per each tile instead of per each iteration.
Jingling Xue
Chapter 7. Time-Minimal Tiling
Abstract
The problem of choosing the best tile size and shape to minimise the execution time of a loop nest, called the time-minimal problem, is a difficult discrete non-linear optimisation problem. Fortunately, the problem can be solved analytically when the iteration space is two-dimensional.
Jingling Xue
Backmatter
Metadaten
Titel
Loop Tiling for Parallelism
verfasst von
Jingling Xue
Copyright-Jahr
2000
Verlag
Springer US
Electronic ISBN
978-1-4615-4337-4
Print ISBN
978-1-4613-6948-6
DOI
https://doi.org/10.1007/978-1-4615-4337-4