Skip to main content
main-content

Über dieses Buch

Ralf Karrenberg presents Whole-Function Vectorization (WFV), an approach that allows a compiler to automatically create code that exploits data-parallelism using SIMD instructions. Data-parallel applications such as particle simulations, stock option price estimation or video decoding require the same computations to be performed on huge amounts of data. Without WFV, one processor core executes a single instance of a data-parallel function. WFV transforms the function to execute multiple instances at once using SIMD instructions. The author describes an advanced WFV algorithm that includes a variety of analyses and code generation techniques. He shows that this approach improves the performance of the generated code in a variety of use cases.

Inhaltsverzeichnis

Frontmatter

1. Introduction

Abstract
Data-parallel applications play an important role in today’s computing landscape, especially in the High-Performance Computing (HPC) area. Particle simulations, stock option prediction, medical imaging, or video encoding and decoding are just a few problems that can be formulated in a data-parallel way: They each require to do similar computations for large amounts of input data, with no or only limited dependencies between the computations of different inputs.
Ralf Karrenberg

2. Foundations & Terminology

Abstract
This thesis builds on a few key developments in the field of compiler construction that we briefly introduce in this chapter. More detailed explanations are given, for example, by Aho et al. [2006] and Kennedy & Allen [2002].
Ralf Karrenberg

3. Overview

Abstract
In this chapter, we present an overview of the basic Whole-Function Vectorization algorithm, the challenges faced when vectorizing real-world code, and the most important performance implications of the transformation. In Chapter 5, we give a detailed description of the analyses involved, followed by the actual presentation of the WFV algorithm in Chapter 6.
Ralf Karrenberg

4. Related Work

Abstract
In this chapter, we discuss other work that is related to this thesis. We give an overview of the different kinds of approaches for vectorization and an overview of languages that offer automatic vectorization. Furthermore, we summarize work on static and dynamic analyses for SIMD execution and work on dynamic code variants for data-parallel programs.
Ralf Karrenberg

5. SIMD Property Analyses

Abstract
In this chapter, we describe the heart of the WFV algorithm: a set of analyses that determine properties of a function for a SIMD execution model.
The analyses presented in this chapter determine a variety of properties for the instructions, basic blocks, and loops of a scalar source function. The properties related to values are listed in Table 5.1, those related to blocks and loops in Table 5.2. They describe the behavior of the function in data-parallel execution.
Ralf Karrenberg

6. Whole-Function Vectorization

Abstract
In this chapter, we present the main transformation phases of the Whole- Function Vectorization algorithm: Mask Generation, Select Generation, Partial CFG Linearization, and Instruction Vectorization.
Ralf Karrenberg

7. Dynamic Code Variants

Abstract
During or after vectorization of a function, additional optimizations that exploit dynamic properties of the function can be applied. Such an optimization duplicates an existing code path, improves the new code by assuming certain properties, and guards the new path by a runtime check that ensures that the assumed properties hold. We call such a guarded path a dynamic variant.
Ralf Karrenberg

8. Evaluation

Abstract
In this chapter, we present a thorough evaluation of our Whole-Function Vectorization implementation in various scenarios: First, in WFVOpenCL, a prototypical OpenCL CPU driver that was implemented to showcase the benefits of WFV for a data-parallel language [Karrenberg & Hack 2012]. Second, in AnySL, a shading system aimed at highly efficient code without loss of portability and flexibility [Karrenberg et al. 2010]. Third, in Noise, a compiler that allows the user to specify which optimizations should be run on what parts of the code. In addition, we quantify the impact of the key techniques that exploit the results of the analyses described in Chapter 5.
Ralf Karrenberg

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise