2021 | Book

# Principles of Parallel Scientific Computing

## A First Guide to Numerical Concepts and Programming Methods

Author: Prof. Dr. Tobias Weinzierl

Publisher:

Book Series : Undergraduate Topics in Computer Science

Part of:

insite
SEARCH

New insight in many scientific and engineering fields is unthinkable without the use of numerical simulations running efficiently on modern computers. The faster we get new results, the bigger and accurate are the problems that we can solve. It is the combination of mathematical ideas plus efficient programming that drives the progress in many disciplines. Future champions in the area thus will have to be qualified in their application domain, they will need a profound understanding of some mathematical ideas, and they need the skills to deliver fast code.

The present textbook targets students which have programming skills already and do not shy away from mathematics, though they might be educated in computer science or an application domain. It introduces the basic concepts and ideas behind applied mathematics and parallel programming that we need to write numerical simulations for today’s multicore workstations. Our intention is not to dive into one particular application domain or to introduce a new programming language – we lay the generic foundations for future courses and projects in the area.

The text is written in an accessible style which is easy to digest for students without years and years of mathematics education. It values clarity and intuition over formalism, and uses a simple N-body simulation setup to illustrate basic ideas that are of relevance in various different subdomains of scientific computing. Its primary goal is to make theoretical and paradigmatic ideas accessible to undergraduate students and to bring the fascination of the field across.

#### Introduction: Why to Study the Subject

##### 1. The Pillars of Science
Abstract
We discuss the two classic approaches (pillars) to acquire new insight in science: experiment and theory. As computer simulations supersede the classic experiment and blackboard theory, we obtain a third pillar. This notion of a third pillar can be challenged, but it motivates us to define new research areas assembled under the umbrella of the catchphrase computational science and engineering. After a quick sketch of the simulation pipeline—typical steps required in any computational discipline—we introduce scientific computing as area conducting research into the tools enabling insight through computing.
Tobias Weinzierl
##### 2. Moore Myths
Abstract
We introduce Moore’s Law and Dennard scaling. They illustrate that further increases of compute capabilities will result from an increase in parallelism over the upcoming years; which implies that our codes have to be prepared to exploit this parallelism. We identify three levels of parallelism within a CPU (inter-node, intra-node and vector parallelism) and eventually characterise GPUs within these levels.
Tobias Weinzierl
##### 3. Our Model Problem (Our First Encounter with the Explicit Euler)
Abstract
A simple N-body problem serves as demonstrator for the topics discussed in upcoming sections. We introduce an explicit Euler for this showcase problem informally, and thus obtain a baseline toy code to play with. The introduction allows us to discriminate numerical approximations from analytical solutions.
Tobias Weinzierl

#### How the Machine Works

##### 4. Floating Point Numbers
Abstract
Numbers from $$\mathbb {R}$$ cannot be handled directly by computers, as they can have infinitely many digits after the decimal point. We have to truncate their representation. Storage schemes with a finite number of digits before and after the decimal point struggle: There are always regimes of values where they work prefectly fine and regimes where they cannot represent values accurately. We therefore introduce the concept of floating point numbers and discuss the IEEE standard. Its concept of normalisation allows us to establish the notion of machine precision, as we realise that the truncation into a finite number of digits equals rounding and introduces a bounded relative error. Since we work with approximations to real numbers, we have to be careful when we compare floating point numbers in our codes.
Tobias Weinzierl
##### 5. A Simplistic Machine Model
Abstract
The von Neumann architecture equips us with an understanding how a computer processes our program code step by step. Our brief review of the architecture highlights the role of registers for our numerical computations. We will augment the straightforward definition of a register later to realise vector processing, i.e. the first flavour of hardware parallelism, while the architecture review provides clues why many numerical codes struggle with performance flaws and bottlenecks on our CPUs.
Tobias Weinzierl

#### Floating Point Number Crunching

##### 6. Round-Off Error Propagation
Abstract
Round-off errors creep into our calculations with every non-trivial floating point instruction. We introduce a formalism how to analyse round-off errors in implementations, and discuss when and why numerical calculations suffer from a lack of associativity, cancellation or accumulation stagnation.
Tobias Weinzierl
##### 7. SIMD Vector Crunching
Abstract
Vector computing can be read as equipping a processing unit with replicated ALUs. To benefit from this hardware concurrency, we have to phrase our calculations as operation sequences over small vectors. We discuss what well-suited loops that can exploit vector units have to look like, we discuss the relation between loops with vectorisation and partial loop unrolling, and we discuss the difference between horizontal and vertical vectorisation. These insights allow us to study realisation flavours of vectorisation on the chip: Are wide vector registers used, or do we introduce hardware threads with lockstepping, how do masking and blending facilitate slight thread divergence, and why does non-continuous data access require us to gather and scatter register content?
Tobias Weinzierl
##### 8. Arithmetic Stability of an Implementation
Abstract
Round-off errors are tied to the implementation, i.e. two different implementations of the same algorithm might exhibit different error propagation patterns. We introduce the term arithmetic stability and formalise how to find out if an implementation is stable under round-off errors. The formalism allows us to show that our previously studied N-body problem is indeed single-step stable and also stable over longer simulation runs. For implementations of extreme-scale scalar products, e.g., stability however does not come for free, and we have to carefully arrange all computational steps.
Tobias Weinzierl
##### 9. Vectorisation of the Model Problem
Abstract
With an understanding how computers handle floating-point calculations, we vectorise our model problem. For this, we first categorise different parallelisation programming techniques, before we commit to a pragma-based approach, i.e. rely on auto-vectorisation plus vectorisation hints to the compiler. Compilers can give us feedback where they succeed to vectorise, which helps us when we introduce OpenMP pragmas to our code. We wrap up the vectorisation topic with a discussion of showstoppers to vectorisation, such as aliasing, reductions, or function calls.
Tobias Weinzierl

#### Basic Numerical Techniques and Terms

##### 10. Conditioning and Well-Posedness
Abstract
Hadamard’s definition of well-posedness makes us start to think if a particular problem is solvable on a computer at all. We introduce the terms well-conditioned and ill-conditioned, and finally define the condition number to quantify how sensitive an algorithm is to slight changes of its input. After a brief excursus what the condition number means for linear equation systems—it turns out to be a special case of our more generic definition—we discuss the interplay of the conditioning with forward stability and introduce backward stability as evaluation condition number.
Tobias Weinzierl
##### 11. Taylor Expansion
Abstract
We revise the Taylor series and Taylor expansion. The text’s intention is to remind us of this important tool, and to refresh some important technical skills such as Taylor for functions with multiple arguments. We reiterate the smoothness assumptions we implicitly employ. This chapter revisits Taylor expansion in a very hands-on fashion. Feel free to skip the whole chapter if you don’t have to refresh your knowledge.
Tobias Weinzierl
##### 12. Ordinary Differential Equations
Abstract
There is a lot of definitions, terminology and theory around ordinary differential equations (ODEs). We introduce the terminology and properties that we need for our programming as well as the knowledge that we can exploit when we debug and assess codes. With the skills to rewrite any ODE into a system of first-order ODEs and the Taylor expansion, we finally have the mathematical tools to derive our explicit Euler formally. This chapter revisits some very basic properties of ordinary differential equations (ODEs). Feel free to skip the majority of this chapter and jump straight into Sect. 12.3, where we bring our ODE knowledge and the ideas behind Taylor together.
Tobias Weinzierl
##### 13. Accuracy and Appropriateness of Numerical Schemes
Abstract
For the accuracy and stability of numerical codes, we have to ensure that a discretisation is consistent and is stable. Both depend on the truncation error, while we distinguish zero- and A-stability. We end up with the notion of convergence according to the Lax Equivalence Theorem, and finally discuss how we can compute the convergence order experimentally.
Tobias Weinzierl

#### Using a Multicore Computer

##### 14. Writing Parallel Code
Abstract
We distinguish three different strategies (competition, data decomposition, functional decomposition) when we design parallelisation schemes. SIMD’s vectorisation is a special case of data parallelism which we now generalise to MIMD machines where we distinguish (logically) shared and distributed memory. Our text discusses some vendor jargon regarding hardware, before we introduce BSP and SPMD as programming techniques.
Tobias Weinzierl
##### 15. Upscaling Models and Scaling Measurements
Abstract
Two different upscaling models dominate our work: weak and strong scaling. They are introduced as Amdahl’s and Gustafson’s law, and allow us to predict and understand how well a BSP (or other) parallelisation is expected to upscale. We close the brief speedup discussion with some remarks how to present acquired data.
Tobias Weinzierl
##### 16. OpenMP Primer: BSP on Multicores
Abstract
We introduce OpenMP for multicore programming through the eyes of the BSP programming model. An overview over some terminology for the programming of multicore CPUs and some syntax discussions allow us to define well-suited loops for OpenMP BSP. Following a sketch of the data transfer throughout the termination of the bulk, we abandon the academic notion of BSP and introduce shared memory data exchange via critical sections and atomics. Remarks on variable visibility policies and some scheduling options close the chapter.
Tobias Weinzierl
Abstract
Starting from a brief revision of task terminology, we introduce task-based programming in OpenMP and the required technical terms. After a summary of variable visibilities for tasks—all concepts are well-known from the BSP world though some defaults change—we conclude the task discussion with the introduction of explicit task dependencies.
Tobias Weinzierl
##### 18. GPGPUs with OpenMP
Abstract
GPGPUs can be read as multicore chips within the compute node which host their own, separate memory and multiple cores which are slow yet nevertheless extremely powerful. This power results from SIMD on a massive scale. We give a brief, very high level overview of the GPU architecture paradigm, before we introduce OpenMP’s target blocks. The data movement between host and accelerator requires dedicated attention, while a brief discussion of collectives, function offloading and structs (classes) for GPUs close the discussion.
Tobias Weinzierl

#### Faster and More Accurate Numerical Codes

##### 19. Higher Order Methods
Abstract
To construct higher-order time stepping methods, we discuss two paradigms: On the one hand, we can write down an integral equation for the time stepping and construct more accurate integrators for the right-hand side. On the other hand, we can shoot multiple times into the future to obtain a guess for the additional terms from the Taylor expansion. The first approach leads to multistep methods, the second approach introduces the principles behind Runge-Kutta methods. After a brief discussion how to program arbitrary order Runge-Kutta once the Butcher tableau is known, we briefly sketch leapfrog-type methods which yield particularly elegant, second order methods for our N-body problem.
Tobias Weinzierl
Abstract
The discussion around A-stability makes it clear that ODE integrators are very sensitive w.r.t. time step choices. Some simple thought experiments illustrate that a fixed time step size for a whole simulation often is inappropriate. We sketch one simple multiscale approach to guide h-adaptivity (adaptive time step choices) and one simple p-adaptivity strategy.
Tobias Weinzierl
##### Backmatter
Title
Principles of Parallel Scientific Computing
Author
Prof. Dr. Tobias Weinzierl