scroll identifier for mobile
main-content

This second edition of the popular textbook contains a comprehensive course in modern probability theory. Overall, probabilistic concepts play an increasingly important role in mathematics, physics, biology, financial engineering and computer science. They help us in understanding magnetism, amorphous media, genetic diversity and the perils of random developments at financial markets, and they guide us in constructing more efficient algorithms.

To address these concepts, the title covers a wide variety of topics, many of which are not usually found in introductory textbooks, such as:

• limit theorems for sums of random variables
• martingales
• percolation
• Markov chains and electrical networks
• construction of stochastic processes
• Poisson point process and infinite divisibility
• large deviation principles and statistical physics
• Brownian motion
• stochastic integral and stochastic differential equations.

The theory is developed rigorously and in a self-contained way, with the chapters on measure theory interlaced with the probabilistic chapters in order to display the power of the abstract concepts in probability theory. This second edition has been carefully extended and includes many new features. It contains updated figures (over 50), computer simulations and some difficult proofs have been made more accessible. A wealth of examples and more than 270 exercises as well as biographic details of key mathematicians support and enliven the presentation. It will be of use to students and researchers in mathematics and statistics in physics, computer science, economics and biology.

### Chapter 1. Basic Measure Theory

Abstract
In this chapter, we lay the measure theoretic foundations of probability theory. We introduce the classes of sets (semirings, rings, algebras, σ-algebras) that allow for a systematic treatment of events and random observations. Using the measure extension theorem, we construct measures, in particular probability measures on σ-algebras. Finally, we define random variables as measurable maps and study the σ-algebras generated by certain maps.
Achim Klenke

### Chapter 2. Independence

Abstract
Measure theory is a linear theory that could not describe the dependence structure of events or random variables. We enter the realm of probability theory exactly at this point, where we define independence of events and random variables. Independence is a pivotal notion of probability theory, and the computation of dependencies is one of the theory’s major tasks. We close this chapter with the investigation of a random graph model (bond percolation on the integer lattice): using independent coin flips the individual bonds in the lattice are either kept or deleted. The question of interest is whether there is an infinite connected component or not.
Achim Klenke

### Chapter 3. Generating Functions

Abstract
It is a fundamental principle of mathematics to map a class of objects that are of interest into a class of objects where computations are easier. This map can be one to one, as with linear maps and matrices, or it may map only some properties uniquely, as with matrices and determinants.
In probability theory, in the second category fall quantities such as the median, mean and variance of random variables. In the first category, we have characteristic functions, Laplace transforms and probability generating functions. These are useful mostly because addition of independent random variables leads to multiplication of the transforms. Before we introduce characteristic functions (and Laplace transforms) later in the book, we want to illustrate the basic idea with probability generating functions that are designed for $$\mathbb{N} _{0}$$-valued random variables.
In the first section, we give the basic definitions and derive simple properties. The next two sections are devoted to two applications: The Poisson approximation theorem and a simple investigation of Galton–Watson branching processes.
Achim Klenke

### Chapter 4. The Integral

Abstract
Based on the notions of measure spaces and measurable maps, we introduce the integral of a measurable map with respect to a general measure. This generalizes the Lebesgue integral that can be found in textbooks on calculus. Furthermore, the integral is a cornerstone in a systematic theory of probability that allows for the definition and investigation of expected values and higher moments of random variables.
In this chapter, we define the integral by an approximation scheme with simple functions. Then we deduce basic statements such as Fatou’s lemma. Other important convergence theorems for integrals follow in Chapters 6 and 7.
Achim Klenke

### Chapter 5. Moments and Laws of Large Numbers

Abstract
The most important characteristic quantities of random variables are the median, expectation and variance. For large n, the expectation describes the typical approximate value of the arithmetic mean (X 1+…+X n )/n of independent and identically distributed random variables (law of large numbers).
In this chapter, we first show the weak law of large numbers and then present Etemadi’s strong law of large numbers. As an example we study the average length of a random message (source coding theorem). Under additional moment conditions, we investigate the speed of convergence in the law of large numbers. Finally, as an important example, we consider the Poisson process.
Achim Klenke

### Chapter 6. Convergence Theorems

Abstract
In the strong and the weak laws of large numbers, we implicitly introduced the notions of almost sure convergence and convergence in probability of random variables. We saw that almost sure convergence implies convergence in measure/probability. This chapter is devoted to a systematic treatment of almost sure convergence, convergence in measure and convergence of integrals.
The key role for connecting convergence in measure and convergence of integrals is played by the concept of uniform integrability.
In particular, we study Lebesgue’s dominated convergence theorem and conditions for interchangeability of derivative and integral.
Achim Klenke

### Chapter 7. L p -Spaces and the Radon–Nikodym Theorem

Abstract
In this chapter, we study the spaces of functions whose pth power is integrable. In Section 7.2, we first derive some of the important inequalities (Hölder, Minkowski, Jensen) and then in Section 7.3 investigate the case p=2 in more detail.
Apart from the inequalities, the important results for probability theory are Lebesgue’s decomposition theorem and the Radon–Nikodym theorem in Section 7.4. At first reading, some readers might wish to skip some of the more analytic parts of this chapter.
Achim Klenke

### Chapter 8. Conditional Expectations

Abstract
If there is partial information on the outcome of a random experiment, the probabilities for the possible events may change. The concept of conditional probabilities and conditional expectations formalizes the corresponding calculus.
We start by defining conditional probabilities given events and then generalize to conditional probabilities and expectations given (the information of) aσ-algebra. Finally, we consider the concept of the regular version of the distribution of a random variable given aσ-algebra.
Achim Klenke

### Chapter 9. Martingales

Abstract
One of the most important concepts of modern probability theory is the martingale, which formalizes the notion of a fair game. In this chapter, we first lay the foundations for the treatment of general stochastic processes (filtrations, adapted processes, stopping times). We then introduce martingales and the discrete stochastic integral as well as the martingale representation theorem and the stability theorem for discrete martingales.
We close with an application to a model from mathematical finance.
Achim Klenke

### Chapter 10. Optional Sampling Theorems

Abstract
In Chapter 9 we saw that martingales are transformed into martingales if we apply certain admissible gambling strategies. In this chapter, we establish a similar stability property for martingales that are stopped at a random time (optional sampling and optional stopping). In order also to obtain these results for submartingales and supermartingales, in the first section, we start with a decomposition theorem for adapted processes. We show the optional sampling and optional stopping theorems in the second section. The chapter finishes with the investigation of random stopping times with an infinite time horizon.
Achim Klenke

### Chapter 11. Martingale Convergence Theorems and Their Applications

Abstract
We became familiar with martingales X=(X n ) nN0 as fair games and found that under certain transformations (optional stopping, discrete stochastic integral) martingales turn into martingales. In this chapter, we will see that under weak conditions (non-negativity or uniform integrability) martingales converge almost surely. Furthermore, the martingale structure implies L p -convergence under assumptions that are (formally) weaker than those of Chapter 7. The basic ideas of this chapter are Doob’s inequality (Theorem 11.4) and the upcrossing inequality (Lemma 11.3).
Achim Klenke

### Chapter 12. Backwards Martingales and Exchangeability

Abstract
With many data acquisitions, such as telephone surveys, the order in which the data come does not matter. Mathematically, we say that a family of random variables is exchangeable if the joint distribution does not change under finite permutations. De Finetti’s structural theorem says that an infinite family of E-valued exchangeable random variables can be described by a two-stage experiment. At the first stage, a probability distribution Ξ on E is drawn at random. At the second stage, independent and identically distributed random variables with distribution Ξ are implemented.
We first define the notion of exchangeability. Then we consider backwards martingales and prove the convergence theorem for them. This is the cornerstone for the proof of de Finetti’s theorem.
Achim Klenke

### Chapter 13. Convergence of Measures

Abstract
One focus of probability theory is distributions that are the result of an interplay of a large number of random impacts. Often a useful approximation can be obtained by taking a limit of such distributions, for example, a limit where the number of impacts goes to infinity. With the Poisson distribution, we have encountered such a limit distribution that occurs as the number of very rare events when the number of possibilities goes to infinity (see Theorem 3.7). In many cases, it is necessary to rescale the original distributions in order to capture the behavior of the essential fluctuations, e.g., in the central limit theorem. While these theorems work with real random variables, we will also see limit theorems where the random variables take values in more general spaces such as the space of continuous functions when we model the path of the random motion of a particle.
In this chapter, we provide the abstract framework for the investigation of convergence of measures. We introduce the notion of weak convergence of probability measures on general (mostly Polish) spaces and derive the fundamental properties. The reader will profit from a solid knowledge of point set topology. Thus we start with a short overview of some topological definitions and theorems.
Achim Klenke

### Chapter 14. Probability Measures on Product Spaces

Abstract
In order to model a random time evolution, the canonical procedure is to construct probability measures on product spaces. Roughly speaking, the first step is to take a probability measure that models the initial distribution. In the second step, on a different probability space, the distribution after one time step is modeled. Then in each subsequent step, on a further probability space, the random state in the next time step given the full history is modeled. On a formal level, we consider products of probability spaces and Markov kernels between such spaces. Finally, the Ionescu-Tulcea theorem shows that the whole procedure can be realized on a single infinite product space. Furthermore, Kolmogorov’s extension theorem shows that a similar construction can be performed even if the time set is not discrete.
Achim Klenke

### Chapter 15. Characteristic Functions and the Central Limit Theorem

Abstract
The main goal of this chapter is the central limit theorem (CLT) for sums of independent random variables (Theorem 15.37) and for independent arrays of random variables (Lindeberg–Feller theorem, Theorem 15.43). For the latter, we prove only that one of the two implications (Lindeberg’s theorem) that is of interest in the applications.
The ideal tools for the treatment of central limit theorems are so-called characteristic functions; that is, Fourier transforms of probability measures. We start with a more general treatment of classes of test functions that are suitable to characterize weak convergence and then study Fourier transforms in greater detail. The subsequent section proves the CLT for real-valued random variables by means of characteristic functions. In the fifth section, we prove a multidimensional version of the CLT.
Achim Klenke

### Chapter 16. Infinitely Divisible Distributions

Abstract
For every n, the normal distribution with expectation μ and variance σ 2 is the nth convolution power of a probability measure (namely of the normal distribution with expectation μ/n and variance σ 2/n). This property is called infinite divisibility and is shared by other probability distributions such as the Poisson distribution and the Gamma distribution. In the first section, we study which probability measures on the real line are infinitely divisible and give an exhaustive description of this class of distributions by means of the Lévy–Khinchin formula.
Unlike the Poisson distribution, the normal distribution is the limit of rescaled sums of independent and identically distributed random variables (central limit theorem). In the second section, we investigate briefly which subclass of the infinitely divisible measures on the real line shares this property.
Achim Klenke

### Chapter 17. Markov Chains

Abstract
In spite of their simplicity, Markov processes with countable state space (and discrete time) are interesting mathematical objects with which a variety of real-world phenomena can be modeled. We give an introduction to the basic concepts (Markov property, transition matrix, recurrence, transience, invariant distribution) and then study certain examples in more detail. For example, we show how to compute numerically very precisely, the expected number of returns to the origin of simple random walk on multidimensional integer lattices.
The connection with discrete potential theory will be investigated later, in Chapter 19. Some readers might prefer to skip the somewhat technical construction of general Markov processes in Section 17.1.
Achim Klenke

### Chapter 18. Convergence of Markov Chains

Abstract
We consider a Markov chain X with invariant distribution π and investigate conditions under which the distribution of X n converges to π as n→∞. Essentially it is necessary and sufficient that the state space of the chain cannot be decomposed into subspaces
• that the chain does not leave,
• or that are visited by the chain periodically; e.g., only for odd n or only for even n.
In the first case, the chain would be called reducible, and in the second case, it would be periodic.
We study periodicity of Markov chains in the first section. In the second section, we prove the convergence theorem. The third section is devoted to applications of the convergence theorem to computer simulations with the so-called Monte Carlo method. In the last section, we describe the speed of convergence to the equilibrium by means of the spectrum of the transition matrix.
Achim Klenke

### Chapter 19. Markov Chains and Electrical Networks

Abstract
There is a natural connection between electrical networks and so called reversible Markov chains. An example for such a chain is the symmetric graph random walk which, in each step, jumps to a randomly chosen graph neighbor at equal probability. This connection is studied here in some detail. As an application, we prove the statement that if such a graph random walk is recurrent, then it is recurrent also on each subgraph. (Although this statement is rather plausible, it is hard to show by different means.) In particular, the graph random walk on a percolation cluster of the planar integer lattice is recurrent.
Achim Klenke

### Chapter 20. Ergodic Theory

Abstract
Laws of large numbers, e.g., for independent and identically distributed random variables X 1,X 2,… , state that $$\lim_{n\rightarrow\infty} \frac{1}{n} \sum_{i=1}^{n} X_{i} =E [ X_{1} ]$$ converges almost surely. Hence averaging over one realization of many random variables is equivalent to averaging over all possible realizations of one random variable. In the terminology of statistical physics this means that the time average, or path (Greek: odos) average, equals the space average. The space in space average is the probability space in mathematical terminology, and in physics it is considered the space of admissible states with a certain energy (Greek: ergon). Combining the Greek words gives rise to the name ergodic theory, which studies laws of large numbers for possibly dependent, but stationary, random variables.
Achim Klenke

### Chapter 21. Brownian Motion

Abstract
Brownian motion is a central object of probability theory. Roughly speaking, we could perform a space-time rescaling of a symmetric nearest neighbor random walk on the integer lattice such that the limiting process has normally distributed increments and continuous paths.
We provide different principles of construction: Firstly, via the Kolmogorov–Chentsov theorem on continuous modifications and secondly via a Fourier series with random coefficients in the fashion of Paley–Wiener or of Lévy. Finally, we show the functional central limit theorem (invariance principle) which states that the properly rescaled partial sum process of centered square integrable i.i.d. random variables converges in path space to Brownian motion.
Achim Klenke

### Chapter 22. Law of the Iterated Logarithm

Abstract
For sums of independent random variables we already know two limit theorems: the law of large numbers and the central limit theorem. The law of large numbers describes for large $$n\in \mathbb{N}$$ the typical behavior, or average value behavior, of sums of n random variables. On the other hand, the central limit theorem quantifies the typical fluctuations about this average value.
In Chapter 23, we will study atypically large deviations from the average value in greater detail. The aim of this chapter is to quantify the typical fluctuations of the whole process as n→∞. The main message is: While for fixed time n the partial sum S n deviates by approximately $$\sqrt{n}$$ from its expected value (central limit theorem), the maximal fluctuation up to time n is of order $$\sqrt{n \log \log n}$$ (Hartman–Wintner theorem, Theorem 22.11).
Achim Klenke

### Chapter 23. Large Deviations

Abstract
The central limit theorem measures the typical fluctuations of sums of (square integrable) random variables. On the other hand, laws of large numbers only show that atypically large deviations have vanishing probabilities (which is just another way of saying that they are atypical).
The aim of this chapter is to give more precise quantitative statements about the small probabilities of atypically large fluctuations (large deviations). We present Cramér’s theorem on large deviations for sums of real random variables as well as Sanov’s theorem on large deviations for empirical distributions in discrete models. Finally, we use Varadhan’s lemma to provide a connection to statistical physics and, in particular, the Weiss ferromagnet.
Achim Klenke

### Chapter 24. The Poisson Point Process

Abstract
Poisson point processes can be used as a cornerstone in the construction of very different stochastic objects such as, for example, infinitely divisible distributions, Markov processes with complex dynamics, objects of stochastic geometry and so forth.
In this chapter, we briefly develop the general framework of random measures and construct the Poisson point process and characterize it in terms of its Laplace transform. As an application we construct a certain subordinator and show that the Poisson point process is the invariant measure of systems of independent random walks. Via the connection with subordinators, in the third section, we construct two distributions that play prominent roles in population genetics: the Poisson-Dirichlet distribution and the GEM distribution.
Achim Klenke

### Chapter 25. The Itô Integral

Abstract
The Itô integral allows us to integrate stochastic processes with respect to the increments of a Brownian motion or a somewhat more general stochastic process. We develop the Itô integral first for Brownian motion and then for generalized diffusion processes (so called Itô processes). In the third section, we derive the celebrated Itô formula. This is the chain rule for the Itô integral that enables us to do explicit calculations with the Itô integral. In the fourth section, we use the Itô formula to obtain a stochastic solution of the classical Dirichlet problem. This in turn is used in the fifth section in order to show that like symmetric simple random walk, Brownian motion is recurrent in low dimensions and transient in high dimensions.
Achim Klenke

### Chapter 26. Stochastic Differential Equations

Abstract
Stochastic differential equations describe the time evolution of certain continuous n-dimensional Markov processes. In contrast with classical differential equations, in addition to the derivative of the function, there is a term that describes the random fluctuations that are coded as an Itô integral with respect to a Brownian motion. Depending on how seriously we take the concrete Brownian motion as the driving force of the noise, we speak of strong and weak solutions. In the first section, we develop the theory of strong solutions under Lipschitz conditions for the coefficients. In the second section, we develop the so-called (local) martingale problem as a method of establishing weak solutions. In the third section, we present some examples in which the method of duality can be used to prove weak uniqueness.
As stochastic differential equations are a very broad subject, and since things quickly become very technical, we only excursively touch some of the most important results, partly without proofs, and illustrate them with examples.
Achim Klenke