main-content

## Über dieses Buch

We have made small changes throughout the book, including the exercises, and we have tried to correct if not all, then at least most of the typos. We wish to thank the many colleagues and students who have commented c- structively on the book since its publication two years ago, and in particular Professors Valentin Petrov, Esko Valkeila, Volker Priebe, and Frank Knight. Jean Jacod, Paris Philip Protter, Ithaca March, 2002 Preface to the Second Printing of the Second Edition We have bene?ted greatly from the long list of typos and small suggestions sent to us by Professor Luis Tenorio. These corrections have improved the book in subtle yet important ways, and the authors are most grateful to him. Jean Jacod, Paris Philip Protter, Ithaca January, 2004 Preface to the First Edition We present here a one semester course on Probability Theory. We also treat measure theory and Lebesgue integration, concentrating on those aspects which are especially germane to the study of Probability Theory. The book is intended to ?ll a current need: there are mathematically sophisticated s- dents and researchers (especially in Engineering, Economics, and Statistics) who need a proper grounding in Probability in order to pursue their primary interests. Many Probability texts available today are celebrations of Pr- ability Theory, containing treatments of fascinating topics to be sure, but nevertheless they make it di?cult to construct a lean one semester course that covers (what we believe) are the essential topics.

## Inhaltsverzeichnis

### 1. Introduction

Abstract
Almost everyone these days is famihar with the concept of Probability. Each day we are told the probability that it will rain the next day; frequently we discuss the probabilities of winning a lottery or surviving the crash of an airplane. The insurance industry calculates (for example) the probability that a man or woman will live past his or her eightieth birthday, given he or she is 22 years old and applying for life insurance. Probability is used in business too: for example, when deciding to build a waiting area in a restaurant, one wants to calculate the probability of needing space for more than n people each day; a bank wants to calculate the probability a loan will be repaid; a manufacturer wants to calculate the probable demand for his product in the future. In medicine a doctor needs to calculate the probability of success of various alternative remedies; drug companies calculate the probability of harmful side effects of drugs. An example that has recently achieved spectacular success is the use of Probability in Economics, and in particular in Stochastic Finance Theory. Here interest rates and security prices (such as stocks, bonds, currency exchanges) are modelled as varying randomly over time but subject to specific probability laws; one is then able to provide insurance products (for example) to investors by using these models. One could go on with such a list. Probability theory is ubiquitous in modern society and in science.
Jean Jacod, Philip Protter

### 2. Axioms of Probability

Abstract
We begin by presenting the minimal properties we will need to define a Probability measure. Hopefully the reader will convince himself (or herself) that the two axioms presented in Definition 2.3 are reasonable, especially in view of the frequency approach (1.1). From these two simple axioms fiows the entire theory. In order to present these axioms, however, we need to introduce the concept of a σ-algebras.
Jean Jacod, Philip Protter

### 3. Conditional Probability and Independence

Abstract
Let A and B be two events defined on a probabifity space. Let f n (A) denote the number of times A occurs divided by n. Intuitively, as n gets large, f n (A) should be close to P(A). Informally, we should have limn→∞ f n (A)=P(A) (see Chapter 1).
Jean Jacod, Philip Protter

### 4. Probabilities on a Finite or Countable Space

Abstract
For Chapter 4, we assume Ω is finite or countable, and we take the σ-algebra A=2Ω (the class of ah subsets of Ω).
Jean Jacod, Philip Protter

### 5. Random Variables on a Countable Space

Abstract
In Chapter 5 we again assume Ω is countable and A=2Ω. Random variable X in this case is defined to be a function from Ω into a set T. A random variable represents an unknown quantity (hence the term variable) that varies not as a variable in an algebraic relation (such as x2−9=0), but rather varies with the outcome of a random event. Before the random event, we know which values X could possibly assume, but we do not know which one it will take until the random event happens. This is analogous to algebra when we know that x can take on a priori any real value, but we do not know which one (or ones) it will take on until we solve the equationx2-9=0 (for example).
Jean Jacod, Philip Protter

### 6. Construction of a Probability Measure

Abstract
Here we no longer assume Ω is countable. We assume given Ω and a σ-algebra A⊂(2Ω. (Ω, A) is called a measurable space. We want to construct probability measures on A. When Ω is finite or countable we have already seen this is simple to do. When Ω is uncountable, the same technique does not work; indeed, a “typical” probability P wih have P({ω|)=0 for all ω, and thus the family of all numbers P({ω|) for ω∈Ω does not characterize the probability P in general.
Jean Jacod, Philip Protter

### 7. Construction of a Probability Measure on R

Abstract
This chapter is an important special case of what we dealt with in Chapter 6. We assume that Ω=R.
Jean Jacod, Philip Protter

### 8. Random Variables

Abstract
In chapter 5 we considered random variables defined on a countable probability space (Ω, A, P). We now wish to consider an arbitrary abstract space, countable or not. If X maps Ω into a state space (F, F), then what we will often want to compute is the probability that X takes its values in a given subset of the state space. We take these subsets to be elements of the σ-algebra F of subsets of F. Thus, we will want to compute P(ω:X(ω)∈A|)=P(XA)=P(X−1(A)), which are three equivalent ways to write the same quantity. The third is enlightening: in order to compute P(X−1(A)), we need X−1(A) to be an element of A, the Ω-algebra on ω on which P is defined. This motivates the following definition.
Jean Jacod, Philip Protter

### 9. Integration with Respect to a Probability Measure

Abstract
Let (ω, A, P) be a probability space. We want to define the expectation, or what is equivalent, the “integral”, of general random variables. We have of course already done this for random variables defined on a countable space ω. The general case (for arbitrary ω) is more delicate.
Jean Jacod, Philip Protter

### 10. Independent Random Variables

Abstract
Recall that two events A and B are independent if knowledge that B has occurred does not change the probability that A will occur: that is, P(A|B)=P(A). This of course is algebraically equivalent to the statement P(AB=P(A)P(B). The latter expression generalizes easily to a finite number of events: A1,….,A n are independent if P(Ai=JA i i = J P(A i ) for every subset J of {1,…,n|n| (see Definition 3.1)
Jean Jacod, Philip Protter

### 11. Probability Distributions on R

Abstract
We have already seen that a probability measure P on (R, B) (with B the Borel sets of R) is characterized by its distribution function
$$F(x) = P(( - \infty ,x])$$
.
Jean Jacod, Philip Protter

### 12. Probability Distributions on R n

Abstract
In Chapter 11 we considered the simple case of distributions on (R, B). The case of distributions on (R n , B n ) for n=2, 3,…. is both analogous and more complicated. [B n denotes the Borel sets of R n .]
Jean Jacod, Philip Protter

### 13. Characteristic Functions

Abstract
It often arises in mattiematics ttiat one can solve problems and/or obtain properties of mathematical objects by “transforming” them into another space, solving the problem there, and then transforming the solution back. Two of the most important transforms are the Laplace transform and the Fourier transform. While these transforms are widely used in the study of differential equations, they are also extraordinarily useful for the study of Probability. They can be used to analyze random variables (e.g., to compute their moments), and they can be used to give short and elegant proofs of the Central Limit Theorem (see Chapter 21). The Fourier transform is the more sophisticated of the two, and it is also the most useful.
Jean Jacod, Philip Protter

### 14. Properties of Characteristic Functions

Abstract
We have seen several examples on how to calculate a characteristic function when given a random variable. Equivalently we have seen examples of how to calculate the fourier transforms of probability measures. For such transforms to be useful, we need to know that knowledge of the transform characterizes the distribution that gives rise to it. The proof of the next theorem uses the stone-weierstrass theorem and thus is a bit advanced for this book. nevertheless we include the proof for the sake of completeness.
Jean Jacod, Philip Protter

### 15. Sums of Independent Random Variables

Abstract
Many of the important uses of Probability Theory flow from the study of sums of independent random variables. A simple example is from Statistics: if we perform an experiment repeatedly and independently, then the “average value” is given by $$\bar x = \frac{1} {n}\sum\nolimits_{j = 1}^n {X_j }$$ where X j represents the outcome of the jth experiment. The r.v. x̄ is then called an estimator for the mean μ of each of the X j . Statistical theory studies when (and how) x̄ converges to μ as n tends to ∞. Even once we show that x̄ tends to μ as n tends to ∞, we also need to know how large n should be in order to be reasonably sure that x̄ is close to the true value μ (which is, in general, unknown). There are other, more sophisticated questions that arise as well: what is the probability distribution of x̄? If we cannot infer the exact distribution of x̄, can we approximate it? How large need n be so that our approximation is sufficiently accurate? If we have prior information about μ, how do we use that to improve upon our estimator x̄? Even to begin to answer some of these fundamentally important questions we need to study sums of independent random variables.
Jean Jacod, Philip Protter

### 16. Gaussian Random Variables (The Normal and the Multivariate Normal Distributions)

Abstract
Let us recall that a Normal random variable with parameters (μ, σ2), where μ ∈ R and σ2 > 0, is a random variable whose density is given by:
$$f\left( x \right) = \frac{1} {{\sqrt {2\pi \sigma } }}e^{{{ - \left( {x - \mu } \right)^2 } \mathord{\left/ {\vphantom {{ - \left( {x - \mu } \right)^2 } 2}} \right. \kern-\nulldelimiterspace} 2}\sigma ^2 } , - \infty < x < \infty .$$
(16.1)
Such a distribution is usually denoted N(μ, σ2). For convenience of notation, we extend the class of normal distributions to include the parameters μ ∈ R and σ2 = 0 as follows: we will denote by N(μ, 0) the law of the constant r.v. equal to μ (this is also the dirac measure at point μ). Of course, the distribution N(μ, 0) has no density, and in this case we sometimes speak oF a degenerate normal distribution. When μ = 0 and σ2 = 1, we say that N(0, 1) is the standard Normal distribution.
Jean Jacod, Philip Protter

### 17. Convergence of Random Variables

Abstract
In elementary mathematics courses (such as Calculus) one speaks of the convergence of functions: fn:RR, then limn→∞ fn = f if limn→∞ fn(x) = f( x ) for all x in R. This is called pointwise convergence of functions. A random variable is of course a function (X: Ω → R for an abstract space Ω), and thus we have the same notion: a sequence X n : Ω → R converges pointwise to X if limn→∞X n (ω) = X(ω), for all ω ∈ Ω. This natural definition is surprisingly useless in probability. The next example gives an indication why.
Jean Jacod, Philip Protter

### 18. Weak Convergence

Abstract
In Chapter 17 we considered four types of convergence of random variables: pointwise everywhere, pointwise almost surely, convergence in pth mean (L p convergence), and convergence in probability. While all but the first differ from types of convergence seen in elementary Calculus courses, they are nevertheless squarely in the analysis tradition, and they can be thought of as variants of standard pointwise convergence. While these types of convergence are natural and useful in probability, there is yet another notion of convergence which is profoundly different from the four we have already seen. This convergence, known as weak convergence, is fundamental to the study of Probability and Statistics. As its name implies, it is a weak type of convergence. The weaker the requirements for convergence, the easier it is for a sequence of random variables to have a limit. What is unusual about weak convergence, however, is that the actual values of the random variables themselves are not important! It is simply the probabilities that they will assume those values that matter. That is, it is the probability distributions of the random variables that will be converging, and not the values of the random variables themselves. It is this difference that makes weak convergence a convergence of a different type than pointwise and its variants.
Jean Jacod, Philip Protter

### 19. Weak Convergence and Characteristic Functions

Abstract
Weak convergence is at the heart of much of probability and statistics. Limit theorems provide much of the justification of statistics, and they also have a myriad of other applications. There is an intimate relationship between weak convergence and characteristic functions, and it is indeed this relationship (provided by the next theorem) that makes characteristic functions so useful in the study of probability and statistics.
Jean Jacod, Philip Protter

### 20. The Laws of Large Numbers

Abstract
One of the fundamental results of Probability Theory is the Strong Law of Large Numbers. It helps to justify our intuitive notions of what probability actually is (Example 1), and it has many direct applications, such as (for example) Monte Carlo estimation theory (see Example 2).
Jean Jacod, Philip Protter

### 21. The Central Limit Theorem

Abstract
The Central Limit Theorem is one of the most impressive achievements of probability theory. From a simple description requiring minimal hypotheses, we are able to deduce precise results. The Central Limit Theorem thus serves as the basis for much of Statistical Theory. The idea is simple: let X1,…., Xj,…. be a sequence of i.i.d. random variables with finite variance. Let S n = ∑ j=1 n . Then for n large, L(S n ) ≈ N(nμ, nΩ2), where E{Xj| = μ and Ω2 = Var(X j ) (all j). The key observation is that absolutely nothing (except a finite variance) is assumed about the distribution of the random variables (X j ) j ≥-1. Therefore, if one can assume that a random variable in question is the sum of many i.i.d. random variables with finite variances, that one can infer that the random variable's distribution is approximately Gaussian. Next one can use data and do Statistical Tests to estimate μ and Ω2, and then one knows essentially everything
Jean Jacod, Philip Protter

### 22. L2 and Hilbert Spaces

Abstract
We suppose given a probability space (ω, F, P). Let L2 denote all (equivalence classes for a.s. equality of) random variables X such that E{X2| < ∞. We henceforth identify all random variables X, Y in L2 that are equal a.s. and consider them to be representatives of the same random variable. This has the consequence that if E{X2| = 0, we can conclude that X = 0 (and not only X = 0 a.s.).
Jean Jacod, Philip Protter

### 23. Conditional Expectation

Abstract
Let X and Y be two random variables with Y taking values in R with X taking on only countably many values. It often arises that we know already the value of X and want to calculate the expected value of Y taking into account the knowledge of X. That is, suppose we know that the event {X = j| for some value j has occurred. The expectation of Y may change given this knowledge. Indeed, if Q(∧) = P∧|X = j), it makes more sense to calculate E Q {Y{ than it does to calculate Ep{Y| (E R {·| denotes expectation with respect to the Probability measure R.)
Jean Jacod, Philip Protter

### 24. Martingales

Abstract
We begin by recalling the Strong Law of Large Numbers (Theorem 20.1): if X n )n≥1 are i.i.d. with E{X n | = μ and $$\sigma _{Xn}^2 < \infty$$ , and if $$S_n = \sum\nolimits_{j = 1}^n {X_j }$$ , then $$\lim _{n \to \infty } \frac{{s_n }} {n} = \mu$$ a.s. Note that since the X n are all independent, the limit must be constant a.s. as a consequence of the tail event zero-one law (Theorem 10.6). It is interesting to study sequences converging to limits that are random variables, not just constant.
Jean Jacod, Philip Protter

### 25. Supermartingales and Submartingales

Abstract
In Chapter 24 we defined a martingale via an equality for certain conditional expectations. If we replace that equality with an inequality we obtain supermartingales and submartingales. Once again (ω, F, P) is a probability space that is assumed given and fixed, and (Fn)n≥-1 is an increasing sequence of σ-algebras.
Jean Jacod, Philip Protter

### 26. Martingale Inequalities

Abstract
One of the reasons martingales have become central to probability theory is that their structure gives rise to some powerful inequalities. Our presentation follows Bass [1].
Jean Jacod, Philip Protter

### 27. Martingale Convergence Theorems

Abstract
In Chapter 17 we studied convergence theorems, but they were all of the type that one form of convergence, plus perhaps an extra condition, implies another type of convergence. What is unusual about martingale convergence theorems is that no type of convergence is assumed — only a certain structure — yet convergence is concluded. This makes martingale convergence theorems special in analysis; the only similar situation arises in ergodic theory.
Jean Jacod, Philip Protter

Abstract
Let (ω, F, P) be a probability space. Suppose a random variable X ≥ 0 a.s. has the property E{X| = 1. Then if we define a set function Q on F by
$$Q\left( \wedge \right) = E\left\{ {1_ \wedge X} \right\}$$
(28.1)
then it is easy to see that Q defines a new probability (see Exercise 9.5). Indeed
$$Q\left( \Omega \right) = E\left\{ {1_\Omega X} \right\} = E\left\{ X \right\} = 1$$
and A1, A2, A3, … are disjoint in F then
$$\begin{gathered} Q\left( {\bigcup\limits_{i = 1}^\infty {A_i } } \right) = E\{ 1_{ \cup _{i = 1}^\infty A_i } X\} \hfill \\ = E\left\{ {\sum\limits_{i = 1}^\infty {1_{A_i } X} } \right\} \hfill \\ = \sum\limits_{i = 1}^\infty {E\{ 1_{A_i } ,X\} } \hfill \\ = \sum\limits_{i = 1}^\infty {Q(A_i )} \hfill \\ \end{gathered}$$
and we have countable additivity. The interchange of the expectation and the summation is justified by the Monotone Convergence Theorem (Theorem 9.1(d)).
Jean Jacod, Philip Protter

### Backmatter

Weitere Informationen