In the Bayesian approach, we choose and fix a finite set of
J models: under model
j, the underlying state process
X is Markovian, with transition density
$$\begin{aligned} p_j(x,x') = P_j(X_h \in \hbox {d}x' | X_0 = x )/ \hbox {d}x'\qquad (j = 1, \ldots , J), \end{aligned}$$
(12)
where
\(h>0\) is the time step. As before, we give ourselves some prior distribution
\(\pi _j(0)\) over the possible models. Model
j has pricing function
\(\varphi ^a_j(\cdot )\) for derivative
a. We select some loss function
\(Q(\varphi _j(X), Y_t)\), which for the sake of the discussion we might take to be
$$\begin{aligned} Q(y,y') = \alpha \Vert y-y' \Vert ^2 \end{aligned}$$
(13)
for some
\(\alpha >0\). The log-likelihood
\(\ell _j(t)\) of model
j at time
t then updates as
$$\begin{aligned} \ell _j(t) = \ell _j(t-h) + \log p_j(X_{t-h},X_t) -Q ( \varphi _j(X_t), Y_t). \end{aligned}$$
(14)
In practice, it is a good idea to allow the data-generating model to change with a small probability each period, according to some Markov chain with transition matrix
P. This prevents the Bayesian inference from getting stuck at some long-term average values as the number of time steps increases, and reflects a natural requirement that data from the distant past should have less influence on our inference than more recent data. The posterior distribution then updates as
$$\begin{aligned} \pi _j(t) \propto \sum _k \pi _k(t-h)\, p_{kj} \, \exp ( \ell _j(t) ). \end{aligned}$$
(15)
Now everything is easy:
-
The law of \(X_{t+h}\) conditional on \({\mathcal {F}}_t\) has density \(\sum _j \pi _j(t)\, p_j(X_t, \cdot )\);
-
If model
j gives the price of an exotic to be
\(\xi _j\), then take the overall price to be
$$\begin{aligned} \bar{\xi }\equiv \sum _j \pi _j(t) \, \xi _j, \end{aligned}$$
(16)
the posterior mean;
-
What is the error in \(\bar{\xi }\)? It is the mean of a discrete distribution over the values \(\xi _j\) with weights \(p_j(t)\), so we know the variance and all other moments;
-
If model
j gives delta hedge
2 \(H_j\), then to first order we have a delta hedge given by
\(\sum _j \pi _j(t) H_j\).
If we revisit the issues that were problematic for the classical approach, we have answers:
1.
The model prices \(\varphi ^a(X_t,\theta )\) may not exactly match market prices \(Y^a_t\). The Bayesian approach does not say that the prices must be any particular value—it says that any price is a random variable whose distribution we know completely.
2.
Tomorrow we recalibrate and arrive at a value \(\theta ^*_{t+1}\)—so how do we mark-to-market and hedge a derivative that we sold on day t? Using \(\theta = \theta ^*_t\)? Using \(\theta ^*_{t+1}\)? Using some other \(\theta \) value? At all times, the price from the Bayesian approach is the posterior mean of the price—there is no inconsistency;
3.
Would some other model be ‘better’? Other models can be compared simply by adding them to the universe of models in the Bayesian comparison;
4.
\(\theta ^*_t\) is an estimate—what account do we take of estimation error? Nothing is estimated in the Bayesian approach.
At this point, it might appear that the Bayesian approach to inference deals triumphantly with all the conceptual difficulties and inconsistencies of the classical approach, which it does. However, this is not to say that all problems have been eliminated, and in fact there remain very considerable difficulties in applying the Bayesian methodology effectively, to do with computation. To apply the Bayesian approach in the way we have just described requires us in the first place to make a choice of the finite family of models considered, and this is the major issue. If we were only going to consider a one-parameter family of models, we could select a finite set of parameter values (perhaps just a few thousand) which effectively cover the parameter space, and the computational analysis will run ahead with no issues. But if we were looking at a family of models indexed by some parameter
\(\theta \in {\mathbb {R}}^8\), then it will be hard to distribute even one million points in the parameter space in such a way as to cover reasonably effectively, and at this point the computational Bayesian method starts to struggle. We are talking here about particle filtering (also known as sequential Monte Carlo), and although much effort has in the last 30 years been directed towards doing this well, it remains far from a finished technology. All manner of variants of the basic approach have been proposed—more than we could possibly begin to summarize here—which just goes to show that obvious general implementations must often fail.