Skip to main content

Über dieses Buch

The exponential increase in the use of MCMC methods and the corre­ sponding applications in domains of even higher complexity have caused a growing concern about the available convergence assessment methods and the realization that some of these methods were not reliable enough for all-purpose analyses. Some researchers have mainly focussed on the con­ vergence to stationarity and the estimation of rates of convergence, in rela­ tion with the eigenvalues of the transition kernel. This monograph adopts a different perspective by developing (supposedly) practical devices to assess the mixing behaviour of the chain under study and, more particularly, it proposes methods based on finite (state space) Markov chains which are obtained either through a discretization of the original Markov chain or through a duality principle relating a continuous state space Markov chain to another finite Markov chain, as in missing data or latent variable models. The motivation for the choice of finite state spaces is that, although the resulting control is cruder, in the sense that it can often monitor con­ vergence for the discretized version alone, it is also much stricter than alternative methods, since the tools available for finite Markov chains are universal and the resulting transition matrix can be estimated more accu­ rately. Moreover, while some setups impose a fixed finite state space, other allow for possible refinements in the discretization level and for consecutive improvements in the convergence monitoring.



1. Markov Chain Monte Carlo Methods

As the complexity of the models covered by statistical inference increases, the need for new computational tools gets increasingly pressing. Simulation has always been a natural tool for statisticians (as opposed to numerical analysis) and simulation via Markov chains has been recently exposed as a broad spectrum method, which allows to tackle problems of higher complexity (as shown by the subsequent literature). Although the purpose of this book is to introduce some control techniques for such simulation methods, we feel it is necessary to recall in this chapter the main properties of Markov Chain Monte Carlo (MCMC) algorithms. Moreover, we take the opportunity to introduce notations and our favorite (so-called benchmark) examples, which will be used over and over in the first half of the book. Good introductions to the topic are Gelfand and Smith (1990) seminal paper and Tanner (1996) monograph, as well as Casella and George (1992) and Chib and Greenberg (1996) tutorial papers, and Gelman and Rubin (1992), Geyer (1992) and Besag et al. (1995) surveys, while Neal (1993), Gilks, Richardson and Spiegelhalter (1996), Robert (1996c), Gamerman (1997), Robert and Casella (1998) and Gelfand and Smith (1998) provide deeper entries. In this explosive area of research, many books, monographs and long surveys are currently on their way and it is quite impossible to keep an exact account of the current’ MCMC production!
Christian P. Robert, Sylvia Richardson

2. Convergence Control of MCMC Algorithms

There is an obvious difference between the theoretical guarantee that f is the stationary distribution of a Markov chain (x(t)) and the practical requirement that (1.2) is close enough to (1.1). It is thus necessary to develop diagnostic tools towards the latter goal, namely convergence control.1While control is the topic of this book, we first present in this chapter some of the usual methods, before embarking upon the description of new control methods. The reader is referred to the survey papers of Brooks (1998), Brooks and Roberts (1998) and Cowles and Carlin (1996), as well as to Robert and Casella (1998) and Gelfand and Smith (1998) for details.
Christian P. Robert, Dominique Cellier

3. Linking Discrete and Continuous Chains

When comparing discrete and continuous Markov chains from a theoretical perspective (through, say, Kemeny and Snell, 1960, or Feller, 1970, vol. 1, for the former and Revuz, 1984, or Meyn and Tweedie, 1993, for the latter), a striking difference is the scale of the machinery needed to deal with continuous Markov chains and, as a corollary, the relative lack of intuitive basis behind theoretical results for continuous Markov chains. This gap is the major incentive for this book, in the sense that convergence controls methods must keep away both from the traps of ad hoc devices which are “seen” to work well on artificial and contrived examples, and from the quagmire of formal convergence results which, while being fascinating from a theoretical point of view, either fail to answer the true purpose of the analysis, i.e. to decide whether or not the chain(s) at hand have really converged, or require such an involved analysis that they are not customarily applicable besides case-study setups. This is also why techniques such as Raftery and Lewis (1996) are quite alluring, given their intuitive background and theoretical (quasi-)validity.
Anne Philippe, Christian P. Robert

4. Valid Discretization via Renewal Theory

As discussed in Chapter 2, an important drawback of Raftery and Lewis’ (1992a, 1996) convergence control method is that the discretized version of the Markov chain is not a Markov chain itself, unless a stringent lumpability condition holds (see Kemeny and Snell, 1960). This somehow invalidates the binary control method, although it provides useful preliminary information on the required number of iterations. However, the discrete aspect of the criterion remains attractive for its intuitive flavour and, while the Duality Principle of Chapter 1 cannot be invoked in every setting, this chapter shows how renewal theory can be used to construct a theoretically valid discretization method for general Markov chains. We then consider some convergence control methods based on these discretized chains, even though the chains can be used in many alternative ways (see also Chapter 5).
Chantal Guihenneuc-Jouyaux, Christian P. Robert

5. Control by the Central Limit Theorem

Distinctions between single chain and parallel chain control methods have already been discussed in Chapter 2. However, as Brooks and Roberts (1998) point out, other characteristics must be taken into account for evaluating control methods. An important criterion is the programming investment: diagnostics requiring problem-specific computer codes for their implementation (e.g., requiring knowledge of the transition kernel of the Markov chain) are far less usable for the end user than diagnostics solely based upon the outputs from the sampler, which can use available generic codes. Another criterion is interpretability, in the sense that a diagnostic should preferably require no interpretation or experience from the user.
Didier Chauveau, Jean Diebolt, Christian P. Robert

6. Convergence Assessment in Latent Variable Models: DNA Applications

A DNA sequence is a long succession of four nucleotides or bases, Adenine, Cytosine, Guanine and Thymine, and can be represented by a finite series \( x = \left( {{x_1}, \cdots,{x_n}} \right) \) ;each base xttaken from the alphabet \( x = \left\{ {A,C,G,T} \right\} \) It turns out that there is an important heterogeneity within the genome.1 Statistical models based on a complete homogeneity assumption are thus unrealistic. We propose a hidden Markov chain approach to identify homogeneous regions in the DNA sequence. The breakpoints which define these regions may thus separate parts of the genome with different functional or structural properties.
Florence Muri, Didier Chauveau, Dominique Cellier

7. Convergence Assessment in Latent Variable Models: Application to the Longitudinal Modelling of a Marker of HIV Progression

Infection’ with Human Immunodeficiency Virus type-1 (HIV-1), the virus that leads to AIDS, is associated with a decline in CD4 cell count, a type of white blood cell involved in the immune system. In order to monitor the health status and disease progression of HIV infected patients, CD4 counts have thus been frequently used as a marker. In particular, Markov process models of the natural history of HIV play an important part in AIDS modelling (Longini et al., 1991, Freydman, 1992, Longini, Clark and Karon, 1993, Gentleman et al., 1994, Satten and Longini, 1996).
Chantal Guihenneuc-Jouyaux, Sylvia Richardson, Virginie Lasserre

8. Estimation of Exponential Mixtures

Exponential mixtures are distributions of the form
$$ \sum\limits_{i = 0}^k {{p_i}} \varepsilon xp\left( {{\lambda _i}} \right) $$
With\( {p_o} + \ldots + {p_k} = 1 \)and\( {\lambda _i} > 0\left( {0 \leqslant i \leqslant k} \right) \) Considering the huge literature on normal mixtures (see §3.4), the treatment of exponential mixtures is rather limited. A possible reason, as illustrated in this chapter, is that the components of (8.1) are much more difficult to distinguish than in the normal case of §3.4. Exponential mixtures with a small number of components are nonetheless used in the modeling of phenomena with positive output and long asymmetric tails, mainly in survival and duration setups, like the applications mentioned in Titterington, Smith and Makov (1985, p.17-21). We also illustrate this modeling in the case of hospitalization durations for which a two or three component exponential mixture is appropriate.
Marie-Anne Gruet, Anne Philippe, Christian P. Robert


Weitere Informationen