Trends in Ecology & Evolution
ReviewApproximate Bayesian Computation (ABC) in practice
Section snippets
Inference with simulations in evolutionary genetics
Natural populations have complex demographic histories: their sizes and ranges change over time, leading to fission and fusion processes that leave signatures on their genetic composition [1]. One ‘promise’ of biology is that molecular data will help us uncover the complex demographic and adaptive processes that have acted on natural populations. The widespread availability of different molecular markers and increased computer power has fostered the development of sophisticated statistical
The ABC of Approximate Bayesian Computation
ABC has its roots in the rejection algorithm, a simple technique to generate samples from a probability distribution 8, 9. The basic rejection algorithm consists of simulating large numbers of datasets under a hypothesized evolutionary scenario. The parameters of the scenario are not chosen deterministically, but sampled from a probability distribution. The data generated by simulation are then reduced to summary statistics, and the sampled parameters are accepted or rejected on the basis of
Bayesian data analysis: building, fitting, and improving the model
The three main steps of Bayesian analysis are formulating the model, fitting the model to data, and improving the model by checking its fit and comparing it with other models [59] (Box 4). When formulating a model, we use our experience and prior knowledge, and sometimes resort to established theories. This step is often mathematical in essence because it encompasses explicit definitions for the likelihood (or a generating mechanism) and the prior distribution. These quantities summarize
Choice and dimension of summary statistics
Carrying out inference based on summary statistics instead of the full dataset inevitably implies discarding potentially useful information. More specifically, if a summary statistic is not sufficient for the parameter of interest, the posterior distribution computed with this statistic would not be equal to the posterior distribution computed with the full dataset [4]. Many areas of evolutionary biology focus on developing informative statistics. An example of a recently developed statistic is
Inference under complex models
ABC is extremely flexible and relatively easy to implement, so inference can be carried out for many complex models in evolution and ecology as long as informative summary statistics are available and simulating data under the model is possible. Templeton 56, 57 argues that the interpretation of why a complex model is preferred is highly subjective when using computer simulations. With the proliferation of highly complex models, Bayesian statisticians and evolutionary biologists have made
Conclusions
Biology is a complex science, so it is inevitable that the observation of biological systems leads us to build complex models. However, the apparent ease of using an inference algorithm such as ABC should never hide the general difficulties of making inferences under complex models. The automatic process of inference is hampered by the model definition and model checking steps, which are case-dependent and highly user-interactive. ABC is far from being as ‘easy as 123’. Important caveats when
Acknowledgements
KC is funded by a postdoctoral fellowship from the Université Joseph Fourier (ABC MSTIC). OF was partially supported by a grant from the Agence Nationale de la Recherche (BLAN06-3146282 MAEV). MB and OF acknowledge the support of the Complex Systems Institute (IXXI). OEG acknowledges support from the EcoChange Project.
Glossary
- Bayes factor
- the ratio of probabilities of two models that is used to evaluate the relative support of one model in relation to another in Bayesian model comparison.
- Bayesian statistics
- a general framework for summarizing uncertainty (prior information) and making estimates and predictions using probability statements conditional on observed data and an assumed model.
- Coalescent theory
- a mathematical theory that describes the ancestral relationships of a sample of ‘individuals’ back to their common
References (104)
Coalescent genealogy samplers: windows into population history
Trends Ecol. Evol.
(2009)Origins and genetic diversity of Pygmy hunter-gatherers from western Central Africa
Curr. Biol.
(2009)Signatures of purifying and local positive selection in human miRNAs
Am. J. Hum. Genet.
(2009)- et al.
Estimating primate divergence times by using conditioned birth-and-death processes
Theor. Popul. Biol.
(2009) - et al.
Model selection in ecology and evolution
Trends Ecol. Evol.
(2004) Phylogeography's past, present, and future: 10 years after Avise, 2000
Mol. Phylogenet. Evol.
(2010)- et al.
Recombination as a point process along sequences
Theor. Popul. Biol.
(1999) Properties of a neutral allele model with intragenic recombination
Theor. Popul. Biol.
(1983)Molecular Markers, Natural History and Evolution
(2004)- et al.
The Bayesian revolution in genetics
Nat. Rev. Genet.
(2004)