Review
Approximate Bayesian Computation (ABC) in practice

https://doi.org/10.1016/j.tree.2010.04.001Get rights and content

Understanding the forces that influence natural variation within and among populations has been a major objective of evolutionary biologists for decades. Motivated by the growth in computational power and data complexity, modern approaches to this question make intensive use of simulation methods. Approximate Bayesian Computation (ABC) is one of these methods. Here we review the foundations of ABC, its recent algorithmic developments, and its applications in evolutionary biology and ecology. We argue that the use of ABC should incorporate all aspects of Bayesian data analysis: formulation, fitting, and improvement of a model. ABC can be a powerful tool to make inferences with complex models if these principles are carefully applied.

Section snippets

Inference with simulations in evolutionary genetics

Natural populations have complex demographic histories: their sizes and ranges change over time, leading to fission and fusion processes that leave signatures on their genetic composition [1]. One ‘promise’ of biology is that molecular data will help us uncover the complex demographic and adaptive processes that have acted on natural populations. The widespread availability of different molecular markers and increased computer power has fostered the development of sophisticated statistical

The ABC of Approximate Bayesian Computation

ABC has its roots in the rejection algorithm, a simple technique to generate samples from a probability distribution 8, 9. The basic rejection algorithm consists of simulating large numbers of datasets under a hypothesized evolutionary scenario. The parameters of the scenario are not chosen deterministically, but sampled from a probability distribution. The data generated by simulation are then reduced to summary statistics, and the sampled parameters are accepted or rejected on the basis of

Bayesian data analysis: building, fitting, and improving the model

The three main steps of Bayesian analysis are formulating the model, fitting the model to data, and improving the model by checking its fit and comparing it with other models [59] (Box 4). When formulating a model, we use our experience and prior knowledge, and sometimes resort to established theories. This step is often mathematical in essence because it encompasses explicit definitions for the likelihood (or a generating mechanism) and the prior distribution. These quantities summarize

Choice and dimension of summary statistics

Carrying out inference based on summary statistics instead of the full dataset inevitably implies discarding potentially useful information. More specifically, if a summary statistic is not sufficient for the parameter of interest, the posterior distribution computed with this statistic would not be equal to the posterior distribution computed with the full dataset [4]. Many areas of evolutionary biology focus on developing informative statistics. An example of a recently developed statistic is

Inference under complex models

ABC is extremely flexible and relatively easy to implement, so inference can be carried out for many complex models in evolution and ecology as long as informative summary statistics are available and simulating data under the model is possible. Templeton 56, 57 argues that the interpretation of why a complex model is preferred is highly subjective when using computer simulations. With the proliferation of highly complex models, Bayesian statisticians and evolutionary biologists have made

Conclusions

Biology is a complex science, so it is inevitable that the observation of biological systems leads us to build complex models. However, the apparent ease of using an inference algorithm such as ABC should never hide the general difficulties of making inferences under complex models. The automatic process of inference is hampered by the model definition and model checking steps, which are case-dependent and highly user-interactive. ABC is far from being as ‘easy as 123’. Important caveats when

Acknowledgements

KC is funded by a postdoctoral fellowship from the Université Joseph Fourier (ABC MSTIC). OF was partially supported by a grant from the Agence Nationale de la Recherche (BLAN06-3146282 MAEV). MB and OF acknowledge the support of the Complex Systems Institute (IXXI). OEG acknowledges support from the EcoChange Project.

Glossary

Bayes factor
the ratio of probabilities of two models that is used to evaluate the relative support of one model in relation to another in Bayesian model comparison.
Bayesian statistics
a general framework for summarizing uncertainty (prior information) and making estimates and predictions using probability statements conditional on observed data and an assumed model.
Coalescent theory
a mathematical theory that describes the ancestral relationships of a sample of ‘individuals’ back to their common

References (104)

  • P. Marjoram et al.

    Modern computational approaches for analysing molecular genetic variation data

    Nat. Rev. Genet.

    (2006)
  • M.A. Beaumont

    Approximate Bayesian Computation in population genetics

    Genetics

    (2002)
  • S. Wright

    The genetical structure of populations

    Ann. Eugen.

    (1951)
  • L.L. Cavalli-Sforza et al.

    Experiments with an artificial population

  • S. Tavaré

    Inferring coalescence times from DNA sequence data

    Genetics

    (1997)
  • J.K. Pritchard

    Population growth of human Y chromosomes: a study of Y chromosome microsatellites

    Mol. Biol. Evol.

    (1999)
  • M.G.B. Blum et al.

    Non-linear regression models for Approximate Bayesian Computation

    Stat. Comput.

    (2010)
  • P. Marjoram

    Markov chain Monte Carlo without likelihoods

    Proc. Natl. Acad. Sci. U. S. A.

    (2003)
  • S.A. Sisson

    Sequential Monte Carlo without likelihoods

    Proc. Natl. Acad. Sci. U. S. A.

    (2007)
  • T. Toni

    Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems

    J. R. Soc. Interface

    (2009)
  • D.A. Tallmon

    Comparative evaluation of a new effective population size estimator based on approximate Bayesian computation

    Genetics

    (2004)
  • Y.L. Chan

    Bayesian estimation of the timing and severity of a population bottleneck from ancient DNA

    PLoS Genet.

    (2006)
  • K.R. Thornton et al.

    Approximate Bayesian inference reveals evidence for a recent, severe, bottleneck in a Netherlands population of Drosophila melanogaster

    Genetics

    (2006)
  • M. Pascual

    Introduction history of Drosophila subobscura in the New World: a microsatellite based survey using ABC methods

    Mol. Ecol.

    (2007)
  • O. François

    Demographic history of European populations of Arabidopsis thaliana

    PLoS Genet.

    (2008)
  • J. Ross-Ibarra

    Patterns of polymorphism and demographic history in natural populations of Arabidopsis lyrata

    PLoS ONE

    (2008)
  • P.K. Ingvarsson

    Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula

    Genetics

    (2008)
  • L.Z. Gao et al.

    Non-independent domestication of the two rice subspecies, Oryza sativa subsp. indica and subsp. japonica, demonstrated by multilocus microsatellites

    Genetics

    (2008)
  • T. Guillemaud

    Inferring introduction routes of invasive species using approximate Bayesian computation on microsatellite data

    Heredity

    (2010)
  • M. Tanaka

    Estimating tuberculosis transmission parameters from genotype data using approximate Bayesian computation

    Genetics

    (2006)
  • D. Shriner

    Evolution of intrahost HIV-1 genetic diversity during chronic infection

    Evolution

    (2006)
  • N.J.R. Fagundes

    Statistical evaluation of alternative models of human evolution

    Proc. Natl. Acad. Sci. U. S. A.

    (2007)
  • M.P. Cox

    Testing for archaic hominin admixture on the X chromosome: model likelihoods for the modern human RRM2P4 region from summaries of genealogical topology under the structured coalescent

    Genetics

    (2008)
  • P. Gerbault

    Impact of selection and demography on the diffusion of lactase persistence

    PLoS ONE

    (2009)
  • E. Patin

    Inferring the demographic history of African farmers and Pygmy hunter–gatherers using a multilocus resequencing data set

    PLoS Genet.

    (2009)
  • M. Bonhomme

    Origin and number of founders in an introduced insular primate: estimation from nuclear genetic data

    Mol. Ecol.

    (2008)
  • A. Estoup

    Genetic analysis of complex demographic scenarios: spatially expanding populations of the cane toad

    Bufo marinus. Evolution

    (2004)
  • N. Miller

    Multiple transatlantic introductions of the Western corn rootworm

    Science

    (2005)
  • E.B. Rosenblum

    A multilocus perspective on colonization accompanied by selection and gene flow

    Evolution

    (2007)
  • S. Neuenschwander

    Colonization history of the Swiss Rhine basin by the bullhead (Cottus gobio): inference under a Bayesian spatially explicit framework

    Mol. Ecol.

    (2008)
  • N. Ray

    A statistical evaluation of models for the initial settlement of the American continent emphasizes the importance of gene flow with Asia

    Mol. Biol. Evol.

    (2010)
  • S. Ghirotto

    Inferring genealogical processes from patterns of bronze-age and modern DNA variation in Sardinia

    Mol. Biol. Evol.

    (2010)
  • L. Excoffier

    Bayesian analysis of an admixture model with mutations and arbitrarily linked markers

    Genetics

    (2005)
  • V.C. Sousa

    Approximate Bayesian computation without summary statistics: the case of admixture

    Genetics

    (2009)
  • J.M. Cornuet

    Bayesian inference under complex evolutionary scenarios using microsatellite markers: multiple divergence and genetic admixture events in the honey bee Apis mellifera

  • G. Hamilton

    Bayesian estimation of recent migration rates after a spatial expansion

    Genetics

    (2005)
  • J.S. Lopes et al.

    The use of approximate Bayesian computation in conservation genetics and its application in a case study on yellow-eyed penguins

    Conserv. Genet

    (2010)
  • I. Tiemann-Boege

    High-resolution recombination patterns in a region of human chromosome 21 measured by sperm typing

    PLoS Genet.

    (2006)
  • B. Padhukasahasram

    Estimating recombination rates from single-nucleotide polymorphisms using summary statistics

    Genetics

    (2006)
  • M. Touchon

    Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths

    PLoS Genet.

    (2009)
  • Cited by (0)

    View full text