Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation

Abstract

As modeling becomes a more widespread practice in the life sciences and biomedical sciences, researchers need reliable tools to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation (ABC) framework and software environment, ABC-SysBio, which is a Python package that runs on Linux and Mac OS X systems and that enables parameter estimation and model selection in the Bayesian formalism by using sequential Monte Carlo (SMC) approaches. We outline the underlying rationale, discuss the computational and practical issues and provide detailed guidance as to how the important tasks of parameter inference and model selection can be performed in practice. Unlike other available packages, ABC-SysBio is highly suited for investigating, in particular, the challenging problem of fitting stochastic models to data. In order to demonstrate the use of ABC-SysBio, in this protocol we postulate the existence of an imaginary reaction network composed of seven interrelated biological reactions (involving a specific mRNA, the protein it encodes and a post-translationally modified version of the protein), a network that is defined by two files containing 'observed' data that we provide as supplementary information. In the first part of the PROCEDURE, ABC-SysBio is used to infer the parameters of this system, whereas in the second part we use ABC-SysBio's relevant functionality to discriminate between two different reaction network models, one of them being the 'true' one. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up for this cost, especially in complex problems.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Data and posteriors.
Figure 2: Models and data.
Figure 3: Automatically generated model summary file.
Figure 4: Example trajectories of intermediate and final ABC-SMC populations.
Figure 5: Model probabilities after each ABC-SMC population.
Figure 6: Analyzing the posterior distribution.

Similar content being viewed by others

References

  1. Kirk, P., Thorne, T. & Stumpf, M.P. Model selection in systems and synthetic biology. Curr. Opin. Biotechnol. 24, 767–774 (2013).

    Article  CAS  PubMed  Google Scholar 

  2. Xu, T.-R. et al. Inferring signaling pathway topologies from multiple perturbation measurements of specific biochemical species. Sci. Signal 3, ra20 (2010).

    PubMed  Google Scholar 

  3. Stumpf, M.P.H., Balding, D.J. & Girolami, M. Handbook of Statistical Systems Biology (Wiley, 2011).

  4. Balsa-Canto, E., Peifer, M., Banga, J.R., Timmer, J. & Fleck, C. Hybrid optimization method with general switching strategy for parameter estimation. BMC Syst. Biol. 2, 26 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Kirk, P.D.W. & Stumpf, M.P.H. Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data. Bioinformatics 25, 1300–1306 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Efron, B. Bayes' theorem in the 21st century. Science 340, 1177–1178 (2013).

    Article  PubMed  Google Scholar 

  7. Vyshemirsky, V. & Girolami, M.A. Bayesian ranking of biochemical system models. Bioinformatics 24, 833–839 (2008).

    Article  CAS  PubMed  Google Scholar 

  8. Robert, C. The Bayesian Choice (Springer, 2007).

  9. Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A Math. Phys. Sci. 186, 453–461 (1946).

    Article  CAS  PubMed  Google Scholar 

  10. Jaynes, E. Prior Probabilities. IEEE Trans. Syst. Sci. Cyber. 4, 227–241 (1968).

    Article  Google Scholar 

  11. Bernardo, J.M. & Smith, A.F.M. Bayesian Theory (John Wiley & Sons, 2009).

  12. Kass, R.E. & Wasserman, L. The selection of prior distributions by formal rules. J. Am. Statist. Assoc. 91, 1343–1370 (1996).

    Article  Google Scholar 

  13. Cox, D. Principles of Statistical Inference (Cambridge University Press, 2006).

  14. Toni, T., Ozaki, Y.-I., Kirk, P., Kuroda, S. & Stumpf, M.P.H . Elucidating the in vivo phosphorylation dynamics of the ERK MAP kinase using quantitative proteomics data and Bayesian model selection. Mol. Biosyst. 8, 1921–1929 (2012).

    Article  CAS  PubMed  Google Scholar 

  15. Gilks, W.R., Richardson, S. & Spiegelhalter, D.J. Markov Chain Monte Carlo in Practice (CRC Press, 1996).

  16. Gutenkunst, R.N. et al. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput. Biol. 3, 1871–1878 (2007).

    Article  CAS  PubMed  Google Scholar 

  17. Apgar, J.F., Witmer, D.K., White, F.M. & Tidor, B. Sloppy models, parameter uncertainty, and the role of experimental design. Mol. Biosyst. 6, 1890–1900 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Erguler, K. & Stumpf, M.P.H. Practical limits for reverse engineering of dynamical systems: a statistical analysis of sensitivity and parameter inferability in systems biology models. Mol. Biosyst. 7, 1593–1602 (2011).

    Article  CAS  PubMed  Google Scholar 

  19. Sunnåker, M. et al. Approximate Bayesian computation. PLoS Comput. Biol. 9, e1002803 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Tavaré, S., Balding, D.J., Griffiths, R.C. & Donnelly, P. Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997).

    PubMed  PubMed Central  Google Scholar 

  21. Beaumont, M.A., Zhang, W. & Balding, D.J. Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002).

    PubMed  PubMed Central  Google Scholar 

  22. Kirk, P.D.W., Toni, T. & Stumpf, M.P. Parameter inference for biochemical systems that undergo a Hopf bifurcation. Biophys. J. 95, 540–549 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Golightly, A. & Wilkinson, D.J. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 61, 781–788 (2005).

    Article  CAS  PubMed  Google Scholar 

  24. Bowsher, C.G. & Swain, P.S. Identifying sources of variation and the flow of information in biochemical networks. Proc. Natl. Acad. Sci. USA 109, E1320–E1328 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Hilfinger, A. & Paulsson, J. Separating intrinsic from extrinsic fluctuations in dynamic biological systems. Proc. Natl. Acad. Sci. USA 108, 12167–12172 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Sisson, S.A., Fan, Y. & Tanaka, M.M. Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104, 1760–1765 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Toni, T., Welch, D., Strelkowa, N., Ipsen, A. & Stumpf, M.P.H. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009).

    Article  PubMed  Google Scholar 

  28. Beaumont, M.A., Cornuet, J.-M., Marin, J.-M. & Robert, C.P. Adaptive approximate Bayesian computation. Biometrika 96, 983–990 (2009).

    Article  Google Scholar 

  29. Joyce, P. & Marjoram, P. Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 7, 26 (2008).

    Article  Google Scholar 

  30. Nunes, M.A. & Balding, D.J. On optimal selection of summary statistics for approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 9, 34 (2010).

    Article  Google Scholar 

  31. Robert, C.P., Cornuet, J.-M., Marin, J.-M. & Pillai, N.S. Lack of confidence in approximate Bayesian computation model choice. Proc. Natl. Acad. Sci. USA 108, 15112–15117 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Fearnhead, P. & Prangle, D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc. Ser. B Stat. Methodol. 74, 419–474 (2012).

    Article  Google Scholar 

  33. Barnes, C.P., Filippi, S., Stumpf, M.P. & Thorne, T. Considerate approaches to constructing summary statistics for ABC model selection. Stat. Comput. 22, 1181–1197 (2012).

    Article  Google Scholar 

  34. Toni, T. & Stumpf, M.P.H. Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics 26, 104–110 (2010).

    Article  CAS  PubMed  Google Scholar 

  35. Wilkinson, R.D. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12, 129–141 (2013).

    Article  PubMed  Google Scholar 

  36. Drovandi, C.C., Pettitt, A.N. & Faddy, M.J. Approximate Bayesian computation using indirect inference. J. R. Statist. Soc. Ser. C 60, 317–337 (2011).

    Article  Google Scholar 

  37. Grelaud, A., Robert, C.P. & Marin, J.-M. ABC methods for model choice in Gibbs random fields. Comptes Rendus Mathematique 347, 205–210 (2009).

    Article  Google Scholar 

  38. Thorne, T. & Stumpf, M.P.H. Graph spectral analysis of protein interaction network evolution. J. R. Soc. Interface 9, 2653–2666 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A. & Feldman, M.W. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999).

    Article  CAS  PubMed  Google Scholar 

  40. Marjoram, P., Molitor, J., Plagnol, V. & Tavaré, S. Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100, 15324–15328 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Lopes, J.S. & Beaumont, M.A. ABC: a useful Bayesian tool for the analysis of population data. Infect. Genet. Evol. 10, 826–833 (2010).

    Article  CAS  PubMed  Google Scholar 

  42. Toni, T., Jovanovic, G., Huvet, M., Buck, M. & Stumpf, M.P.H. From qualitative data to quantitative models: analysis of the phage shock protein stress response in Escherichia coli. BMC Syst. Biol. 5, 69 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Silk, D. et al. Designing attractive models via automated identification of chaotic and oscillatory dynamical regimes. Nat. Commun. 2, 489 (2011).

    Article  PubMed  Google Scholar 

  44. Liepe, J. et al. Calibrating spatio-temporal models of leukocyte dynamics against in vivo live-imaging data using approximate Bayesian computation. Integr. Biol. 4, 335–345 (2012).

    Article  Google Scholar 

  45. Barnes, C.P., Silk, D., Sheng, X. & Stumpf, M.P.H. Bayesian design of synthetic biological systems. Proc. Natl. Acad. Sci. USA 108, 15190–15195 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Maclean, A.L., Lo Celso, C. & Stumpf, M.P.H. Population dynamics of normal and leukaemia stem cells in the haematopoietic stem cell niche show distinct regimes where leukaemia will be controlled. J. R. Soc. Interface 10, 20120968 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Csilléry, K., Blum, M.G.B., Gaggiotti, O.E. & Francois, O. Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418 (2010).

    Article  PubMed  Google Scholar 

  48. Komorowski, M., Finkenstädt, B., Harper, C.V. & Rand, D.A. Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinformatics 10, 343 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Komorowski, M., Costa, M.J., Rand, D.A. & Stumpf, M.P.H. Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proc. Natl. Acad. Sci. USA 108, 8645–8650 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Golightly, A. & Wilkinson, D.J. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo. Interface Focus 1, 807–820 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Ale, A., Kirk, P. & Stumpf, M.P.H. A general moment expansion method for stochastic kinetic models. J. Chem. Phys. 138, 174101 (2013).

    Article  PubMed  Google Scholar 

  52. Cornuet, J.-M. et al. Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24, 2713–2719 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Cornuet, J.-M., Ravigné, V. & Estoup, A. Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (v1.0). BMC Bioinformatics 11, 401 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  54. Bertorelle, G., Benazzo, A. & Mona, S. ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Mol. Ecol. 19, 2609–2625 (2010).

    Article  CAS  PubMed  Google Scholar 

  55. Dematté, L. & Prandi, D. GPU computing for systems biology. Brief. Bioinformatics 11, 323–333 (2010).

    Article  PubMed  Google Scholar 

  56. Zhou, Y., Liepe, J., Sheng, X., Stumpf, M.P.H. & Barnes, C. GPU accelerated biochemical network simulation. Bioinformatics 27, 874–876 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Vyshemirsky, V. & Girolami, M. BioBayes: a software package for Bayesian inference in systems biology. Bioinformatics 24, 1933–1934 (2008).

    Article  CAS  PubMed  Google Scholar 

  58. Golightly, A. & Wilkinson, D. Bayesian sequential inference for stochastic kinetic biochemical network models 13, 838–851 (2006).

  59. Filippi, S., Barnes, C.P., Cornebise, J. & Stumpf, M.P.H. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat. Appl. Genet. Mol. Biol. 12, 87–107 (2013).

    Article  PubMed  Google Scholar 

  60. Silk, D., Filippi, S. & Stumpf, M.P.H. Optimizing threshold - schedules for approximate Bayesian computation sequential Monte Carlo samplers: applications to molecular systems. Preprint at http://arxiv.org/abs/1210.3296 (2012).

  61. Leuenberger, C. & Wegmann, D. Bayesian computation and model selection without likelihoods. Genetics 184, 243–252 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Wilkinson, D.J. Stochastic Modelling for Systems Biology (CRC Press, 2011).

  63. Rand, D.A. Mapping global sensitivity of cellular network dynamics: sensitivity heat maps and a global summation law. J. R. Soc. Interface 5 (suppl. 1): S59–S69 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Secrier, M., Toni, T. & Stumpf, M.P.H. The ABC of reverse engineering biological signalling systems. Mol. Biosyst. 5, 1925–1935 (2009).

    Article  CAS  PubMed  Google Scholar 

  65. Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M.P.H. Maximizing the information content of experiments in systems biology. PLoS Comput. Biol. 9, e1002888 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Vanlier, J., Tiemann, C.A., Hilbers, P.A.J. & van Riel, N.A.W. A Bayesian approach to targeted experiment design. Bioinformatics 28, 1136–1142 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Hindmarsh, A.C. ODEPACK, a systematized collection of ODE solvers, in Scientific Computing (eds. Stepleman, R.S. et al.) IMACS Transactions on Scientific Computation, Vol. 1, 55–64 (Elsevier, 1983).

    Google Scholar 

  68. Kloeden, P.E. & Platen, E. Numerical Solution of Stochastic Differential Equations (Springer, 1992).

  69. Gillespie, D.T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22, 403–434 (1976).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

J.L., T.T. and C.P.B. gratefully acknowledge funding from the Wellcome Trust through a PhD studentship, a Wellcome Trust-Massachusetts Institute of Technology (MIT) postdoctoral fellowship (no. 090433/B/09/Z) and a Research Career Development Fellowship (no. 097319/Z/11/Z), respectively. J.L. also acknowledges financial support from the NC3R through a David Sainsbury Fellowship; S.F. acknowledges financial support through a UK Medical Research Council Biocomputing Fellowship. P.K. and M.P.H.S. acknowledge support from a Human Frontier Science Program (HFSP) grant (no. RGP0061/2011). M.P.H.S. gratefully acknowledges support from the UK Biotechnology and Biological Sciences Research Council, The Leverhulme Trust and the Royal Society through a Wolfson Research Merit Award.

Author information

Authors and Affiliations

Authors

Contributions

J.L. designed and analyzed the examples, developed the protocol and wrote the paper. P.K. designed the analysis and wrote the paper; S.F. analyzed the examples, verified the protocols and wrote the paper. T.T. analyzed the examples, verified the protocols and wrote the paper. C.P.B. developed the protocols and wrote the paper; M.P.H.S. designed the examples and wrote the paper.

Corresponding author

Correspondence to Michael P H Stumpf.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Intermediate and final ABC-SMC populations.

Shown are the analytic plots produced by the ABC-SysBio software after each population. These plots include pairwise scatterplots of the accepted parameters (a, b). The scatterplots contain the information of all previous populations (denoted by different colours). On the diagonal are the histograms of the latest population. After the first population (a) none of the parameters are inferred and the trajectories do not fit the data. However, by population 16 (b) all the parameters are inferred and the trajectories are in agreement with the data. (Note that the scaling of axes differs between populations in these diagnostic plots.)

Supplementary information

Supplementary Figure 1

Intermediate and final ABC-SMC populations. (PDF 919 kb)

Supplementary Methods (PDF 245 kb)

Supplementary Data 1

Input files 1 and 2. (ZIP 4 kb)

Supplementary Data 2

sbml files 1 and 2. (ZIP 1 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liepe, J., Kirk, P., Filippi, S. et al. A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation. Nat Protoc 9, 439–456 (2014). https://doi.org/10.1038/nprot.2014.025

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2014.025

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing