Abstract
As modeling becomes a more widespread practice in the life sciences and biomedical sciences, researchers need reliable tools to calibrate models against ever more complex and detailed data. Here we present an approximate Bayesian computation (ABC) framework and software environment, ABC-SysBio, which is a Python package that runs on Linux and Mac OS X systems and that enables parameter estimation and model selection in the Bayesian formalism by using sequential Monte Carlo (SMC) approaches. We outline the underlying rationale, discuss the computational and practical issues and provide detailed guidance as to how the important tasks of parameter inference and model selection can be performed in practice. Unlike other available packages, ABC-SysBio is highly suited for investigating, in particular, the challenging problem of fitting stochastic models to data. In order to demonstrate the use of ABC-SysBio, in this protocol we postulate the existence of an imaginary reaction network composed of seven interrelated biological reactions (involving a specific mRNA, the protein it encodes and a post-translationally modified version of the protein), a network that is defined by two files containing 'observed' data that we provide as supplementary information. In the first part of the PROCEDURE, ABC-SysBio is used to infer the parameters of this system, whereas in the second part we use ABC-SysBio's relevant functionality to discriminate between two different reaction network models, one of them being the 'true' one. Although computationally expensive, the additional insights gained in the Bayesian formalism more than make up for this cost, especially in complex problems.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Kirk, P., Thorne, T. & Stumpf, M.P. Model selection in systems and synthetic biology. Curr. Opin. Biotechnol. 24, 767–774 (2013).
Xu, T.-R. et al. Inferring signaling pathway topologies from multiple perturbation measurements of specific biochemical species. Sci. Signal 3, ra20 (2010).
Stumpf, M.P.H., Balding, D.J. & Girolami, M. Handbook of Statistical Systems Biology (Wiley, 2011).
Balsa-Canto, E., Peifer, M., Banga, J.R., Timmer, J. & Fleck, C. Hybrid optimization method with general switching strategy for parameter estimation. BMC Syst. Biol. 2, 26 (2008).
Kirk, P.D.W. & Stumpf, M.P.H. Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data. Bioinformatics 25, 1300–1306 (2009).
Efron, B. Bayes' theorem in the 21st century. Science 340, 1177–1178 (2013).
Vyshemirsky, V. & Girolami, M.A. Bayesian ranking of biochemical system models. Bioinformatics 24, 833–839 (2008).
Robert, C. The Bayesian Choice (Springer, 2007).
Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A Math. Phys. Sci. 186, 453–461 (1946).
Jaynes, E. Prior Probabilities. IEEE Trans. Syst. Sci. Cyber. 4, 227–241 (1968).
Bernardo, J.M. & Smith, A.F.M. Bayesian Theory (John Wiley & Sons, 2009).
Kass, R.E. & Wasserman, L. The selection of prior distributions by formal rules. J. Am. Statist. Assoc. 91, 1343–1370 (1996).
Cox, D. Principles of Statistical Inference (Cambridge University Press, 2006).
Toni, T., Ozaki, Y.-I., Kirk, P., Kuroda, S. & Stumpf, M.P.H . Elucidating the in vivo phosphorylation dynamics of the ERK MAP kinase using quantitative proteomics data and Bayesian model selection. Mol. Biosyst. 8, 1921–1929 (2012).
Gilks, W.R., Richardson, S. & Spiegelhalter, D.J. Markov Chain Monte Carlo in Practice (CRC Press, 1996).
Gutenkunst, R.N. et al. Universally sloppy parameter sensitivities in systems biology models. PLoS Comput. Biol. 3, 1871–1878 (2007).
Apgar, J.F., Witmer, D.K., White, F.M. & Tidor, B. Sloppy models, parameter uncertainty, and the role of experimental design. Mol. Biosyst. 6, 1890–1900 (2010).
Erguler, K. & Stumpf, M.P.H. Practical limits for reverse engineering of dynamical systems: a statistical analysis of sensitivity and parameter inferability in systems biology models. Mol. Biosyst. 7, 1593–1602 (2011).
Sunnåker, M. et al. Approximate Bayesian computation. PLoS Comput. Biol. 9, e1002803 (2013).
Tavaré, S., Balding, D.J., Griffiths, R.C. & Donnelly, P. Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997).
Beaumont, M.A., Zhang, W. & Balding, D.J. Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002).
Kirk, P.D.W., Toni, T. & Stumpf, M.P. Parameter inference for biochemical systems that undergo a Hopf bifurcation. Biophys. J. 95, 540–549 (2008).
Golightly, A. & Wilkinson, D.J. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 61, 781–788 (2005).
Bowsher, C.G. & Swain, P.S. Identifying sources of variation and the flow of information in biochemical networks. Proc. Natl. Acad. Sci. USA 109, E1320–E1328 (2012).
Hilfinger, A. & Paulsson, J. Separating intrinsic from extrinsic fluctuations in dynamic biological systems. Proc. Natl. Acad. Sci. USA 108, 12167–12172 (2011).
Sisson, S.A., Fan, Y. & Tanaka, M.M. Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104, 1760–1765 (2007).
Toni, T., Welch, D., Strelkowa, N., Ipsen, A. & Stumpf, M.P.H. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009).
Beaumont, M.A., Cornuet, J.-M., Marin, J.-M. & Robert, C.P. Adaptive approximate Bayesian computation. Biometrika 96, 983–990 (2009).
Joyce, P. & Marjoram, P. Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 7, 26 (2008).
Nunes, M.A. & Balding, D.J. On optimal selection of summary statistics for approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 9, 34 (2010).
Robert, C.P., Cornuet, J.-M., Marin, J.-M. & Pillai, N.S. Lack of confidence in approximate Bayesian computation model choice. Proc. Natl. Acad. Sci. USA 108, 15112–15117 (2011).
Fearnhead, P. & Prangle, D. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. J. R. Stat. Soc. Ser. B Stat. Methodol. 74, 419–474 (2012).
Barnes, C.P., Filippi, S., Stumpf, M.P. & Thorne, T. Considerate approaches to constructing summary statistics for ABC model selection. Stat. Comput. 22, 1181–1197 (2012).
Toni, T. & Stumpf, M.P.H. Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics 26, 104–110 (2010).
Wilkinson, R.D. Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12, 129–141 (2013).
Drovandi, C.C., Pettitt, A.N. & Faddy, M.J. Approximate Bayesian computation using indirect inference. J. R. Statist. Soc. Ser. C 60, 317–337 (2011).
Grelaud, A., Robert, C.P. & Marin, J.-M. ABC methods for model choice in Gibbs random fields. Comptes Rendus Mathematique 347, 205–210 (2009).
Thorne, T. & Stumpf, M.P.H. Graph spectral analysis of protein interaction network evolution. J. R. Soc. Interface 9, 2653–2666 (2012).
Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A. & Feldman, M.W. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999).
Marjoram, P., Molitor, J., Plagnol, V. & Tavaré, S. Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100, 15324–15328 (2003).
Lopes, J.S. & Beaumont, M.A. ABC: a useful Bayesian tool for the analysis of population data. Infect. Genet. Evol. 10, 826–833 (2010).
Toni, T., Jovanovic, G., Huvet, M., Buck, M. & Stumpf, M.P.H. From qualitative data to quantitative models: analysis of the phage shock protein stress response in Escherichia coli. BMC Syst. Biol. 5, 69 (2011).
Silk, D. et al. Designing attractive models via automated identification of chaotic and oscillatory dynamical regimes. Nat. Commun. 2, 489 (2011).
Liepe, J. et al. Calibrating spatio-temporal models of leukocyte dynamics against in vivo live-imaging data using approximate Bayesian computation. Integr. Biol. 4, 335–345 (2012).
Barnes, C.P., Silk, D., Sheng, X. & Stumpf, M.P.H. Bayesian design of synthetic biological systems. Proc. Natl. Acad. Sci. USA 108, 15190–15195 (2011).
Maclean, A.L., Lo Celso, C. & Stumpf, M.P.H. Population dynamics of normal and leukaemia stem cells in the haematopoietic stem cell niche show distinct regimes where leukaemia will be controlled. J. R. Soc. Interface 10, 20120968 (2013).
Csilléry, K., Blum, M.G.B., Gaggiotti, O.E. & Francois, O. Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. 25, 410–418 (2010).
Komorowski, M., Finkenstädt, B., Harper, C.V. & Rand, D.A. Bayesian inference of biochemical kinetic parameters using the linear noise approximation. BMC Bioinformatics 10, 343 (2009).
Komorowski, M., Costa, M.J., Rand, D.A. & Stumpf, M.P.H. Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proc. Natl. Acad. Sci. USA 108, 8645–8650 (2011).
Golightly, A. & Wilkinson, D.J. Bayesian parameter inference for stochastic biochemical network models using particle Markov chain Monte Carlo. Interface Focus 1, 807–820 (2011).
Ale, A., Kirk, P. & Stumpf, M.P.H. A general moment expansion method for stochastic kinetic models. J. Chem. Phys. 138, 174101 (2013).
Cornuet, J.-M. et al. Inferring population history with DIY ABC: a user-friendly approach to approximate Bayesian computation. Bioinformatics 24, 2713–2719 (2008).
Cornuet, J.-M., Ravigné, V. & Estoup, A. Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (v1.0). BMC Bioinformatics 11, 401 (2010).
Bertorelle, G., Benazzo, A. & Mona, S. ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Mol. Ecol. 19, 2609–2625 (2010).
Dematté, L. & Prandi, D. GPU computing for systems biology. Brief. Bioinformatics 11, 323–333 (2010).
Zhou, Y., Liepe, J., Sheng, X., Stumpf, M.P.H. & Barnes, C. GPU accelerated biochemical network simulation. Bioinformatics 27, 874–876 (2011).
Vyshemirsky, V. & Girolami, M. BioBayes: a software package for Bayesian inference in systems biology. Bioinformatics 24, 1933–1934 (2008).
Golightly, A. & Wilkinson, D. Bayesian sequential inference for stochastic kinetic biochemical network models 13, 838–851 (2006).
Filippi, S., Barnes, C.P., Cornebise, J. & Stumpf, M.P.H. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat. Appl. Genet. Mol. Biol. 12, 87–107 (2013).
Silk, D., Filippi, S. & Stumpf, M.P.H. Optimizing threshold - schedules for approximate Bayesian computation sequential Monte Carlo samplers: applications to molecular systems. Preprint at http://arxiv.org/abs/1210.3296 (2012).
Leuenberger, C. & Wegmann, D. Bayesian computation and model selection without likelihoods. Genetics 184, 243–252 (2010).
Wilkinson, D.J. Stochastic Modelling for Systems Biology (CRC Press, 2011).
Rand, D.A. Mapping global sensitivity of cellular network dynamics: sensitivity heat maps and a global summation law. J. R. Soc. Interface 5 (suppl. 1): S59–S69 (2008).
Secrier, M., Toni, T. & Stumpf, M.P.H. The ABC of reverse engineering biological signalling systems. Mol. Biosyst. 5, 1925–1935 (2009).
Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M.P.H. Maximizing the information content of experiments in systems biology. PLoS Comput. Biol. 9, e1002888 (2013).
Vanlier, J., Tiemann, C.A., Hilbers, P.A.J. & van Riel, N.A.W. A Bayesian approach to targeted experiment design. Bioinformatics 28, 1136–1142 (2012).
Hindmarsh, A.C. ODEPACK, a systematized collection of ODE solvers, in Scientific Computing (eds. Stepleman, R.S. et al.) IMACS Transactions on Scientific Computation, Vol. 1, 55–64 (Elsevier, 1983).
Kloeden, P.E. & Platen, E. Numerical Solution of Stochastic Differential Equations (Springer, 1992).
Gillespie, D.T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22, 403–434 (1976).
Acknowledgements
J.L., T.T. and C.P.B. gratefully acknowledge funding from the Wellcome Trust through a PhD studentship, a Wellcome Trust-Massachusetts Institute of Technology (MIT) postdoctoral fellowship (no. 090433/B/09/Z) and a Research Career Development Fellowship (no. 097319/Z/11/Z), respectively. J.L. also acknowledges financial support from the NC3R through a David Sainsbury Fellowship; S.F. acknowledges financial support through a UK Medical Research Council Biocomputing Fellowship. P.K. and M.P.H.S. acknowledge support from a Human Frontier Science Program (HFSP) grant (no. RGP0061/2011). M.P.H.S. gratefully acknowledges support from the UK Biotechnology and Biological Sciences Research Council, The Leverhulme Trust and the Royal Society through a Wolfson Research Merit Award.
Author information
Authors and Affiliations
Contributions
J.L. designed and analyzed the examples, developed the protocol and wrote the paper. P.K. designed the analysis and wrote the paper; S.F. analyzed the examples, verified the protocols and wrote the paper. T.T. analyzed the examples, verified the protocols and wrote the paper. C.P.B. developed the protocols and wrote the paper; M.P.H.S. designed the examples and wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Intermediate and final ABC-SMC populations.
Shown are the analytic plots produced by the ABC-SysBio software after each population. These plots include pairwise scatterplots of the accepted parameters (a, b). The scatterplots contain the information of all previous populations (denoted by different colours). On the diagonal are the histograms of the latest population. After the first population (a) none of the parameters are inferred and the trajectories do not fit the data. However, by population 16 (b) all the parameters are inferred and the trajectories are in agreement with the data. (Note that the scaling of axes differs between populations in these diagnostic plots.)
Supplementary information
Supplementary Figure 1
Intermediate and final ABC-SMC populations. (PDF 919 kb)
Supplementary Data 1
Input files 1 and 2. (ZIP 4 kb)
Supplementary Data 2
sbml files 1 and 2. (ZIP 1 kb)
Rights and permissions
About this article
Cite this article
Liepe, J., Kirk, P., Filippi, S. et al. A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation. Nat Protoc 9, 439–456 (2014). https://doi.org/10.1038/nprot.2014.025
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2014.025
This article is cited by
-
Global stability and parameter analysis reinforce therapeutic targets of PD-L1-PD-1 and MDSCs for glioblastoma
Journal of Mathematical Biology (2024)
-
Model-informed experimental design recommendations for distinguishing intrinsic and acquired targeted therapeutic resistance in head and neck cancer
npj Systems Biology and Applications (2022)
-
Application of approximate Bayesian computation for estimation of modified weibull distribution parameters for natural fiber strength with high uncertainty
Journal of Materials Science (2022)
-
TopoFilter: a MATLAB package for mechanistic model identification in systems biology
BMC Bioinformatics (2020)
-
Parameter estimation and model selection for water sorption in a wood fibre material
Wood Science and Technology (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.