Skip to main content

2019 | OriginalPaper | Buchkapitel

How Many Subpopulations Is Too Many? Exponential Lower Bounds for Inferring Population Histories

verfasst von : Younhun Kim, Frederic Koehler, Ankur Moitra, Elchanan Mossel, Govind Ramnarayan

Erschienen in: Research in Computational Molecular Biology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure—the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Alternatively, diploids whose phasing is provided.
 
2
In a diploid population, the exponents are scaled by a constant factor 2. This can be handled easily via scaling and therefore makes little difference in the analysis.
 
3
The distinction between the Wright-Fisher and Moran models is of no consequence in this work, as the latter also yields an exponential model in the limit as population size increases [3].
 
4
In practice, even if this scaling is unknown, this is easily handled by e.g. trying powers of 2 and picking the best result in CDF distance, for instance \(\Vert F - G\Vert _\infty = \sup _t |F(t) - G(t)|\).
 
5
As in Remark 1, we can optionally make the more recent era short so that almost all samples will be from the earlier period.
 
6
The alternative hypothesis had no more than a few additional mixture components. A byproduct of this analysis is that even estimating the number of populations is in these examples requires a very large number of samples.
 
7
More precisely, with Earthmover’s distance in parameter space greater than 0.01. For comparison, an estimator which only gets the (easy) constant component correct already has Earthmover distance at most 1 / k.
 
Literatur
1.
Zurück zum Zitat Bhaskar, A., Song, Y.S.: Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Ann. Stat. 42(6), 2469 (2014)MathSciNetCrossRef Bhaskar, A., Song, Y.S.: Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Ann. Stat. 42(6), 2469 (2014)MathSciNetCrossRef
2.
Zurück zum Zitat Bhaskar, A., Wang, Y.R., Song, Y.S.: Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res. 25(2), 268–279 (2015). gr-178756CrossRef Bhaskar, A., Wang, Y.R., Song, Y.S.: Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Res. 25(2), 268–279 (2015). gr-178756CrossRef
3.
Zurück zum Zitat Blythe, R.A., McKane, A.J.: Stochastic models of evolution in genetics, ecology and linguistics. J. Stat. Mech.: Theory Exp. 2007(07), P07018 (2007)CrossRef Blythe, R.A., McKane, A.J.: Stochastic models of evolution in genetics, ecology and linguistics. J. Stat. Mech.: Theory Exp. 2007(07), P07018 (2007)CrossRef
4.
Zurück zum Zitat Candès, E.J., Fernandez-Granda, C.: Super-resolution from noisy data. J. Fourier Anal. Appl. 19(6), 1229–1254 (2013)MathSciNetCrossRef Candès, E.J., Fernandez-Granda, C.: Super-resolution from noisy data. J. Fourier Anal. Appl. 19(6), 1229–1254 (2013)MathSciNetCrossRef
5.
Zurück zum Zitat Drummond, A., Rambaut, A., Shapiro, B., Pybus, O.: Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22(5), 1185–1192 (2005)CrossRef Drummond, A., Rambaut, A., Shapiro, B., Pybus, O.: Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22(5), 1185–1192 (2005)CrossRef
6.
Zurück zum Zitat Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C., Foll, M.: Robust demographic inference from genomic and SNP data. PLoS Genet. 9(10), e1003905 (2013)CrossRef Excoffier, L., Dupanloup, I., Huerta-Sánchez, E., Sousa, V.C., Foll, M.: Robust demographic inference from genomic and SNP data. PLoS Genet. 9(10), e1003905 (2013)CrossRef
7.
Zurück zum Zitat Gautschi, W.: On inverses of vandermonde and confluent vandermonde matrices. Numer. Math. 4(1), 117–123 (1962)MathSciNetCrossRef Gautschi, W.: On inverses of vandermonde and confluent vandermonde matrices. Numer. Math. 4(1), 117–123 (1962)MathSciNetCrossRef
8.
Zurück zum Zitat Heled, J., Drummond, A.: Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8(1), 289 (2008)CrossRef Heled, J., Drummond, A.: Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8(1), 289 (2008)CrossRef
9.
Zurück zum Zitat Hua, Y., Sarkar, T.K.: Matrix pencil method for estimating parameters of exponentially damped/undamped sinusoids in noise. IEEE Trans. Acoust. Speech Signal Process. 38(5), 814–824 (1990)MathSciNetCrossRef Hua, Y., Sarkar, T.K.: Matrix pencil method for estimating parameters of exponentially damped/undamped sinusoids in noise. IEEE Trans. Acoust. Speech Signal Process. 38(5), 814–824 (1990)MathSciNetCrossRef
11.
Zurück zum Zitat Kim, J., Mossel, E., Rácz, M.Z., Ross, N.: Can one hear the shape of a population history? Theor. Popul. Biol. 100, 26–38 (2015)CrossRef Kim, J., Mossel, E., Rácz, M.Z., Ross, N.: Can one hear the shape of a population history? Theor. Popul. Biol. 100, 26–38 (2015)CrossRef
12.
Zurück zum Zitat Kim, Y., Koehler, F., Moitra, A., Mossel, E., Ramnarayan, G.: How many subpopulations is too many? Exponential lower bounds for inferring population histories. arXiv preprint arXiv:1811.03177 (2018) Kim, Y., Koehler, F., Moitra, A., Mossel, E., Ramnarayan, G.: How many subpopulations is too many? Exponential lower bounds for inferring population histories. arXiv preprint arXiv:​1811.​03177 (2018)
13.
Zurück zum Zitat Kimura, M., Crow, J.F.: The number of alleles that can be maintained in a finite population. Genetics 49(4), 725 (1964) Kimura, M., Crow, J.F.: The number of alleles that can be maintained in a finite population. Genetics 49(4), 725 (1964)
14.
Zurück zum Zitat Li, H., Durbin, R.: Inference of human population history from individual whole-genome sequences. Nature 475(7357), 493 (2011)CrossRef Li, H., Durbin, R.: Inference of human population history from individual whole-genome sequences. Nature 475(7357), 493 (2011)CrossRef
15.
Zurück zum Zitat McVean, G.A., Cardin, N.J.: Approximating the coalescent with recombination. Philos. Trans. Roy. Soc. London B: Biol. Sci. 360(1459), 1387–1393 (2005)CrossRef McVean, G.A., Cardin, N.J.: Approximating the coalescent with recombination. Philos. Trans. Roy. Soc. London B: Biol. Sci. 360(1459), 1387–1393 (2005)CrossRef
16.
Zurück zum Zitat Moitra, A.: Super-resolution, extremal functions and the condition number of vandermonde matrices. In: Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, STOC 2015, pp. 821–830. ACM, New York (2015). https://doi.org/10.1145/2746539.2746561 Moitra, A.: Super-resolution, extremal functions and the condition number of vandermonde matrices. In: Proceedings of the Forty-seventh Annual ACM Symposium on Theory of Computing, STOC 2015, pp. 821–830. ACM, New York (2015). https://​doi.​org/​10.​1145/​2746539.​2746561
17.
Zurück zum Zitat Myers, S., Fefferman, C., Patterson, N.: Can one learn history from the allelic spectrum? Theor. Popul. Biol. 73(3), 342–348 (2008)CrossRef Myers, S., Fefferman, C., Patterson, N.: Can one learn history from the allelic spectrum? Theor. Popul. Biol. 73(3), 342–348 (2008)CrossRef
18.
Zurück zum Zitat Nazarov, F.L.: Local estimates for exponential polynomials and their applications to inequalities of the uncertainty principle type. Algebra i analiz 5(4), 3–66 (1993)MathSciNetMATH Nazarov, F.L.: Local estimates for exponential polynomials and their applications to inequalities of the uncertainty principle type. Algebra i analiz 5(4), 3–66 (1993)MathSciNetMATH
19.
Zurück zum Zitat Nielsen, R.: Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154(2), 931–942 (2000) Nielsen, R.: Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154(2), 931–942 (2000)
20.
Zurück zum Zitat Nordborg, M.: Coalescent theory. Handb. Stat. Genet. 2, 843–877 (2001) Nordborg, M.: Coalescent theory. Handb. Stat. Genet. 2, 843–877 (2001)
21.
Zurück zum Zitat Schiffels, S., Durbin, R.: Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46(8), 919 (2014)CrossRef Schiffels, S., Durbin, R.: Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46(8), 919 (2014)CrossRef
22.
Zurück zum Zitat Sheehan, S., Harris, K., Song, Y.S.: Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 194, 647–662 (2013)CrossRef Sheehan, S., Harris, K., Song, Y.S.: Estimating variable effective population sizes from multiple genomes: a sequentially markov conditional sampling distribution approach. Genetics 194, 647–662 (2013)CrossRef
23.
Zurück zum Zitat Terhorst, J., Kamm, J.A., Song, Y.S.: Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49(2), 303 (2017)CrossRef Terhorst, J., Kamm, J.A., Song, Y.S.: Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49(2), 303 (2017)CrossRef
24.
Zurück zum Zitat Terhorst, J., Song, Y.S.: Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proc. Nat. Acad. Sci. 112(25), 7677–7682 (2015)CrossRef Terhorst, J., Song, Y.S.: Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proc. Nat. Acad. Sci. 112(25), 7677–7682 (2015)CrossRef
25.
Zurück zum Zitat Turán, P.: On a New Method of Analysis and Its Applications. Wiley, New York (1984)MATH Turán, P.: On a New Method of Analysis and Its Applications. Wiley, New York (1984)MATH
Metadaten
Titel
How Many Subpopulations Is Too Many? Exponential Lower Bounds for Inferring Population Histories
verfasst von
Younhun Kim
Frederic Koehler
Ankur Moitra
Elchanan Mossel
Govind Ramnarayan
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-17083-7_9