Skip to main content
Top

2019 | OriginalPaper | Chapter

Greedy Partition Distance Under Stochastic Models - Analytic Results

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Gene partitioning is a very common task in genomics, based on several criteria such as gene function, homology, interactions, and more. Given two such partitions, a metric to compare them is called for. One such metric is based on multi symmetric difference and elements are removed from both partitions until identity is reached. While such a task can be done accurately by a maximum weight bipartite matching, in common settings in comparative genomics, the standard algorithm to solve this problem might become impractical. In previous works we have studied the universal pacemaker (UPM) where genes are clustered according to mutation rate correlation, and suggested a very fast and greedy procedure for comparing partitions. This procedure is guaranteed to provide a poor approximation ratio of 1/2 under arbitrary inputs.
In this work we give a probabilistic analysis of this procedure under a common and natural stochastic environment. We show that under mild size requirements, and a sound model assumption, this procedure returns the correct result with high probability. Furthermore, we show that in the context of the UPM, this natural requirement holds automatically, rendering statistical consistency of this fast greedy procedure. We also discuss the application of this procedure in the comparative genomics rudimentary task of gene orthology where such a solution is imperative.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows. Prentice Hall, Englewood Cliffs (1993)MATH Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows. Prentice Hall, Englewood Cliffs (1993)MATH
2.
go back to reference Alon, N., Spencer, J.H.: The Probabilistic Method, 3rd edn. Wiley, New York (2008)CrossRef Alon, N., Spencer, J.H.: The Probabilistic Method, 3rd edn. Wiley, New York (2008)CrossRef
3.
go back to reference Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial Optimization. Wiley, New York (1998)MATH Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial Optimization. Wiley, New York (1998)MATH
4.
go back to reference Duchene, S., Ho, S.Y.W.: Mammalian genome evolution is governed by multiple pacemakers. Bioinformatics 31, 2061–2065 (2015)CrossRef Duchene, S., Ho, S.Y.W.: Mammalian genome evolution is governed by multiple pacemakers. Bioinformatics 31, 2061–2065 (2015)CrossRef
5.
go back to reference Duchene, S., Ho, S.Y.W.: Using multiple relaxed-clock models to estimate evolutionary timescales from DNA sequence data. Mol. Phylogenet. Evol. 77, 65–70 (2014)CrossRef Duchene, S., Ho, S.Y.W.: Using multiple relaxed-clock models to estimate evolutionary timescales from DNA sequence data. Mol. Phylogenet. Evol. 77, 65–70 (2014)CrossRef
6.
go back to reference Gusfield, D.: Partition-distance: a problem and class of perfect graphs arising in clustering. Inf. Process. Lett. 82(3), 159–164 (2002)MathSciNetCrossRef Gusfield, D.: Partition-distance: a problem and class of perfect graphs arising in clustering. Inf. Process. Lett. 82(3), 159–164 (2002)MathSciNetCrossRef
7.
go back to reference Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)CrossRef Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)CrossRef
8.
go back to reference Hurst, L.D., Pál, C., Lercher, M.J.: The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 5(4), 299 (2004)CrossRef Hurst, L.D., Pál, C., Lercher, M.J.: The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 5(4), 299 (2004)CrossRef
9.
go back to reference Kimura, M.: Molecular evolutionary clock and the neutral theory. J. Mol. Evol. 26, 24–33 (1987)CrossRef Kimura, M.: Molecular evolutionary clock and the neutral theory. J. Mol. Evol. 26, 24–33 (1987)CrossRef
10.
go back to reference Lawler, E.L.: Combinatorial Optimization: Networks and Matroids. The University of Michigan, Ann Arbor (1976)MATH Lawler, E.L.: Combinatorial Optimization: Networks and Matroids. The University of Michigan, Ann Arbor (1976)MATH
11.
go back to reference Lee, J.M., Sonnhammer, E.L.L.: Genomic gene clustering analysis of pathways in eukaryotes. Genome Res. 13(5), 875–882 (2003)CrossRef Lee, J.M., Sonnhammer, E.L.L.: Genomic gene clustering analysis of pathways in eukaryotes. Genome Res. 13(5), 875–882 (2003)CrossRef
13.
go back to reference Moran, S., Snir, S.: Efficient approximation of convex recolorings. J. Comput. Syst. Sci. 73(7), 1078–1089 (2007)MathSciNetCrossRef Moran, S., Snir, S.: Efficient approximation of convex recolorings. J. Comput. Syst. Sci. 73(7), 1078–1089 (2007)MathSciNetCrossRef
14.
go back to reference Moran, S., Snir, S.: Convex recolorings of strings and trees: definitions, hardness results and algorithms. J. Comput. Syst. Sci. 74(5), 850–869 (2008)MathSciNetCrossRef Moran, S., Snir, S.: Convex recolorings of strings and trees: definitions, hardness results and algorithms. J. Comput. Syst. Sci. 74(5), 850–869 (2008)MathSciNetCrossRef
16.
go back to reference Snir, S.: On the number of genomic pacemakers: a geometric approach. Algorithms Mol. Biol. 9, 26 (2014). Extended abstract appeared in WABI 2014CrossRef Snir, S.: On the number of genomic pacemakers: a geometric approach. Algorithms Mol. Biol. 9, 26 (2014). Extended abstract appeared in WABI 2014CrossRef
17.
go back to reference Snir, S.: Bounds on identification of genome evolution pacemakers. J. Comput. Biol. (2019) Snir, S.: Bounds on identification of genome evolution pacemakers. J. Comput. Biol. (2019)
18.
19.
go back to reference Snir, S., Wolf, Y., Koonin, E.: Universal pacemaker of genome evolution in animals and fungi and variation of evolutionary rates in diverse organisms. Genome Biol. Evol. 6(6), 1268–1278 (2014)CrossRef Snir, S., Wolf, Y., Koonin, E.: Universal pacemaker of genome evolution in animals and fungi and variation of evolutionary rates in diverse organisms. Genome Biol. Evol. 6(6), 1268–1278 (2014)CrossRef
20.
go back to reference Wolf, Y., Snir, S., Koonin, E.: Stability along with extreme variability in core genome evolution. Genome Biol. Evol. 5(7), 1393–1402 (2013)CrossRef Wolf, Y., Snir, S., Koonin, E.: Stability along with extreme variability in core genome evolution. Genome Biol. Evol. 5(7), 1393–1402 (2013)CrossRef
21.
go back to reference Wolf, Y.I., Novichkov, P.S., Karev, G.P., Koonin, E.V., Lipman, D.J.: The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc. Natl. Acad. Sci. 106(18), 7273–7280 (2009)CrossRef Wolf, Y.I., Novichkov, P.S., Karev, G.P., Koonin, E.V., Lipman, D.J.: The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc. Natl. Acad. Sci. 106(18), 7273–7280 (2009)CrossRef
22.
go back to reference Yi, G., Thon, M.R., Sze, S.-H.: Identifying clusters of functionally related genes in genomes. Bioinformatics 23(9), 1053–1060 (2007)CrossRef Yi, G., Thon, M.R., Sze, S.-H.: Identifying clusters of functionally related genes in genomes. Bioinformatics 23(9), 1053–1060 (2007)CrossRef
23.
go back to reference Zuckerkandl, E., Pauling, L.: Molecules as documents of evolutionary history. J. Theor. Biol. 8(2), 357–66 (1965)CrossRef Zuckerkandl, E., Pauling, L.: Molecules as documents of evolutionary history. J. Theor. Biol. 8(2), 357–66 (1965)CrossRef
Metadata
Title
Greedy Partition Distance Under Stochastic Models - Analytic Results
Author
Sagi Snir
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-20242-2_22

Premium Partner