Skip to main content
Top

2020 | OriginalPaper | Chapter

DeepED: A Deep Learning Framework for Estimating Evolutionary Distances

Authors : Zhuangzhuang Liu, Mingming Ren, Zhiheng Niu, Gang Wang, Xiaoguang Liu

Published in: Artificial Neural Networks and Machine Learning – ICANN 2020

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Evolutionary distances refer to the number of substitutions per site in two aligned nucleotide or amino acid sequences, which reflect divergence time and are much significant for phylogenetic inferences. In the past several decades, lots of molecular evolution models have been proposed for evolutionary distance estimation. Most of these models are designed under more or less assumptions and some assumptions are in good agreement with some real-world data but not all. To relax these assumptions and improve accuracies in evolutionary distance estimation, this paper proposes a framework containing Deep Neural Networks (DNNs), called DeepED (Deep learning method to estimate Evolutionary Distances), to estimate evolutionary distances for aligned DNA sequence pairs. The purposely designed structure in this framework enables it to handle long and variable length sequences as well as to find important segments in a sequence. The models of the network are trained with reliable data from real world which includes highly credible phylogenetic inferences. Experimental results demonstrate that DeepED models achieve a accuracy up to 0.98 (R-Squared), which outperforms traditional methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jukes, T.H., Cantor, C.R., et al.: Evolution of protein molecules. Mamm. Protein Metab. 3(21), 132 (1969) Jukes, T.H., Cantor, C.R., et al.: Evolution of protein molecules. Mamm. Protein Metab. 3(21), 132 (1969)
2.
go back to reference Posada, D., Crandall, K.A.: Selecting the best-fit model of nucleotide substitution. Syst. Biol. 50(4), 580–601 (2001)CrossRef Posada, D., Crandall, K.A.: Selecting the best-fit model of nucleotide substitution. Syst. Biol. 50(4), 580–601 (2001)CrossRef
3.
go back to reference Cunningham, C.W., Zhu, H., Hillis, D.M.: Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies. Evolution 52(4), 978–987 (1998)CrossRef Cunningham, C.W., Zhu, H., Hillis, D.M.: Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies. Evolution 52(4), 978–987 (1998)CrossRef
4.
go back to reference LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef
5.
go back to reference Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015) Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
6.
go back to reference Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
7.
go back to reference Tajima, F., Nei, M.: Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol. 1(3), 269–285 (1984) Tajima, F., Nei, M.: Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol. 1(3), 269–285 (1984)
8.
go back to reference Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16(2), 111–120 (1980)CrossRef Kimura, M.: A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16(2), 111–120 (1980)CrossRef
9.
go back to reference Tamura, K.: Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+ C-content biases. Mol. Biol. Evol. 9(4), 678–687 (1992) Tamura, K.: Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+ C-content biases. Mol. Biol. Evol. 9(4), 678–687 (1992)
10.
go back to reference Waddell, P.J., Steel, M.A.: General time reversible distances with unequal rates across sites (1996) Waddell, P.J., Steel, M.A.: General time reversible distances with unequal rates across sites (1996)
11.
go back to reference Zhang, J., Xun, G.: Correlation between the substitution rate and rate variation among sites in protein evolution. Genetics 149(3), 1615–1625 (1998) Zhang, J., Xun, G.: Correlation between the substitution rate and rate variation among sites in protein evolution. Genetics 149(3), 1615–1625 (1998)
12.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
13.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
14.
go back to reference Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008) Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167 (2008)
15.
go back to reference Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
16.
go back to reference Hornik, K., Stinchcombe, M., White, H., et al.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)MATHCrossRef Hornik, K., Stinchcombe, M., White, H., et al.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)MATHCrossRef
17.
go back to reference Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3(2), 246–257 (1991)CrossRef Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3(2), 246–257 (1991)CrossRef
18.
go back to reference Zhang, H., et al.: A new method of RNA secondary structure prediction based on convolutional neural network and dynamic programming. Front. Genet. 10, 467 (2019)CrossRef Zhang, H., et al.: A new method of RNA secondary structure prediction based on convolutional neural network and dynamic programming. Front. Genet. 10, 467 (2019)CrossRef
19.
go back to reference Wang, R., et al.: Deepdna: a hybrid convolutional and recurrent neural network for compressing human mitochondrial genomes. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 270–274. IEEE (2018) Wang, R., et al.: Deepdna: a hybrid convolutional and recurrent neural network for compressing human mitochondrial genomes. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 270–274. IEEE (2018)
20.
go back to reference Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015) Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
23.
go back to reference Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010) Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
24.
go back to reference Perelman, P., et al.: A molecular phylogeny of living primates. PLoS Genet. 7(3), e1001342 (2011)CrossRef Perelman, P., et al.: A molecular phylogeny of living primates. PLoS Genet. 7(3), e1001342 (2011)CrossRef
25.
go back to reference Katoh, K., Standley, D.M.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013)CrossRef Katoh, K., Standley, D.M.: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013)CrossRef
26.
go back to reference Bouckaert, R., et al.: Beast 2: a software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10(4), e1003537 (2014)CrossRef Bouckaert, R., et al.: Beast 2: a software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10(4), e1003537 (2014)CrossRef
27.
go back to reference Hughes, L.C., et al.: Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data. Proc. Natl. Acad. Sci. 115(24), 6249–6254 (2018)CrossRef Hughes, L.C., et al.: Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data. Proc. Natl. Acad. Sci. 115(24), 6249–6254 (2018)CrossRef
28.
go back to reference Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)CrossRef Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)CrossRef
29.
go back to reference Kumar, S., Stecher, G., Tamura, K.: Mega7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33(7), 1870–1874 (2016)CrossRef Kumar, S., Stecher, G., Tamura, K.: Mega7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33(7), 1870–1874 (2016)CrossRef
30.
go back to reference Song, N., Liang, A.-P., Bu, C.-P.: A molecular phylogeny of hemiptera inferred from mitochondrial genome sequences. PLoS ONE 7(11), e48778 (2012)CrossRef Song, N., Liang, A.-P., Bu, C.-P.: A molecular phylogeny of hemiptera inferred from mitochondrial genome sequences. PLoS ONE 7(11), e48778 (2012)CrossRef
Metadata
Title
DeepED: A Deep Learning Framework for Estimating Evolutionary Distances
Authors
Zhuangzhuang Liu
Mingming Ren
Zhiheng Niu
Gang Wang
Xiaoguang Liu
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-61609-0_26

Premium Partner