Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 4/2020

29.07.2019 | Original Article

DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding

verfasst von: Yongqing Zhang, Shaojie Qiao, Shengjie Ji, Yizhou Li

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Transcription factors are cis-regulatory molecules that bind to specific sub-regions of DNA promoters and initiate transcription, the process that regulates the conversion of genetic information from DNA to RNA. Several computational methods have been developed to predict DNA–protein binding sites in DNA sequence using convolutional neural network (CNN). However, these techniques could indicate the dependency information of DNA sequence information in the framework of CNN. In addition, these methods are not accurate enough in prediction of the DNA–protein binding sites from the DNA sequence. In this study, we employ the bidirectional long short-term memory (BLSTM) and CNN to capture long-term dependencies between the sequence motifs in DNA, which is called DeepSite. Apart from traditional CNN, which includes six layers: input layer, BLSTM layer, CNN layer, pooling layer, full connection layer and output layer, DeepSite approach can predict DNA–protein binding sites with 87.12% sensitivity, 91.06% specificity, 89.19% accuracy and 0.783 MCC, when tested on the 690 Chip-seq experiments from ENCODE. Lastly, we conclude that our proposed method can also be applied to find DNA–protein binding sites in different DNA sequences.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
Zurück zum Zitat Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas R, Morgunova E, Enge M, Taipale M, Wei G (2013) DNA-binding specificities of human transcription factors. Cell 152(1):327–339CrossRef Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas R, Morgunova E, Enge M, Taipale M, Wei G (2013) DNA-binding specificities of human transcription factors. Cell 152(1):327–339CrossRef
2.
Zurück zum Zitat Zhou TY, Shen N, Yang L, Abe N, Horton J, Mann RS, Bussemaker HJ, Gordân R, Rohs R (2015) Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Natl Acad Sci 112(15):4654–4659CrossRef Zhou TY, Shen N, Yang L, Abe N, Horton J, Mann RS, Bussemaker HJ, Gordân R, Rohs R (2015) Quantitative modeling of transcription factor binding specificities using DNA shape. Proc Natl Acad Sci 112(15):4654–4659CrossRef
3.
Zurück zum Zitat Slattery M, Zhou T, Yang L, Dantas AC, Gordan R, Rohs R (2014) Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 39(9):381–399CrossRef Slattery M, Zhou T, Yang L, Dantas AC, Gordan R, Rohs R (2014) Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci 39(9):381–399CrossRef
4.
Zurück zum Zitat Zhang YQ, Cao XY, Zhong S (2016) Genemo: a search engine for web-based functional genomic data. Nucleic Acids Res 44(W1):W122–W127CrossRef Zhang YQ, Cao XY, Zhong S (2016) Genemo: a search engine for web-based functional genomic data. Nucleic Acids Res 44(W1):W122–W127CrossRef
5.
Zurück zum Zitat Fan S, Huang K, Ai R, Wang M, Wang W (2016) Predicting CPG methylation levels by integrating infinium humanmethylation 450 beadchip array data. Genomics 107(4):132–137CrossRef Fan S, Huang K, Ai R, Wang M, Wang W (2016) Predicting CPG methylation levels by integrating infinium humanmethylation 450 beadchip array data. Genomics 107(4):132–137CrossRef
6.
Zurück zum Zitat Furey TS (2012) Chip-seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat Rev Genet 13(12):840–52CrossRef Furey TS (2012) Chip-seq and beyond: new and improved methodologies to detect and characterize protein–DNA interactions. Nat Rev Genet 13(12):840–52CrossRef
7.
Zurück zum Zitat Wang L, Chen J, Wang C, Uuskülareimand L, Chen K, Medinarivera A, Young EJ, Zimmermann MT, Yan H, Sun Z (2014) Mace: model based analysis of chip-exo. Nucleic Acids Res 42(20):e156CrossRef Wang L, Chen J, Wang C, Uuskülareimand L, Chen K, Medinarivera A, Young EJ, Zimmermann MT, Yan H, Sun Z (2014) Mace: model based analysis of chip-exo. Nucleic Acids Res 42(20):e156CrossRef
8.
Zurück zum Zitat He QY, Johnston J, Zeitlinger JL (2015) Chip-nexus: a novel chip-exo protocol for improved detection of in vivo transcription factor binding footprints. Nat Biotechnol 33(4):395–401CrossRef He QY, Johnston J, Zeitlinger JL (2015) Chip-nexus: a novel chip-exo protocol for improved detection of in vivo transcription factor binding footprints. Nat Biotechnol 33(4):395–401CrossRef
9.
Zurück zum Zitat Cirillo D, Bottaorfila T, Tartaglia GG (2015) By the company they keep: interaction networks define the binding ability of transcription factors. Nucleic Acids Res 43(19):e125CrossRef Cirillo D, Bottaorfila T, Tartaglia GG (2015) By the company they keep: interaction networks define the binding ability of transcription factors. Nucleic Acids Res 43(19):e125CrossRef
10.
Zurück zum Zitat Zhang HB, Lin Z, Huang DS (2016) Discmla: an efficient discriminative motif learning algorithm over high-throughput datasets. IEEE ACM Trans Comput Biol Bioinform 15(6):1810–1820CrossRef Zhang HB, Lin Z, Huang DS (2016) Discmla: an efficient discriminative motif learning algorithm over high-throughput datasets. IEEE ACM Trans Comput Biol Bioinform 15(6):1810–1820CrossRef
11.
Zurück zum Zitat Zhu L, Guo WL, Lu CY, Huang DS (2017) Collaborative completion of transcription factor binding profiles via local sensitive unified embedding. IEEE Trans Nanobiosci 15(8):946–958 Zhu L, Guo WL, Lu CY, Huang DS (2017) Collaborative completion of transcription factor binding profiles via local sensitive unified embedding. IEEE Trans Nanobiosci 15(8):946–958
12.
Zurück zum Zitat Schmidt F, Kern F, Ebert P, Baumgarten N, Schulz MH (2018) Tepic 2—an extended framework for transcription factor binding prediction and integrative epigenomic analysis. Bioinformatics 35(9):1608–1619CrossRef Schmidt F, Kern F, Ebert P, Baumgarten N, Schulz MH (2018) Tepic 2—an extended framework for transcription factor binding prediction and integrative epigenomic analysis. Bioinformatics 35(9):1608–1619CrossRef
13.
Zurück zum Zitat Huang DS (2004) A constructive approach for finding arbitrary roots of polynomials by neural networks. IEEE Trans Neural Netw 15(2):477–491CrossRef Huang DS (2004) A constructive approach for finding arbitrary roots of polynomials by neural networks. IEEE Trans Neural Netw 15(2):477–491CrossRef
14.
Zurück zum Zitat Zhang YQ, Zhang DL, Mi G, Ma DC, Li GB, Guo YZ, Li ML, Zhu M (2012) Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions. Comput Biol Chem 36:36–41MathSciNetCrossRef Zhang YQ, Zhang DL, Mi G, Ma DC, Li GB, Guo YZ, Li ML, Zhu M (2012) Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions. Comput Biol Chem 36:36–41MathSciNetCrossRef
15.
Zurück zum Zitat Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18(5):851–869 Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18(5):851–869
16.
Zurück zum Zitat Zhang YQ, Qiao SJ, Ji SJ, Zhou JL (2018) Ensemble-cnn: Predicting dna binding sites in protein sequences by an ensemble deep learning method. In: Proceedings of 2018 international conference on intelligent computing. Springer, Wuhan, China, pp 301–306 Zhang YQ, Qiao SJ, Ji SJ, Zhou JL (2018) Ensemble-cnn: Predicting dna binding sites in protein sequences by an ensemble deep learning method. In: Proceedings of 2018 international conference on intelligent computing. Springer, Wuhan, China, pp 301–306
17.
Zurück zum Zitat Spencer M, Eickholt J, Cheng JL (2015) A deep learning network approach to ab initio protein secondary structure prediction. IEEE ACM Trans Comput Biol Bioinform 12(1):103–112CrossRef Spencer M, Eickholt J, Cheng JL (2015) A deep learning network approach to ab initio protein secondary structure prediction. IEEE ACM Trans Comput Biol Bioinform 12(1):103–112CrossRef
18.
Zurück zum Zitat Chen YF, Li Y, Narayan R, Subramanian A, Xie XH (2016) Gene expression inference with deep learning. Bioinformatics 32(12):1–8CrossRef Chen YF, Li Y, Narayan R, Subramanian A, Xie XH (2016) Gene expression inference with deep learning. Bioinformatics 32(12):1–8CrossRef
19.
Zurück zum Zitat Zhang Y, Qiao S, Ji S, Han N, Liu D, Zhou J (2019) Identification of DNA–protein binding sites by bootstrap multiple convolutional neural networks on sequence information. Eng Appl Artif Intell 79:58–66CrossRef Zhang Y, Qiao S, Ji S, Han N, Liu D, Zhou J (2019) Identification of DNA–protein binding sites by bootstrap multiple convolutional neural networks on sequence information. Eng Appl Artif Intell 79:58–66CrossRef
20.
Zurück zum Zitat Asgari E, Mofrad MRK (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One 10(11):1–15CrossRef Asgari E, Mofrad MRK (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One 10(11):1–15CrossRef
21.
Zurück zum Zitat Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–839CrossRef Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–839CrossRef
22.
Zurück zum Zitat Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12(10):931–934CrossRef Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12(10):931–934CrossRef
23.
Zurück zum Zitat Zeng H, Edwards MD, Liu G, Gifford DK (2016) Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32(12):i121–i127CrossRef Zeng H, Edwards MD, Liu G, Gifford DK (2016) Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32(12):i121–i127CrossRef
24.
Zurück zum Zitat Cao Z, Zhang SH (2018) Simple tricks of convolutional neural network architectures improve DNA–protein binding prediction. Bioinformatics 35(11):1837–1843CrossRef Cao Z, Zhang SH (2018) Simple tricks of convolutional neural network architectures improve DNA–protein binding prediction. Bioinformatics 35(11):1837–1843CrossRef
25.
Zurück zum Zitat Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S (2012) Gencode: the reference human genome annotation for the encode project. Genome Res 22(9):1760–1774CrossRef Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S (2012) Gencode: the reference human genome annotation for the encode project. Genome Res 22(9):1760–1774CrossRef
26.
Zurück zum Zitat Wang X, Wang R, Chen X (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715CrossRef Wang X, Wang R, Chen X (2018) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715CrossRef
27.
Zurück zum Zitat Wang R, Wang X, Kwong S, Chen X (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475CrossRef Wang R, Wang X, Kwong S, Chen X (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475CrossRef
28.
Zurück zum Zitat Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing. IEEE, Vancouver, BC, Canada, pp 6645–6649 Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing. IEEE, Vancouver, BC, Canada, pp 6645–6649
29.
Zurück zum Zitat Zhu L, Deng SP, Huang S (2015) A two-stage geometric method for pruning unreliable links in protein–protein networks. IEEE Trans Nanobiosci 14(5):528–534CrossRef Zhu L, Deng SP, Huang S (2015) A two-stage geometric method for pruning unreliable links in protein–protein networks. IEEE Trans Nanobiosci 14(5):528–534CrossRef
30.
Zurück zum Zitat Klaus G, Rupesh KS, Jan K, Bas RS, Jürgen S (2015) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNet Klaus G, Rupesh KS, Jan K, Bas RS, Jürgen S (2015) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232MathSciNet
31.
Zurück zum Zitat Krizhevsky A, Sutskever T, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems. Lake Tahoe, Nevada, USA, pp 1097–1105 Krizhevsky A, Sutskever T, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems. Lake Tahoe, Nevada, USA, pp 1097–1105
32.
Zurück zum Zitat Abdel-Hamid O, Mohamed AR, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE international conference on acoustics, speech and signal processing. IEEE, Kyoto, Japan, pp 4277–4280CrossRef Abdel-Hamid O, Mohamed AR, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: 2012 IEEE international conference on acoustics, speech and signal processing. IEEE, Kyoto, Japan, pp 4277–4280CrossRef
33.
Zurück zum Zitat Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: 2014 IEEE conference on computer vision and pattern recognition. IEEE, Columbus, OH, USA, pp 1725–1732CrossRef Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Li FF (2014) Large-scale video classification with convolutional neural networks. In: 2014 IEEE conference on computer vision and pattern recognition. IEEE, Columbus, OH, USA, pp 1725–1732CrossRef
34.
Zurück zum Zitat Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition. IEEE, Tsukuba, Japan, pp 3304–3308 Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st international conference on pattern recognition. IEEE, Tsukuba, Japan, pp 3304–3308
35.
Zurück zum Zitat Cecotti H, Graser A (2011) Convolutional neural networks for p300 detection with application to brain–computer interfaces. IEEE Trans Pattern Anal Mach Intell 33(3):433–445CrossRef Cecotti H, Graser A (2011) Convolutional neural networks for p300 detection with application to brain–computer interfaces. IEEE Trans Pattern Anal Mach Intell 33(3):433–445CrossRef
36.
Zurück zum Zitat Ouyang WL, Wang XG, Zeng XY, Qiu S, Luo P, Tian YL, Li HS, Yang S, Wang Z, Loy CC (2015) Deepid-net: deformable deep convolutional neural networks for object detection. In: IEEE conference on computer vision and pattern recognition. IEEE, Boston, MA, USA, pp 2403–2412 Ouyang WL, Wang XG, Zeng XY, Qiu S, Luo P, Tian YL, Li HS, Yang S, Wang Z, Loy CC (2015) Deepid-net: deformable deep convolutional neural networks for object detection. In: IEEE conference on computer vision and pattern recognition. IEEE, Boston, MA, USA, pp 2403–2412
37.
Zurück zum Zitat Wang X, Xing H, Li Y, Hua Q, Dong C, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef Wang X, Xing H, Li Y, Hua Q, Dong C, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef
38.
Zurück zum Zitat Kingma D, Ba J (2014) ADAM: a method for stochastic optimization. In: Proceedings of 3rd international conference on learning representations. San Diego, CA, USA, pp 1–15 Kingma D, Ba J (2014) ADAM: a method for stochastic optimization. In: Proceedings of 3rd international conference on learning representations. San Diego, CA, USA, pp 1–15
39.
Zurück zum Zitat Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536CrossRef Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536CrossRef
40.
Zurück zum Zitat Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):257–269MathSciNetMATH Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):257–269MathSciNetMATH
41.
Zurück zum Zitat Wang X, Zhang T, Wang R (2019) Non-iterative deep learning: incorporating restricted Boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380CrossRef Wang X, Zhang T, Wang R (2019) Non-iterative deep learning: incorporating restricted Boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380CrossRef
42.
Zurück zum Zitat Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH
Metadaten
Titel
DeepSite: bidirectional LSTM and CNN models for predicting DNA–protein binding
verfasst von
Yongqing Zhang
Shaojie Qiao
Shengjie Ji
Yizhou Li
Publikationsdatum
29.07.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 4/2020
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-019-00990-x

Weitere Artikel der Ausgabe 4/2020

International Journal of Machine Learning and Cybernetics 4/2020 Zur Ausgabe

Neuer Inhalt