Skip to main content
Top

2019 | OriginalPaper | Chapter

DeepTF: Accurate Prediction of Transcription Factor Binding Sites by Combining Multi-scale Convolution and Long Short-Term Memory Neural Network

Authors : Xiao-Rong Bao, Yi-Heng Zhu, Dong-Jun Yu

Published in: Intelligence Science and Big Data Engineering. Big Data and Machine Learning

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Transcription factor binding site (TFBS), one of the DNA-protein binding sites, plays important roles in understanding regulation of gene expression and drug design. Recently, deep-learning based methods have been widely used in the prediction of TFBS. In this work, we propose a novel deep-learning model, called Combination of Multi-Scale Convolutional Network and Long Short-Term Memory Network (MCNN-LSTM), which utilizes multi-scale convolution for feature processing, and the long short-term memory network to recognize TFBS in DNA sequences. Moreover, we design a new encoding method, called multi-nucleotide one-hot (MNOH), which considers the correlation between nucleotides in adjacent positions, to further improve the prediction performance of TFBS. Stringent cross-validation and independent tests on benchmark datasets demonstrated the efficacy of MNOH and MCNN-LSTM. Based on the proposed methods, we further implement a new TFBS predictor, called DeepTF. The computational experimental results show that our predictor outperformed several existing TFBS predictors.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lee, D., et al.: A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47(8), 955 (2015)CrossRef Lee, D., et al.: A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47(8), 955 (2015)CrossRef
2.
go back to reference Kharchenko, P.V., Tolstorukov, M.Y., Park, P.J.: Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26(12), 1351 (2008)CrossRef Kharchenko, P.V., Tolstorukov, M.Y., Park, P.J.: Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26(12), 1351 (2008)CrossRef
3.
go back to reference Ji, H., Jiang, H., Ma, W., Johnson, D.S., Myers, R.M., Wong, W.H.: An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26(11), 1293 (2008)CrossRef Ji, H., Jiang, H., Ma, W., Johnson, D.S., Myers, R.M., Wong, W.H.: An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26(11), 1293 (2008)CrossRef
4.
go back to reference Siggers, T., Gordân, R.: Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42(4), 2099–2111 (2013)CrossRef Siggers, T., Gordân, R.: Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42(4), 2099–2111 (2013)CrossRef
5.
go back to reference Fletez-Brant, C., Lee, D., McCallion, A.S., Beer, M.A.: kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res. 41(W1), W544–W556 (2013)CrossRef Fletez-Brant, C., Lee, D., McCallion, A.S., Beer, M.A.: kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res. 41(W1), W544–W556 (2013)CrossRef
6.
go back to reference Wong, K.C., Chan, T.M., Peng, C., Li, Y., Zhang, Z.: DNA motif elucidation using belief propagation. Nucleic Acids Res. 41(16), e153–e153 (2013)CrossRef Wong, K.C., Chan, T.M., Peng, C., Li, Y., Zhang, Z.: DNA motif elucidation using belief propagation. Nucleic Acids Res. 41(16), e153–e153 (2013)CrossRef
7.
go back to reference Ghandi, M., Lee, D., Mohammad-Noori, M., Beer, M.A.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10(7), e1003711 (2014)CrossRef Ghandi, M., Lee, D., Mohammad-Noori, M., Beer, M.A.: Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10(7), e1003711 (2014)CrossRef
8.
go back to reference Nutiu, R., et al.: Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29(7), 659 (2011)CrossRef Nutiu, R., et al.: Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29(7), 659 (2011)CrossRef
9.
go back to reference Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831 (2015)CrossRef Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831 (2015)CrossRef
10.
go back to reference Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32(12), i121–i127 (2016)CrossRef Zeng, H., Edwards, M.D., Liu, G., Gifford, D.K.: Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32(12), i121–i127 (2016)CrossRef
11.
go back to reference LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)CrossRef
12.
go back to reference Hassanzadeh, H.R., Wang, M.D.: DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 178–183. IEEE (2016) Hassanzadeh, H.R., Wang, M.D.: DeeperBind: enhancing prediction of sequence specificities of DNA binding proteins. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 178–183. IEEE (2016)
13.
go back to reference Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12(10), 931 (2015)CrossRef Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12(10), 931 (2015)CrossRef
14.
go back to reference Siebert, M., Söding, J.: Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences. Nucleic Acids Res. 44(13), 6055–6069 (2016)CrossRef Siebert, M., Söding, J.: Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences. Nucleic Acids Res. 44(13), 6055–6069 (2016)CrossRef
15.
go back to reference Salekin, S., Zhang, J.M., Huang, Y.: Base-pair resolution detection of transcription factor binding site by deep deconvolutional network. Bioinformatics 34(20), 3446–3453 (2018)CrossRef Salekin, S., Zhang, J.M., Huang, Y.: Base-pair resolution detection of transcription factor binding site by deep deconvolutional network. Bioinformatics 34(20), 3446–3453 (2018)CrossRef
16.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
17.
go back to reference Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence & Statistics, AISTATS, vol. 130, p. 297 (2011) Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence & Statistics, AISTATS, vol. 130, p. 297 (2011)
18.
go back to reference Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2013) Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278. IEEE (2013)
19.
go back to reference Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
20.
go back to reference Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH
21.
go back to reference Hu, J., Zhou, X., Zhu, Y.H., Yu, D.J., Zhang, G.: TargetDBP: accurate DNA-binding protein prediction via sequence-based multi-view feature learning. IEEE/ACM Trans. Comput. Biol. Bioinform. (2019) Hu, J., Zhou, X., Zhu, Y.H., Yu, D.J., Zhang, G.: TargetDBP: accurate DNA-binding protein prediction via sequence-based multi-view feature learning. IEEE/ACM Trans. Comput. Biol. Bioinform. (2019)
22.
go back to reference Zhu, Y.H., Hu, J., Song, X.N., Yu, D.J.: DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J. Chem. Inf. Model. (2019) Zhu, Y.H., Hu, J., Song, X.N., Yu, D.J.: DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J. Chem. Inf. Model. (2019)
23.
go back to reference Ren, H., Shen, Y.: RNA-binding residues prediction using structural features. BMC Bioinform. 16(1), 249 (2015)CrossRef Ren, H., Shen, Y.: RNA-binding residues prediction using structural features. BMC Bioinform. 16(1), 249 (2015)CrossRef
24.
go back to reference Chen, K., Mizianty, M.J., Kurgan, L.: ATPsite: sequence-based prediction of ATP-binding residues. In: Proteome Science, vol. 9, p. S4. BioMed Central (2011)CrossRef Chen, K., Mizianty, M.J., Kurgan, L.: ATPsite: sequence-based prediction of ATP-binding residues. In: Proteome Science, vol. 9, p. S4. BioMed Central (2011)CrossRef
Metadata
Title
DeepTF: Accurate Prediction of Transcription Factor Binding Sites by Combining Multi-scale Convolution and Long Short-Term Memory Neural Network
Authors
Xiao-Rong Bao
Yi-Heng Zhu
Dong-Jun Yu
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-36204-1_10

Premium Partner