Skip to main content
Top

2020 | OriginalPaper | Chapter

Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks

Authors : Luca Cappelletti, Alessandro Petrini, Jessica Gliozzo, Elena Casiraghi, Max Schubach, Martin Kircher, Giorgio Valentini

Published in: Bioinformatics and Biomedical Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The annotation and characterization of tissue-specific cis-regulatory elements (CREs) in non-coding DNA represents an open challenge in computational genomics. Several prior works show that machine learning methods, using epigenetic or spectral features directly extracted from DNA sequences, can predict active promoters and enhancers in specific tissues or cell lines. In particular, very recently deep-learning techniques obtained state-of-the-art results in this challenging computational task. In this study, we provide additional evidence that Feed Forward Neural Networks (FFNN) trained on epigenetic data and one-dimensional convolutional neural networks (CNN) trained on DNA sequence data can successfully predict active regulatory regions in different cell lines. We show that model selection by means of Bayesian optimization applied to both FFNN and CNN models can significantly improve deep neural network performance, by automatically finding models that best fit the data. Further, we show that techniques applied to balance active and non-active regulatory regions in the human genome in training and test data may lead to over-optimistic or poor predictions. We recommend to use actual imbalanced data that was not used to train the models for evaluating their generalization performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Mora, A., Sandve, G.K., Gabrielsen, O.S., Eskeland, R.: In the loop: promoter-enhancer interactions and bioinformatics. Brief. Bioinform. 17, 980–995 (2016)PubMed Mora, A., Sandve, G.K., Gabrielsen, O.S., Eskeland, R.: In the loop: promoter-enhancer interactions and bioinformatics. Brief. Bioinform. 17, 980–995 (2016)PubMed
4.
go back to reference Schubach, M., Re, M., Robinson, P.N., Valentini, G.: Imbalance-aware machine learning for predicting rare and commondisease-associated non-coding variants. Sci. Rep. 7(1), 1–2 (2017)CrossRef Schubach, M., Re, M., Robinson, P.N., Valentini, G.: Imbalance-aware machine learning for predicting rare and commondisease-associated non-coding variants. Sci. Rep. 7(1), 1–2 (2017)CrossRef
5.
go back to reference Rentzsch, P., Witten, D., Cooper, G., Shendure, J., Kircher, M.: CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019)PubMedCrossRef Rentzsch, P., Witten, D., Cooper, G., Shendure, J., Kircher, M.: CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019)PubMedCrossRef
6.
go back to reference Javierre, B., et al.: Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016)PubMedPubMedCentralCrossRef Javierre, B., et al.: Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016)PubMedPubMedCentralCrossRef
8.
go back to reference Dunham, I., et al.: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)CrossRef Dunham, I., et al.: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)CrossRef
15.
go back to reference Hoffman, M.M., Buske, O.J., Wang, J., Weng, Z., Bilmes, J.A., Noble, W.S.: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473 (2012)PubMedPubMedCentralCrossRef Hoffman, M.M., Buske, O.J., Wang, J., Weng, Z., Bilmes, J.A., Noble, W.S.: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473 (2012)PubMedPubMedCentralCrossRef
16.
go back to reference Kwasnieski, J.C., Fiore, C., Chaudhari, H.G., Cohen, B.A.: High-throughput functional testing of encode segmentation predictions. Genome Res. 24, 1595–1602 (2014)PubMedPubMedCentralCrossRef Kwasnieski, J.C., Fiore, C., Chaudhari, H.G., Cohen, B.A.: High-throughput functional testing of encode segmentation predictions. Genome Res. 24, 1595–1602 (2014)PubMedPubMedCentralCrossRef
17.
go back to reference Yip, K.Y., et al.: Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012)PubMedPubMedCentralCrossRef Yip, K.Y., et al.: Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012)PubMedPubMedCentralCrossRef
18.
go back to reference Lu, Y., Qu, W., Shan, G., Zhang, C.: DELTA: a distal enhancer locating tool based on AdaBoost algorithm and shape features of chromatin modifications. PLoS ONE 10, e0130622 (2015)PubMedPubMedCentralCrossRef Lu, Y., Qu, W., Shan, G., Zhang, C.: DELTA: a distal enhancer locating tool based on AdaBoost algorithm and shape features of chromatin modifications. PLoS ONE 10, e0130622 (2015)PubMedPubMedCentralCrossRef
19.
21.
go back to reference Li, Y., Shi, W., Wasserman, W.W.: Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics 19, 202 (2018)PubMedPubMedCentralCrossRef Li, Y., Shi, W., Wasserman, W.W.: Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics 19, 202 (2018)PubMedPubMedCentralCrossRef
22.
go back to reference Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)PubMedCrossRef Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)PubMedCrossRef
23.
go back to reference Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)PubMedCrossRef Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)PubMedCrossRef
24.
25.
go back to reference Yang, B., et al.: BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 33(13), 1930–1936 (2017)PubMedCrossRef Yang, B., et al.: BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics 33(13), 1930–1936 (2017)PubMedCrossRef
26.
28.
30.
go back to reference van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008) van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
31.
go back to reference Hierlemann, A., Schweizer-Berberich, M., Weimar, U., Kraus, G., Pfau, A., Göpel, W.: Pattern recognition and multicomponent analysis. Sens. Update 2, 119–180 (1996)CrossRef Hierlemann, A., Schweizer-Berberich, M., Weimar, U., Kraus, G., Pfau, A., Göpel, W.: Pattern recognition and multicomponent analysis. Sens. Update 2, 119–180 (1996)CrossRef
33.
go back to reference Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016) Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:​1603.​04467 (2016)
34.
go back to reference Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012) Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
35.
go back to reference Swersky, K., Snoek, J., Adams, P.: Multi-task Bayesian optimization. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2004–2012. Curran Associates, Inc., Red Hook (2013) Swersky, K., Snoek, J., Adams, P.: Multi-task Bayesian optimization. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2004–2012. Curran Associates, Inc., Red Hook (2013)
36.
go back to reference Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016)CrossRef Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148–175 (2016)CrossRef
37.
go back to reference Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010) Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)
38.
go back to reference Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2012, pp. 2951–2959. Curran Associates, Inc., Red Hook (2012) Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2012, pp. 2951–2959. Curran Associates, Inc., Red Hook (2012)
39.
go back to reference Dozat, T.: Incorporating Nesterov momentum into Adam. In: International Conference on Learning Representations, Workshop (ICLRW), pp. 1–6 (2016) Dozat, T.: Incorporating Nesterov momentum into Adam. In: International Conference on Learning Representations, Workshop (ICLRW), pp. 1–6 (2016)
42.
go back to reference Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)CrossRef Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006)CrossRef
43.
go back to reference He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)CrossRef He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 1263–1284 (2009)CrossRef
44.
go back to reference Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, 1–21 (2015) Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, 1–21 (2015)
45.
go back to reference Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1, 80–83 (1945)CrossRef Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1, 80–83 (1945)CrossRef
46.
go back to reference Pratt, J.W.: Remarks on zeros and ties in the Wilcoxon signed rank procedures. J. Am. Stat. Assoc. 54, 655–667 (1959)CrossRef Pratt, J.W.: Remarks on zeros and ties in the Wilcoxon signed rank procedures. J. Am. Stat. Assoc. 54, 655–667 (1959)CrossRef
47.
go back to reference Derrick, B., Paul W.: Comparing two samples from an individual Likert question. Int. J. Math. Stat. 18(3) (2017) Derrick, B., Paul W.: Comparing two samples from an individual Likert question. Int. J. Math. Stat. 18(3) (2017)
Metadata
Title
Bayesian Optimization Improves Tissue-Specific Prediction of Active Regulatory Regions with Deep Neural Networks
Authors
Luca Cappelletti
Alessandro Petrini
Jessica Gliozzo
Elena Casiraghi
Max Schubach
Martin Kircher
Giorgio Valentini
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-45385-5_54

Premium Partner