Skip to main content
Top
Published in: Network Modeling Analysis in Health Informatics and Bioinformatics 1/2023

01-12-2023 | Review Article

Human DNA/RNA motif mining using deep-learning methods: a scoping review

Authors: Rajashree Chaurasia, Udayan Ghose

Published in: Network Modeling Analysis in Health Informatics and Bioinformatics | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The current study aims to develop robust contextual knowledge of deep-learning methodology for DNA/RNA motif sequence identification and recognition of correct transcription factor-binding sites (TFBS) for gene regulatory mechanisms in humans. Knowledge of the exact sequence specificities of DNA- and RNA-binding to particular transcriptional factors (TF) seems to be an excellent strategy to develop unique deep-learning models for gene regulatory processes. But uncertainty in the sequence specificity of genomic sequences to a particular TFBS is a big issue. It may be possible to resolve this issue using deep-learning techniques, and thus, it will be helpful to gain generalizable domain knowledge of deep-learning architectures, which offers researchers to know better, their performance to select a unified computational approach for the discovery of a selective kind of motif pattern. This scoping review serves to synthesize evidence for DNA/RNA motif sequences binding with transcriptional factor sites using the PRISMA-ScR guidelines (Preferred Reporting Items for Systematic reviews and Meta-Analyses of Scoping Reviews) to better understand and further assessment of the scope of literature on DNA or RNA motif mining using deep-learning methods. A deep-learning architecture literature survey for DNA and RNA sequence specificity for human ChIP-seq (Chromatin Immuno-Precipitation sequence), DNase-seq (DNase hypersensitive site sequence), CLIP-seq (Cross Linking Immuno-Precipitation sequence), ATAC-seq (Assay for Transposase-Accessible Chromatin sequence), etc. datasets, common motif pattern, and their corresponding TF-DNA/RNA-binding site affinities are included in this study. Deep-learning (DL) models have been used to find selective motifs and have been demonstrated to be more reproducible than traditional methods. As per our literature survey, 33 DL models exist to detect DNA/RNA motifs that have varied framework designs and implementation styles. Through literature survey and PRISMA-ScR reporting guidelines, it is easy to analytically evaluate the performances of each DL model in terms of model size, automatic calibration ability, tool selection, and training set, and it has been found that the DESSO (DEep Sequence and Shape mOtif), DeepFinder, and DeepBind are the selective DL models that are appropriate to study the true biological relationship, especially concerning gene expression patterns and sequence analysis. This study concludes that the application of existing deep-learning methods in the field of motif discovery is the faster way to process complex data relevant to genomic sequences. Through the PRISMA-ScR reporting guidelines and literature survey analysis, more than 30 existing deep-learning models are compared, and it is concluded that complex DL models are preferred over simpler DL models in terms of performance and scalability evaluation. Selective selection of a DL model architecture can be made to understand the complex behavior of motifs and their associated regulatory mechanism at the gene level.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, Ouzzani M, Thayer K, Thomas J, Turner T, Xia J, Robinson K, Glasziou P (2018) Making progress with the automation of systematic reviews: principles of the international collaboration for the automation of systematic reviews (ICASR). Syst Rev. https://doi.org/10.1186/s13643-018-0740-7 Beller E, Clark J, Tsafnat G, Adams C, Diehl H, Lund H, Ouzzani M, Thayer K, Thomas J, Turner T, Xia J, Robinson K, Glasziou P (2018) Making progress with the automation of systematic reviews: principles of the international collaboration for the automation of systematic reviews (ICASR). Syst Rev. https://​doi.​org/​10.​1186/​s13643-018-0740-7
go back to reference Castro-Mondragon JA, Riudavets-Puig R, Rauluseviciute I, Berhanu Lemma R, Turchi L, Blanc-Mathieu R, Lucas J, Boddie P, Khan A, Manosalva Pérez N, Fornes O, Leung TY, Aguirre A, Hammal F, Schmelter D, Baranasic D, Ballester B, Sandelin A, Lenhard B, Mathelier A (2021) Jaspar 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucl Acids Res. https://doi.org/10.1093/nar/gkab1113 Castro-Mondragon JA, Riudavets-Puig R, Rauluseviciute I, Berhanu Lemma R, Turchi L, Blanc-Mathieu R, Lucas J, Boddie P, Khan A, Manosalva Pérez N, Fornes O, Leung TY, Aguirre A, Hammal F, Schmelter D, Baranasic D, Ballester B, Sandelin A, Lenhard B, Mathelier A (2021) Jaspar 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucl Acids Res. https://​doi.​org/​10.​1093/​nar/​gkab1113
go back to reference Cheng SH, Augustin C, Bethel A, Gill D, Anzaroot S, Brun J, DeWilde B, Minnich RC, Garside R, Masuda YJ, Miller DC, Wilkie D, Wongbusarakum S, McKinnon MC (2018) Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv Biol 32(4):762–764. https://doi.org/10.1111/cobi.13117CrossRef Cheng SH, Augustin C, Bethel A, Gill D, Anzaroot S, Brun J, DeWilde B, Minnich RC, Garside R, Masuda YJ, Miller DC, Wilkie D, Wongbusarakum S, McKinnon MC (2018) Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv Biol 32(4):762–764. https://​doi.​org/​10.​1111/​cobi.​13117CrossRef
go back to reference Chiang W-L, Liu X, Si S, Li Y, Bengio S, Hsieh C-J (2019) Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining. https://doi.org/10.1145/3292500.3330925 Chiang W-L, Liu X, Si S, Li Y, Bengio S, Hsieh C-J (2019) Cluster-GCN: an efficient algorithm for training deep and large graph convolutional networks. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining. https://​doi.​org/​10.​1145/​3292500.​3330925
go back to reference Kusupati A, Singh M, Bhatia K, Kumar A, Jain P, Varma M (2019) FastGRNN: a fast, accurate, stable, and tiny kilobyte-sized gated recurrent neural network. Retrieved March 12, 2022, from arXiv:1901.02358 Kusupati A, Singh M, Bhatia K, Kumar A, Jain P, Varma M (2019) FastGRNN: a fast, accurate, stable, and tiny kilobyte-sized gated recurrent neural network. Retrieved March 12, 2022, from arXiv:​1901.​02358
go back to reference Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS, Lam B, Myers Z, Sud P, Jou J, Lin K, Baymuradov UK, Graham K, Litton C, Miyasato SR, Strattan JS, Jolanki O, Lee J-W, Tanaka FY, Adenekan P, Cherry JM (2019) New Developments on the encyclopedia of DNA elements (encode) data portal. Nucl Acids Res. https://doi.org/10.1093/nar/gkz1062 Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS, Lam B, Myers Z, Sud P, Jou J, Lin K, Baymuradov UK, Graham K, Litton C, Miyasato SR, Strattan JS, Jolanki O, Lee J-W, Tanaka FY, Adenekan P, Cherry JM (2019) New Developments on the encyclopedia of DNA elements (encode) data portal. Nucl Acids Res. https://​doi.​org/​10.​1093/​nar/​gkz1062
go back to reference Quan L, Chu X, Sun X, Wu T, Lyu Q (2022) How deepbics quantifies intensities of transcription factor-DNA binding and facilitates prediction of single nucleotide variant pathogenicity with a deep learning model trained on ChIP-seq data sets (Pre-Print). In: IEEE/ACM transactions on computational biology and bioinformatics. https://doi.org/10.1109/tcbb.2022.3170343 Quan L, Chu X, Sun X, Wu T, Lyu Q (2022) How deepbics quantifies intensities of transcription factor-DNA binding and facilitates prediction of single nucleotide variant pathogenicity with a deep learning model trained on ChIP-seq data sets (Pre-Print). In: IEEE/ACM transactions on computational biology and bioinformatics. https://​doi.​org/​10.​1109/​tcbb.​2022.​3170343
go back to reference Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RA, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ (2022) Current progress and open challenges for applying deep learning across the biosciences. Nat Commun. https://doi.org/10.1038/s41467-022-29268-7 Sapoval N, Aghazadeh A, Nute MG, Antunes DA, Balaji A, Baraniuk R, Barberan CJ, Dannenfelser R, Dun C, Edrisi M, Elworth RA, Kille B, Kyrillidis A, Nakhleh L, Wolfe CR, Yan Z, Yao V, Treangen TJ (2022) Current progress and open challenges for applying deep learning across the biosciences. Nat Commun. https://​doi.​org/​10.​1038/​s41467-022-29268-7
go back to reference Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th international conference on neural information processing systems. Conference proceedings. Retrieved September 24, 2022. https://doi.org/10.5555/2969033.2969173 Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th international conference on neural information processing systems. Conference proceedings. Retrieved September 24, 2022. https://​doi.​org/​10.​5555/​2969033.​2969173
go back to reference Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, Moher D, Peters MDJ, Horsley T, Weeks L, Hempel S, Akl EA, Chang C, McGowan J, Stewart L, Hartling L, Aldcroft A, Wilson MG, Garritty C, Straus SE (2018) Prisma extension for scoping reviews (PRISMA-SCR): checklist and explanation. Ann Intern Med 169(7):467–473. https://doi.org/10.7326/m18-0850 Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, Moher D, Peters MDJ, Horsley T, Weeks L, Hempel S, Akl EA, Chang C, McGowan J, Stewart L, Hartling L, Aldcroft A, Wilson MG, Garritty C, Straus SE (2018) Prisma extension for scoping reviews (PRISMA-SCR): checklist and explanation. Ann Intern Med 169(7):467–473. https://​doi.​org/​10.​7326/​m18-0850
go back to reference Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA (2012) Deploying an interactive machine learning system in an evidence-based practice center. In: Proceedings of the 2nd ACM SIGHIT symposium on international health informatics—IHI’12. https://doi.org/10.1145/2110363.2110464 Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA (2012) Deploying an interactive machine learning system in an evidence-based practice center. In: Proceedings of the 2nd ACM SIGHIT symposium on international health informatics—IHI’12. https://​doi.​org/​10.​1145/​2110363.​2110464
go back to reference Wang M, Tai C, Weinan E, Wei L (2018) Define: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucl Acids Res. https://doi.org/10.1093/nar/gky215 Wang M, Tai C, Weinan E, Wei L (2018) Define: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucl Acids Res. https://​doi.​org/​10.​1093/​nar/​gky215
go back to reference Zaheer M, Guruganesh G, Dubey A, Ainslie J, Alberti C, Ontanon S, Pham P, Ravula A, Wang Q, Yang L, Ahmed A (2021) Big bird: transformers for longer sequences. Retrieved April 24, 2022, from arXiv:2007.14062 Zaheer M, Guruganesh G, Dubey A, Ainslie J, Alberti C, Ontanon S, Pham P, Ravula A, Wang Q, Yang L, Ahmed A (2021) Big bird: transformers for longer sequences. Retrieved April 24, 2022, from arXiv:​2007.​14062
go back to reference Zou D, Hu Z, Wang Y, Jiang S, Sun Y, Gu Q (2019) Layer-dependent importance sampling for training deep and large graph convolutional networks. Retrieved March 27, 2022, from arXiv:1911.07323 Zou D, Hu Z, Wang Y, Jiang S, Sun Y, Gu Q (2019) Layer-dependent importance sampling for training deep and large graph convolutional networks. Retrieved March 27, 2022, from arXiv:​1911.​07323
Metadata
Title
Human DNA/RNA motif mining using deep-learning methods: a scoping review
Authors
Rajashree Chaurasia
Udayan Ghose
Publication date
01-12-2023
Publisher
Springer Vienna
Published in
Network Modeling Analysis in Health Informatics and Bioinformatics / Issue 1/2023
Print ISSN: 2192-6662
Electronic ISSN: 2192-6670
DOI
https://doi.org/10.1007/s13721-023-00414-5

Other articles of this Issue 1/2023

Network Modeling Analysis in Health Informatics and Bioinformatics 1/2023 Go to the issue

Premium Partner