Skip to main content
Top

2020 | OriginalPaper | Chapter

DiS-TSS: An Annotation Agnostic Algorithm for TSS Identification

Authors : Dimitris Grigoriadis, Nikos Perdikopanis, Georgios K. Georgakilas, Artemis Hatzigeorgiou

Published in: Bioinformatics and Biomedical Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The spread, distribution and utilization of transcription start sites (TSS) experimental evidence within promoters are poorly understood. Cap Analysis of Gene Expression (CAGE) has emerged as a popular gene expression profiling protocol, able to quantitate TSS usage by recognizing the 5′ end of capped RNA molecules. However, there is an increasing volume of studies in the literature suggesting that CAGE can also detect 5′ capping events which are transcription byproducts. These findings highlight the need for computational methods that can effectively remove the excessive amount of noise from CAGE samples, leading to accurate TSS annotation and promoter usage quantification. In this study, we present an annotation agnostic computational framework, DIANA Signal-TSS (DiS-TSS), that for the first time utilizes digital signal processing inspired features customized on the peculiarities of CAGE data. Features from the spatial and frequency domains are combined with a robustly trained Support Vector Machines (SVM) model to accurately distinguish between peaks related to real transcription initiation events and biological or protocol-induced noise. When benchmarked on experimentally derived data on active transcription marks as well as annotated TSSs, DiS-TSS was found to outperform existing implementations, by providing on average ~11k positive predictions and an increase in performance by ~5% based on in the experimental and annotation-based evaluations.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Frith, M.C., Valen, E., Krogh, A., Hayashizaki, Y., Carninci, P., Sandelin, A.: A code for transcription initiation in mammalian genomes. Genome Res. 18, 1–12 (2008)CrossRef Frith, M.C., Valen, E., Krogh, A., Hayashizaki, Y., Carninci, P., Sandelin, A.: A code for transcription initiation in mammalian genomes. Genome Res. 18, 1–12 (2008)CrossRef
2.
go back to reference The FANTOM Consortium and the RIKEN PMI and CLST (DGT): A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014) The FANTOM Consortium and the RIKEN PMI and CLST (DGT): A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014)
3.
go back to reference Haberle, V., Forrest, A.R.R., Hayashizaki, Y., Carninci, P., Lenhard, B.: CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res. 43, e51 (2015)CrossRef Haberle, V., Forrest, A.R.R., Hayashizaki, Y., Carninci, P., Lenhard, B.: CAGEr: precise TSS data retrieval and high-resolution promoterome mining for integrative analyses. Nucleic Acids Res. 43, e51 (2015)CrossRef
4.
go back to reference Ohmiya, H., et al.: RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE). BMC Genom. 15, 269 (2014)CrossRef Ohmiya, H., et al.: RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE). BMC Genom. 15, 269 (2014)CrossRef
5.
go back to reference Li, Q., Brown, J.B., Huang, H., Bickel, P.J.: Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011)CrossRef Li, Q., Brown, J.B., Huang, H., Bickel, P.J.: Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011)CrossRef
6.
go back to reference Taylor Raborn, R., Brendel, V.P., Sridharan, K.: TSRchitect: promoter identification from large-scale TSS profiling data Taylor Raborn, R., Brendel, V.P., Sridharan, K.: TSRchitect: promoter identification from large-scale TSS profiling data
7.
go back to reference Mendizabal-Ruiz, G., Román-Godínez, I., Torres-Ramos, S., Salido-Ruiz, R.A., Alejandro Morales, J.: On DNA numerical representations for genomic similarity computation. PLoS ONE 12, e0173288 (2017)CrossRef Mendizabal-Ruiz, G., Román-Godínez, I., Torres-Ramos, S., Salido-Ruiz, R.A., Alejandro Morales, J.: On DNA numerical representations for genomic similarity computation. PLoS ONE 12, e0173288 (2017)CrossRef
8.
go back to reference Sharma, D., Issac, B., Raghava, G.P.S., Ramaswamy, R.: Spectral repeat finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20, 1405–1412 (2004)CrossRef Sharma, D., Issac, B., Raghava, G.P.S., Ramaswamy, R.: Spectral repeat finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 20, 1405–1412 (2004)CrossRef
10.
go back to reference Kotlar, D.: Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res. 13(8), 1930–1937 (2003)PubMedPubMedCentral Kotlar, D.: Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res. 13(8), 1930–1937 (2003)PubMedPubMedCentral
11.
go back to reference Lio, P., Vannucci, M.: Wavelet change-point prediction of transmembrane proteins. Bioinformatics 16, 376–382 (2000)CrossRef Lio, P., Vannucci, M.: Wavelet change-point prediction of transmembrane proteins. Bioinformatics 16, 376–382 (2000)CrossRef
12.
go back to reference The ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)CrossRef The ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)CrossRef
13.
go back to reference Zerbino, D.R., et al.: Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2017)CrossRef Zerbino, D.R., et al.: Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2017)CrossRef
14.
go back to reference Telgarsky, R.: Dominant frequency extraction. arXiv [cs.NA] (2013) Telgarsky, R.: Dominant frequency extraction. arXiv [cs.NA] (2013)
16.
go back to reference Bernstein, B.E., et al.: Methylation of histone H3 Lys 4 in coding regions of active genes. Proc. Natl. Acad. Sci. U.S.A. 99, 8695–8700 (2002)CrossRef Bernstein, B.E., et al.: Methylation of histone H3 Lys 4 in coding regions of active genes. Proc. Natl. Acad. Sci. U.S.A. 99, 8695–8700 (2002)CrossRef
17.
go back to reference Santos-Rosa, H., et al.: Active genes are tri-methylated at K4 of histone H3. Nature 419, 407–411 (2002)CrossRef Santos-Rosa, H., et al.: Active genes are tri-methylated at K4 of histone H3. Nature 419, 407–411 (2002)CrossRef
18.
go back to reference Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., Young, R.A.: A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130, 77–88 (2007)CrossRef Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., Young, R.A.: A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130, 77–88 (2007)CrossRef
Metadata
Title
DiS-TSS: An Annotation Agnostic Algorithm for TSS Identification
Authors
Dimitris Grigoriadis
Nikos Perdikopanis
Georgios K. Georgakilas
Artemis Hatzigeorgiou
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-45385-5_55

Premium Partner