Abstract
Promoter prediction programs (PPPs) are important for in silico gene discovery without support from expressed sequence tag (EST)/cDNA/mRNA sequences, in the analysis of gene regulation and in genome annotation. Contrary to previous expectations, a comprehensive analysis of PPPs reveals that no program simultaneously achieves sensitivity and a positive predictive value >65%. PPP performances deduced from a limited number of chromosomes or smaller data sets do not hold when evaluated at the level of the whole genome, with serious inaccuracy of predictions for non-CpG-island-related promoters. Some PPPs even perform worse than, or close to, pure random guessing.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Weinzierl, R.O.J. Mechanisms of Gene Expression: Structure, Function, and Evolution of the Basal Transcriptional Machinery (Imperial College Press, London, 1999).
Pedersen, A.G., Baldi, P., Chauvin, Y. & Brunak, S. The biology of eukaryotic promoter prediction—a review. Comput. Chem. 23, 191–207 (1999).
Bajic, V.B. et al. Computer model for recognition of functional transcription start sites in RNA polymerase II promoters of vertebrates. J. Mol. Graph. Model. 21, 323–332 (2003).
Bajic, V.B. & Seah, S.H. Dragon Gene Start Finder identifies approximate locations of the 5′ ends of genes. Nucleic Acids Res. 31, 3560–3563 (2003).
Bajic, V.B. & Seah, S.H. Dragon Gene Start Finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res. 13, 1923–1929 (2003).
Davuluri, R.V., Grosse, I. & Zhang, M.Q. Computational identification of promoters and first exons in the human genome. Nat. Genet. 29, 412–417 (2001).
Down, T.A. & Hubbard, T.J. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12, 458–461 (2002).
Reese, M.G. Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput. Chem. 26, 51–56 (2001).
Knudsen, S. Promoter2.0: for the recognition of PolII promoter sequences. Bioinformatics 15, 356–361 (1999).
Ohler, U., Liao, G.C., Niemann, H. & Rubin, G.M. Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3(12), RESEARCH0087. Epub 2002 Dec 20 (2002).
Ohler, U., Stemmer, G., Harbeck, S. & Niemann, H. Stochastic segment models of eukaryotic promoter regions. Proc. Pac. Symp. Biocomput. 5, 380–391 (2000).
Ponger, L. & Mouchiroud, D. CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 18, 631–633 (2002).
Hannenhalli, S. & Levy, S. Promoter prediction in the human genome. Bioinformatics 17, S90–S96 (2001).
Ioshikhes, I.P. & Zhang, M.Q. Large-scale human promoter mapping using CpG islands. Nat. Genet. 26, 61–63 (2000).
Scherf, M., Klingenhoff, A. & Werner, T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J. Mol. Biol. 297, 599–606 (2000).
Solovyev, V.V. & Shahmuradov, I.A. PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res. 31, 3540–3545 (2003).
Fickett, J.W. & Hatzigeorgiou, A.G. Eukaryotic promoter recognition. Genome Res. 7, 861–878 (1997).
Prestridge, D.S. Computer software for eukaryotic promoter analysis. Methods Mol. Biol. 130, 265–295 (2000).
Bajic, V.B. Comparing the success of different prediction software in sequence analysis: a review. Brief. Bioinform. 1, 214–228 (2000).
Liu, R. & States, D.J. Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling. Genome Res. 12, 462–469 (2002).
Suzuki, Y. et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2, 388–393 (2001).
Scherf, M. et al. First pass annotation of promoters on human chromosome 22. Genome Res. 11, 333–340 (2001).
Suzuki, Y. et al. DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Res. 30, 328–331 (2002).
Ripley, B.D. Pattern Recognition and Neural Networks (Cambridge University Press, Cambridge, UK, 1996).
Murakami, K. & Takagi, T. Gene recognition by combination of several gene-finding programs. Bioinformatics 14, 665–675 (1998).
Rogic, S., Ouellette, B.F. & Mackworth, A.K. Improving gene recognition accuracy by combining predictions from two gene-finding programs. Bioinformatics 18, 1034–1045 (2002).
Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31(1), 374–8 (2003).
Acknowledgements
We are grateful to Riu Yamashita and Kenta Nakai for assisting in constructing and maintaining DBTSS.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The employer of Vladimir B. Bajic and Sin Lam Tan has licensed Dragon Promoter Finder and Dragon Gene Start Finder to Biobase, Germany. Vladimir B. Bajic receives royalty for these two programs.
Supplementary information
Supplementary Figure 1
Distribution of clustered predictions for seven analyzed PPPs.
Supplementary Table 1
Results of promoter prediction on human chromosomes 21 and 22.
Supplementary Table 2
Results of promoter prediction on human chromosomes 4, 21 and 22.
Supplementary Table 3
Results of promoter prediction on HG for different distance criteria.
Rights and permissions
About this article
Cite this article
Bajic, V., Tan, S., Suzuki, Y. et al. Promoter prediction analysis on the whole human genome. Nat Biotechnol 22, 1467–1473 (2004). https://doi.org/10.1038/nbt1032
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt1032