Skip to main content

2013 | OriginalPaper | Buchkapitel

3. Topics in Computational Genomics

verfasst von : Michael Q. Zhang, Andrew D. Smith

Erschienen in: Basics of Bioinformatics

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Genomics began with large-scale sequencing of the human and many model organism genomes around 1990; rapid accumulation of vast genomic data brings a great challenge on how to decipher such massive molecular information. As bioinformatics in general, genome informatics is also data driven; many computational tools developed can soon be obsolete when new technologies and data types become available. Keeping this in mind if a student wants to work in this fascinating new field, one must be able to adapt quickly and to “shoot the moving targets” with the “just-in-time ammunition.”

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abouelhoda MI, Kurtz S, Ohlebusch E (2004) Replacing suffix trees with enhanced suffix arrays. J Discret Algorithms 2(1):53–86MATHMathSciNetCrossRef Abouelhoda MI, Kurtz S, Ohlebusch E (2004) Replacing suffix trees with enhanced suffix arrays. J Discret Algorithms 2(1):53–86MATHMathSciNetCrossRef
2.
Zurück zum Zitat Apostolico A, Bock ME, Lonardi S (2002) Monotony of surprise and large-scale quest for unusual words. In: Proceedings of the sixth annual international conference on computational biology. ACM Press, New York, pp 22–31 Apostolico A, Bock ME, Lonardi S (2002) Monotony of surprise and large-scale quest for unusual words. In: Proceedings of the sixth annual international conference on computational biology. ACM Press, New York, pp 22–31
3.
Zurück zum Zitat Bailey TL, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21(1–2):51–80 Bailey TL, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21(1–2):51–80
4.
Zurück zum Zitat Bairoch A (1992) PROSITE: a dictionary of site and patterns in proteins. Nucl Acids Res 20:2013–2018CrossRef Bairoch A (1992) PROSITE: a dictionary of site and patterns in proteins. Nucl Acids Res 20:2013–2018CrossRef
5.
Zurück zum Zitat Bajic V, Seah S (2003) Dragon gene start finder identifies approximate locations of the 5 ′ ends of genes. Nucleic Acids Res 31:3560–3563CrossRef Bajic V, Seah S (2003) Dragon gene start finder identifies approximate locations of the 5 ends of genes. Nucleic Acids Res 31:3560–3563CrossRef
6.
Zurück zum Zitat Bajic V, Tan S, Suzuki Y, Sugano S (2004) Promoter prediction analysis on the whole human genome. Nat Biotechnol 22:1467–1473CrossRef Bajic V, Tan S, Suzuki Y, Sugano S (2004) Promoter prediction analysis on the whole human genome. Nat Biotechnol 22:1467–1473CrossRef
7.
Zurück zum Zitat Barash Y, Bejerano G, Friedman N (2001) A simple hyper-geometric approach for discovering putative transcription factor binding sites. Lect Notes Comput Sci 2149:278–293CrossRef Barash Y, Bejerano G, Friedman N (2001) A simple hyper-geometric approach for discovering putative transcription factor binding sites. Lect Notes Comput Sci 2149:278–293CrossRef
8.
Zurück zum Zitat Barash Y, Elidan G, Friedman N, Kaplan T (2003) Modeling dependencies in protein-DNA binding sites. In: Miller W, Vingron M, Istrail S, Pevzner P, Waterman M (eds) Proceedings of the seventh annual international conference on computational molecular biology, ACM Press, New York, pp 28–37. doi http://doi.acm.org/10.1145/640075.640079 Barash Y, Elidan G, Friedman N, Kaplan T (2003) Modeling dependencies in protein-DNA binding sites. In: Miller W, Vingron M, Istrail S, Pevzner P, Waterman M (eds) Proceedings of the seventh annual international conference on computational molecular biology, ACM Press, New York, pp 28–37. doi http://​doi.​acm.​org/​10.​1145/​640075.​640079
9.
Zurück zum Zitat Beckstette M, Stothmann D, Homann R, Giegerich R, Kurtz S (2004) Possumsearch: fast and sensitive matching of position specific scoring matrices using enhanced suffix arrays. In: Proceedings of the German conference in bioinformatics. pp 53–64 Beckstette M, Stothmann D, Homann R, Giegerich R, Kurtz S (2004) Possumsearch: fast and sensitive matching of position specific scoring matrices using enhanced suffix arrays. In: Proceedings of the German conference in bioinformatics. pp 53–64
10.
Zurück zum Zitat Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D (2004) Ultraconserved elements in the human genome. Science 304(5675):1321–1325CrossRef Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D (2004) Ultraconserved elements in the human genome. Science 304(5675):1321–1325CrossRef
11.
Zurück zum Zitat Berezikov E, Guryev V, Plasterk RH, Cuppen E (2004) CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res 14(1):170–178. doi:10.1101/gr.1642804CrossRef Berezikov E, Guryev V, Plasterk RH, Cuppen E (2004) CONREAL: conserved regulatory elements anchored alignment algorithm for identification of transcription factor binding sites by phylogenetic footprinting. Genome Res 14(1):170–178. doi:10.1101/gr.1642804CrossRef
13.
Zurück zum Zitat Bernal A, Crammer K, Hatzigeorgiou A, Pereira F (2007) Global discriminative learning for high-accuracy computational gene prediction. PLoS Comput Biol 3:e54MathSciNetCrossRef Bernal A, Crammer K, Hatzigeorgiou A, Pereira F (2007) Global discriminative learning for high-accuracy computational gene prediction. PLoS Comput Biol 3:e54MathSciNetCrossRef
14.
Zurück zum Zitat Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14(4):708–715CrossRef Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14(4):708–715CrossRef
15.
Zurück zum Zitat Blanchette M, Sinha S (2001) Separating real motifs from their artifacts. In: Brunak S, Krogh A (eds) Proceedings of the annual international symposium on intelligent systems for molecular biology, pp 30–38 Blanchette M, Sinha S (2001) Separating real motifs from their artifacts. In: Brunak S, Krogh A (eds) Proceedings of the annual international symposium on intelligent systems for molecular biology, pp 30–38
16.
Zurück zum Zitat Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748CrossRef Blanchette M, Tompa M (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res 12(5):739–748CrossRef
17.
Zurück zum Zitat Brazma A, Jonassen I, Ukkonen E, Vilo J (1996) Discovering patterns and subfamilies in biosequences. In: Proceedings of the annual international symposium on intelligent systems for molecular biology, pp 34–43 Brazma A, Jonassen I, Ukkonen E, Vilo J (1996) Discovering patterns and subfamilies in biosequences. In: Proceedings of the annual international symposium on intelligent systems for molecular biology, pp 34–43
18.
Zurück zum Zitat Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13(4):721–731CrossRef Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res 13(4):721–731CrossRef
19.
Zurück zum Zitat Buhler J, Tompa M (2002) Finding motifs using random projections. J Comput Biol 9(2):225–242CrossRef Buhler J, Tompa M (2002) Finding motifs using random projections. J Comput Biol 9(2):225–242CrossRef
20.
Zurück zum Zitat Burge C, Karlin S (1997) Prediction of complete gene structure in human genomic DNA. J Mol Biol 268:78–94CrossRef Burge C, Karlin S (1997) Prediction of complete gene structure in human genomic DNA. J Mol Biol 268:78–94CrossRef
21.
Zurück zum Zitat Bussemaker HJ, Li H, Siggia ED (2001) Regulatory element detection using correlation with expression. Nat Genet 27(2):167–171CrossRef Bussemaker HJ, Li H, Siggia ED (2001) Regulatory element detection using correlation with expression. Nat Genet 27(2):167–171CrossRef
22.
Zurück zum Zitat Califano A (2000) SPLASH: structural pattern localization analysis by sequential histograms. Bioinformatics 16(4):341–357CrossRef Califano A (2000) SPLASH: structural pattern localization analysis by sequential histograms. Bioinformatics 16(4):341–357CrossRef
23.
Zurück zum Zitat Carninci P, et al (2006) Genomewide analysis of mammalian promoter architecture and evolution. Nat Genet 38:626–635CrossRef Carninci P, et al (2006) Genomewide analysis of mammalian promoter architecture and evolution. Nat Genet 38:626–635CrossRef
24.
Zurück zum Zitat Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR (2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725):1149–1154CrossRef Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR (2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308(5725):1149–1154CrossRef
25.
Zurück zum Zitat Conlon EM, Liu XS, Lieb JD, Liu JS (2003) Integrating regulatory motif discovery and genome-wide expression analysis. Proc Natl Acad Sci USA 100(6):3339–3344CrossRef Conlon EM, Liu XS, Lieb JD, Liu JS (2003) Integrating regulatory motif discovery and genome-wide expression analysis. Proc Natl Acad Sci USA 100(6):3339–3344CrossRef
26.
Zurück zum Zitat Das D, Banerjee N, Zhang MQ (2004) Interacting models of cooperative gene regulation. Proc Natl Acad Sci USA 101(46):16234–16239CrossRef Das D, Banerjee N, Zhang MQ (2004) Interacting models of cooperative gene regulation. Proc Natl Acad Sci USA 101(46):16234–16239CrossRef
27.
Zurück zum Zitat Das D, Nahle Z, Zhang M (2006) Adaptively inferring human transcriptional subnetworks. Mol Syst Biol 2:2006.0029CrossRef Das D, Nahle Z, Zhang M (2006) Adaptively inferring human transcriptional subnetworks. Mol Syst Biol 2:2006.0029CrossRef
28.
Zurück zum Zitat Davuluri R, Grosse I, Zhang M (2002) Computational identification of promoters and first exons in the human genome. Nat Genet 229:412–417; Erratum: Nat Genet 32:459 Davuluri R, Grosse I, Zhang M (2002) Computational identification of promoters and first exons in the human genome. Nat Genet 229:412–417; Erratum: Nat Genet 32:459
29.
Zurück zum Zitat Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38MATHMathSciNet Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38MATHMathSciNet
30.
Zurück zum Zitat Dermitzakis ET, Clark AG (2002) Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol 19(7):1114–1121CrossRef Dermitzakis ET, Clark AG (2002) Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. Mol Biol Evol 19(7):1114–1121CrossRef
31.
Zurück zum Zitat Dorohonceanu B, Nevill-Manning C (2000) Accelerating protein classification using suffix trees. In: Proceedings of the 8th international conference on intelligent systems for molecular biology (ISMB). pp 128–133 Dorohonceanu B, Nevill-Manning C (2000) Accelerating protein classification using suffix trees. In: Proceedings of the 8th international conference on intelligent systems for molecular biology (ISMB). pp 128–133
32.
Zurück zum Zitat Down T, Hubbard T (2002) Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res 12:458–461CrossRef Down T, Hubbard T (2002) Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res 12:458–461CrossRef
33.
Zurück zum Zitat Durbin R, Eddy SR, Krogh A, Mitchison G (1999) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press Durbin R, Eddy SR, Krogh A, Mitchison G (1999) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press
34.
Zurück zum Zitat Duta R, Hart P, Stock D (2000) Pattern classification, 2 edn. Wiley, New York Duta R, Hart P, Stock D (2000) Pattern classification, 2 edn. Wiley, New York
35.
Zurück zum Zitat Ettwiller L, Paten B, Souren M, Loosli F, Wittbrodt J, Birney E (2005) The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol 6(12):R104CrossRef Ettwiller L, Paten B, Souren M, Loosli F, Wittbrodt J, Birney E (2005) The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol 6(12):R104CrossRef
36.
Zurück zum Zitat Evans PA, Smith AD (2003) Toward optimal motif enumeration. In: Dehne FKHA, Ortiz AL, Sack JR (eds) Workshop on algorithms and data structures. Lecture notes in computer science, vol 2748, pp 47–58 Evans PA, Smith AD (2003) Toward optimal motif enumeration. In: Dehne FKHA, Ortiz AL, Sack JR (eds) Workshop on algorithms and data structures. Lecture notes in computer science, vol 2748, pp 47–58
37.
Zurück zum Zitat Felsenstein J, Churchill G (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13(1):93–104CrossRef Felsenstein J, Churchill G (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13(1):93–104CrossRef
38.
Zurück zum Zitat Fiegler H, et al (2006) Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res 16:1566–1574CrossRef Fiegler H, et al (2006) Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res 16:1566–1574CrossRef
39.
40.
Zurück zum Zitat Guigó R, et al (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7(Suppl 1):S2.1–S2.3CrossRef Guigó R, et al (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7(Suppl 1):S2.1–S2.3CrossRef
41.
Zurück zum Zitat Gupta M, Liu J (2003) Discovery of conserved sequence patterns using a stochastic dictionary model. J Am Stat Assoc 98(461):55–66MATHMathSciNetCrossRef Gupta M, Liu J (2003) Discovery of conserved sequence patterns using a stochastic dictionary model. J Am Stat Assoc 98(461):55–66MATHMathSciNetCrossRef
42.
Zurück zum Zitat Halpern A, Bruno W (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol 15(7):910–917CrossRef Halpern A, Bruno W (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol 15(7):910–917CrossRef
43.
Zurück zum Zitat IUPAC-IUB Commission on Biochemical Nomenclature (1970) Abbreviations and symbols for nucleic acids, polynucleotides and their constituents: recommendations 1970. J Biol Chem 245(20):5171–5176. URL http://www.jbc.org IUPAC-IUB Commission on Biochemical Nomenclature (1970) Abbreviations and symbols for nucleic acids, polynucleotides and their constituents: recommendations 1970. J Biol Chem 245(20):5171–5176. URL http://​www.​jbc.​org
44.
Zurück zum Zitat Javier Costas FC, Vieira J (2003) Turnover of binding sites for transcription factors involved in early drosophila development. Gene 310:215–220CrossRef Javier Costas FC, Vieira J (2003) Turnover of binding sites for transcription factors involved in early drosophila development. Gene 310:215–220CrossRef
45.
Zurück zum Zitat Kel A, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis O, Wingender E (2003) MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucl Acids Res 31(13):3576–3579CrossRef Kel A, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis O, Wingender E (2003) MATCHTM: a tool for searching transcription factor binding sites in DNA sequences. Nucl Acids Res 31(13):3576–3579CrossRef
46.
Zurück zum Zitat Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B (2005) A high-resolution map of active promoters in the human genome. Nature 436:876–880CrossRef Kim TH, Barrera LO, Zheng M, Qu C, Singer MA, Richmond TA, Wu Y, Green RD, Ren B (2005) A high-resolution map of active promoters in the human genome. Nature 436:876–880CrossRef
47.
Zurück zum Zitat Komura D, et al (2006) Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res 16:1575–1584CrossRef Komura D, et al (2006) Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res 16:1575–1584CrossRef
48.
Zurück zum Zitat Korbel JO, et al (2007) Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. Proc Natl Acad Sci USA 104:10110–10115CrossRef Korbel JO, et al (2007) Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome. Proc Natl Acad Sci USA 104:10110–10115CrossRef
49.
Zurück zum Zitat Krogh A (1997) Two methods for improving performance of an HMM and their application for gene finding. Proc Int Conf Intell Syst Mol Biol 5:179–186 Krogh A (1997) Two methods for improving performance of an HMM and their application for gene finding. Proc Int Conf Intell Syst Mol Biol 5:179–186
50.
Zurück zum Zitat Kulp D, Haussler D, Reese M, Eeckman F (1996) A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol 4:134–142 Kulp D, Haussler D, Reese M, Eeckman F (1996) A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol 4:134–142
51.
Zurück zum Zitat Lawrence C, Altschul S, Boguski M, Liu J, Neuwald A, Wootton J (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214CrossRef Lawrence C, Altschul S, Boguski M, Liu J, Neuwald A, Wootton J (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214CrossRef
52.
Zurück zum Zitat Lawrence C, Reilly AA (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct Funct Genet 7:41–51CrossRef Lawrence C, Reilly AA (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct Funct Genet 7:41–51CrossRef
54.
Zurück zum Zitat Liu XS, Brutlag DL, Liu JS (2002) An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nat Biotechnol 20(8):835–839CrossRef Liu XS, Brutlag DL, Liu JS (2002) An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments. Nat Biotechnol 20(8):835–839CrossRef
55.
Zurück zum Zitat Liu JS, Lawrence CE, Neuwald A (1995) Bayesian models for multiple local sequence alignment and its Gibbs sampling strategies. J Am Stat Assoc 90:1156–1170MATHCrossRef Liu JS, Lawrence CE, Neuwald A (1995) Bayesian models for multiple local sequence alignment and its Gibbs sampling strategies. J Am Stat Assoc 90:1156–1170MATHCrossRef
56.
Zurück zum Zitat Majoros W, Pertea M, Salzberg S (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic genefinders. Bioinformatics 20:2878–2879CrossRef Majoros W, Pertea M, Salzberg S (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic genefinders. Bioinformatics 20:2878–2879CrossRef
57.
Zurück zum Zitat Marinescu VD, Kohane IS, Riva A (2005) The MAPPER database: a multi-genome catalog of putative transcription factor binding sites. Nucl Acids Res 33(Suppl 1):D91–D97 Marinescu VD, Kohane IS, Riva A (2005) The MAPPER database: a multi-genome catalog of putative transcription factor binding sites. Nucl Acids Res 33(Suppl 1):D91–D97
58.
Zurück zum Zitat Marsan L, Sagot MF (2000) Extracting structured motifs using a suffix tree – algorithms and application to promoter consensus identification. In: Minoru S, Shamir R (eds) Proceedings of the annual international conference on computational molecular biology. ACM Press, New York, pp 210–219 Marsan L, Sagot MF (2000) Extracting structured motifs using a suffix tree – algorithms and application to promoter consensus identification. In: Minoru S, Shamir R (eds) Proceedings of the annual international conference on computational molecular biology. ACM Press, New York, pp 210–219
59.
Zurück zum Zitat Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E (2003) TRANSFAC(R): transcriptional regulation, from patterns to profiles. Nucl Acids Res 31(1):374–378CrossRef Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, Kloos DU, Land S, Lewicki-Potapov B, Michael H, Munch R, Reuter I, Rotert S, Saxel H, Scheer M, Thiele S, Wingender E (2003) TRANSFAC(R): transcriptional regulation, from patterns to profiles. Nucl Acids Res 31(1):374–378CrossRef
60.
Zurück zum Zitat Moses AM, Chiang DY, Pollard DA, Iyer VN, Eisen MB (2004) MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol 5(12):R98CrossRef Moses AM, Chiang DY, Pollard DA, Iyer VN, Eisen MB (2004) MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol 5(12):R98CrossRef
61.
Zurück zum Zitat Moses AM, Pollard DA, Nix DA, Iyer VN, Li XY, Biggin MD, Eisen MB (2006) Large-scale turnover of functional transcription factor binding sites in drosophila. PLoS Comput Biol 2(10):e130CrossRef Moses AM, Pollard DA, Nix DA, Iyer VN, Li XY, Biggin MD, Eisen MB (2006) Large-scale turnover of functional transcription factor binding sites in drosophila. PLoS Comput Biol 2(10):e130CrossRef
64.
Zurück zum Zitat Odom DT, Dowell RD, Jacobsen ES, Gordon W, Danford TW, MacIsaac KD, Rolfe PA, Conboy CM, Gifford DK, Fraenkel E (2007) Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat Genet 39(6):730–732; Published online: 21 May 2007 Odom DT, Dowell RD, Jacobsen ES, Gordon W, Danford TW, MacIsaac KD, Rolfe PA, Conboy CM, Gifford DK, Fraenkel E (2007) Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat Genet 39(6):730–732; Published online: 21 May 2007
65.
Zurück zum Zitat Ohler U, Liao G, Niemann H, Rubin G (2002) Computational analysis of core promoters in the drosophila genome. Genome Biol 3(12):RESEARCH0087 Ohler U, Liao G, Niemann H, Rubin G (2002) Computational analysis of core promoters in the drosophila genome. Genome Biol 3(12):RESEARCH0087
66.
Zurück zum Zitat Pearson H (2006) What is a gene?. Nat Genet 441:398–340 Pearson H (2006) What is a gene?. Nat Genet 441:398–340
67.
Zurück zum Zitat Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD, Plajzer-Fick I, Akiyama J, Val SD, Afzal V, Black BL, Couronne O, Eisen MB, Visel A, Rubin EM (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118):499–502CrossRef Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD, Plajzer-Fick I, Akiyama J, Val SD, Afzal V, Black BL, Couronne O, Eisen MB, Visel A, Rubin EM (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118):499–502CrossRef
68.
Zurück zum Zitat Pevzner P, Sze S (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Bourne P, et al (eds) Proceedings of the annual international symposium on intelligent systems for molecular biology. Menlo Park, AAAI Press, pp 269–278 Pevzner P, Sze S (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Bourne P, et al (eds) Proceedings of the annual international symposium on intelligent systems for molecular biology. Menlo Park, AAAI Press, pp 269–278
69.
Zurück zum Zitat Portugal J (1989) Footprinting analysis of sequence-specific DNA-drug interactions. Chem Biol Interact 71(4):311–324CrossRef Portugal J (1989) Footprinting analysis of sequence-specific DNA-drug interactions. Chem Biol Interact 71(4):311–324CrossRef
70.
Zurück zum Zitat Price TS, Regan R, Mott R, Hedman A, Honey B, Daniels RJ, Smith L, Greenfield A, Tiganescu A, Buckle V, Ventress N, Ayyub H, Salhan A, Pedraza-Diaz S, Broxholme J, Ragoussis J, Higgs DR, Flint J, Knight SJL (2005) SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data. Nucl Acids Res 33(11):3455–3464CrossRef Price TS, Regan R, Mott R, Hedman A, Honey B, Daniels RJ, Smith L, Greenfield A, Tiganescu A, Buckle V, Ventress N, Ayyub H, Salhan A, Pedraza-Diaz S, Broxholme J, Ragoussis J, Higgs DR, Flint J, Knight SJL (2005) SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data. Nucl Acids Res 33(11):3455–3464CrossRef
71.
Zurück zum Zitat Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77:257–286CrossRef Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77:257–286CrossRef
72.
Zurück zum Zitat Rahmann S, Muller T, Vingron M (2003) On the power of profiles for transcription factor binding site detection. Stat Appl Genet Mol Biol 2(1):7MathSciNet Rahmann S, Muller T, Vingron M (2003) On the power of profiles for transcription factor binding site detection. Stat Appl Genet Mol Biol 2(1):7MathSciNet
73.
Zurück zum Zitat Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME (2006) Global variation in copy number in the human genome. Nature 444:444–454CrossRef Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzalez JR, Gratacos M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME (2006) Global variation in copy number in the human genome. Nature 444:444–454CrossRef
74.
Zurück zum Zitat Roth F, Hughes J, Estep P, Church G (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939–945CrossRef Roth F, Hughes J, Estep P, Church G (1998) Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol 16(10):939–945CrossRef
75.
Zurück zum Zitat Salamov A, Solovyev V (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522CrossRef Salamov A, Solovyev V (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10:516–522CrossRef
76.
Zurück zum Zitat Sandelin A, et al (2007) Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet 8:424–436CrossRef Sandelin A, et al (2007) Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet 8:424–436CrossRef
77.
Zurück zum Zitat Schones D, Smith A, Zhang M (2007) Statistical significance of cis-regulatory modules. BMC Bioinform 8:19CrossRef Schones D, Smith A, Zhang M (2007) Statistical significance of cis-regulatory modules. BMC Bioinform 8:19CrossRef
78.
Zurück zum Zitat Sebat J, et al (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528CrossRef Sebat J, et al (2004) Large-scale copy number polymorphism in the human genome. Science 305:525–528CrossRef
79.
Zurück zum Zitat Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8):1034–1050CrossRef Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8):1034–1050CrossRef
80.
Zurück zum Zitat Solovyev VV, et al (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucl Acids Res 22:5156–5163CrossRef Solovyev VV, et al (1994) Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Nucl Acids Res 22:5156–5163CrossRef
81.
Zurück zum Zitat Sonnenburg S, Zien A, Ratsch G (2006) ARTS: accurate recognition of transcription starts in human. Bioinformatics 22:e472–e480CrossRef Sonnenburg S, Zien A, Ratsch G (2006) ARTS: accurate recognition of transcription starts in human. Bioinformatics 22:e472–e480CrossRef
82.
Zurück zum Zitat Staden R (1989) Methods for calculating the probabilities of finding patterns in sequences. Comput Appl Biosci 5(2):89–96 Staden R (1989) Methods for calculating the probabilities of finding patterns in sequences. Comput Appl Biosci 5(2):89–96
83.
Zurück zum Zitat Stanke M, Waack S (2003) Gene prediction with a hidden markov model and a new intron submodel. Bioinformatics 19(Suppl 2):II215–II225CrossRef Stanke M, Waack S (2003) Gene prediction with a hidden markov model and a new intron submodel. Bioinformatics 19(Suppl 2):II215–II225CrossRef
84.
Zurück zum Zitat Sumazin P, Chen G, Hata N, Smith AD, Zhang T, Zhang MQ (2005) DWE: discriminating word enumerator. Bioinformatics 21(1):31–38CrossRef Sumazin P, Chen G, Hata N, Smith AD, Zhang T, Zhang MQ (2005) DWE: discriminating word enumerator. Bioinformatics 21(1):31–38CrossRef
85.
Zurück zum Zitat Thomas M, Chiang C (2006) The general transcription machinery and general cofactors. Crit Rev Biochem Mol Biol 41:105–178CrossRef Thomas M, Chiang C (2006) The general transcription machinery and general cofactors. Crit Rev Biochem Mol Biol 41:105–178CrossRef
86.
Zurück zum Zitat Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res 22(22):4673–4680CrossRef Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res 22(22):4673–4680CrossRef
87.
Zurück zum Zitat Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144CrossRef Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137–144CrossRef
88.
Zurück zum Zitat Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17(11):1665–1674CrossRef Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan M (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17(11):1665–1674CrossRef
89.
Zurück zum Zitat Waterman MS (1995) Introduction to computational biology: maps, sequences and genomes. Chapman and Hall, LondonMATHCrossRef Waterman MS (1995) Introduction to computational biology: maps, sequences and genomes. Chapman and Hall, LondonMATHCrossRef
90.
Zurück zum Zitat Waterman MS, Arratia R, Galas DJ (1984) Pattern recognition in several sequences: consensus and alignment. Bull Math Biol 46:515–527MATHMathSciNetCrossRef Waterman MS, Arratia R, Galas DJ (1984) Pattern recognition in several sequences: consensus and alignment. Bull Math Biol 46:515–527MATHMathSciNetCrossRef
91.
Zurück zum Zitat Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJK, Cooke JE, Elgar G (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 3(1):e7CrossRef Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJK, Cooke JE, Elgar G (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 3(1):e7CrossRef
92.
Zurück zum Zitat Zhang M (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc Natl Acad Sci USA 94:565–568CrossRef Zhang M (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc Natl Acad Sci USA 94:565–568CrossRef
93.
Zurück zum Zitat Zhang M (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698–709CrossRef Zhang M (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698–709CrossRef
94.
Zurück zum Zitat Zhao X, Xuan Z, Zhang MQ (2006) Boosting with stumps for predicting transcription start sites. Genome Biol 8:R17CrossRef Zhao X, Xuan Z, Zhang MQ (2006) Boosting with stumps for predicting transcription start sites. Genome Biol 8:R17CrossRef
95.
Zurück zum Zitat Zhou Q, Liu JS (2004) Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics 20(6):909–916CrossRef Zhou Q, Liu JS (2004) Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics 20(6):909–916CrossRef
Metadaten
Titel
Topics in Computational Genomics
verfasst von
Michael Q. Zhang
Andrew D. Smith
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-38951-1_3