Skip to main content

2011 | OriginalPaper | Buchkapitel

3. Statistical Learning and Modeling of TF-DNA Binding

verfasst von : Bo Jiang, Jun S. Liu

Erschienen in: Handbook of Statistical Bioinformatics

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Discovering binding sites and motifs of specific TFs is an important first step towards the understanding of gene regulation circuitry. Computational approaches have been developed to identify transcription factor binding sites from a set of co-regulated genes. Recently, the abundance of gene expression data, ChIP-based TF-binding data (ChIP-array/seq), and high-resolution epigenetic maps have brought up the possibility of capturing sequence features relevant to TF-DNA interactions so as to improve the predictive power of gene regulation modeling. In this chapter, we introduce some statistical models and computational strategies used to predict TF-DNA interactions from the DNA sequence information, and describe a general framework of predictive modeling approaches to the TF-DNA binding problem, which includes both traditional regression methods and statistical learning methods by selecting relevant sequence features and epigenetic markers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bailey, T. L., & Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the second international conference on intelligent systems for molecular biology (pp. 28–36). Menlo Park, California: AAAI Press. Bailey, T. L., & Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the second international conference on intelligent systems for molecular biology (pp. 28–36). Menlo Park, California: AAAI Press.
2.
Zurück zum Zitat Berger, M. F., Philippakis, A. A., Qureshi, A., He, F. S., Estep, P. W., & Bulyk, M. L. (2006). Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature Biotechnology, 24(11), 1429–1435CrossRef Berger, M. F., Philippakis, A. A., Qureshi, A., He, F. S., Estep, P. W., & Bulyk, M. L. (2006). Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nature Biotechnology, 24(11), 1429–1435CrossRef
3.
Zurück zum Zitat Bussemaker, H. J., Li, H., & Siggia, E. D. (2001). Regulatory element detection using correlation with expression. Nature Genetics, 27, 167–174.CrossRef Bussemaker, H. J., Li, H., & Siggia, E. D. (2001). Regulatory element detection using correlation with expression. Nature Genetics, 27, 167–174.CrossRef
4.
Zurück zum Zitat Chipman, H. A., George, E. I., & McCulloch, R. E. (2007). Bayesian ensemble learning. In B. Scholkopf, J. Platt, & T. Hoffman (Eds.), Neural information processing systems, 19. Cambridge, MA: MIT Press. Chipman, H. A., George, E. I., & McCulloch, R. E. (2007). Bayesian ensemble learning. In B. Scholkopf, J. Platt, & T. Hoffman (Eds.), Neural information processing systems, 19. Cambridge, MA: MIT Press.
5.
Zurück zum Zitat Conlon, E. M., Liu, X. S., Lieb, J. D., & Liu, J. S. (2001). Integrating regulatory motif discovery and genome-wide expression analysis. Proceedings of the National Academy of Science United States of America, 100, 3339–3344.CrossRef Conlon, E. M., Liu, X. S., Lieb, J. D., & Liu, J. S. (2001). Integrating regulatory motif discovery and genome-wide expression analysis. Proceedings of the National Academy of Science United States of America, 100, 3339–3344.CrossRef
6.
Zurück zum Zitat Djordjevic, M., Sengupta, A. M., & Shraiman, B. I. (2003). A biophysical approach to transcription factor binding site discovery. Genome Research, 13, 2381–2390.CrossRef Djordjevic, M., Sengupta, A. M., & Shraiman, B. I. (2003). A biophysical approach to transcription factor binding site discovery. Genome Research, 13, 2381–2390.CrossRef
7.
Zurück zum Zitat Foat, B. C., Houshmandi, S. S., Olivas, W. M., & Bussemaker, H. J. (2005). Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. Proceedings of the National Academy of Science United States of America, 102, 17675–17680.CrossRef Foat, B. C., Houshmandi, S. S., Olivas, W. M., & Bussemaker, H. J. (2005). Profiling condition-specific, genome-wide regulation of mRNA stability in yeast. Proceedings of the National Academy of Science United States of America, 102, 17675–17680.CrossRef
8.
Zurück zum Zitat Freund, Y., & Schapire, R. (1997). A decision-theoretical generalization of online learning and an application to boosting. Journal of Computer and System Science, 55, 119–139.MathSciNetMATHCrossRef Freund, Y., & Schapire, R. (1997). A decision-theoretical generalization of online learning and an application to boosting. Journal of Computer and System Science, 55, 119–139.MathSciNetMATHCrossRef
10.
Zurück zum Zitat Gupta, M., & Liu, J. S. (2005). De-novo cis-regulatory module elicitation for eukaryotic genomes. Proceedings of the National Academy of Science United States of America, 102, 7079–7084.CrossRef Gupta, M., & Liu, J. S. (2005). De-novo cis-regulatory module elicitation for eukaryotic genomes. Proceedings of the National Academy of Science United States of America, 102, 7079–7084.CrossRef
11.
Zurück zum Zitat Hertz, G. Z., Hartzell, G. W., & Stormo, G. D. (1990). Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Bioinformatics, 6, 81–92.CrossRef Hertz, G. Z., Hartzell, G. W., & Stormo, G. D. (1990). Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Bioinformatics, 6, 81–92.CrossRef
12.
Zurück zum Zitat Hong, P., Liu, X. S., Zhou, Q., Lu, X., Liu, J. S., & Wong, W. H. (2005). A boosting approach for motif modeling using ChIP-chip data. Bioinformatics, 21, 2636–2643.CrossRef Hong, P., Liu, X. S., Zhou, Q., Lu, X., Liu, J. S., & Wong, W. H. (2005). A boosting approach for motif modeling using ChIP-chip data. Bioinformatics, 21, 2636–2643.CrossRef
13.
Zurück zum Zitat Jensen, S. T., Liu, X. S., Zhou, Q., & Liu, J. S. (2004) Computational discovery of gene regulatory binding motifs: A bayesian perspective. Statistical Science, 19, 188–204.MathSciNetMATHCrossRef Jensen, S. T., Liu, X. S., Zhou, Q., & Liu, J. S. (2004) Computational discovery of gene regulatory binding motifs: A bayesian perspective. Statistical Science, 19, 188–204.MathSciNetMATHCrossRef
14.
Zurück zum Zitat Kinney, J. B., Tkacik, G., & Callan, C. G., Jr. (2007). Precise physical models of protein-DNA interaction from high-throughput data. Proceedings of the National Academy of Science United States of America, 104, 501–506.CrossRef Kinney, J. B., Tkacik, G., & Callan, C. G., Jr. (2007). Precise physical models of protein-DNA interaction from high-throughput data. Proceedings of the National Academy of Science United States of America, 104, 501–506.CrossRef
15.
Zurück zum Zitat Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., & Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, 262, 208–214.CrossRef Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., & Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, 262, 208–214.CrossRef
16.
Zurück zum Zitat Lee, W., Tillo, D., Bray, N., Morse, R. H., Davis, R. W., Hughes, T. R., et al. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nature Genetics, 39, 1235–1244.CrossRef Lee, W., Tillo, D., Bray, N., Morse, R. H., Davis, R. W., Hughes, T. R., et al. (2007). A high-resolution atlas of nucleosome occupancy in yeast. Nature Genetics, 39, 1235–1244.CrossRef
17.
Zurück zum Zitat Liang, F., & Wong, W. H. (2002). Evolutionary Monte Carlo: Applications to Cp model sampling and change point problem. Statistica Sinica, 10, 317–342. Liang, F., & Wong, W. H. (2002). Evolutionary Monte Carlo: Applications to Cp model sampling and change point problem. Statistica Sinica, 10, 317–342.
18.
Zurück zum Zitat Liu, J.S., & Lawrence, C.E. (1999). Bayesian inference on biopolymer models. Bioinformatics, 15, 38–52.CrossRef Liu, J.S., & Lawrence, C.E. (1999). Bayesian inference on biopolymer models. Bioinformatics, 15, 38–52.CrossRef
19.
Zurück zum Zitat Liu, J. S., Neuwald, A. F., & Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association, 90, 1156–1170.MATHCrossRef Liu, J. S., Neuwald, A. F., & Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. Journal of the American Statistical Association, 90, 1156–1170.MATHCrossRef
20.
Zurück zum Zitat Liu, X. S., Brutlag, D. L., & Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology, 20, 835–839. Liu, X. S., Brutlag, D. L., & Liu, J. S. (2002). An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology, 20, 835–839.
21.
Zurück zum Zitat McCue, L. A., Thompson, W., Carmack, C. S., Ryan, M. P., Liu, J. S., Derbyshire, V., & Lawrence, C. E. (2001). Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Research, 29, 774–782.CrossRef McCue, L. A., Thompson, W., Carmack, C. S., Ryan, M. P., Liu, J. S., Derbyshire, V., & Lawrence, C. E. (2001). Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. Nucleic Acids Research, 29, 774–782.CrossRef
22.
Zurück zum Zitat Narlikar, L., Gordân, R., & Hartemink, A. J. (2007). A nucleosome-guided map of transcription factor binding sites in yeast. PLoS Computational Biology, 3(11), e215CrossRef Narlikar, L., Gordân, R., & Hartemink, A. J. (2007). A nucleosome-guided map of transcription factor binding sites in yeast. PLoS Computational Biology, 3(11), e215CrossRef
23.
Zurück zum Zitat Sinha, S., & Tompa, M. (2002). Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Research, 30, 5549–5560.CrossRef Sinha, S., & Tompa, M. (2002). Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Research, 30, 5549–5560.CrossRef
24.
Zurück zum Zitat Thompson, W., Palumbo, M. J., Wasserman, W. W., Liu, J. S., & Lawrence, C. E. (2004). Decoding human regulatory circuits. Genome Research, 10, 1967–1974.CrossRef Thompson, W., Palumbo, M. J., Wasserman, W. W., Liu, J. S., & Lawrence, C. E. (2004). Decoding human regulatory circuits. Genome Research, 10, 1967–1974.CrossRef
25.
Zurück zum Zitat Vapnik, V. (1998). The nature of statistical learning theory (2nd ed.). New York: Springer. Vapnik, V. (1998). The nature of statistical learning theory (2nd ed.). New York: Springer.
26.
Zurück zum Zitat Won, K. J., Ren, B., & Wang, W. (2010). Genome-wide prediction of transcription factor binding sites using an integrated model. Genome Biology, 11, R7.CrossRef Won, K. J., Ren, B., & Wang, W. (2010). Genome-wide prediction of transcription factor binding sites using an integrated model. Genome Biology, 11, R7.CrossRef
27.
Zurück zum Zitat Yuan, G. C., Liu, Y. J., Dion, D. F., Slack, M. D., Wu, L. F., Altschuler, S. J., et al. (2005). Genome-scale identification of nucleosome positions in S. cerevisiae. Science, 309, 626–630. Yuan, G. C., Liu, Y. J., Dion, D. F., Slack, M. D., Wu, L. F., Altschuler, S. J., et al. (2005). Genome-scale identification of nucleosome positions in S. cerevisiae. Science, 309, 626–630.
28.
Zurück zum Zitat Yuan, G. C., Ma, P., Zhong, W., & Liu, J. S. (2006). Statistical assessment of the global regulatory role of histone acetylation in Saccharomyces cerevisiae. Genome Biology, 7, R70.CrossRef Yuan, G. C., Ma, P., Zhong, W., & Liu, J. S. (2006). Statistical assessment of the global regulatory role of histone acetylation in Saccharomyces cerevisiae. Genome Biology, 7, R70.CrossRef
29.
Zurück zum Zitat Zhong, W., Zeng, P., Ma, P., Liu, J. S., & Zhu, Y. (2005). RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics, 21, 4169–4175.CrossRef Zhong, W., Zeng, P., Ma, P., Liu, J. S., & Zhu, Y. (2005). RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics, 21, 4169–4175.CrossRef
30.
Zurück zum Zitat Zhou, Q., & Liu, J. S. (2004). Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics, 20, 909–916.CrossRef Zhou, Q., & Liu, J. S. (2004). Modeling within-motif dependence for transcription factor binding site predictions. Bioinformatics, 20, 909–916.CrossRef
31.
Zurück zum Zitat Zhou, Q., & Liu, J. S. (2008). Extracting sequence features to predict protein-DNA interactions: A comparative study. Nucleic Acids Research, 36, 4137–4148.CrossRef Zhou, Q., & Liu, J. S. (2008). Extracting sequence features to predict protein-DNA interactions: A comparative study. Nucleic Acids Research, 36, 4137–4148.CrossRef
32.
Zurück zum Zitat Zhou, Q., & Wong, W. H. (2004). CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proceedings of the National Academy of Science United States of America, 101, 12114–12119.CrossRef Zhou, Q., & Wong, W. H. (2004). CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proceedings of the National Academy of Science United States of America, 101, 12114–12119.CrossRef
Metadaten
Titel
Statistical Learning and Modeling of TF-DNA Binding
verfasst von
Bo Jiang
Jun S. Liu
Copyright-Jahr
2011
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-16345-6_3