Skip to main content
Erschienen in:
Buchtitelbild

2015 | OriginalPaper | Buchkapitel

1. Application of Machine-Learning Methods to Understand Gene Expression Regulation

verfasst von : Chao Cheng, William P. Worzel

Erschienen in: Genetic Programming Theory and Practice XII

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the development and application of high-throughput technologies, an enormous amount of biological data has been produced in the past few years. These large-scale datasets make it possible and necessary to implement machine learning techniques for mining biological insights. In this chapter, we describe several examples to show how machine learning approaches are used to elucidate the mechanism of transcriptional regulation mediated by transcription factors and histone modifications. We demonstrate that machine learning provides powerful tools to quantitatively relate gene expression with transcription factor binding and histone modifications, to identify novel regulatory DNA elements in the genomes, and to predict gene functions. We also discuss the advantages and limitations of genetic programming in analyzing and processing biological data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Andre D, Koza J (1996) A parallel implementation of genetic programming that achieves super-linear performance. Proceedings of the international conference on parallel and distributed processing techniques and applications, CSREA Press, Sunnyvale:A.H.R. Andre D, Koza J (1996) A parallel implementation of genetic programming that achieves super-linear performance. Proceedings of the international conference on parallel and distributed processing techniques and applications, CSREA Press, Sunnyvale:A.H.R.
Zurück zum Zitat Berger S (2007) The complex language of chromatin regulation during transcription. Nature 447(7143):407–412CrossRef Berger S (2007) The complex language of chromatin regulation during transcription. Nature 447(7143):407–412CrossRef
Zurück zum Zitat Chen X, Xu H, Yuan P, Fang F, Huss M, Vega V, Wong E, Orlov Y, Zhang W, Jiang J (2008) Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133(6):1106–1117CrossRef Chen X, Xu H, Yuan P, Fang F, Huss M, Vega V, Wong E, Orlov Y, Zhang W, Jiang J (2008) Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133(6):1106–1117CrossRef
Zurück zum Zitat Cheng C, Gerstein M (2012) Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells. Nucleic Acids Res 40(2):553–568CrossRef Cheng C, Gerstein M (2012) Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells. Nucleic Acids Res 40(2):553–568CrossRef
Zurück zum Zitat Cheng C, Li L (2008) Systematic identification of cell cycle regulated transcription factors from microarray time series data. BMC Genomics 9:116CrossRef Cheng C, Li L (2008) Systematic identification of cell cycle regulated transcription factors from microarray time series data. BMC Genomics 9:116CrossRef
Zurück zum Zitat Cheng C, Shou C, Yip K, Gerstein M (2011a) Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors. Genome Biol 12(11):R111CrossRef Cheng C, Shou C, Yip K, Gerstein M (2011a) Genome-wide analysis of chromatin features identifies histone modification sensitive and insensitive yeast transcription factors. Genome Biol 12(11):R111CrossRef
Zurück zum Zitat Cheng C, Yan K, Yip K, Rozowsky J, Alexander R, Shou C, Gerstein M (2011b) A statistical framework for modeling gene expression using chromatin features and application to modencode datasets. Genome Biol 12(2):R15CrossRef Cheng C, Yan K, Yip K, Rozowsky J, Alexander R, Shou C, Gerstein M (2011b) A statistical framework for modeling gene expression using chromatin features and application to modencode datasets. Genome Biol 12(2):R15CrossRef
Zurück zum Zitat Cheng C, Alexander R, Min R, Leng J, Yip K, Rozowsky J, Yan K, Dong X, Djebali S, Ruan Y (2012) Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res 22(9):1658–1667CrossRef Cheng C, Alexander R, Min R, Leng J, Yip K, Rozowsky J, Yan K, Dong X, Djebali S, Ruan Y (2012) Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res 22(9):1658–1667CrossRef
Zurück zum Zitat Cheng C, Ung M, Grant G, Whitfield M (2013) Transcription factor binding profiles reveal cyclic expression of human protein-coding genes and non-coding rnas. PLoS Computational Biol 9(7):e1003132CrossRef Cheng C, Ung M, Grant G, Whitfield M (2013) Transcription factor binding profiles reveal cyclic expression of human protein-coding genes and non-coding rnas. PLoS Computational Biol 9(7):e1003132CrossRef
Zurück zum Zitat Cloonan N, Forrest A, Kolle G, Gardiner B, Faulkner G, Brown M, Taylor D, Steptoe A, Wani S, Bethel G (2008) Stem cell transcriptome profiling via massive-scale mrna sequencing. Nat Methods 5(7):613–619CrossRef Cloonan N, Forrest A, Kolle G, Gardiner B, Faulkner G, Brown M, Taylor D, Steptoe A, Wani S, Bethel G (2008) Stem cell transcriptome profiling via massive-scale mrna sequencing. Nat Methods 5(7):613–619CrossRef
Zurück zum Zitat Creyghton M, Cheng A, Welstead G, Kooistra T, Carey B, Steine E, Hanna J, Lodato M, Frampton G, Sharp P (2010) Histone h3k27ac separates active from poised enhancers and predicts developmental state. Proceedings of the National Academy of Sciences of the United States of America 107(50):21,931–21,936 Creyghton M, Cheng A, Welstead G, Kooistra T, Carey B, Steine E, Hanna J, Lodato M, Frampton G, Sharp P (2010) Histone h3k27ac separates active from poised enhancers and predicts developmental state. Proceedings of the National Academy of Sciences of the United States of America 107(50):21,931–21,936
Zurück zum Zitat Eggermont J, Kok J, Kosters W (2004) Genetic programming for data classification:partitioning the search space. Proceedings of the 2004 ACM symposium on Applied computing ACM Press, Nicosia, pp 1001–1005 Eggermont J, Kok J, Kosters W (2004) Genetic programming for data classification:partitioning the search space. Proceedings of the 2004 ACM symposium on Applied computing ACM Press, Nicosia, pp 1001–1005
Zurück zum Zitat ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74 ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74
Zurück zum Zitat Farnham P (2009)Insights from genomic profiling of transcription factors. Nat Rev Genet 10(9):605–616CrossRef Farnham P (2009)Insights from genomic profiling of transcription factors. Nat Rev Genet 10(9):605–616CrossRef
Zurück zum Zitat Gerstein M, Lu Z, Nostrand EV, Cheng C, Arshinoff B, Liu T, Yip K, Robilotto R, Rechtsteiner A, Ikegami K (2010) Integrative analysis of the caenorhabditis elegans genome by the modencode project. Science 330(6012):1775–1787CrossRef Gerstein M, Lu Z, Nostrand EV, Cheng C, Arshinoff B, Liu T, Yip K, Robilotto R, Rechtsteiner A, Ikegami K (2010) Integrative analysis of the caenorhabditis elegans genome by the modencode project. Science 330(6012):1775–1787CrossRef
Zurück zum Zitat Ghosh P, Bagchi M (2009) Qsar modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection. Curr Med Chem 16(30):4032–4048CrossRef Ghosh P, Bagchi M (2009) Qsar modeling for quinoxaline derivatives using genetic algorithm and simulated annealing based feature selection. Curr Med Chem 16(30):4032–4048CrossRef
Zurück zum Zitat Johnson D, Mortazavi A, Myers R, Wold B (2007) Genome-wide mapping of in vivo protein-dna interactions. Science 316(5830):1497–1502CrossRef Johnson D, Mortazavi A, Myers R, Wold B (2007) Genome-wide mapping of in vivo protein-dna interactions. Science 316(5830):1497–1502CrossRef
Zurück zum Zitat Kandoth C, McLellan M, Vandin F, Ye K, Niu B, Lu C, Xie M, Zhang Q, McMichael J, Wyczalkowski M (2013) Mutational landscape and significance across 12 major cancer types. Nature 502(7471):333–339CrossRef Kandoth C, McLellan M, Vandin F, Ye K, Niu B, Lu C, Xie M, Zhang Q, McMichael J, Wyczalkowski M (2013) Mutational landscape and significance across 12 major cancer types. Nature 502(7471):333–339CrossRef
Zurück zum Zitat Khan M, Alam M (2012) A survey of application: genomics and genetic programming, a new frontier. Genomics 100(2):65–71CrossRef Khan M, Alam M (2012) A survey of application: genomics and genetic programming, a new frontier. Genomics 100(2):65–71CrossRef
Zurück zum Zitat Kotanchek M, Smits G, Vladislavleva E (2006) Pursuing the pareto paradigm tournaments, algorithm variations & ordinal optimization. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice IV, genetic and evolutionary computation, vol 5. Springer, Ann Arbor, pp 167–185. doi:10.1007/978-0-387-49650-4–11 Kotanchek M, Smits G, Vladislavleva E (2006) Pursuing the pareto paradigm tournaments, algorithm variations & ordinal optimization. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice IV, genetic and evolutionary computation, vol 5. Springer, Ann Arbor, pp 167–185. doi:10.1007/978-0-387-49650-4–11
Zurück zum Zitat Kotanchek ME,Vladislavleva E, Smits G(2012) Symbolic regression is not enough: It takes a village to raise a model. In: Riolo R, Vladislavleva E, Ritchie MD, Moore JH (eds) Genetic programming theory and practice X, genetic and evolutionary computation. Springer, Ann Arbor, pp 187–203. doi:10.1007/978-1-4614-6846-2-13, http://dx.doi.org/10.1007/978-1-4614-6846-2-13 Kotanchek ME,Vladislavleva E, Smits G(2012) Symbolic regression is not enough: It takes a village to raise a model. In: Riolo R, Vladislavleva E, Ritchie MD, Moore JH (eds) Genetic programming theory and practice X, genetic and evolutionary computation. Springer, Ann Arbor, pp 187–203. doi:10.1007/978-1-4614-6846-2-13, http://​dx.​doi.​org/​10.​1007/​978-1-4614-6846-2-13
Zurück zum Zitat Koza JR, Mydlowec W, Lanza G, Yu J, Keane MA (2001) Automatic synthesis of both the topology and sizing of metabolic pathways using genetic programming. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt HM, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2001). Morgan Kaufmann, San Francisco, pp 57–65. http://www.cs.bham.ac.uk/~wbl/biblio/gecco2001/koza-gecco2001.pdf Koza JR, Mydlowec W, Lanza G, Yu J, Keane MA (2001) Automatic synthesis of both the topology and sizing of metabolic pathways using genetic programming. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt HM, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the genetic and evolutionary computation conference (GECCO-2001). Morgan Kaufmann, San Francisco, pp 57–65. http://​www.​cs.​bham.​ac.​uk/​~wbl/​biblio/​gecco2001/​koza-gecco2001.​pdf
Zurück zum Zitat Kurdistani S, Tavazoie S, Grunstein M (2004) Mapping global histone acetylation patterns to gene expression. Cell 117(6):721–733CrossRef Kurdistani S, Tavazoie S, Grunstein M (2004) Mapping global histone acetylation patterns to gene expression. Cell 117(6):721–733CrossRef
Zurück zum Zitat Lander E, Linton L, Birren B, Nusbaum C, Zody M, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921CrossRef Lander E, Linton L, Birren B, Nusbaum C, Zody M, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921CrossRef
Zurück zum Zitat Li B, Carey M, Workman J (2007) The role of chromatin during transcription. Cell 128(4):707–719CrossRef Li B, Carey M, Workman J (2007) The role of chromatin during transcription. Cell 128(4):707–719CrossRef
Zurück zum Zitat Maston G, Evans S, Green M (2006) Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 7:29–59 Maston G, Evans S, Green M (2006) Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 7:29–59
Zurück zum Zitat Mikkelsen T, Ku M, Jaffe D, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim T, Koche R (2007) Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448(7153):553–560CrossRef Mikkelsen T, Ku M, Jaffe D, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim T, Koche R (2007) Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448(7153):553–560CrossRef
Zurück zum Zitat Mitra A, Almal A, George B, Fry D, Lenehan P, Pagliarulo V, Cote R, Datar R, Worzel W (2006) The use of genetic programming in the analysis of quantitative gene expression profiles for identification of nodal status in bladder cancer. BMC Cancer 6:159CrossRef Mitra A, Almal A, George B, Fry D, Lenehan P, Pagliarulo V, Cote R, Datar R, Worzel W (2006) The use of genetic programming in the analysis of quantitative gene expression profiles for identification of nodal status in bladder cancer. BMC Cancer 6:159CrossRef
Zurück zum Zitat Moore J, White B (2006) Genome-wide genetic analysis using genetic programming: the critical need for expert knowledge. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice IV, Springer, genetic and evolutionary computation, vol 5, pp 11–28 Moore J, White B (2006) Genome-wide genetic analysis using genetic programming: the critical need for expert knowledge. In: Riolo RL, Soule T, Worzel B (eds) Genetic programming theory and practice IV, Springer, genetic and evolutionary computation, vol 5, pp 11–28
Zurück zum Zitat Orlando D, Lin C, Bernard A, Wang J, Socolar J, Iversen E, Hartemink A, Haase S (2008) Global control of cell-cycle transcription by coupled cdk and network oscillators. Nature 453(7197):944–947CrossRef Orlando D, Lin C, Bernard A, Wang J, Socolar J, Iversen E, Hartemink A, Haase S (2008) Global control of cell-cycle transcription by coupled cdk and network oscillators. Nature 453(7197):944–947CrossRef
Zurück zum Zitat Pennacchio L, Ahituv N, Moses A, Prabhakar S, Nobrega M, Shoukry M, Minovisky S, Dubchak I, Holt A, Lewis K (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118):499–502CrossRef Pennacchio L, Ahituv N, Moses A, Prabhakar S, Nobrega M, Shoukry M, Minovisky S, Dubchak I, Holt A, Lewis K (2006) In vivo enhancer analysis of human conserved non-coding sequences. Nature 444(7118):499–502CrossRef
Zurück zum Zitat Pennacchio L, Bickmore W, Dean A, Nobrega M, Bejerano G (2013) Enhancers: five essential questions. Nat Rev Genet 14(4):288–295CrossRef Pennacchio L, Bickmore W, Dean A, Nobrega M, Bejerano G (2013) Enhancers: five essential questions. Nat Rev Genet 14(4):288–295CrossRef
Zurück zum Zitat Ren B, Robert F, Wyrick J, Aparicio O, Jennings E, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E (2000) Genome-wide location and function of dna binding proteins. Science 290(5500):2306–2309CrossRef Ren B, Robert F, Wyrick J, Aparicio O, Jennings E, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E (2000) Genome-wide location and function of dna binding proteins. Science 290(5500):2306–2309CrossRef
Zurück zum Zitat Simon I, Barnett J, Hannett N, Harbison C, Rinaldi N, Volkert T, Wyrick J, Zeitlinger J, Gifford D, Jaakkola T (2001) Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106(6):697–708CrossRef Simon I, Barnett J, Hannett N, Harbison C, Rinaldi N, Volkert T, Wyrick J, Zeitlinger J, Gifford D, Jaakkola T (2001) Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 106(6):697–708CrossRef
Zurück zum Zitat Stamatoyannopoulos J, Snyder M, Hardison R, Ren B, Gingeras T, Gilbert D, Groudine M, Bender M, Kaul R, Canfield T (2012) An encyclopedia of mouse dna elements (mouse encode). Gen Biol 13(8):418 Stamatoyannopoulos J, Snyder M, Hardison R, Ren B, Gingeras T, Gilbert D, Groudine M, Bender M, Kaul R, Canfield T (2012) An encyclopedia of mouse dna elements (mouse encode). Gen Biol 13(8):418
Zurück zum Zitat Stormo G (2000) Dna binding sites: representation and discovery. Bioinformatics 16(1):16–23CrossRef Stormo G (2000) Dna binding sites: representation and discovery. Bioinformatics 16(1):16–23CrossRef
Zurück zum Zitat Strahl B, Allis C (2000) The language of covalent histone modifications. Nature 403(6765):41–45CrossRef Strahl B, Allis C (2000) The language of covalent histone modifications. Nature 403(6765):41–45CrossRef
Zurück zum Zitat Venter J, Adams M, Myers E, Li P, Mural R, Sutton G, Smith H, Yandell M, Evans C, Holt R (2001) The sequence of the human genome. Science 291(5507):1304–1351CrossRef Venter J, Adams M, Myers E, Li P, Mural R, Sutton G, Smith H, Yandell M, Evans C, Holt R (2001) The sequence of the human genome. Science 291(5507):1304–1351CrossRef
Zurück zum Zitat Whitfield M, Sherlock G, Saldanha A, Murray J, Ball C, Alexander K, Matese J, Perou C, Hurt M, Brown P (2002) Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 13(6):1977–2000CrossRef Whitfield M, Sherlock G, Saldanha A, Murray J, Ball C, Alexander K, Matese J, Perou C, Hurt M, Brown P (2002) Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 13(6):1977–2000CrossRef
Zurück zum Zitat Worzel W, Yu J, Almal A, Chinnaiyan A (2009) Applications of genetic programming in cancer research. Int J Biochem Cell Biol 41(2):405–413CrossRef Worzel W, Yu J, Almal A, Chinnaiyan A (2009) Applications of genetic programming in cancer research. Int J Biochem Cell Biol 41(2):405–413CrossRef
Zurück zum Zitat Yip K, Cheng C, Bhardwaj N, Brown J, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M (2012) Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome biol 13(9):R48CrossRef Yip K, Cheng C, Bhardwaj N, Brown J, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M (2012) Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome biol 13(9):R48CrossRef
Metadaten
Titel
Application of Machine-Learning Methods to Understand Gene Expression Regulation
verfasst von
Chao Cheng
William P. Worzel
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-16030-6_1