Skip to main content

2020 | OriginalPaper | Buchkapitel

A Clustering Approach to Identify Candidates to Housekeeping Genes Based on RNA-seq Data

verfasst von : Edian F. Franco, Dener Maués, Ronnie Alves, Luis Guimarães, Vasco Azevedo, Artur Silva, Preetam Ghosh, Jefferson Morais, Rommel T. J. Ramos

Erschienen in: Advances in Bioinformatics and Computational Biology

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Housekeeping genes (HKGs), are essential for gene expression based studies performed through Reverse Transcriptase-polymerase Chain Reaction (RT-qPCR). These genes are related with the basic cellular processes that are essential for cell maintenance, survival and function. Thus, HKGs should be expressed in all cells of an organism regardless of the tissue type, cell state or cell condition. High-throughput technologies, including RNA sequencing (RNA-seq), are used to study and identify these types of genes. RNA-seq is a high-throughput method that allows the measurement of gene expression profiles in a target tissue or an isolated cell. Moreover, machine learning methods are routinely applied in different genomics related areas to enable the interpretation of large datasets, including those related to gene expression. This study reports a new machine learning based approach to identify candidate HKGs in silico from RNA-seq gene expression data. The approach enabled the identification of stable HKGs candidates in RNA-seq data from Corynebacterium pseudotuberculosis. These genes showed stable expression under different stress conditions as well as low variation index and fold changes. Furthermore, some of these genes were already reported in the literature as HKGs or HKGs candidates for the same or other bacterial organisms, which reinforced the accuracy of the proposed method. We present a novel approach based on K-means algorithm, internal metrics and machine learning methods that can identify stable housekeeping genes from gene expression data with high accuracy and efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Andritsos, P., et al.: Data clustering techniques. Rapport technique. University of Toronto. Department of Computer Science (2002) Andritsos, P., et al.: Data clustering techniques. Rapport technique. University of Toronto. Department of Computer Science (2002)
2.
Zurück zum Zitat Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006)CrossRef Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Heidelberg (2006)CrossRef
4.
Zurück zum Zitat Brock, G., Pihur, V., Datta, S.: clValid: an R package for cluster validation. J. Stat. Softw. 25, 1–32 (2008)CrossRef Brock, G., Pihur, V., Datta, S.: clValid: an R package for cluster validation. J. Stat. Softw. 25, 1–32 (2008)CrossRef
5.
Zurück zum Zitat Brun, M., et al.: Model-based evaluation of clustering validation measures. Pattern Recogn. 40(3), 807–824 (2007) CrossRef Brun, M., et al.: Model-based evaluation of clustering validation measures. Pattern Recogn. 40(3), 807–824 (2007) CrossRef
7.
Zurück zum Zitat Chen, W.H., Minguez, P., Lercher, M.J., Bork, P.: OGEE: an online gene essentiality database. Nucleic Acids Res. 40(D1), D901–D906 (2011)CrossRef Chen, W.H., Minguez, P., Lercher, M.J., Bork, P.: OGEE: an online gene essentiality database. Nucleic Acids Res. 40(D1), D901–D906 (2011)CrossRef
8.
Zurück zum Zitat Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018)CrossRef Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018)CrossRef
11.
Zurück zum Zitat Dheda, K., Huggett, J.F., Bustin, S.A., Johnson, M.A., Rook, G., Zumla, A.: Validation of housekeeping genes for normalizing RNA expression in real-time PCR. BioTechniques 37(1), 112–119 (2004)CrossRef Dheda, K., Huggett, J.F., Bustin, S.A., Johnson, M.A., Rook, G., Zumla, A.: Validation of housekeeping genes for normalizing RNA expression in real-time PCR. BioTechniques 37(1), 112–119 (2004)CrossRef
12.
Zurück zum Zitat Dong, B., et al.: Predicting housekeeping genes based on Fourier analysis. PLoS One 6(6), e21012 (2011)CrossRef Dong, B., et al.: Predicting housekeeping genes based on Fourier analysis. PLoS One 6(6), e21012 (2011)CrossRef
13.
Zurück zum Zitat Eisenberg, E., Levanon, E.Y.: Human housekeeping genes, revisited. Trends Genet. 29(10), 569–574 (2013)CrossRef Eisenberg, E., Levanon, E.Y.: Human housekeeping genes, revisited. Trends Genet. 29(10), 569–574 (2013)CrossRef
14.
Zurück zum Zitat Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Knowledge discovery and data mining: towards a unifying framework. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 82–88 (1996) https://doi.org/10.1.1.27.363 Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: Knowledge discovery and data mining: towards a unifying framework. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 82–88 (1996) https://​doi.​org/​10.​1.​1.​27.​363
15.
Zurück zum Zitat Ghazzali, N.: NbClust: an R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 61(6), 1–36 (2014) Ghazzali, N.: NbClust: an R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 61(6), 1–36 (2014)
16.
Zurück zum Zitat Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 187–194. IEEE (2001) Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 187–194. IEEE (2001)
18.
Zurück zum Zitat Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Tecniques, 3rd edn. Morgan Kaufmann/Elsevier, Walthan (2011) Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Tecniques, 3rd edn. Morgan Kaufmann/Elsevier, Walthan (2011)
20.
Zurück zum Zitat Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: Proceedings of the 6th International Symposium of Hungarian Researchers on Computational Intelligence, pp. 1–11 (2005) Kovács, F., Legány, C., Babos, A.: Cluster validity measurement techniques. In: Proceedings of the 6th International Symposium of Hungarian Researchers on Computational Intelligence, pp. 1–11 (2005)
21.
Zurück zum Zitat Kozera, B., Rapacz, M.: Reference genes in real-time PCR. J. Appl. Genet. 54(4), 391–406 (2013)CrossRef Kozera, B., Rapacz, M.: Reference genes in real-time PCR. J. Appl. Genet. 54(4), 391–406 (2013)CrossRef
24.
Zurück zum Zitat Lin, Y., et al.: Evaluating stably expressed genes in single cells. bioRxiv p. 229815 (2018) Lin, Y., et al.: Evaluating stably expressed genes in single cells. bioRxiv p. 229815 (2018)
27.
Zurück zum Zitat Oyelade, J., et al.: Clustering algorithms: their application to gene expression data. Bioinform. Biol. Insights 10, BBI-S38316 (2016) Oyelade, J., et al.: Clustering algorithms: their application to gene expression data. Bioinform. Biol. Insights 10, BBI-S38316 (2016)
28.
Zurück zum Zitat Pinto, A.C., et al.: Differential transcriptional profile of Corynebacterium pseudotuberculosis in response to abiotic stresses. BMC Genomics 15(1), 14 (2014)CrossRef Pinto, A.C., et al.: Differential transcriptional profile of Corynebacterium pseudotuberculosis in response to abiotic stresses. BMC Genomics 15(1), 14 (2014)CrossRef
30.
Zurück zum Zitat Rao, J., Liu, W., Xie, H.: A new method to identify housekeeping genes and tissue special genes. In: International Conference on Biomedical and Biological Engineering. Atlantis Press (2016) Rao, J., Liu, W., Xie, H.: A new method to identify housekeeping genes and tissue special genes. In: International Conference on Biomedical and Biological Engineering. Atlantis Press (2016)
31.
Zurück zum Zitat Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011) Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)
33.
Zurück zum Zitat Ross, I., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996) Ross, I., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996)
35.
Zurück zum Zitat Silva, A., et al.: Complete genome sequence of corynebacterium pseudotuberculosis I19, a strain isolated from a cow in israel with bovine mastitis. J. Bacteriol. 193(1), 323–324 (2011)CrossRef Silva, A., et al.: Complete genome sequence of corynebacterium pseudotuberculosis I19, a strain isolated from a cow in israel with bovine mastitis. J. Bacteriol. 193(1), 323–324 (2011)CrossRef
39.
Zurück zum Zitat Vieira, A., et al.: Comparative validation of conventional and RNA-Seq data-derived reference genes for QPCR expression studies of colletotrichum Kahawae. PLoS One 11(3), e0150651 (2016)CrossRef Vieira, A., et al.: Comparative validation of conventional and RNA-Seq data-derived reference genes for QPCR expression studies of colletotrichum Kahawae. PLoS One 11(3), e0150651 (2016)CrossRef
Metadaten
Titel
A Clustering Approach to Identify Candidates to Housekeeping Genes Based on RNA-seq Data
verfasst von
Edian F. Franco
Dener Maués
Ronnie Alves
Luis Guimarães
Vasco Azevedo
Artur Silva
Preetam Ghosh
Jefferson Morais
Rommel T. J. Ramos
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-46417-2_8