Skip to main content

2018 | OriginalPaper | Buchkapitel

Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences

verfasst von : Giosué Lo Bosco, Riccardo Rizzo, Antonino Fiannaca, Massimo La Rosa, Alfonso Urso

Erschienen in: New Trends in Databases and Information Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue affects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative \(k-mers\) associated to a set of DNA sequences, for the final purpose of nucleosome/linker classification by a deep learning network. Results computed on three public datasets show the effectiveness of the adopted feature selection method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F., Richmond, T.J.: Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389(6648), 251–260 (1997)CrossRef Luger, K., Mader, A.W., Richmond, R.K., Sargent, D.F., Richmond, T.J.: Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389(6648), 251–260 (1997)CrossRef
2.
Zurück zum Zitat Struhl, K., Segal, E.: Determinants of nucleosome positioning. Nat StructMol Biol 20(3), 267–273 (2013)CrossRef Struhl, K., Segal, E.: Determinants of nucleosome positioning. Nat StructMol Biol 20(3), 267–273 (2013)CrossRef
3.
Zurück zum Zitat Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Briefings Bioinf. 15(3), 419–430 (2013)CrossRef Pinello, L., Lo Bosco, G., Yuan, G.-C.: Applications of alignment-free methods in epigenomics. Briefings Bioinf. 15(3), 419–430 (2013)CrossRef
4.
Zurück zum Zitat Pinello, L., Lo Bosco, G., Hanlon, B., Yuan, G.-C.: A motif-independent metric for DNA sequence specificity. BMC Bioinf. 12(408) (2011) Pinello, L., Lo Bosco, G., Hanlon, B., Yuan, G.-C.: A motif-independent metric for DNA sequence specificity. BMC Bioinf. 12(408) (2011)
7.
Zurück zum Zitat Ferraro, P.U., Roscigno, G., Cattaneo, G., Giancarlo, R.: Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms. Bioinformatics 34(11), 1826–1833 (2018)CrossRef Ferraro, P.U., Roscigno, G., Cattaneo, G., Giancarlo, R.: Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms. Bioinformatics 34(11), 1826–1833 (2018)CrossRef
8.
Zurück zum Zitat Pandey, P., Bender, M.A., Johnson, R., Patro, R.: Squeakr: an exact and approximate k-mer counting system. Bioinformatics 34(4), 568–575 (2018)CrossRef Pandey, P., Bender, M.A., Johnson, R., Patro, R.: Squeakr: an exact and approximate k-mer counting system. Bioinformatics 34(4), 568–575 (2018)CrossRef
9.
Zurück zum Zitat Kuksa, P., Pavlovic, V.: Efficient alignment-free DNA barcode analytics. BMC Bioinf. 10(S14) (2009) Kuksa, P., Pavlovic, V.: Efficient alignment-free DNA barcode analytics. BMC Bioinf. 10(S14) (2009)
11.
Zurück zum Zitat Yuan, G.C.: Linking genome to epigenome. Wiley Interdisc. Rev. Syst. Biol. Med. 4(3), 297–309 (2012)CrossRef Yuan, G.C.: Linking genome to epigenome. Wiley Interdisc. Rev. Syst. Biol. Med. 4(3), 297–309 (2012)CrossRef
12.
Zurück zum Zitat Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)CrossRef Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)CrossRef
13.
Zurück zum Zitat LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
16.
Zurück zum Zitat Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning model for epigenomic studies. In: Proceedings of SITIS 2016 Conference, Naples, Italy (2016) Lo Bosco, G., Rizzo, R., Fiannaca, A., La Rosa, M., Urso, A.: A deep learning model for epigenomic studies. In: Proceedings of SITIS 2016 Conference, Naples, Italy (2016)
18.
Zurück zum Zitat Fiannaca, A. et al.: Deep learning models for bacteria taxonomic classication of metagenomic data. BMC Bioinform. 19(S7:198) (2018) Fiannaca, A. et al.: Deep learning models for bacteria taxonomic classication of metagenomic data. BMC Bioinform. 19(S7:198) (2018)
19.
Zurück zum Zitat Di Gangi, M.A., Lo Bosco, G., Rizzo, R.: Deep learning architectures for prediction of nucleosome positioning from sequences data. BMC Bioinf. (2018, to appear) Di Gangi, M.A., Lo Bosco, G., Rizzo, R.: Deep learning architectures for prediction of nucleosome positioning from sequences data. BMC Bioinf. (2018, to appear)
20.
Zurück zum Zitat Dubinkina, V.B., Ischenko, D.S., Ulyantsev, V.I., Tyakht, A.V., Alexeev, D.G.: Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis. BMC Bioinf. 17(1) (2016) Dubinkina, V.B., Ischenko, D.S., Ulyantsev, V.I., Tyakht, A.V., Alexeev, D.G.: Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis. BMC Bioinf. 17(1) (2016)
21.
Zurück zum Zitat Guo, S.-H., et al.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)CrossRef Guo, S.-H., et al.: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30(11), 1522–1529 (2014)CrossRef
23.
Zurück zum Zitat LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef
Metadaten
Titel
Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences
verfasst von
Giosué Lo Bosco
Riccardo Rizzo
Antonino Fiannaca
Massimo La Rosa
Alfonso Urso
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-00063-9_30