Skip to main content

2016 | OriginalPaper | Buchkapitel

Distributed Monte Carlo Feature Selection: Extracting Informative Features Out of Multidimensional Problems with Linear Speedup

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Selection of informative features out of ever growing results of high throughput biological experiments requires specialized feature selection algorithms. One of such methods is the Monte Carlo Feature Selection - a straightforward, yet computationally expensive one. In this technical paper we present architecture and performance of a development version of our distributed implementation of this algorithm, designed to run in multiprocessor as well as multihost computing environments, and potentially controllable through a web browser by non-IT staff. As a simple enhancement, our method is able to produce statistically interpretable output by means of permutation testing. Tested on reference Golub et al. leukemia data, as well as on our own dataset of almost 2 million features, it has shown nearly linear speedup when executed with an increased amount of processors. Being platform independent, as well as open for extensions, this application could become a valuable tool for researchers facing the challenge of ill-defined high dimensional feature selection problems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Draminski, M., Rada-Iglesias, A., Enroth, S., Wadelius, C., Koronacki, J., Komorowski, J.: Monte carlo feature selection for supervised classification. Bioinformatics 24, 110–117 (2008)CrossRef Draminski, M., Rada-Iglesias, A., Enroth, S., Wadelius, C., Koronacki, J., Komorowski, J.: Monte carlo feature selection for supervised classification. Bioinformatics 24, 110–117 (2008)CrossRef
2.
Zurück zum Zitat Dramiński, M., Kierczak, M., Koronacki, J., Komorowski, J.: Monte carlo feature selection and interdependency discovery in supervised classification. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds.) Advances in Machine Learning II. SCI, vol. 263, pp. 371–385. Springer, Heidelberg (2010)CrossRef Dramiński, M., Kierczak, M., Koronacki, J., Komorowski, J.: Monte carlo feature selection and interdependency discovery in supervised classification. In: Koronacki, J., Raś, Z.W., Wierzchoń, S.T., Kacprzyk, J. (eds.) Advances in Machine Learning II. SCI, vol. 263, pp. 371–385. Springer, Heidelberg (2010)CrossRef
3.
Zurück zum Zitat Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRef Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRef
4.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explor. 11, 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explor. 11, 10–18 (2009)CrossRef
5.
Zurück zum Zitat International HapMap Consortium: The international hapmap project. Nature 426, 789 (2003) International HapMap Consortium: The international hapmap project. Nature 426, 789 (2003)
6.
Zurück zum Zitat Luque-Baena, R.M., Urda, D., Subirats, J.L., Franco, L., Jerez, J.M.: Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data. Theor. Biol. Med. Model. 11, 7 (2014)CrossRef Luque-Baena, R.M., Urda, D., Subirats, J.L., Franco, L., Jerez, J.M.: Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data. Theor. Biol. Med. Model. 11, 7 (2014)CrossRef
7.
Zurück zum Zitat Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. Series 6(2), 559–572 (1901)CrossRefMATH Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. Series 6(2), 559–572 (1901)CrossRefMATH
8.
Zurück zum Zitat Perneger, T.: What wrong with Bonferroni adjustments. BMJ 316, 1236–1238 (1998)CrossRef Perneger, T.: What wrong with Bonferroni adjustments. BMJ 316, 1236–1238 (1998)CrossRef
9.
Zurück zum Zitat Quinlan, J.R.: Effective Akka. MO’Reilly Media, Inc. ISBN: 1449360076 9781449360078 (2013) Quinlan, J.R.: Effective Akka. MO’Reilly Media, Inc. ISBN: 1449360076 9781449360078 (2013)
10.
Zurück zum Zitat Sidak, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62, 626–633 (1967)MathSciNetMATH Sidak, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62, 626–633 (1967)MathSciNetMATH
11.
12.
Zurück zum Zitat The: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012) The: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)
Metadaten
Titel
Distributed Monte Carlo Feature Selection: Extracting Informative Features Out of Multidimensional Problems with Linear Speedup
verfasst von
Lukasz Krol
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-34099-9_35

Premium Partner