Skip to main content
main-content

Tipp

Weitere Artikel dieser Ausgabe durch Wischen aufrufen

22.11.2016 | Methodologies and Application | Ausgabe 22/2017

Soft Computing 22/2017

Multistage feature selection approach for high-dimensional cancer data

Zeitschrift:
Soft Computing > Ausgabe 22/2017
Autoren:
Alhasan Alkuhlani, Mohammad Nassef, Ibrahim Farag
Wichtige Hinweise
Communicated by V. Loia.

Electronic supplementary material

The online version of this article (doi:10.​1007/​s00500-016-2439-9) contains supplementary material, which is available to authorized users.

Abstract

Cancer is a serious disease that causes death worldwide. DNA methylation (DNAm) is an epigenetic mechanism, which controls the regulation of gene expression and is useful in early detection of cancer. The challenge with DNA methylation microarray datasets is the huge number of CpG sites compared to the number of samples. Recent research efforts attempted to reduce this high dimensionality by different feature selection techniques. This article proposes a multistage feature selection approach to select the optimal CpG sites from three different DNAm cancer datasets (breast, colon and lung). The proposed approach combines three different filter feature selection methods including Fisher Criterion, t-test and Area Under ROC Curve. In addition, as a wrapper feature selection, we apply genetic algorithms with Support Vector Machine Recursive Feature Elimination (SVM-RFE) as its fitness function, and SVM as its evaluator. Using the Incremental Feature Selection (IFS) strategy, subsets of 24, 13 and 27 optimal CpG sites are selected for the breast, colon and lung cancer datasets, respectively. By applying fivefold cross-validation on the training datasets, these subsets of optimal CpG sites showed perfect classification accuracies of 100, 100 and 97.67%, respectively. Moreover, the testing of the three independent cancer datasets by these final subsets resulted in accuracies 96.02, 98.81 and 94.51%, respectively. The experimental results demonstrated high classification performance and small optimal feature subsets. Consequently, the biological significance of the genes corresponding to these feature subsets is validated using enrichment analysis.

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 69.000 Bücher
  • über 500 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Umwelt
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Testen Sie jetzt 30 Tage kostenlos.

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 58.000 Bücher
  • über 300 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Testen Sie jetzt 30 Tage kostenlos.

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 50.000 Bücher
  • über 380 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Umwelt
  • Maschinenbau + Werkstoffe​​​​​​​




Testen Sie jetzt 30 Tage kostenlos.

Zusatzmaterial
Supplementary 1: The accuracy (Ac), sensitivity (Sn), specificity (Sp) of each run of IFS for each of the three training sets using MSFS (NoGA-Mode) (PDF 221 kb).
500_2016_2439_MOESM1_ESM.pdf
Supplementary 2: The accuracy (Ac), sensitivity (Sn), specificity (Sp) of each run of IFS for each of the three training sets using MSFS (GA-Mode) (PDF 206 kb).
500_2016_2439_MOESM2_ESM.pdf
Supplementary 3: The accuracy (Ac), sensitivity (Sn), specificity (Sp) of each run of IFS for each of the three independent sets using MSFS (NoGA-Mode) (PDF 260 kb).
500_2016_2439_MOESM3_ESM.pdf
Supplementary 4: The accuracy (Ac), sensitivity (Sn), specificity (Sp) of each run of IFS for each of the three independent sets using MSFS (GA-Mode) (PDF 261 kb).
500_2016_2439_MOESM4_ESM.pdf
Supplementary 5: GO terms and KEGG pathways of genes corresponding to the selected CpG sites for the three attempted cancer datasets (PDF 241 kb).
500_2016_2439_MOESM5_ESM.pdf
Literatur
Über diesen Artikel

Weitere Artikel der Ausgabe 22/2017

Soft Computing 22/2017 Zur Ausgabe

Premium Partner

    Bildnachweise