Skip to main content
Top

2020 | OriginalPaper | Chapter

Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets

Authors : Zahra Mungloo-Dilmohamud, Yasmina Jaufeerally-Fakim, Carlos Peña-Reyes

Published in: Bioinformatics and Biomedical Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Analysis of gene-expression data often requires that a gene (feature) subset is selected and many feature selection (FS) methods have been devised. However, FS methods often generate different lists of features for the same dataset and users then have to choose which list to use. One approach to support this choice is to apply stability metrics on the generated lists and selecting lists on that base. The aim of this study is to investigate the behavior of stability metrics applied to feature subsets generated by FS methods. The experiments in this work explore a plethora of gene expression datasets, FS methods, and expected number of features to compare several stability metrics. The stability metrics have been used to compare five feature selection methods (SVM, SAM, ReliefF, RFE + RF and LIMMA) on gene expression datasets from the EBI repository. Results show that the studied stability metrics display a high amount of variability. The reason behind this is not clear yet and is being further investigated. The final objective of the research, that is to define how to select a FS method, is an ongoing work whose partial findings are reported herein.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
6.
go back to reference Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (1991)CrossRef Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Hoboken (1991)CrossRef
7.
go back to reference Kuhn, M.: Building predictive models in R using the caret Package. J. Stat. Softw. 28(5), 1–26 (2008)CrossRef Kuhn, M.: Building predictive models in R using the caret Package. J. Stat. Softw. 28(5), 1–26 (2008)CrossRef
9.
go back to reference Mohana, C.: A Survey on feature selection stability measures. International Journal of Computer and Information Technology 05(1), 98–103 (2016) Mohana, C.: A Survey on feature selection stability measures. International Journal of Computer and Information Technology 05(1), 98–103 (2016)
11.
go back to reference Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)CrossRef Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12, 95–116 (2007)CrossRef
12.
13.
go back to reference Lustgarten, J.L., Gopalakrishnan, V., Visweswaran, S.: Measuring stability of feature selection in biomedical datasets. AMIA Annu. Symp. Proc. 2009, 406–410 (2009)PubMedPubMedCentral Lustgarten, J.L., Gopalakrishnan, V., Visweswaran, S.: Measuring stability of feature selection in biomedical datasets. AMIA Annu. Symp. Proc. 2009, 406–410 (2009)PubMedPubMedCentral
14.
go back to reference Dunne, K., Cunningham, P., Azuaje, F.: Solutions to instability problems with sequential wrapper-based approaches to feature selection. J. Mach. Learn. Res., 1–22 (2002) Dunne, K., Cunningham, P., Azuaje, F.: Solutions to instability problems with sequential wrapper-based approaches to feature selection. J. Mach. Learn. Res., 1–22 (2002)
15.
go back to reference Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications, pp. 390–395. ACTA Press (2007) Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications, pp. 390–395. ACTA Press (2007)
16.
go back to reference Shi, L., Reid, L.H., Jones, W.D., Shippy, R., et al.: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006). MAQC ConsortiumCrossRef Shi, L., Reid, L.H., Jones, W.D., Shippy, R., et al.: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006). MAQC ConsortiumCrossRef
17.
go back to reference Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, p. 803. ACM Press, New York (2008) Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08, p. 803. ACM Press, New York (2008)
18.
go back to reference Zucknick, M., Richardson, S., Stronach, E.A.: Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat. Appl. Genet. Mol. Biol. 7 (2008). Article7 Zucknick, M., Richardson, S., Stronach, E.A.: Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Stat. Appl. Genet. Mol. Biol. 7 (2008). Article7
19.
go back to reference Somol, P., Novovicová, J.: Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1921–1939 (2010)CrossRef Somol, P., Novovicová, J.: Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1921–1939 (2010)CrossRef
20.
go back to reference Novovicová, J., Somol, P., Pudil, P.: A new measure of feature selection algorithms’ stability. In: 2009 IEEE International Conference on Data Mining Workshops, pp. 382–387. IEEE (2009) Novovicová, J., Somol, P., Pudil, P.: A new measure of feature selection algorithms’ stability. In: 2009 IEEE International Conference on Data Mining Workshops, pp. 382–387. IEEE (2009)
22.
go back to reference Goh, W.W.B., Wong, L.: Evaluating feature-selection stability in next-generation proteomics. J. Bioinform. Comput. Biol. 14, 1650029 (2016)CrossRef Goh, W.W.B., Wong, L.: Evaluating feature-selection stability in next-generation proteomics. J. Bioinform. Comput. Biol. 14, 1650029 (2016)CrossRef
23.
go back to reference CA, D.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22, 2356–2363 (2006)CrossRef CA, D.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22, 2356–2363 (2006)CrossRef
24.
go back to reference Lausser, L., Müssel, C., Maucher, M., Kestler, H.A.: Measuring and visualizing the stability of biomarker selection techniques. Comput Stat. 28, 51–65 (2013)CrossRef Lausser, L., Müssel, C., Maucher, M., Kestler, H.A.: Measuring and visualizing the stability of biomarker selection techniques. Comput Stat. 28, 51–65 (2013)CrossRef
28.
go back to reference Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 198363 (2015) Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 198363 (2015)
29.
go back to reference Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98, 5116–5121 (2001)CrossRef Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98, 5116–5121 (2001)CrossRef
32.
go back to reference Mungloo-Dilmohamud, Z., Marigliano, G., Jaufeerally-Fakim, Y., Pena-Reyes, C.: A comparative study of feature selection methods for biomarker discovery. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2789–2791. IEEE (2018). https://doi.org/10.1109/bibm.2018.8621267 Mungloo-Dilmohamud, Z., Marigliano, G., Jaufeerally-Fakim, Y., Pena-Reyes, C.: A comparative study of feature selection methods for biomarker discovery. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2789–2791. IEEE (2018). https://​doi.​org/​10.​1109/​bibm.​2018.​8621267
33.
go back to reference Mungloo-Dilmohamud, Z., Jaufeerally-Fakim, T., Peña-Reyes, C.: Exploring the Stability of Feature Selection Methods across a Palette of Gene Expression Datasets. Proceedings of the 2019 6th International Conference on Biomedical and Bioinformatics Engineering, ICBBE 2019. ACM (2019) Mungloo-Dilmohamud, Z., Jaufeerally-Fakim, T., Peña-Reyes, C.: Exploring the Stability of Feature Selection Methods across a Palette of Gene Expression Datasets. Proceedings of the 2019 6th International Conference on Biomedical and Bioinformatics Engineering, ICBBE 2019. ACM (2019)
Metadata
Title
Stability of Feature Selection Methods: A Study of Metrics Across Different Gene Expression Datasets
Authors
Zahra Mungloo-Dilmohamud
Yasmina Jaufeerally-Fakim
Carlos Peña-Reyes
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-45385-5_59

Premium Partner