Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2023

14.08.2023 | Theoretical Advances

CInf-FS\(_S\): an efficient infinite feature selection method using K-means clustering to partition large feature spaces

verfasst von: Seyyedeh Faezeh Hassani Ziabari, Sadegh Eskandari, Maziar Salahi

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present a new feature selection algorithm for supervised problems. We build our algorithm upon recently proposed infinite feature selection (Inf-FS) method where features are ranked based on path integrals and the centrality concept on a feature adjacency graph. The proposed algorithm firstly clusters the feature space into a predefined number of subspaces. Then ranks the features in each subspace using Inf-FS method. Finally merges the resultant subranks using a combined measure of information theory and clusters size. We extensively evaluate our algorithm on six benchmark datasets, and show that our method outperforms Inf-FS in terms of the accuracy, running time, and memory consumption. The codes are available at https://​github.​com/​Sadegh28/​CInf-FS.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRef Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRef
2.
Zurück zum Zitat Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839MathSciNetCrossRefMATH Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839MathSciNetCrossRefMATH
3.
Zurück zum Zitat Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef
4.
Zurück zum Zitat Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13CrossRef Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13CrossRef
5.
Zurück zum Zitat Eskandari S (2022) Multi-label feature selection using geometric series of relevance matrix. J Supercomput 78:1–17CrossRef Eskandari S (2022) Multi-label feature selection using geometric series of relevance matrix. J Supercomput 78:1–17CrossRef
7.
Zurück zum Zitat Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRef Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRef
8.
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182MATH
9.
Zurück zum Zitat Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in neural information processing systems 17 Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in neural information processing systems 17
10.
Zurück zum Zitat Guyon I, Li J, Mader T, Pletscher PA, Schneider G, Uhr M (2007) Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn Lett 28(12):1438–1444CrossRef Guyon I, Li J, Mader T, Pletscher PA, Schneider G, Uhr M (2007) Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn Lett 28(12):1438–1444CrossRef
11.
Zurück zum Zitat Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1119CrossRef Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1119CrossRef
12.
Zurück zum Zitat Lee J, Kim D-W (2015) Mutual information-based multi-label feature selection using interaction information. Expert Syst Appl 42(4):2013–2025CrossRef Lee J, Kim D-W (2015) Mutual information-based multi-label feature selection using interaction information. Expert Syst Appl 42(4):2013–2025CrossRef
13.
Zurück zum Zitat Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461CrossRef Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461CrossRef
14.
Zurück zum Zitat Ou Y-Y, Hung H-G, Oyang Y-J (2006) A study of supervised learning with multivariate analysis on unbalanced datasets. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 2201–2205 Ou Y-Y, Hung H-G, Oyang Y-J (2006) A study of supervised learning with multivariate analysis on unbalanced datasets. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 2201–2205
15.
Zurück zum Zitat Pirgazi J, Alimoradi M, Esmaeili Abharian T, Olyaee MH (2019) An efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets. Sci Rep 9(1):1–15CrossRef Pirgazi J, Alimoradi M, Esmaeili Abharian T, Olyaee MH (2019) An efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets. Sci Rep 9(1):1–15CrossRef
16.
Zurück zum Zitat Roffo G, Melzi S, Castellani U, Vinciarelli A, Cristani M (2020) Infinite feature selection: a graph-based feature filtering approach. IEEE Trans Pattern Anal Mach Intell 43(12):4396–4410CrossRef Roffo G, Melzi S, Castellani U, Vinciarelli A, Cristani M (2020) Infinite feature selection: a graph-based feature filtering approach. IEEE Trans Pattern Anal Mach Intell 43(12):4396–4410CrossRef
17.
Zurück zum Zitat Roffo G, Melzi S, Cristani M (2015) Infinite feature selection. In: Proceedings of the IEEE international conference on computer vision, pp 4202–4210 Roffo G, Melzi S, Cristani M (2015) Infinite feature selection. In: Proceedings of the IEEE international conference on computer vision, pp 4202–4210
18.
Zurück zum Zitat Sasaki Y et al (2007) The truth of the f-measure. Teach Tutor Mater 1(5):1–5 Sasaki Y et al (2007) The truth of the f-measure. Teach Tutor Mater 1(5):1–5
19.
Zurück zum Zitat Wah YB, Ibrahim N, Hamid HA, Abdul-Rahman S, Fong S (2018) Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J Sci Technol 26(1):329–340 Wah YB, Ibrahim N, Hamid HA, Abdul-Rahman S, Fong S (2018) Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J Sci Technol 26(1):329–340
Metadaten
Titel
CInf-FS: an efficient infinite feature selection method using K-means clustering to partition large feature spaces
verfasst von
Seyyedeh Faezeh Hassani Ziabari
Sadegh Eskandari
Maziar Salahi
Publikationsdatum
14.08.2023
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2023
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-023-01189-1

Weitere Artikel der Ausgabe 4/2023

Pattern Analysis and Applications 4/2023 Zur Ausgabe

Premium Partner