Skip to main content
Top
Published in: Pattern Analysis and Applications 4/2023

14-08-2023 | Theoretical Advances

CInf-FS\(_S\): an efficient infinite feature selection method using K-means clustering to partition large feature spaces

Authors: Seyyedeh Faezeh Hassani Ziabari, Sadegh Eskandari, Maziar Salahi

Published in: Pattern Analysis and Applications | Issue 4/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we present a new feature selection algorithm for supervised problems. We build our algorithm upon recently proposed infinite feature selection (Inf-FS) method where features are ranked based on path integrals and the centrality concept on a feature adjacency graph. The proposed algorithm firstly clusters the feature space into a predefined number of subspaces. Then ranks the features in each subspace using Inf-FS method. Finally merges the resultant subranks using a combined measure of information theory and clusters size. We extensively evaluate our algorithm on six benchmark datasets, and show that our method outperforms Inf-FS in terms of the accuracy, running time, and memory consumption. The codes are available at https://​github.​com/​Sadegh28/​CInf-FS.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRef Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRef
2.
go back to reference Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839MathSciNetCrossRefMATH Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839MathSciNetCrossRefMATH
3.
go back to reference Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef
4.
go back to reference Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13CrossRef Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom 21(1):1–13CrossRef
5.
go back to reference Eskandari S (2022) Multi-label feature selection using geometric series of relevance matrix. J Supercomput 78:1–17CrossRef Eskandari S (2022) Multi-label feature selection using geometric series of relevance matrix. J Supercomput 78:1–17CrossRef
7.
go back to reference Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRef Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRef
8.
go back to reference Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182MATH
9.
go back to reference Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in neural information processing systems 17 Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in neural information processing systems 17
10.
go back to reference Guyon I, Li J, Mader T, Pletscher PA, Schneider G, Uhr M (2007) Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn Lett 28(12):1438–1444CrossRef Guyon I, Li J, Mader T, Pletscher PA, Schneider G, Uhr M (2007) Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn Lett 28(12):1438–1444CrossRef
11.
go back to reference Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1119CrossRef Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1119CrossRef
12.
go back to reference Lee J, Kim D-W (2015) Mutual information-based multi-label feature selection using interaction information. Expert Syst Appl 42(4):2013–2025CrossRef Lee J, Kim D-W (2015) Mutual information-based multi-label feature selection using interaction information. Expert Syst Appl 42(4):2013–2025CrossRef
13.
go back to reference Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461CrossRef Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461CrossRef
14.
go back to reference Ou Y-Y, Hung H-G, Oyang Y-J (2006) A study of supervised learning with multivariate analysis on unbalanced datasets. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 2201–2205 Ou Y-Y, Hung H-G, Oyang Y-J (2006) A study of supervised learning with multivariate analysis on unbalanced datasets. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 2201–2205
15.
go back to reference Pirgazi J, Alimoradi M, Esmaeili Abharian T, Olyaee MH (2019) An efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets. Sci Rep 9(1):1–15CrossRef Pirgazi J, Alimoradi M, Esmaeili Abharian T, Olyaee MH (2019) An efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets. Sci Rep 9(1):1–15CrossRef
16.
go back to reference Roffo G, Melzi S, Castellani U, Vinciarelli A, Cristani M (2020) Infinite feature selection: a graph-based feature filtering approach. IEEE Trans Pattern Anal Mach Intell 43(12):4396–4410CrossRef Roffo G, Melzi S, Castellani U, Vinciarelli A, Cristani M (2020) Infinite feature selection: a graph-based feature filtering approach. IEEE Trans Pattern Anal Mach Intell 43(12):4396–4410CrossRef
17.
go back to reference Roffo G, Melzi S, Cristani M (2015) Infinite feature selection. In: Proceedings of the IEEE international conference on computer vision, pp 4202–4210 Roffo G, Melzi S, Cristani M (2015) Infinite feature selection. In: Proceedings of the IEEE international conference on computer vision, pp 4202–4210
18.
go back to reference Sasaki Y et al (2007) The truth of the f-measure. Teach Tutor Mater 1(5):1–5 Sasaki Y et al (2007) The truth of the f-measure. Teach Tutor Mater 1(5):1–5
19.
go back to reference Wah YB, Ibrahim N, Hamid HA, Abdul-Rahman S, Fong S (2018) Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J Sci Technol 26(1):329–340 Wah YB, Ibrahim N, Hamid HA, Abdul-Rahman S, Fong S (2018) Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J Sci Technol 26(1):329–340
Metadata
Title
CInf-FS: an efficient infinite feature selection method using K-means clustering to partition large feature spaces
Authors
Seyyedeh Faezeh Hassani Ziabari
Sadegh Eskandari
Maziar Salahi
Publication date
14-08-2023
Publisher
Springer London
Published in
Pattern Analysis and Applications / Issue 4/2023
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-023-01189-1

Other articles of this Issue 4/2023

Pattern Analysis and Applications 4/2023 Go to the issue

Premium Partner