Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 2/2023

05-10-2022 | Original Article

TSFNFS: two-stage-fuzzy-neighborhood feature selection with binary whale optimization algorithm

Authors: Lin Sun, Xinya Wang, Weiping Ding, Jiucheng Xu, Huili Meng

Published in: International Journal of Machine Learning and Cybernetics | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The optimal global feature subset cannot be found easily due to the high cost, and most swarm intelligence optimization-based feature selection methods are inefficient in handling high-dimensional data. In this study, a two-stage feature selection model based on fuzzy neighborhood rough sets (FNRS) and binary whale optimization algorithm (BWOA) is developed. First, to denote the fuzziness of samples for mixed data with symbolic and numerical features, fuzzy neighborhood similarity is presented to study the similarity matrix and fuzzy membership degree, and the lower and upper approximations can be developed to present new FNRS model. Fuzzy neighborhood-based uncertainty measures such as dependence degree, knowledge granularity, and entropy measures are studied. From the viewpoints of algebra and information, fuzzy knowledge granularity conditional entropy is presented to form a preselected feature reduction set in the first stage. Second, the cosine curve change is added to develop a new control factor, which slows down the convergence rate of BWOA in the early iteration to fully explore the global, and accelerates the convergence rate in the late iteration. Integrating dependence degree with fuzzy knowledge granularity conditional entropy, a new fitness function is designed for selecting an optimal feature subset in this second stage. Two strategies are fused to avoid BWOA falling into the local optimum: the population partition strategy with the adaptive neighborhood search radius to divide the whale population and the local interference strategy of the elite subgroup to adjust the whale position update. Finally, a two-stage feature selection algorithm is designed, where the Fisher score algorithm is employed to preliminarily delete those redundancy features of high-dimensional datasets. Experiments on six UCI datasets and five gene expression datasets show that our algorithm is valid compared to other related algorithms.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Zhang X, Yao Y (2022) Tri-level attribute reduction in rough set theory. Expert Syst Appl 190:116187CrossRef Zhang X, Yao Y (2022) Tri-level attribute reduction in rough set theory. Expert Syst Appl 190:116187CrossRef
2.
go back to reference Ding W, Pedrycz W, Triguero I, Cao Z, Lin C (2021) Multigranulation supertrust model for attribute reduction. IEEE Trans Fuzzy Syst 29(6):1395–1408CrossRef Ding W, Pedrycz W, Triguero I, Cao Z, Lin C (2021) Multigranulation supertrust model for attribute reduction. IEEE Trans Fuzzy Syst 29(6):1395–1408CrossRef
3.
go back to reference Qian W, Dong P, Wang Y, Dai S, Huang J (2022) Local rough set-based feature selection for label distribution learning with incomplete labels. Int J Mach Learn Cybern 13:2345–2364CrossRef Qian W, Dong P, Wang Y, Dai S, Huang J (2022) Local rough set-based feature selection for label distribution learning with incomplete labels. Int J Mach Learn Cybern 13:2345–2364CrossRef
4.
go back to reference Sun L, Zhang X, Qian Y, Xu J, Zhang S (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41MATHCrossRef Sun L, Zhang X, Qian Y, Xu J, Zhang S (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41MATHCrossRef
5.
go back to reference Sun L, Li M, Ding W, Zhang E, Mu X, Xu J (2022) AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data. Inf Sci 612:724–744CrossRef Sun L, Li M, Ding W, Zhang E, Mu X, Xu J (2022) AFNFS: Adaptive fuzzy neighborhood-based feature selection with adaptive synthetic over-sampling for imbalanced data. Inf Sci 612:724–744CrossRef
6.
go back to reference Xu W, Yuan K, Li W (2022) Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl Intell 52(8):9148–9173CrossRef Xu W, Yuan K, Li W (2022) Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl Intell 52(8):9148–9173CrossRef
7.
go back to reference Sun L, Huang M, Xu J (2022) Weak label feature selection method based on neighborhood rough sets and Relief. Chin Comput Sci 49(4):152–160 Sun L, Huang M, Xu J (2022) Weak label feature selection method based on neighborhood rough sets and Relief. Chin Comput Sci 49(4):152–160
8.
go back to reference Zhang C, Ding J, Zhan J, Li D (2022) Incomplete three-way multi-attribute group decision making based on adjustable multigranulation Pythagorean fuzzy probabilistic rough sets. Int J Approx Reason 147:40–59MATHCrossRef Zhang C, Ding J, Zhan J, Li D (2022) Incomplete three-way multi-attribute group decision making based on adjustable multigranulation Pythagorean fuzzy probabilistic rough sets. Int J Approx Reason 147:40–59MATHCrossRef
9.
go back to reference Zhang C, Li D, Liang J (2020) Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation decision-theoretic rough sets over two universes. Inf Sci 507:665–683MATHCrossRef Zhang C, Li D, Liang J (2020) Multi-granularity three-way decisions with adjustable hesitant fuzzy linguistic multigranulation decision-theoretic rough sets over two universes. Inf Sci 507:665–683MATHCrossRef
10.
go back to reference Sun L, Wang X, Ding W, Xu J (2022) TSFNFR: Two-stage fuzzy neighborhood-based feature reduction with binary whale optimization algorithm for imbalanced data classification. Knowl Based Syst 256:109849CrossRef Sun L, Wang X, Ding W, Xu J (2022) TSFNFR: Two-stage fuzzy neighborhood-based feature reduction with binary whale optimization algorithm for imbalanced data classification. Knowl Based Syst 256:109849CrossRef
11.
go back to reference Sun L, Zhang J, Ding W, Xu J (2022) Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors. Inf Sci 593:591–613CrossRef Sun L, Zhang J, Ding W, Xu J (2022) Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors. Inf Sci 593:591–613CrossRef
12.
go back to reference Sun L, Wang L, Ding W, Qian Y, Xu J (2021) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst 29(1):19–33CrossRef Sun L, Wang L, Ding W, Qian Y, Xu J (2021) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Trans Fuzzy Syst 29(1):19–33CrossRef
13.
go back to reference Sun L, Wang W, Xu J, Zhang S (2019) Improved LLE and neighborhood rough sets-based gene selection using Lebesgue measure for cancer classification on gene expression data. J Intell Fuzzy Syst 37(4):5731–5742CrossRef Sun L, Wang W, Xu J, Zhang S (2019) Improved LLE and neighborhood rough sets-based gene selection using Lebesgue measure for cancer classification on gene expression data. J Intell Fuzzy Syst 37(4):5731–5742CrossRef
14.
go back to reference Zhang X, Jiang J (2022) Measurement, modeling, reduction of decision-theoretic multigranulation fuzzy rough sets based on three-way decisions. Inf Sci 607:1550–1582CrossRef Zhang X, Jiang J (2022) Measurement, modeling, reduction of decision-theoretic multigranulation fuzzy rough sets based on three-way decisions. Inf Sci 607:1550–1582CrossRef
15.
go back to reference Sun L, Xu J, Tian Y (2012) Feature selection using rough entropy-based uncertainty measures in incomplete decision systems. Knowl Based Syst 36:206–216CrossRef Sun L, Xu J, Tian Y (2012) Feature selection using rough entropy-based uncertainty measures in incomplete decision systems. Knowl Based Syst 36:206–216CrossRef
17.
go back to reference Xu W, Li W (2016) Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Trans Cybern 46(2):366–379CrossRef Xu W, Li W (2016) Granular computing approach to two-way learning based on formal concept analysis in fuzzy datasets. IEEE Trans Cybern 46(2):366–379CrossRef
19.
go back to reference Sun L, Yin T, Ding W, Qian Y, Xu J (2022) Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst 30(5):1197–1211CrossRef Sun L, Yin T, Ding W, Qian Y, Xu J (2022) Feature selection with missing labels using multilabel fuzzy neighborhood rough sets and maximum relevance minimum redundancy. IEEE Trans Fuzzy Syst 30(5):1197–1211CrossRef
20.
go back to reference Sun L, Wang L, Ding W, Qian Y, Xu J (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst 192:105373CrossRef Sun L, Wang L, Ding W, Qian Y, Xu J (2020) Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Knowl Based Syst 192:105373CrossRef
21.
go back to reference Sun L, Wang L, Qian Y, Xu J, Zhang S (2019) Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems. Knowl Based Syst 186:104942CrossRef Sun L, Wang L, Qian Y, Xu J, Zhang S (2019) Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems. Knowl Based Syst 186:104942CrossRef
22.
go back to reference Shu W, Qian W, Xie Y (2020) Incremental feature selection for dynamic hybrid data using neighborhood rough set. Knowl Based Syst 194:105516CrossRef Shu W, Qian W, Xie Y (2020) Incremental feature selection for dynamic hybrid data using neighborhood rough set. Knowl Based Syst 194:105516CrossRef
23.
go back to reference Chen Y, Chen Y (2021) Feature subset selection based on variable precision neighborhood rough sets. Int J Comput Intell Syst 14(1):572–581CrossRef Chen Y, Chen Y (2021) Feature subset selection based on variable precision neighborhood rough sets. Int J Comput Intell Syst 14(1):572–581CrossRef
24.
go back to reference Tan A, Wu W, Qian Y, Liang J, Chen J, Li J (2019) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539CrossRef Tan A, Wu W, Qian Y, Liang J, Chen J, Li J (2019) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539CrossRef
25.
go back to reference Zeng K, She K, Niu X (2013) Multi-granulation entropy and its applications. Entropy 15(6):2288–2302MATHCrossRef Zeng K, She K, Niu X (2013) Multi-granulation entropy and its applications. Entropy 15(6):2288–2302MATHCrossRef
26.
go back to reference Chen D, Yang Y (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334CrossRef Chen D, Yang Y (2014) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334CrossRef
27.
go back to reference Wang C, Shao M, He Q, Qian Y, Qi Y (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111:173–179CrossRef Wang C, Shao M, He Q, Qian Y, Qi Y (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl Based Syst 111:173–179CrossRef
28.
go back to reference Zhang X, Fan Y, Yang J (2021) Feature selection based on fuzzy-neighborhood relative decision entropy. Pattern Recogn Lett 146:100–107CrossRef Zhang X, Fan Y, Yang J (2021) Feature selection based on fuzzy-neighborhood relative decision entropy. Pattern Recogn Lett 146:100–107CrossRef
29.
go back to reference Xu J, Wang Y, Mu H, Huang F (2019) Feature genes selection based on fuzzy neighborhood conditional entropy. J Intell Fuzzy Syst 36(1):117–126CrossRef Xu J, Wang Y, Mu H, Huang F (2019) Feature genes selection based on fuzzy neighborhood conditional entropy. J Intell Fuzzy Syst 36(1):117–126CrossRef
31.
go back to reference Fan X, Chen H (2020) Stepwise optimized feature selection algorithm based on discernibility matrix and mRMR. Chin Comput Sci 47(1):87–95 Fan X, Chen H (2020) Stepwise optimized feature selection algorithm based on discernibility matrix and mRMR. Chin Comput Sci 47(1):87–95
32.
go back to reference Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67CrossRef Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67CrossRef
33.
go back to reference Tian M, Liang X, Fu X, Sun Y, Li Z (2021) Multi-subgroup particle swarm optimization with game probability selection. Chin Comput Sci 48(10):67–76 Tian M, Liang X, Fu X, Sun Y, Li Z (2021) Multi-subgroup particle swarm optimization with game probability selection. Chin Comput Sci 48(10):67–76
34.
go back to reference Sun L, Kong X, Xu J, Xue Z, Zhai R, Zhang S (2019) A hybrid gene selection method based on ReliefF and Ant Colony Optimization algorithm for tumor classification. Sci Rep 9:8978CrossRef Sun L, Kong X, Xu J, Xue Z, Zhai R, Zhang S (2019) A hybrid gene selection method based on ReliefF and Ant Colony Optimization algorithm for tumor classification. Sci Rep 9:8978CrossRef
35.
go back to reference Sanjoy C, Apu K, Ratul C, Moumita S (2021) An enhanced whale optimization algorithm for large scale optimization problems. Knowl Based Syst 233:107543CrossRef Sanjoy C, Apu K, Ratul C, Moumita S (2021) An enhanced whale optimization algorithm for large scale optimization problems. Knowl Based Syst 233:107543CrossRef
36.
go back to reference Zheng Y, Li Y, Wang G, Chen Y, Xu Q, Fan J, Cui X (2019) A novel hybrid algorithm for feature selection based on whale optimization algorithm. IEEE Access 7:14908–14923CrossRef Zheng Y, Li Y, Wang G, Chen Y, Xu Q, Fan J, Cui X (2019) A novel hybrid algorithm for feature selection based on whale optimization algorithm. IEEE Access 7:14908–14923CrossRef
37.
go back to reference Moorthy U, Gandhi U (2021) A novel optimal feature selection technique for medical data classification using ANOVA based whale optimization. J Ambient Intell Humaniz Comput 12:3527–3538CrossRef Moorthy U, Gandhi U (2021) A novel optimal feature selection technique for medical data classification using ANOVA based whale optimization. J Ambient Intell Humaniz Comput 12:3527–3538CrossRef
38.
go back to reference Tawhid M, Ibrahim A (2020) Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm. Int J Mach Learn Cybern 11(3):573–602CrossRef Tawhid M, Ibrahim A (2020) Feature selection based on rough set approach, wrapper approach, and binary whale optimization algorithm. Int J Mach Learn Cybern 11(3):573–602CrossRef
39.
go back to reference Wang S, Chen H (2020) Feature selection method based on rough sets and improved whale optimization algorithm. Chin Comput Sci 47(2):44–50 Wang S, Chen H (2020) Feature selection method based on rough sets and improved whale optimization algorithm. Chin Comput Sci 47(2):44–50
40.
go back to reference Sun L, Wang T, Ding W, Xu J, Tan A (2022) Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels. Int J Intell Syst 37:6773–6810CrossRef Sun L, Wang T, Ding W, Xu J, Tan A (2022) Two-stage-neighborhood-based multilabel classification for incomplete data with missing labels. Int J Intell Syst 37:6773–6810CrossRef
42.
go back to reference Fang B, Chen H, Wang S (2019) Feature selection algorithm based on rough sets and fruit fly optimization. Chin Comput Sci 46(7):157–164 Fang B, Chen H, Wang S (2019) Feature selection algorithm based on rough sets and fruit fly optimization. Chin Comput Sci 46(7):157–164
43.
go back to reference Sun L, Qin X, Ding W, Xu J (2022) Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing 473:159–181CrossRef Sun L, Qin X, Ding W, Xu J (2022) Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing 473:159–181CrossRef
44.
go back to reference Sun L, Qin X, Ding W, Xu J, Zhang S (2021) Density peaks clustering based on k-nearest neighbors and self-recommendation. Int J Mach Learn Cybern 12(7):1913–1938CrossRef Sun L, Qin X, Ding W, Xu J, Zhang S (2021) Density peaks clustering based on k-nearest neighbors and self-recommendation. Int J Mach Learn Cybern 12(7):1913–1938CrossRef
45.
go back to reference Sun L, Zhang X, Qian Y, Xu J, Zhang S, Tian Y (2019) Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl Intell 49(4):1245–1259CrossRef Sun L, Zhang X, Qian Y, Xu J, Zhang S, Tian Y (2019) Joint neighborhood entropy-based gene selection method with fisher score for tumor classification. Appl Intell 49(4):1245–1259CrossRef
46.
go back to reference Chen Y, Zhang Z, Zheng J, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68CrossRef Chen Y, Zhang Z, Zheng J, Ma Y, Xue Y (2017) Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 67:59–68CrossRef
47.
go back to reference Xu F, Miao D, Wei L (2009) Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl 57(6):1010–1017MATHCrossRef Xu F, Miao D, Wei L (2009) Fuzzy-rough attribute reduction via mutual information with an application to cancer classification. Comput Math Appl 57(6):1010–1017MATHCrossRef
48.
go back to reference Faramarzi A, Heidarinejad M, Stephens B, Mirjalili S (2020) Equilibrium optimizer: a novel optimization algorithm. Knowl Based Syst 191:105190CrossRef Faramarzi A, Heidarinejad M, Stephens B, Mirjalili S (2020) Equilibrium optimizer: a novel optimization algorithm. Knowl Based Syst 191:105190CrossRef
49.
go back to reference Faramaizi A, Heidarinejad M, Mirjalili S, Gandomi A (2020) Marine predators algorithm: a nature-inspired metaheuristic. Expert Syst Appl 152:113377CrossRef Faramaizi A, Heidarinejad M, Mirjalili S, Gandomi A (2020) Marine predators algorithm: a nature-inspired metaheuristic. Expert Syst Appl 152:113377CrossRef
50.
go back to reference Bozorgi S, Yazdani S (2019) IWOA: an improved whale optimization algorithm for optimization problems. J Comput Des Eng 6(3):243–259 Bozorgi S, Yazdani S (2019) IWOA: an improved whale optimization algorithm for optimization problems. J Comput Des Eng 6(3):243–259
53.
go back to reference Hu Q, Yu D, Xie Z (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423CrossRef Hu Q, Yu D, Xie Z (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423CrossRef
54.
go back to reference Sun L, Wang T, Ding W, Xu J, Lin Y (2021) Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification. Inf Sci 578:887–912CrossRef Sun L, Wang T, Ding W, Xu J, Lin Y (2021) Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification. Inf Sci 578:887–912CrossRef
55.
go back to reference Xu J, Shen K, Sun L (2022) Multi-label feature selection based on fuzzy neighborhood rough sets. Complex Intell Syst 8(3):2105–2129CrossRef Xu J, Shen K, Sun L (2022) Multi-label feature selection based on fuzzy neighborhood rough sets. Complex Intell Syst 8(3):2105–2129CrossRef
56.
go back to reference Sun L, Yin T, Ding W, Qian Y, Xu J (2020) Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424MATHCrossRef Sun L, Yin T, Ding W, Qian Y, Xu J (2020) Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems. Inf Sci 537:401–424MATHCrossRef
Metadata
Title
TSFNFS: two-stage-fuzzy-neighborhood feature selection with binary whale optimization algorithm
Authors
Lin Sun
Xinya Wang
Weiping Ding
Jiucheng Xu
Huili Meng
Publication date
05-10-2022
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 2/2023
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-022-01653-0

Other articles of this Issue 2/2023

International Journal of Machine Learning and Cybernetics 2/2023 Go to the issue