Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 5/2022

01-11-2021 | Original Article

A new two-stage hybrid feature selection algorithm and its application in Chinese medicine

Authors: Zhiqin Li, Jianqiang Du, Bin Nie, Wangping Xiong, Guoliang Xu, Jigen Luo

Published in: International Journal of Machine Learning and Cybernetics | Issue 5/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

High-dimensional small sample data are prone to the curse of dimensionality and overfitting and contain many irrelevant and redundant features. In order to solve these feature selection problems, a new Two-stage Hybrid Feature Selection Algorithm (Ts-HFSA) is proposed. The first stage uses the Filter method combined with the Wrapper method to adaptively remove irrelevant features. In the second stage, a De-redundancy Algorithm of Fusing Approximate Markov Blanket with L1 Regular Term (DA2MBL1) is used to solve the AMB’s problem of information loss when deleting redundant features and potential redundancy in the subset of features obtained by AMB. The experimental results on multiple UCI public data sets and datasets from the material foundation of Chinese medicine showed that the Ts-HFSA better deleted irrelevant features and redundant features, found smaller and higher quality feature subsets, and improved stability, indicating that it offers more advantages than AMB, FCBF, RF, GBDT, XGBoost, Lasso, and CI_AMB. Moreover, in the face of data of the material foundation of Chinese medicine, with higher feature dimensions and fewer sample sizes, Ts-HFSA performed better, which can also improve the precision of the model after greatly reducing the dimension. The results indicated that Ts-HFSA is an effective method for feature selection of high-dimensional small samples and an excellent research method for the material foundation of Chinese medicine.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Xu D, Zhang J, Xu H et al (2020) Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data. BMC Genomics 21(1):650MathSciNetCrossRef Xu D, Zhang J, Xu H et al (2020) Multi-scale supervised clustering-based feature selection for tumor classification and identification of biomarkers and targets on genomic data. BMC Genomics 21(1):650MathSciNetCrossRef
2.
go back to reference Zhang P, Gao W (2020) Feature selection considering uncertainty change ratio of the class label. Appl Soft Comput 95:106537CrossRef Zhang P, Gao W (2020) Feature selection considering uncertainty change ratio of the class label. Appl Soft Comput 95:106537CrossRef
3.
go back to reference Ferdinando DM, Sabrina S (2020) Balancing the user-driven feature selection and their incidence in the clustering structure formation. Appl Soft Comput 98:106854 Ferdinando DM, Sabrina S (2020) Balancing the user-driven feature selection and their incidence in the clustering structure formation. Appl Soft Comput 98:106854
4.
go back to reference Al-Rimy BAS, Maarof MA, Shaid SZM et al (2021) Redundancy coefficient gradual up-weighting-based mutual information feature selection technique for crypto-ransomware early detection. Fut Gener Comput Syst 115:641–658CrossRef Al-Rimy BAS, Maarof MA, Shaid SZM et al (2021) Redundancy coefficient gradual up-weighting-based mutual information feature selection technique for crypto-ransomware early detection. Fut Gener Comput Syst 115:641–658CrossRef
7.
go back to reference Zhou Y, Zhang W, Kang J et al (2020) A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Inf Sci 547:841–859MathSciNetCrossRef Zhou Y, Zhang W, Kang J et al (2020) A problem-specific non-dominated sorting genetic algorithm for supervised feature selection. Inf Sci 547:841–859MathSciNetCrossRef
8.
go back to reference Han M, Liu X (2012) Forward feature selection based on approximate Markov blanket. In: International conference on advances in neural networks. Springer Han M, Liu X (2012) Forward feature selection based on approximate Markov blanket. In: International conference on advances in neural networks. Springer
9.
go back to reference Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNetMATH Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNetMATH
10.
go back to reference Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 10th national conference on artificial intelligence, San Jose, CA, July 12–16. AAAI Press, pp 129–134 Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 10th national conference on artificial intelligence, San Jose, CA, July 12–16. AAAI Press, pp 129–134
12.
go back to reference Koller D, Sahami M (1996) Toward optimal feature selection. In: Proceedings of the 20th international conference on machine learning, Bari, Italy, pp 284–292 Koller D, Sahami M (1996) Toward optimal feature selection. In: Proceedings of the 20th international conference on machine learning, Bari, Italy, pp 284–292
13.
go back to reference Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRef Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRef
14.
go back to reference Reshef DN, Reshef YA, Finucane HK et al (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524CrossRef Reshef DN, Reshef YA, Finucane HK et al (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524CrossRef
15.
go back to reference Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef
17.
go back to reference Emary E, Zawbaa HM, Hassanien AE (2016) Binary ant lion approaches for feature selection. Neurocomputing 213:54–65CrossRef Emary E, Zawbaa HM, Hassanien AE (2016) Binary ant lion approaches for feature selection. Neurocomputing 213:54–65CrossRef
18.
go back to reference Zawbaa HM, Emary E (2018) Applications of flower pollination algorithm in feature selection and knapsack problems. Nat Inspired Algorithms Appl Optim 744:217–243CrossRef Zawbaa HM, Emary E (2018) Applications of flower pollination algorithm in feature selection and knapsack problems. Nat Inspired Algorithms Appl Optim 744:217–243CrossRef
19.
go back to reference Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453CrossRef Mafarja M, Mirjalili S (2018) Whale optimization approaches for wrapper feature selection. Appl Soft Comput 62:441–453CrossRef
20.
go back to reference Dhiman G, Oliva D, Kaur A et al (2021) BEPO: A novel binary emperor penguin optimizer for automatic feature selection. Knowl Based Syst 211:106560CrossRef Dhiman G, Oliva D, Kaur A et al (2021) BEPO: A novel binary emperor penguin optimizer for automatic feature selection. Knowl Based Syst 211:106560CrossRef
22.
go back to reference Zhang J et al (2019) A new hybrid filter/wrapper algorithm for feature selection in classification. Anal Chim Acta 1080:43–54CrossRef Zhang J et al (2019) A new hybrid filter/wrapper algorithm for feature selection in classification. Anal Chim Acta 1080:43–54CrossRef
23.
go back to reference Huang Z, Yang C, Zhou X et al (2019) A hybrid feature selection method based on binary state transition algorithm and ReliefF. IEEE J Biomed Health Inform 23:1888–1898CrossRef Huang Z, Yang C, Zhou X et al (2019) A hybrid feature selection method based on binary state transition algorithm and ReliefF. IEEE J Biomed Health Inform 23:1888–1898CrossRef
24.
go back to reference Guo W, Li B, Shen S et al (2019) An intelligent grinding burn detection system based on two-stage feature selection and stacked sparse autoencoder. Int J Adv Manuf Technol 103:2837–2847CrossRef Guo W, Li B, Shen S et al (2019) An intelligent grinding burn detection system based on two-stage feature selection and stacked sparse autoencoder. Int J Adv Manuf Technol 103:2837–2847CrossRef
25.
go back to reference Chaudhuri A, Sahu TP (2021) A hybrid feature selection method based on Binary Jaya algorithm for micro-array data classification. Comput Electr Eng 90:106963CrossRef Chaudhuri A, Sahu TP (2021) A hybrid feature selection method based on Binary Jaya algorithm for micro-array data classification. Comput Electr Eng 90:106963CrossRef
26.
go back to reference Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classiflcation. Pattern Recogn 39(12):2383–2392CrossRef Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classiflcation. Pattern Recogn 39(12):2383–2392CrossRef
27.
go back to reference Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14CrossRef Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14CrossRef
28.
go back to reference Zhang L, Wang C, Guo W (2018) A feature selection algorithm for maximum relevance minimum redundancy using approximate Markov blanket. J Xi’an Jiaotong Univ 52(10):147–151 Zhang L, Wang C, Guo W (2018) A feature selection algorithm for maximum relevance minimum redundancy using approximate Markov blanket. J Xi’an Jiaotong Univ 52(10):147–151
29.
go back to reference Zhang Y, Zhang Z (2012) Feature subset selection with cumulate conditional mutual information minimization. Expert Syst Appl 39(5):6078–6088CrossRef Zhang Y, Zhang Z (2012) Feature subset selection with cumulate conditional mutual information minimization. Expert Syst Appl 39(5):6078–6088CrossRef
32.
go back to reference Hua Z, Zhou J, Hua Y et al (2020) Strong approximate Markov blanket and its application on filter-based feature selection. Appl Soft Comput 87:105957CrossRef Hua Z, Zhou J, Hua Y et al (2020) Strong approximate Markov blanket and its application on filter-based feature selection. Appl Soft Comput 87:105957CrossRef
33.
go back to reference Davies S, Russell S (1994) NP-completeness of searches for smallest possible feature sets. In: Proceedings of the 1994 AAAI fall symposium on relevance, pp 37–39 Davies S, Russell S (1994) NP-completeness of searches for smallest possible feature sets. In: Proceedings of the 1994 AAAI fall symposium on relevance, pp 37–39
34.
go back to reference Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol 58(1):267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol 58(1):267–288MathSciNetMATH
36.
go back to reference Shao F, Li K, Xu X (2016) Railway accidents analysis based on the improved algorithm of the maximal information coefficient. Intell Data Anal 20(3):597–613CrossRef Shao F, Li K, Xu X (2016) Railway accidents analysis based on the improved algorithm of the maximal information coefficient. Intell Data Anal 20(3):597–613CrossRef
37.
go back to reference Sun G, Song Z, Liu J et al (2017) Feature selection method based on maximum information coefficient and approximate Markov blanket. Acta Automatica Sinica 43(5):795–805 Sun G, Song Z, Liu J et al (2017) Feature selection method based on maximum information coefficient and approximate Markov blanket. Acta Automatica Sinica 43(5):795–805
Metadata
Title
A new two-stage hybrid feature selection algorithm and its application in Chinese medicine
Authors
Zhiqin Li
Jianqiang Du
Bin Nie
Wangping Xiong
Guoliang Xu
Jigen Luo
Publication date
01-11-2021
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 5/2022
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-021-01445-y

Other articles of this Issue 5/2022

International Journal of Machine Learning and Cybernetics 5/2022 Go to the issue