Skip to main content
Log in

A hybrid feature selection scheme for mixed attributes data

  • Published:
Computational and Applied Mathematics Aims and scope Submit manuscript

Abstract

Feature selection aims at reducing the number of features in many applications. Existing feature selection approaches mainly deals with classification problems with continuous or discrete attributes. However, data usually come with mixed attributes in real-world applications. In this paper, a hybrid feature selection (HFS) scheme is proposed to deal with mixed attributes data. Firstly, a new correlation measure between mixed attributes is defined by giving a model for calculating mutual information between continuous and discrete attributes; secondly, the features are evaluated by a filter model with the new correlation measure; finally, feature selection is done by optimizing the parameter in the filter model with estimation accuracy criterion. Experimental results show that HFS acquires better stability and estimation accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abe S (2005) Modified backward feature selection by cross validation. In: Proceedings of the European symposium on artificial neural networks, pp 163–168

  • Amiri F, Yousefi MR, Lucas C, Shakery A, Yazdani N (2011) Mutual information-based feature selection for intrusion detection systems. J Netw Comput Appl 34:1184–1199

    Article  Google Scholar 

  • Cakır S, Aytac T, Yilm A (2011) Classifier-based offline feature selection and evaluation for visual tracking of sea-surface and aerial targets. Opt Eng 50(10):1–13

    Google Scholar 

  • Chiang LH, Pell RJ (2004) Genetic algorithms combined with discriminant analysis for key variable identification. J Process Control 14(2):143–155

    Article  Google Scholar 

  • Chiu NH, Huang SJ (2007) The adjusted analogy-based software effort estimation based on similarity distances. J Syst Softw 80:628–640

    Article  Google Scholar 

  • Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156

    Article  Google Scholar 

  • Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176

    Article  MathSciNet  MATH  Google Scholar 

  • Ferreira A, Figueiredo M (2011) Unsupervised joint feature discretization and selection. Lect Notes Comput Sci 6669:200–207

    Article  MathSciNet  Google Scholar 

  • Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555

    MathSciNet  MATH  Google Scholar 

  • Guan S, Liu J, Qi Y (2004) An incremental approach to contribution-based feature selection. J Intell Syst 13(1)

  • Guyon I, Weston J, Barnhill S et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422

    Article  MATH  Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  • Hsu C, Huang H, Schuschel D (2002) The ANNIGMA-wrapper approach to fast feature selection for neural nets. IEEE Trans Syst Man Cybern Part B Cybern 32(2):207–212

    Article  Google Scholar 

  • Hsu WH (2004) Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf Sci 163(17):103–122

    Article  Google Scholar 

  • Hsu HH, Hsieh CW, Lu MD (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38:8144–8150

    Article  Google Scholar 

  • Hu QH, Zhao H, YU DR (2008b) Efficient symbolic and numerical attribute reduction with neighborhood rough sets. PR & AI 21(6):732–738 (In Chinese)

  • Hu QH, Yu DR, Xie ZX (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423

    Article  Google Scholar 

  • Hu QH, Liu JF, Yu DR (2008a) Mixed feature selection based on granulation and approximation. Knowl Based Syst 21:294–304

    Article  Google Scholar 

  • Hu QH, Che XJ, Zhang L, Yu DR (2010) Feature evaluation and selection based on neighborhood soft margin. Neurocomputing 73:2114–2124

    Article  Google Scholar 

  • Hua QH, Zhang L, Zhang D, Pan W, An S, Pedrycz W (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38:10737–10750

    Article  Google Scholar 

  • Jørgensen M, Indahl U, Sjøberg D (2003) Software effort estimation by analogy and regression toward the mean. J Syst Softw 68:253–262

    Article  Google Scholar 

  • Ke L, Feng Z, Ren Z (2008) An efficient ant colony optimization approach to attribute reduction in rough set theory. Pattern Recogn Lett 29:1351–1357

    Article  Google Scholar 

  • Kwak N, Choi CH (2002) Feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159

    Article  Google Scholar 

  • Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671

    Article  Google Scholar 

  • Li YF, Xie M, Goh TN (2009) A study of mutual information based feature selection for case based reasoning in software cost estimation. Expert Syst Appl 36:5921–5931

    Article  Google Scholar 

  • Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, Boston

    Book  MATH  Google Scholar 

  • Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    Article  Google Scholar 

  • Liu JK, Sun FC (2007) A novel dynamic terminal sliding mode control of uncertain nonlinear systems. J Control Theory Appl 5(2):189–193

    Google Scholar 

  • Mair C, Kadoda G, Lefley M, Phalp K, Schofield C (2000) An investigation of machine learning based prediction systems. J Syst Softw 53:23–29

    Article  Google Scholar 

  • Mao Y, Zhou XB, Xia Z, Yin Z, Sun YX (2007) A Survey of study of feature selection algorithms. PR & AI 20(2):211–218 (In Chinese)

    Google Scholar 

  • Maxwell K (2002) Applied statistics for software managers. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  • Michalak K, Kwasnicka H (2006) Correlation-based feature selection strategy in classification problems. Int J Appl Math Comput Sci 16(4):503–511

    MathSciNet  MATH  Google Scholar 

  • Monirul Kabir Md., Monirul Islam Md., Murase K (2010) A new wrapper feature selection approach using neural network. Neurocomputing 73:3273–3283

    Google Scholar 

  • Monirul Kabir Md., Shahjahan Md., Murase K (2011) A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74:2914–2928

    Google Scholar 

  • Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B Cybern 36(1):106–117

    Article  Google Scholar 

  • Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26(11):1424–1437

    Article  Google Scholar 

  • Ooi CH, Tan P (2003) Genetic algorithm applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1):37–44

    Article  Google Scholar 

  • Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  • Schaffernicht E, Kaltenhaeuser R, Verma SS, Gross HM (2010) On estimating mutual information for feature selection. Lect Notes Comput Sci 6325:362–367

    Article  Google Scholar 

  • Sentas P, Angelis L, Stamelos I, Bleris G (2005) Software productivity and effort prediction with ordinal regression. Inf Softw Technol 47:17–29

    Article  Google Scholar 

  • Sheng LD (2000) Introduction to pattern recognition. Beijing University of Posts and Telecommunications Press, Beijing (In Chinese)

    Google Scholar 

  • Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26(7):900–912

    Article  Google Scholar 

  • Sun ZH, Bebis G, Miller R (2004) Object detection using feature subset selection. Pattern Recogn 37(11):2165–2176

    Article  Google Scholar 

  • Sun Y (2007) Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 29(6):1035–1051

    Article  Google Scholar 

  • Verikas A, Bacauskiene M (2002) Feature selection with neural networks. Pattern Recogn Lett 23:1323–1335

    Article  MATH  Google Scholar 

  • Wang L, Zhou N, Chu F (2008) A general wrapper approach to selection of class-dependent features. IEEE Trans Neural Netw 19(7):1267–1278

    Article  Google Scholar 

  • Yang Y, Liao YX, Meng G, Lee J (2011) A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst Appl 38:11311–11320

    Article  Google Scholar 

  • Yao X, Wang XD, Zhang YX, Quan W (2012)Summary of feature selection algorithms. Control Decision 27(2):161–166 (In Chinese)

    Google Scholar 

  • Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5(1):1205–1224

    MATH  Google Scholar 

  • Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 49(11):3236–3248

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haitao Liu.

Additional information

Communicated by José Mario Martinez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, H., Wei, R. & Jiang, G. A hybrid feature selection scheme for mixed attributes data. Comp. Appl. Math. 32, 145–161 (2013). https://doi.org/10.1007/s40314-013-0019-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40314-013-0019-5

Keywords

Navigation