Abstract
Feature selection aims at reducing the number of features in many applications. Existing feature selection approaches mainly deals with classification problems with continuous or discrete attributes. However, data usually come with mixed attributes in real-world applications. In this paper, a hybrid feature selection (HFS) scheme is proposed to deal with mixed attributes data. Firstly, a new correlation measure between mixed attributes is defined by giving a model for calculating mutual information between continuous and discrete attributes; secondly, the features are evaluated by a filter model with the new correlation measure; finally, feature selection is done by optimizing the parameter in the filter model with estimation accuracy criterion. Experimental results show that HFS acquires better stability and estimation accuracy.
Similar content being viewed by others
References
Abe S (2005) Modified backward feature selection by cross validation. In: Proceedings of the European symposium on artificial neural networks, pp 163–168
Amiri F, Yousefi MR, Lucas C, Shakery A, Yazdani N (2011) Mutual information-based feature selection for intrusion detection systems. J Netw Comput Appl 34:1184–1199
Cakır S, Aytac T, Yilm A (2011) Classifier-based offline feature selection and evaluation for visual tracking of sea-surface and aerial targets. Opt Eng 50(10):1–13
Chiang LH, Pell RJ (2004) Genetic algorithms combined with discriminant analysis for key variable identification. J Process Control 14(2):143–155
Chiu NH, Huang SJ (2007) The adjusted analogy-based software effort estimation based on similarity distances. J Syst Softw 80:628–640
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176
Ferreira A, Figueiredo M (2011) Unsupervised joint feature discretization and selection. Lect Notes Comput Sci 6669:200–207
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
Guan S, Liu J, Qi Y (2004) An incremental approach to contribution-based feature selection. J Intell Syst 13(1)
Guyon I, Weston J, Barnhill S et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hsu C, Huang H, Schuschel D (2002) The ANNIGMA-wrapper approach to fast feature selection for neural nets. IEEE Trans Syst Man Cybern Part B Cybern 32(2):207–212
Hsu WH (2004) Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf Sci 163(17):103–122
Hsu HH, Hsieh CW, Lu MD (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38:8144–8150
Hu QH, Zhao H, YU DR (2008b) Efficient symbolic and numerical attribute reduction with neighborhood rough sets. PR & AI 21(6):732–738 (In Chinese)
Hu QH, Yu DR, Xie ZX (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423
Hu QH, Liu JF, Yu DR (2008a) Mixed feature selection based on granulation and approximation. Knowl Based Syst 21:294–304
Hu QH, Che XJ, Zhang L, Yu DR (2010) Feature evaluation and selection based on neighborhood soft margin. Neurocomputing 73:2114–2124
Hua QH, Zhang L, Zhang D, Pan W, An S, Pedrycz W (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst Appl 38:10737–10750
Jørgensen M, Indahl U, Sjøberg D (2003) Software effort estimation by analogy and regression toward the mean. J Syst Softw 68:253–262
Ke L, Feng Z, Ren Z (2008) An efficient ant colony optimization approach to attribute reduction in rough set theory. Pattern Recogn Lett 29:1351–1357
Kwak N, Choi CH (2002) Feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159
Kwak N, Choi CH (2002) Input feature selection by mutual information based on Parzen window. IEEE Trans Pattern Anal Mach Intell 24(12):1667–1671
Li YF, Xie M, Goh TN (2009) A study of mutual information based feature selection for case based reasoning in software cost estimation. Expert Syst Appl 36:5921–5931
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, Boston
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Liu JK, Sun FC (2007) A novel dynamic terminal sliding mode control of uncertain nonlinear systems. J Control Theory Appl 5(2):189–193
Mair C, Kadoda G, Lefley M, Phalp K, Schofield C (2000) An investigation of machine learning based prediction systems. J Syst Softw 53:23–29
Mao Y, Zhou XB, Xia Z, Yin Z, Sun YX (2007) A Survey of study of feature selection algorithms. PR & AI 20(2):211–218 (In Chinese)
Maxwell K (2002) Applied statistics for software managers. Prentice-Hall, Englewood Cliffs
Michalak K, Kwasnicka H (2006) Correlation-based feature selection strategy in classification problems. Int J Appl Math Comput Sci 16(4):503–511
Monirul Kabir Md., Monirul Islam Md., Murase K (2010) A new wrapper feature selection approach using neural network. Neurocomputing 73:3273–3283
Monirul Kabir Md., Shahjahan Md., Murase K (2011) A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74:2914–2928
Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B Cybern 36(1):106–117
Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26(11):1424–1437
Ooi CH, Tan P (2003) Genetic algorithm applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1):37–44
Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Schaffernicht E, Kaltenhaeuser R, Verma SS, Gross HM (2010) On estimating mutual information for feature selection. Lect Notes Comput Sci 6325:362–367
Sentas P, Angelis L, Stamelos I, Bleris G (2005) Software productivity and effort prediction with ordinal regression. Inf Softw Technol 47:17–29
Sheng LD (2000) Introduction to pattern recognition. Beijing University of Posts and Telecommunications Press, Beijing (In Chinese)
Somol P, Pudil P, Kittler J (2004) Fast branch & bound algorithms for optimal feature selection. IEEE Trans Pattern Anal Mach Intell 26(7):900–912
Sun ZH, Bebis G, Miller R (2004) Object detection using feature subset selection. Pattern Recogn 37(11):2165–2176
Sun Y (2007) Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 29(6):1035–1051
Verikas A, Bacauskiene M (2002) Feature selection with neural networks. Pattern Recogn Lett 23:1323–1335
Wang L, Zhou N, Chu F (2008) A general wrapper approach to selection of class-dependent features. IEEE Trans Neural Netw 19(7):1267–1278
Yang Y, Liao YX, Meng G, Lee J (2011) A hybrid feature selection scheme for unsupervised learning and its application in bearing fault diagnosis. Expert Syst Appl 38:11311–11320
Yao X, Wang XD, Zhang YX, Quan W (2012)Summary of feature selection algorithms. Control Decision 27(2):161–166 (In Chinese)
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5(1):1205–1224
Zhu Z, Ong YS, Dash M (2007) Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn 49(11):3236–3248
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by José Mario Martinez.
Rights and permissions
About this article
Cite this article
Liu, H., Wei, R. & Jiang, G. A hybrid feature selection scheme for mixed attributes data. Comp. Appl. Math. 32, 145–161 (2013). https://doi.org/10.1007/s40314-013-0019-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40314-013-0019-5