Skip to main content
Top
Published in: Neural Computing and Applications 10/2019

02-04-2018 | Original Article

Feature selection based on correlation deflation

Authors: Si-Bao Chen, Chris H. Q. Ding, Zhi-Li Zhou, Bin Luo

Published in: Neural Computing and Applications | Issue 10/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Feature selection is very important in many machine learning and data mining applications. In this paper, a simple and effective correlation-deflation-based feature selection method is proposed. The objective function of residual minimization constrained by \(L_{2,0}\)-norm is proved to be equivalent to maximizing sum of square of correlations between class labels and features. Then the whole procedure of correlation-deflation-based feature selection turns into selecting features out one-by-one by deflating correlations. Experiments on several public benchmark data sets show that the proposed method has better residual reduction and classification performance than many state-of-the-art feature selection methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Alon U, Barkai N, Notterman D et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRef Alon U, Barkai N, Notterman D et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750CrossRef
2.
go back to reference Backer E, Schipper JAD (1977) On the max–min approach for feature ordering and selection. In: The seminar on pattern recognition, Liege Univ, Liege, Belgium Backer E, Schipper JAD (1977) On the max–min approach for feature ordering and selection. In: The seminar on pattern recognition, Liege Univ, Liege, Belgium
3.
go back to reference Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550CrossRef Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5:537–550CrossRef
4.
go back to reference Bhattacharjee A, Richards W, Staunton J et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 98(24):13790–13795CrossRef Bhattacharjee A, Richards W, Staunton J et al (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 98(24):13790–13795CrossRef
5.
go back to reference Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice-Hall, LondonMATH Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice-Hall, LondonMATH
6.
go back to reference Ding CHQ, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Computat Biol 3(2):185–206CrossRef Ding CHQ, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Computat Biol 3(2):185–206CrossRef
7.
go back to reference Ding CHQ, Zhou D, He X, Zha H (2006) R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, Pittsburgh, PA, USA, pp 281–288 Ding CHQ, Zhou D, He X, Zha H (2006) R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, Pittsburgh, PA, USA, pp 281–288
8.
go back to reference Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87MathSciNetCrossRef Dudoit S, Fridlyand J, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97(457):77–87MathSciNetCrossRef
9.
go back to reference Fang X, Xu Y, Li X, Fan Z, Liu H, Chen Y (2014) Locality and similarity preserving embedding for feature selection. Neurocomputing 128:304–315CrossRef Fang X, Xu Y, Li X, Fan Z, Liu H, Chen Y (2014) Locality and similarity preserving embedding for feature selection. Neurocomputing 128:304–315CrossRef
10.
go back to reference Golub T, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRef Golub T, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRef
12.
go back to reference Gu B, Sheng VS, Tay KY, Romano W, Li S (2015a) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416MathSciNetCrossRef Gu B, Sheng VS, Tay KY, Romano W, Li S (2015a) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416MathSciNetCrossRef
13.
go back to reference Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015b) Incremental learning for v-support vector regression. Neural Netw 67:140–150CrossRef Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015b) Incremental learning for v-support vector regression. Neural Netw 67:140–150CrossRef
15.
go back to reference Guyon I (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
16.
go back to reference Huang D, Chow TW (2005) Effective feature selection scheme using mutual information. Neurocomputing 63:325–343CrossRef Huang D, Chow TW (2005) Effective feature selection scheme using mutual information. Neurocomputing 63:325–343CrossRef
17.
go back to reference Jain A, Zongker D (1997) Feature selection: evaluation, application and small sample performance. IEEE Trans Pattern Anal Machine Intell 19(2):153–158CrossRef Jain A, Zongker D (1997) Feature selection: evaluation, application and small sample performance. IEEE Trans Pattern Anal Machine Intell 19(2):153–158CrossRef
18.
go back to reference Khan J, Wei JS, Ringner M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679CrossRef Khan J, Wei JS, Ringner M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7(6):673–679CrossRef
19.
go back to reference Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, ML92, pp 249–256 Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, ML92, pp 249–256
20.
go back to reference Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324CrossRef Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324CrossRef
21.
go back to reference Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning, pp 171–182 Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning, pp 171–182
22.
go back to reference Langley P (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall symposium on relevance, pp 140–144 Langley P (1994) Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall symposium on relevance, pp 140–144
23.
go back to reference Li Q, Xie B, You J, Bian W, Tao D (2016) Correlated logistic model with elastic net regularization for multilabel image classification. IEEE Trans Image Process 25(8):3801–3813MathSciNetCrossRef Li Q, Xie B, You J, Bian W, Tao D (2016) Correlated logistic model with elastic net regularization for multilabel image classification. IEEE Trans Image Process 25(8):3801–3813MathSciNetCrossRef
24.
go back to reference Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, NorwellCrossRef Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer, NorwellCrossRef
25.
go back to reference Liu H, Liu L, Zhang H (2009) Boosting feature selection using information metric for classification. Neurocomputing 73(1–3):295–303CrossRef Liu H, Liu L, Zhang H (2009) Boosting feature selection using information metric for classification. Neurocomputing 73(1–3):295–303CrossRef
26.
go back to reference Ma S, Song X, Huang J (2007) Supervised group lasso with applications to microarray data analysis. BMC Bioinform 8:60CrossRef Ma S, Song X, Huang J (2007) Supervised group lasso with applications to microarray data analysis. BMC Bioinform 8:60CrossRef
27.
go back to reference Mao KZ (2002) Fast orthogonal forward selection algorithm for feature subset selection. IEEE Trans Neural Netw 13(5):1218–1224CrossRef Mao KZ (2002) Fast orthogonal forward selection algorithm for feature subset selection. IEEE Trans Neural Netw 13(5):1218–1224CrossRef
28.
go back to reference Mao KZ (2004) Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Trans Syst Man Cybern Part B 34(1):629–634CrossRef Mao KZ (2004) Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Trans Syst Man Cybern Part B 34(1):629–634CrossRef
29.
go back to reference Ng AY (2004) Feature selection, \(l_1\) vs. \(l_2\) regularization, and rotational invariance. In: ICML Ng AY (2004) Feature selection, \(l_1\) vs. \(l_2\) regularization, and rotational invariance. In: ICML
30.
go back to reference Nie F, Huang H, Cai X, Ding CHQ (2010) Efficient and robust feature selection via joint \(l_{2,1}\)-norms minimization. In: Advances in neural information processing systems, pp 1813–1821 Nie F, Huang H, Cai X, Ding CHQ (2010) Efficient and robust feature selection via joint \(l_{2,1}\)-norms minimization. In: Advances in neural information processing systems, pp 1813–1821
31.
go back to reference Nutt C, Mani D, Betensky R et al (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63(7):1602–1607 Nutt C, Mani D, Betensky R et al (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63(7):1602–1607
32.
go back to reference Pan Z, Jin P, Lei J et al (2016) Fast reference frame selection based on content similarity for low complexity HEVC encoder. J Vis Commun Image Represent 40(Part B):516–524CrossRef Pan Z, Jin P, Lei J et al (2016) Fast reference frame selection based on content similarity for low complexity HEVC encoder. J Vis Commun Image Represent 40(Part B):516–524CrossRef
33.
go back to reference Pan Z, Zhang Y, Kwong S (2015) Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans Broadcast 61(2):166–176CrossRef Pan Z, Zhang Y, Kwong S (2015) Efficient motion and disparity estimation optimization for low complexity multiview video coding. IEEE Trans Broadcast 61(2):166–176CrossRef
34.
go back to reference Pan Z, Lei J, Zhang Y, Sun X, Kwong S (2016) Fast motion estimation based on content property for low-complexity H265 HEVC encoder. IEEE Trans Broadcast 62(3):675–684CrossRef Pan Z, Lei J, Zhang Y, Sun X, Kwong S (2016) Fast motion estimation based on content property for low-complexity H265 HEVC encoder. IEEE Trans Broadcast 62(3):675–684CrossRef
35.
go back to reference Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef
36.
go back to reference Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125CrossRef Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125CrossRef
37.
go back to reference Raileanu LE, Stoffel K (2004) Theoretical comparison between the Gini index and information gain criteria. Ann Math Artif Intell 41(1):77–93MathSciNetCrossRef Raileanu LE, Stoffel K (2004) Theoretical comparison between the Gini index and information gain criteria. Ann Math Artif Intell 41(1):77–93MathSciNetCrossRef
38.
go back to reference Skalak DB (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: ICML, NJ, USA, pp 293–301 Skalak DB (1994) Prototype and feature selection by sampling and random mutation hill climbing algorithms. In: ICML, NJ, USA, pp 293–301
39.
go back to reference Su A, Welsh J, Sapinoso L et al (2001) Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res 61(20):7388–7393 Su A, Welsh J, Sapinoso L et al (2001) Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res 61(20):7388–7393
40.
go back to reference Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288MathSciNetMATH
41.
go back to reference Wei D, Li S, Tan M (2012) Graph embedding based feature selection. Neurocomputing 93:115–125CrossRef Wei D, Li S, Tan M (2012) Graph embedding based feature selection. Neurocomputing 93:115–125CrossRef
42.
go back to reference Wei H, Billings S (2007) Feature subset selection and ranking for data dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):162–166CrossRef Wei H, Billings S (2007) Feature subset selection and ranking for data dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):162–166CrossRef
43.
go back to reference Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed Tools Appl 75(4):1947–1962CrossRef Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimed Tools Appl 75(4):1947–1962CrossRef
44.
go back to reference Xuan P, Guo MZ, Wang J, Liu XY, Liu Y (2011) Genetic algorithm-based efficient feature selection for classification of pre-mirnas. Genet Mol Res 10(2):588–603CrossRef Xuan P, Guo MZ, Wang J, Liu XY, Liu Y (2011) Genetic algorithm-based efficient feature selection for classification of pre-mirnas. Genet Mol Res 10(2):588–603CrossRef
46.
go back to reference Yang K, Cai Z, Li J, Lin G (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7:228CrossRef Yang K, Cai Z, Li J, Lin G (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7:228CrossRef
47.
go back to reference Yuan C, Sun X, R LV (2016) Fingerprint liveness detection based on multi-scale LPQ and PCA. China Commun 13(7):60–65CrossRef Yuan C, Sun X, R LV (2016) Fingerprint liveness detection based on multi-scale LPQ and PCA. China Commun 13(7):60–65CrossRef
48.
go back to reference Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67MathSciNetCrossRef Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68(1):49–67MathSciNetCrossRef
49.
go back to reference Zhang J, Yu J, Wan J, Zeng Z (2015) L2,1-norm regularized fisher criterion for optimal feature selection. Neurocomputing 166:455–463CrossRef Zhang J, Yu J, Wan J, Zeng Z (2015) L2,1-norm regularized fisher criterion for optimal feature selection. Neurocomputing 166:455–463CrossRef
50.
go back to reference Zhang M, Ding CHQ, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: Proceedings of the 28th AAAI, Québec, Canada, pp 1355–1361 Zhang M, Ding CHQ, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: Proceedings of the 28th AAAI, Québec, Canada, pp 1355–1361
51.
go back to reference Zhao G, Wu Y, Chen F, Zhang J, Bai J (2015) Effective feature selection using feature vector graph for classification. Neurocomputing 151:376–389CrossRef Zhao G, Wu Y, Chen F, Zhang J, Bai J (2015) Effective feature selection using feature vector graph for classification. Neurocomputing 151:376–389CrossRef
52.
Metadata
Title
Feature selection based on correlation deflation
Authors
Si-Bao Chen
Chris H. Q. Ding
Zhi-Li Zhou
Bin Luo
Publication date
02-04-2018
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 10/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3467-4

Other articles of this Issue 10/2019

Neural Computing and Applications 10/2019 Go to the issue

Premium Partner