Skip to main content
Top
Published in: International Journal of Machine Learning and Cybernetics 3/2014

01-06-2014 | Original Article

Comparative analysis on margin based feature selection algorithms

Authors: Pan Wei, Peijun Ma, Qinghua Hu, Xiaohong Su, Chaoqi Ma

Published in: International Journal of Machine Learning and Cybernetics | Issue 3/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Feature evaluation and selection is an important preprocessing step in classification and regression learning. As large quantity of irrelevant information is gathered, selecting the most informative features may help users to understand the task, and enhance the performance of the models. Margin has been widely accepted and used in evaluating feature quality these years. A collection of feature selection algorithms were developed using margin based loss functions and various search strategies. However, there is no comparative research conducted to study the effectiveness of these algorithms. In this work, we compare 14 margin based feature selections from the viewpoints of reduction capability, classification performance of reduced data and robustness, where four margin based loss functions and three search strategies are considered. Moreover, we also compare these techniques with two well-known margin based feature selection algorithms ReliefF and Simba. The derived conclusions give some guidelines for selecting features in practical applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Literature
1.
go back to reference Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRef Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRef
2.
go back to reference Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering—a filter solution. In: Proceeding of second international conference on data mining, pp 115–122 Dash M, Choi K, Scheuermann P, Liu H (2002) Feature selection for clustering—a filter solution. In: Proceeding of second international conference on data mining, pp 115–122
3.
go back to reference Tong DL, Mintram R (2010) Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cybern 1(1–4):75-87 Tong DL, Mintram R (2010) Genetic Algorithm-Neural Network (GANN): a study of neural network activation functions and depth of genetic algorithm search applied to feature selection. Int J Mach Learn Cybern 1(1–4):75-87
4.
go back to reference Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134 Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134
5.
go back to reference Sharma A, Imoto S, Miyano S, Sharma V (2012) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern 3(4):269–276 Sharma A, Imoto S, Miyano S, Sharma V (2012) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern 3(4):269–276
6.
go back to reference Pal M (2009) Margin-based feature selection for hyperspectral data. Int J Appl Earth Observ Geoinf 11:212–220CrossRef Pal M (2009) Margin-based feature selection for hyperspectral data. Int J Appl Earth Observ Geoinf 11:212–220CrossRef
7.
go back to reference Liu CL, Jaeger S, Masaki N (2004) Offline recognition of Chinese characters: the state of art. IEEE Trans Pattern Anal Mach Intell 26:198–213CrossRef Liu CL, Jaeger S, Masaki N (2004) Offline recognition of Chinese characters: the state of art. IEEE Trans Pattern Anal Mach Intell 26:198–213CrossRef
8.
go back to reference Verron S, Tiplica T, Kobi A (2008) Fault detection and identification with a new feature selection based on mutual information. J Process Control 5:479–490CrossRef Verron S, Tiplica T, Kobi A (2008) Fault detection and identification with a new feature selection based on mutual information. J Process Control 5:479–490CrossRef
9.
go back to reference Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 11:1424–1437 Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 11:1424–1437
10.
go back to reference Hall MA (1999) Correlation-based feature subset selection for machine learning, Hamilton, pp 7–45 Hall MA (1999) Correlation-based feature subset selection for machine learning, Hamilton, pp 7–45
12.
go back to reference Wang X, Dong C (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567 Wang X, Dong C (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567
13.
go back to reference Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238CrossRef
14.
go back to reference Huang D, Chow TWS (2005) Effective feature selection scheme using mutual information. Neurocomputing 63:325–343CrossRef Huang D, Chow TWS (2005) Effective feature selection scheme using mutual information. Neurocomputing 63:325–343CrossRef
15.
go back to reference Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data-with application to face recognition. Pattern Recognit 34:2067–2070CrossRefMATH Yu H, Yang J (2001) A direct LDA algorithm for high-dimensional data-with application to face recognition. Pattern Recognit 34:2067–2070CrossRefMATH
16.
go back to reference Witold M (1981) On an extended Fisher criterion for feature selection. IEEE Trans Pattern Anal Mach Intell 5:611–614 Witold M (1981) On an extended Fisher criterion for feature selection. IEEE Trans Pattern Anal Mach Intell 5:611–614
17.
go back to reference Hu Q, Yu D, Pedrycz W, Chen D (2011) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 11:1649–1667CrossRef Hu Q, Yu D, Pedrycz W, Chen D (2011) Kernelized fuzzy rough sets and their applications. IEEE Trans Knowl Data Eng 11:1649–1667CrossRef
18.
go back to reference He Q, Xie Z, Hu Q, Wu C (2011) Boundary instance selection based on neighborhood model. Neurocomputing 10:1585–1594CrossRef He Q, Xie Z, Hu Q, Wu C (2011) Boundary instance selection based on neighborhood model. Neurocomputing 10:1585–1594CrossRef
19.
go back to reference Wang X, Dong L, Yan J (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505 Wang X, Dong L, Yan J (2012) Maximum ambiguity based sample selection in fuzzy decision tree induction. IEEE Trans Knowl Data Eng 24(8):1491–1505
20.
go back to reference Wang X, Zhai J, Lu S (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202 Wang X, Zhai J, Lu S (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inf Sci 178(16):3188–3202
21.
22.
go back to reference Gilad-Bachrach R, Navot A, Tishby N (2004) Margin based feature selection–theory and algorithms. In: Proceedings of the 21st international conference on machine learning, pp 40–46 Gilad-Bachrach R, Navot A, Tishby N (2004) Margin based feature selection–theory and algorithms. In: Proceedings of the 21st international conference on machine learning, pp 40–46
23.
go back to reference Sun Y. (2007) Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 6:1–17 Sun Y. (2007) Iterative RELIEF for feature weighting: algorithms, theories, and applications. IEEE Trans Pattern Anal Mach Intell 6:1–17
24.
go back to reference Vapnik VN (1998) Statistical learning theory, New York Vapnik VN (1998) Statistical learning theory, New York
25.
go back to reference Bartlett PL, Shawe-Taylor J (1999) Generalization performance of support vector machines and other pattern classifiers. In: Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 43–54 Bartlett PL, Shawe-Taylor J (1999) Generalization performance of support vector machines and other pattern classifiers. In: Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 43–54
26.
go back to reference Li Y, Lu B (2009) Feature selection based on loss-margin of nearest neighbor classification. Pattern Recognit 42:1914–1921CrossRefMATH Li Y, Lu B (2009) Feature selection based on loss-margin of nearest neighbor classification. Pattern Recognit 42:1914–1921CrossRefMATH
27.
go back to reference Chen B, Liu H, Chai J, Bao Z (2009) Large margin feature weighting method via linear programming. IEEE Trans Knowl Data Eng 10:1475–1486CrossRef Chen B, Liu H, Chai J, Bao Z (2009) Large margin feature weighting method via linear programming. IEEE Trans Knowl Data Eng 10:1475–1486CrossRef
28.
go back to reference Hu Q, Zhu P, Yang Y, Yu D (2011) Large-margin nearest neighbor classifiers via sample weight learning. Neurocomputing 7: 656–660CrossRef Hu Q, Zhu P, Yang Y, Yu D (2011) Large-margin nearest neighbor classifiers via sample weight learning. Neurocomputing 7: 656–660CrossRef
29.
go back to reference Garg A, Roth D (2003) Margin distribution and learning algorithms. In: Proceedings of the twentieth international conference on machine learning, pp 210–217 Garg A, Roth D (2003) Margin distribution and learning algorithms. In: Proceedings of the twentieth international conference on machine learning, pp 210–217
30.
31.
go back to reference Rudin C, Schapire RE, Daubechies I (2007) Analysis of boosting algorithms using the smooth margin function. Ann Stat 6:2723–2768CrossRefMathSciNet Rudin C, Schapire RE, Daubechies I (2007) Analysis of boosting algorithms using the smooth margin function. Ann Stat 6:2723–2768CrossRefMathSciNet
32.
go back to reference Freund Y (2001) An adaptive version of the boost by majority algorithm. Int Conf Mach Learn 43:293–318CrossRefMATH Freund Y (2001) An adaptive version of the boost by majority algorithm. Int Conf Mach Learn 43:293–318CrossRefMATH
33.
34.
go back to reference Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Knowl Discov Data Min 2:121–167CrossRef Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Knowl Discov Data Min 2:121–167CrossRef
35.
go back to reference Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Thirteenth international conference in machine learning, pp 1–15 Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Thirteenth international conference in machine learning, pp 1–15
36.
go back to reference Warmuth MK, Glocer K, Ratsch G (2007) Boosting algorithms for maximizing the soft margin. Advances in Neural Information Processing Systems. MIT Press, pp 340–346 Warmuth MK, Glocer K, Ratsch G (2007) Boosting algorithms for maximizing the soft margin. Advances in Neural Information Processing Systems. MIT Press, pp 340–346
37.
go back to reference Vezhnevets A, Barinova O (2007) Avoiding boosting overfitting by removing confusing samples. In: Proceedings of the 18th European conference on machine learning, pp 430–441 Vezhnevets A, Barinova O (2007) Avoiding boosting overfitting by removing confusing samples. In: Proceedings of the 18th European conference on machine learning, pp 430–441
38.
go back to reference Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407CrossRefMATHMathSciNet Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407CrossRefMATHMathSciNet
39.
go back to reference Romero E, Marquez L, Carreras X (2004) Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost. Neurocomputing 57: 313–344CrossRef Romero E, Marquez L, Carreras X (2004) Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost. Neurocomputing 57: 313–344CrossRef
40.
go back to reference Kanamori T, Takenouchi P, Eguchi P, Murata N (2007) Robust loss functions for boosting. NeuralComputation 8:2183–2244MathSciNet Kanamori T, Takenouchi P, Eguchi P, Murata N (2007) Robust loss functions for boosting. NeuralComputation 8:2183–2244MathSciNet
41.
go back to reference Allen GI (2012) Automatic feature extraction via weighted Kernels and regularization. J Comput Graph Stat (to appear) Allen GI (2012) Automatic feature extraction via weighted Kernels and regularization. J Comput Graph Stat (to appear)
42.
go back to reference Parka C, Koob JY, Kimc PT, Leeb JW (2008) Stepwise feature selection using generalized logistic loss. Comput Stat Data Anal 52(7):3709–3718CrossRef Parka C, Koob JY, Kimc PT, Leeb JW (2008) Stepwise feature selection using generalized logistic loss. Comput Stat Data Anal 52(7):3709–3718CrossRef
43.
go back to reference Weinberger KQ, Blitzer J, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244 Weinberger KQ, Blitzer J, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
44.
go back to reference Ng Andrew Y (2004) Feature selection L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the 21 st international conference on machine learning Ng Andrew Y (2004) Feature selection L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the 21 st international conference on machine learning
45.
go back to reference Zhang T (2009) On the consistency of feature selection using Greedy least squares regression. J Mach Learn Res 10:555–568MATHMathSciNet Zhang T (2009) On the consistency of feature selection using Greedy least squares regression. J Mach Learn Res 10:555–568MATHMathSciNet
46.
go back to reference Farahat AK, Ghodsi A, Kamel MS (2011) An efficient Greedy method for unsupervised feature selection. IEEE 11th international conference on data mining Farahat AK, Ghodsi A, Kamel MS (2011) An efficient Greedy method for unsupervised feature selection. IEEE 11th international conference on data mining
47.
go back to reference Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs, ICML Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs, ICML
48.
go back to reference Liu X, Yu T, Gen. Electr, Global Res. Niskayuna (2007) Gradient feature selection for online boosting. IEEE 11th international conference on computer vision, pp 1–8 Liu X, Yu T, Gen. Electr, Global Res. Niskayuna (2007) Gradient feature selection for online boosting. IEEE 11th international conference on computer vision, pp 1–8
49.
go back to reference Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130CrossRefMATH Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130CrossRefMATH
50.
go back to reference Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retriev 4:5–31CrossRefMATH Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retriev 4:5–31CrossRefMATH
51.
go back to reference Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167CrossRef Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167CrossRef
52.
go back to reference Lee Y-J, Mangasarian OL (2001) SSVM: a smooth support vector machine for classification. Comput Optim Appl 1:320–344MathSciNet Lee Y-J, Mangasarian OL (2001) SSVM: a smooth support vector machine for classification. Comput Optim Appl 1:320–344MathSciNet
53.
go back to reference Crammer K, Gilad-Bachrach R, Navot A, Tishby N (2002) Margin analysis of the LVQ algorithm. In: Proceedings of the 17’th conference on neural information processing systems, pp 462–469 Crammer K, Gilad-Bachrach R, Navot A, Tishby N (2002) Margin analysis of the LVQ algorithm. In: Proceedings of the 17’th conference on neural information processing systems, pp 462–469
54.
go back to reference Wright J, Yang AY, Ganesh A, Shankar Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 2:210–227CrossRef Wright J, Yang AY, Ganesh A, Shankar Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 2:210–227CrossRef
55.
go back to reference Zhang L, Zhou WD (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit 1:97–106CrossRefMATH Zhang L, Zhou WD (2011) Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit 1:97–106CrossRefMATH
56.
go back to reference Tan M, Wang L, Tsang IW (2010) Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of the 27th international conference on machine learning, pp 301–308 Tan M, Wang L, Tsang IW (2010) Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of the 27th international conference on machine learning, pp 301–308
57.
go back to reference Perkins S, Lacker K, Theiler J (2003) Grafting: fast incremental feature se- lection by gradient descent in function space. J Mach Learn Res 3:1333–1356MATHMathSciNet Perkins S, Lacker K, Theiler J (2003) Grafting: fast incremental feature se- lection by gradient descent in function space. J Mach Learn Res 3:1333–1356MATHMathSciNet
58.
go back to reference Carpenter B (2008) Lazy sparse stochastic gradient descent for regularized multinomial logistic regression, Technical report 1–12 Carpenter B (2008) Lazy sparse stochastic gradient descent for regularized multinomial logistic regression, Technical report 1–12
59.
go back to reference Tsuruoka Y, Tsujii J, Ananiadou S (2009) Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In: Proceedings of the 47th annual meeting of the ACL and the 4th IJCNLP of the AFNLP, pp 477–485 Tsuruoka Y, Tsujii J, Ananiadou S (2009) Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In: Proceedings of the 47th annual meeting of the ACL and the 4th IJCNLP of the AFNLP, pp 477–485
61.
go back to reference Perou CM, Solie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752 Perou CM, Solie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406:747–752
62.
go back to reference Alizadeh A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 405:503–511CrossRef Alizadeh A et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 405:503–511CrossRef
63.
go back to reference Beer DG, Kardia SLR, Huang CC et al (2008) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824 Beer DG, Kardia SLR, Huang CC et al (2008) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8:816–824
64.
go back to reference Khan J, Weil JS, Ringner M, Saall LH, Ladanyi M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679CrossRef Khan J, Weil JS, Ringner M, Saall LH, Ladanyi M et al (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 7:673–679CrossRef
65.
go back to reference Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. Eur Conf Mach Learn, 171–182 Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. Eur Conf Mach Learn, 171–182
Metadata
Title
Comparative analysis on margin based feature selection algorithms
Authors
Pan Wei
Peijun Ma
Qinghua Hu
Xiaohong Su
Chaoqi Ma
Publication date
01-06-2014
Publisher
Springer Berlin Heidelberg
Published in
International Journal of Machine Learning and Cybernetics / Issue 3/2014
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-013-0164-6

Other articles of this Issue 3/2014

International Journal of Machine Learning and Cybernetics 3/2014 Go to the issue