Skip to main content
Top
Published in: Neural Computing and Applications 10/2019

26-04-2018 | Original Article

Feature selection with MCP\(^2\) regularization

Authors: Yong Shi, Jianyu Miao, Lingfeng Niu

Published in: Neural Computing and Applications | Issue 10/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Feature selection, as a fundamental component of building robust models, plays an important role in many machine learning and data mining tasks. Recently, with the development of sparsity research, both theoretical and empirical studies have suggested that the sparsity is one of the intrinsic properties of real world data and sparsity regularization has been applied into feature selection models successfully. In view of the remarkable performance of non-convex regularization, in this paper, we propose a novel non-convex yet Lipschitz continuous sparsity regularization term, named MCP\(^2\), and apply it into feature selection. To solve the resulting non-convex model, a new algorithm in the framework of the ConCave–Convex Procedure is given at the same time. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48 Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48
2.
go back to reference Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271MathSciNetCrossRef Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271MathSciNetCrossRef
3.
go back to reference Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 13 th international conference on machine learning, vol 98. pp 82–90 Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 13 th international conference on machine learning, vol 98. pp 82–90
4.
go back to reference Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 333–342 Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 333–342
5.
go back to reference Cai X, Nie F, Huang H, Ding C (2011) Multi-class l2, 1-norm support vector machine. In: Data mining (ICDM), 2011 IEEE 11th international conference on. IEEE, pp 91–100 Cai X, Nie F, Huang H, Ding C (2011) Multi-class l2, 1-norm support vector machine. In: Data mining (ICDM), 2011 IEEE 11th international conference on. IEEE, pp 91–100
6.
go back to reference Collobert R, Sinz F, Weston J, Bottou L (2006) Large scale transductive svms. J Mach Learn Res 7:1687–1712MathSciNetMATH Collobert R, Sinz F, Weston J, Bottou L (2006) Large scale transductive svms. J Mach Learn Res 7:1687–1712MathSciNetMATH
7.
go back to reference Constantinopoulos C, Titsias MK, Likas A (2006) Bayesian feature and model selection for gaussian mixture models. IEEE Trans Pattern Anal Mach Intell 6:1013–1018CrossRef Constantinopoulos C, Titsias MK, Likas A (2006) Bayesian feature and model selection for gaussian mixture models. IEEE Trans Pattern Anal Mach Intell 6:1013–1018CrossRef
8.
go back to reference Ding C, Zhou D, He X, Zha H (2006) R 1-pca: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 281–288 Ding C, Zhou D, He X, Zha H (2006) R 1-pca: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 281–288
9.
go back to reference Du X, Yan Y, Pan P, Long G, Zhao L (2016) Multiple graph unsupervised feature selection. Signal Process 120:754–760CrossRef Du X, Yan Y, Pan P, Long G, Zhao L (2016) Multiple graph unsupervised feature selection. Signal Process 120:754–760CrossRef
10.
go back to reference Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, HobokenMATH Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, HobokenMATH
11.
go back to reference Esser E, Lou Y, Xin J (2013) A method for finding structured sparse solutions to nonnegative least squares problems with applications. SIAM J Imaging Sci 6(4):2010–2046MathSciNetCrossRef Esser E, Lou Y, Xin J (2013) A method for finding structured sparse solutions to nonnegative least squares problems with applications. SIAM J Imaging Sci 6(4):2010–2046MathSciNetCrossRef
12.
go back to reference Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRef Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRef
13.
go back to reference Fung G, Mangasarian OL (2000) Data selection for support vector machine classifiers. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery data mining, pp 64–70 Fung G, Mangasarian OL (2000) Data selection for support vector machine classifiers. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery data mining, pp 64–70
14.
go back to reference Gao S, Ye Q, Ye N (2011) 1-norm least squares twin support vector machines. Neurocomputing 74(17):3590–3597CrossRef Gao S, Ye Q, Ye N (2011) 1-norm least squares twin support vector machines. Neurocomputing 74(17):3590–3597CrossRef
15.
go back to reference Gui J, Sun Z, Ji S, Tao D, Tan T (2016) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507MathSciNetCrossRef Gui J, Sun Z, Ji S, Tao D, Tan T (2016) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507MathSciNetCrossRef
16.
go back to reference Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
17.
go back to reference Han Y, Yang Y, Yan Y, Ma Z, Sebe N, Zhou X (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst 26(2):252–264MathSciNetCrossRef Han Y, Yang Y, Yan Y, Ma Z, Sebe N, Zhou X (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst 26(2):252–264MathSciNetCrossRef
18.
go back to reference He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514 He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514
19.
go back to reference Jiang W, Nie F, Huang H (2015) Robust dictionary learning with capped l1-norm. In: IJCAI, pp 3590–3596 Jiang W, Nie F, Huang H (2015) Robust dictionary learning with capped l1-norm. In: IJCAI, pp 3590–3596
20.
go back to reference Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150CrossRef Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150CrossRef
21.
go back to reference Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: AAAI Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: AAAI
22.
go back to reference Ma Z, Nie F, Yang Y, Uijlings JR, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimed 14(4):1021–1030CrossRef Ma Z, Nie F, Yang Y, Uijlings JR, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimed 14(4):1021–1030CrossRef
23.
go back to reference Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint \(\ell _{2, 1}\)-norms minimization. In: Advances in neural information processing systems, pp 1813–1821 Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint \(\ell _{2, 1}\)-norms minimization. In: Advances in neural information processing systems, pp 1813–1821
24.
go back to reference Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. AAAI 2:671–676 Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. AAAI 2:671–676
25.
go back to reference Obozinski G, Taskar B, Jordan M (2006) Multi-task feature selection. Statistics Department, UC Berkeley, Tech. Rep Obozinski G, Taskar B, Jordan M (2006) Multi-task feature selection. Statistics Department, UC Berkeley, Tech. Rep
26.
go back to reference Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef
27.
28.
go back to reference Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and rrelieff. Mach Learn 53(1–2):23–69CrossRef Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and rrelieff. Mach Learn 53(1–2):23–69CrossRef
29.
go back to reference Sahami M (1998) Using machine learning to improve information access. Ph.D. thesis, Stanford University Sahami M (1998) Using machine learning to improve information access. Ph.D. thesis, Stanford University
30.
go back to reference Shi C, Ruan Q, Guo S, Tian Y (2015) Sparse feature selection based on \(l_{2, 1/2}\)-matrix norm for web image annotation. Neurocomputing 151:424–433 Shi C, Ruan Q, Guo S, Tian Y (2015) Sparse feature selection based on \(l_{2, 1/2}\)-matrix norm for web image annotation. Neurocomputing 151:424–433
32.
go back to reference Tan J, Zhang Z, Zhen L, Zhang C, Deng N (2013) Adaptive feature selection via a new version of support vector machine. Neural Comput Appl 23(3–4):937–945CrossRef Tan J, Zhang Z, Zhen L, Zhang C, Deng N (2013) Adaptive feature selection via a new version of support vector machine. Neural Comput Appl 23(3–4):937–945CrossRef
33.
go back to reference Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288MathSciNetMATH
34.
go back to reference Wang S, Tang J, Liu H (2015) Embedded unsupervised feature selection. In: Twenty-ninth AAAI conference on artificial intelligence Wang S, Tang J, Liu H (2015) Embedded unsupervised feature selection. In: Twenty-ninth AAAI conference on artificial intelligence
35.
go back to reference Wright S (1965) The interpretation of population structure by f-statistics with special regard to systems of mating. Evolution 19:395–420CrossRef Wright S (1965) The interpretation of population structure by f-statistics with special regard to systems of mating. Evolution 19:395–420CrossRef
36.
go back to reference Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754CrossRef Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754CrossRef
37.
go back to reference Ye YF, Shao YH, Deng NY, Li CN, Hua XY (2017) Robust lp-norm least squares support vector regression with feature selection. Appl Math Comput 305:32–52MathSciNetMATH Ye YF, Shao YH, Deng NY, Li CN, Hua XY (2017) Robust lp-norm least squares support vector regression with feature selection. Appl Math Comput 305:32–52MathSciNetMATH
38.
go back to reference Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68(1):49–67MathSciNetCrossRef Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68(1):49–67MathSciNetCrossRef
39.
40.
go back to reference Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531CrossRef Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531CrossRef
41.
go back to reference Zhang H, Chow TW, Wu QJ (2016) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550CrossRef Zhang H, Chow TW, Wu QJ (2016) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550CrossRef
43.
go back to reference Zhang M, Ding CH, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: AAAI, pp 1355–1361 Zhang M, Ding CH, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: AAAI, pp 1355–1361
44.
go back to reference Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning. ACM, pp 1151–1157 Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning. ACM, pp 1151–1157
45.
go back to reference Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384 Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384
46.
go back to reference Zhu P, Hu Q, Zhang C, Zuo W (2016) Coupled dictionary learning for unsupervised feature selection. In: AAAI, pp 2422–2428 Zhu P, Hu Q, Zhang C, Zuo W (2016) Coupled dictionary learning for unsupervised feature selection. In: AAAI, pp 2422–2428
47.
go back to reference Zhu P, Xu Q, Hu Q, Zhang C, Zhao H (2018) Multi-label feature selection with missing labels. Pattern Recognit 74:488–502CrossRef Zhu P, Xu Q, Hu Q, Zhang C, Zhao H (2018) Multi-label feature selection with missing labels. Pattern Recognit 74:488–502CrossRef
48.
go back to reference Zhu P, Zhu W, Wang W, Zuo W, Hu Q (2017) Non-convex regularized self-representation for unsupervised feature selection. Image Vis Comput 60:22–29CrossRef Zhu P, Zhu W, Wang W, Zuo W, Hu Q (2017) Non-convex regularized self-representation for unsupervised feature selection. Image Vis Comput 60:22–29CrossRef
Metadata
Title
Feature selection with MCP regularization
Authors
Yong Shi
Jianyu Miao
Lingfeng Niu
Publication date
26-04-2018
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 10/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3500-7

Other articles of this Issue 10/2019

Neural Computing and Applications 10/2019 Go to the issue

Premium Partner