Skip to main content
Erschienen in: Knowledge and Information Systems 3/2017

05.05.2017 | Survey Paper

Recent advances in feature selection and its applications

verfasst von: Yun Li, Tao Li, Huan Liu

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature selection is one of the key problems for machine learning and data mining. In this review paper, a brief historical background of the field is given, followed by a selection of challenges which are of particular current interests, such as feature selection for high-dimensional small sample size data, large-scale data, and secure feature selection. Along with these challenges, some hot topics for feature selection have emerged, e.g., stable feature selection, multi-view feature selection, distributed feature selection, multi-label feature selection, online feature selection, and adversarial feature selection. Then, the recent advances of these topics are surveyed in this paper. For each topic, the existing problems are analyzed, and then, current solutions to these problems are presented and discussed. Besides the topics, some representative applications of feature selection are also introduced, such as applications in bioinformatics, social media, and multimedia retrieval.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 31:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 31:1157–1182MATH
2.
Zurück zum Zitat Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:494–502CrossRef Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17:494–502CrossRef
3.
Zurück zum Zitat Hughes GF (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14:55–63CrossRef Hughes GF (1968) On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 14:55–63CrossRef
4.
5.
Zurück zum Zitat Blum A, Langle P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271 Blum A, Langle P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97:245–271
6.
Zurück zum Zitat Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRefMATH Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRefMATH
7.
Zurück zum Zitat Inza I, Larranaga P, Blanco R, Cerrolaza AJ (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31:91–103CrossRef Inza I, Larranaga P, Blanco R, Cerrolaza AJ (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31:91–103CrossRef
8.
Zurück zum Zitat Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305MATH Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305MATH
9.
Zurück zum Zitat Blum AL, Rivest RL (1992) Training a 3-node neural networks is NP-complete. Neural Netw 5:117–127CrossRef Blum AL, Rivest RL (1992) Training a 3-node neural networks is NP-complete. Neural Netw 5:117–127CrossRef
11.
Zurück zum Zitat Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef
12.
Zurück zum Zitat Singh D, Febbo PG, Ross K (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2:203–209CrossRef Singh D, Febbo PG, Ross K (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 2:203–209CrossRef
13.
Zurück zum Zitat Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98:13790–13795CrossRef Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98:13790–13795CrossRef
14.
Zurück zum Zitat Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750 Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96:6745–6750
15.
Zurück zum Zitat Zhao Z (2010) Spectral feature selection for mining ultrahigh dimensional data, Ph.D. thesis. Arizona State University Zhao Z (2010) Spectral feature selection for mining ultrahigh dimensional data, Ph.D. thesis. Arizona State University
16.
Zurück zum Zitat Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature extraction, foundations and applications. Springer, Physica-Verlag, New YorkCrossRefMATH Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature extraction, foundations and applications. Springer, Physica-Verlag, New YorkCrossRefMATH
17.
Zurück zum Zitat Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRef Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156CrossRef
18.
Zurück zum Zitat Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28CrossRef Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28CrossRef
19.
Zurück zum Zitat Tang JL, Alelyani S, Liu H (2014) Feature selection for classification—a review. In: Aggarwal C (ed) Data classification: algorithms and applications. CRC Press, Boca Raton Tang JL, Alelyani S, Liu H (2014) Feature selection for classification—a review. In: Aggarwal C (ed) Data classification: algorithms and applications. CRC Press, Boca Raton
20.
Zurück zum Zitat Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang JL, Liu H (2016) Feature selection: a data perspective, vol 3, pp 1–73. arXiv:1601.07996 Li JD, Cheng KW, Wang SH, Morstatter F, Trevino RP, Tang JL, Liu H (2016) Feature selection: a data perspective, vol 3, pp 1–73. arXiv:​1601.​07996
21.
Zurück zum Zitat Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238CrossRef Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238CrossRef
22.
Zurück zum Zitat Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24:301–312CrossRef Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24:301–312CrossRef
23.
Zurück zum Zitat Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of international conference on machine learning, pp 359–366 Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of international conference on machine learning, pp 359–366
24.
Zurück zum Zitat Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of international conference on machine learning, pp 856–863 Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of international conference on machine learning, pp 856–863
25.
Zurück zum Zitat Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNetMATH Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224MathSciNetMATH
26.
Zurück zum Zitat Saeys Y, Abeel T, de Peer YV (2008) Robust feature selection using ensemble feature selection techniques. In: Proceedings of the 25th European conference on machine learning and knowledge discovery in databases, Banff, pp 313–325 Saeys Y, Abeel T, de Peer YV (2008) Robust feature selection using ensemble feature selection techniques. In: Proceedings of the 25th European conference on machine learning and knowledge discovery in databases, Banff, pp 313–325
27.
Zurück zum Zitat Han Y, Yu L (2010) A variance reduction framework for stable feature selection. In: Proceedings of the international conference on data mining, pp 206–215 Han Y, Yu L (2010) A variance reduction framework for stable feature selection. In: Proceedings of the international conference on data mining, pp 206–215
28.
Zurück zum Zitat Loscalzo S, Yu L, Ding C (2009) Consensus group stable feature selection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 567–575 Loscalzo S, Yu L, Ding C (2009) Consensus group stable feature selection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 567–575
29.
Zurück zum Zitat Abeel T, Helleputte T, de Peer YV, Dupont P, Saeys Y (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26:392–398CrossRef Abeel T, Helleputte T, de Peer YV, Dupont P, Saeys Y (2010) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26:392–398CrossRef
30.
Zurück zum Zitat Li Y, Gao SY, Chen SC (2012) Ensemble feature weighting based on local learning and diversity. In: AAAI Conference on artificial intelligence, pp 1019–1025 Li Y, Gao SY, Chen SC (2012) Ensemble feature weighting based on local learning and diversity. In: AAAI Conference on artificial intelligence, pp 1019–1025
31.
Zurück zum Zitat Woznica A, Nguyen P, Kalousis A (2012) Model mining for robust feature selection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 913–921 Woznica A, Nguyen P, Kalousis A (2012) Model mining for robust feature selection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 913–921
32.
Zurück zum Zitat Yu L, Han Y, Berens ME (2012) Stable gene selection from microarray data via sample weighting. IEEE/ACM Trans Comput Biol Bioinform 9:262–272CrossRef Yu L, Han Y, Berens ME (2012) Stable gene selection from microarray data via sample weighting. IEEE/ACM Trans Comput Biol Bioinform 9:262–272CrossRef
33.
Zurück zum Zitat Yu L, Ding C, Loscalzo S (2008) Stable feature selection via dense feature groups. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 803–811 Yu L, Ding C, Loscalzo S (2008) Stable feature selection via dense feature groups. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining, pp 803–811
34.
Zurück zum Zitat He ZY, Yu WC (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34:215–225CrossRef He ZY, Yu WC (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34:215–225CrossRef
35.
Zurück zum Zitat Li Y, Huang SS, Chen SC, Si J (2013) Stable l2-regularized ensemble feature weighting. In: Proceedings of the 11th international workshop on multiple classifier systems, pp 167–178 Li Y, Huang SS, Chen SC, Si J (2013) Stable l2-regularized ensemble feature weighting. In: Proceedings of the 11th international workshop on multiple classifier systems, pp 167–178
36.
Zurück zum Zitat Li Y, Si J, Zhou GJ, Huang SS, Chen SC (2015) Frel: a stable feature selection algorithm. IEEE Trans Neural Netw Learn Syst 26:1388–1402MathSciNetCrossRef Li Y, Si J, Zhou GJ, Huang SS, Chen SC (2015) Frel: a stable feature selection algorithm. IEEE Trans Neural Netw Learn Syst 26:1388–1402MathSciNetCrossRef
37.
Zurück zum Zitat Crammer K, Bachrach RG, Navot A, Tishby N (2002) Margin analysis of the LVQ algorithm. In: Proceedings of advances in neural information processing systems, pp 462–469 Crammer K, Bachrach RG, Navot A, Tishby N (2002) Margin analysis of the LVQ algorithm. In: Proceedings of advances in neural information processing systems, pp 462–469
38.
Zurück zum Zitat Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MathSciNetMATH
39.
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Stat Methodol) 58:267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Stat Methodol) 58:267–288MathSciNetMATH
40.
Zurück zum Zitat Ng AY (2004) Feature selection, l1 vs. l2 regularization, and rotational invariance. In: Proceedings of international conference on machine learning, pp 78–85 Ng AY (2004) Feature selection, l1 vs. l2 regularization, and rotational invariance. In: Proceedings of international conference on machine learning, pp 78–85
41.
Zurück zum Zitat Jenatton R, Obozinski G, Bach F (2010) Structured sparse principal component analysis. In: Proceedings of international conference on artificial intelligence and statistics Jenatton R, Obozinski G, Bach F (2010) Structured sparse principal component analysis. In: Proceedings of international conference on artificial intelligence and statistics
42.
Zurück zum Zitat Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68:49–67MathSciNetCrossRefMATH Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68:49–67MathSciNetCrossRefMATH
43.
Zurück zum Zitat Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320MathSciNetCrossRefMATH Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67:301–320MathSciNetCrossRefMATH
44.
Zurück zum Zitat Kim S, Xing EP (2010) Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th international conference on machine learning Kim S, Xing EP (2010) Tree-guided group lasso for multi-task regression with structured sparsity. In: Proceedings of the 27th international conference on machine learning
45.
Zurück zum Zitat Wang J, Zhou JY, Liu J, Wonka P, Ye JP (2014) A safe screening rule for sparse logistic regression. In: Proceedings of advances in neural information processing systems, pp 1053–1061 Wang J, Zhou JY, Liu J, Wonka P, Ye JP (2014) A safe screening rule for sparse logistic regression. In: Proceedings of advances in neural information processing systems, pp 1053–1061
46.
Zurück zum Zitat Wang J, Ye JP (2015) Safe screening for multi-task feature learning with multiple data matrices. In: Proceedings of the 32nd international conference on machine learning Wang J, Ye JP (2015) Safe screening for multi-task feature learning with multiple data matrices. In: Proceedings of the 32nd international conference on machine learning
47.
Zurück zum Zitat Zhao Z, Wang JX, Sharma S, Agarwal N, Liu H, Chang Y (2010) An integrative approach to identifying biologically relevant genes. In: Proceedings of SIAM International conference on data mining Zhao Z, Wang JX, Sharma S, Agarwal N, Liu H, Chang Y (2010) An integrative approach to identifying biologically relevant genes. In: Proceedings of SIAM International conference on data mining
48.
Zurück zum Zitat Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: Proceedings of international conference on machine learning Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: Proceedings of international conference on machine learning
49.
Zurück zum Zitat Chu CT, Kim SK, Lin YA, Yu YY, Bradski G, Ng A, Olukotun K (2007) Map-reduce for machine learning on multicore. In: Proceedings of advances in neural information processing systems Chu CT, Kim SK, Lin YA, Yu YY, Bradski G, Ng A, Olukotun K (2007) Map-reduce for machine learning on multicore. In: Proceedings of advances in neural information processing systems
50.
Zurück zum Zitat Snir M, Otto S, Lederman SH, Walker D, Dongarra J (1995) MPI: the complete reference, 1st edn. MIT Press, Cambridge Snir M, Otto S, Lederman SH, Walker D, Dongarra J (1995) MPI: the complete reference, 1st edn. MIT Press, Cambridge
51.
Zurück zum Zitat Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107–113CrossRef
52.
Zurück zum Zitat Zhao ZA, Liu H (2012) Spectral feature selection for data mining. Taylor and Francis Group, London Zhao ZA, Liu H (2012) Spectral feature selection for data mining. Taylor and Francis Group, London
53.
Zurück zum Zitat Zhao Z, Zhang RW, Cox J, Duling D, Sarle W (2013) Massively parallel feature selection: an approach based on variance preservation. Mach Learn 92:195–220MathSciNetCrossRefMATH Zhao Z, Zhang RW, Cox J, Duling D, Sarle W (2013) Massively parallel feature selection: an approach based on variance preservation. Mach Learn 92:195–220MathSciNetCrossRefMATH
54.
Zurück zum Zitat Das K, Bhaduri K (2010) H. Kargupta: A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks. Knowl. Inf Syst 24:341–367 Das K, Bhaduri K (2010) H. Kargupta: A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks. Knowl. Inf Syst 24:341–367
55.
Zurück zum Zitat Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26:97–107CrossRef Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26:97–107CrossRef
56.
Zurück zum Zitat Cao B, He LF, Kong XN, Yu PS, Hao ZF, Ragin AB (2014) Tensor-based multi-view feature selection with applications to brain diseases. In: Proceedings of the 2014 international conference on data mining, pp 40–49 Cao B, He LF, Kong XN, Yu PS, Hao ZF, Ragin AB (2014) Tensor-based multi-view feature selection with applications to brain diseases. In: Proceedings of the 2014 international conference on data mining, pp 40–49
57.
Zurück zum Zitat Smalter A, Huan J, Lushington G (2009) Feature selection in the tensor product feature space. In: Proceedings of the 2009 international conference on data mining, pp 1004–1009 Smalter A, Huan J, Lushington G (2009) Feature selection in the tensor product feature space. In: Proceedings of the 2009 international conference on data mining, pp 1004–1009
58.
Zurück zum Zitat Tang JL, Hu X, Gao HJ, Liu H (2013) Unsupervised feature selection for multi-view data in social media. In: Proceedings of the 2013 SIAM conference on data mining Tang JL, Hu X, Gao HJ, Liu H (2013) Unsupervised feature selection for multi-view data in social media. In: Proceedings of the 2013 SIAM conference on data mining
59.
Zurück zum Zitat Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422CrossRefMATH Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422CrossRefMATH
60.
Zurück zum Zitat Fang Z, Zhang ZM (2013) Discriminative feature selection for multi-view cross-domain learning. In: Proceedings of ACM international conference of information and knowledge management, pp 1321–1330 Fang Z, Zhang ZM (2013) Discriminative feature selection for multi-view cross-domain learning. In: Proceedings of ACM international conference of information and knowledge management, pp 1321–1330
61.
Zurück zum Zitat Chen WZ, Yan J, Zhang BY, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Proceedings of the 7th IEEE conference on data mining, pp 451–456 Chen WZ, Yan J, Zhang BY, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Proceedings of the 7th IEEE conference on data mining, pp 451–456
62.
Zurück zum Zitat Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
63.
Zurück zum Zitat Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 119–127 Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 119–127
64.
Zurück zum Zitat Yan J, Liu N, Zhang B, Yan S, Chen Z, Cheng Q, Fan W, Ma WY (2005) OCFS: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 122–129 Yan J, Liu N, Zhang B, Yan S, Chen Z, Cheng Q, Fan W, Ma WY (2005) OCFS: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval, pp 122–129
65.
Zurück zum Zitat Lastra G, Luaces O, Quevedo JR, Bahamonde A (2011) Graphical feature selection for multilabel classification tasks. In: Proceedings of the 10th international conference on advances in intelligent data analysis, pp 281–305 Lastra G, Luaces O, Quevedo JR, Bahamonde A (2011) Graphical feature selection for multilabel classification tasks. In: Proceedings of the 10th international conference on advances in intelligent data analysis, pp 281–305
66.
Zurück zum Zitat Kong X, Yu PS (2012) gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31:281–305CrossRef Kong X, Yu PS (2012) gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31:281–305CrossRef
67.
Zurück zum Zitat Gu QQ, Li ZH, Han JW (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 1087–1096 Gu QQ, Li ZH, Han JW (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on information and knowledge management, pp 1087–1096
68.
Zurück zum Zitat Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Advances in neural information processing systems, pp 681–687 Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. In: Advances in neural information processing systems, pp 681–687
69.
Zurück zum Zitat Yan P, Li Y (2016) Graph-margin based multi-label feature selection. In: European conference on machine learning, pp 540–555 Yan P, Li Y (2016) Graph-margin based multi-label feature selection. In: European conference on machine learning, pp 540–555
70.
Zurück zum Zitat Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of international conference on machine learning, pp 592–599 Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of international conference on machine learning, pp 592–599
71.
Zurück zum Zitat Wu X, Yu K, Wang H, Ding W (2010) Online streaming feature selection. In: Proceedings of international conference on machine learning, pp 1159–1166 Wu X, Yu K, Wang H, Ding W (2010) Online streaming feature selection. In: Proceedings of international conference on machine learning, pp 1159–1166
72.
Zurück zum Zitat Zhou D, Huang J, Scholkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of international conference on machine learning, pp 1036–1043 Zhou D, Huang J, Scholkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of international conference on machine learning, pp 1036–1043
73.
Zurück zum Zitat Yu K, Wu XD, Ding W, Pei J (2014) Towards scalable and accurate online feature selection for big data. In: Proceedings of IEEE conference on data mining, pp 660–669 Yu K, Wu XD, Ding W, Pei J (2014) Towards scalable and accurate online feature selection for big data. In: Proceedings of IEEE conference on data mining, pp 660–669
74.
Zurück zum Zitat Sengupta D, Bandyopadhyay S, Sinha D (2017) A scoring scheme for online feature selection: simulating model performance without retraining. IEEE Trans Neural Netw Learn Syst 28:405–414CrossRef Sengupta D, Bandyopadhyay S, Sinha D (2017) A scoring scheme for online feature selection: simulating model performance without retraining. IEEE Trans Neural Netw Learn Syst 28:405–414CrossRef
75.
Zurück zum Zitat Wang J, Zhao ZQ, Hu XG, Cheung YM, Wang M, Wu XD (2013) Online group feature selection. In: Proceedings of international joint conference on artificial intelligence Wang J, Zhao ZQ, Hu XG, Cheung YM, Wang M, Wu XD (2013) Online group feature selection. In: Proceedings of international joint conference on artificial intelligence
76.
Zurück zum Zitat Wang J, Zhao P, Hoi S, Jin R (2014) Online feature selection and its applications. IEEE Trans Knowl Data Eng 26:698–710 Wang J, Zhao P, Hoi S, Jin R (2014) Online feature selection and its applications. IEEE Trans Knowl Data Eng 26:698–710
77.
Zurück zum Zitat Zhang Q, Zhang P, Long G, Ding W, Zhang C, Wu X (2015) Towards mining trapezoidal data streams. In: Proceedings of IEEE international conference on data mining, pp 1111–1116 Zhang Q, Zhang P, Long G, Ding W, Zhang C, Wu X (2015) Towards mining trapezoidal data streams. In: Proceedings of IEEE international conference on data mining, pp 1111–1116
78.
Zurück zum Zitat Avidan S, Butman M (2006) Efficient methods for privacy preserving face detection. In: Advances in neural information processing systems, pp 57–64 Avidan S, Butman M (2006) Efficient methods for privacy preserving face detection. In: Advances in neural information processing systems, pp 57–64
79.
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407 Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407
80.
Zurück zum Zitat Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl-Based Syst 95:1–11CrossRef Zhou Q, Zhou H, Li T (2016) Cost-sensitive feature selection using random forest: selecting low-cost subsets of informative features. Knowl-Based Syst 95:1–11CrossRef
81.
Zurück zum Zitat Dwork C (2006) Differential privacy. In: Proceedings of international colloquium on automata, languages and programming, pp 1–12 Dwork C (2006) Differential privacy. In: Proceedings of international colloquium on automata, languages and programming, pp 1–12
82.
Zurück zum Zitat Yang J, Li Y (2014) Differential privacy feature selection. In: Proceedings of international joint conference on neural networks, pp 4182–4189 Yang J, Li Y (2014) Differential privacy feature selection. In: Proceedings of international joint conference on neural networks, pp 4182–4189
83.
Zurück zum Zitat Li Y, Yang J, Ji W (2016) Local learning-based feature weighting with privacy preservation. Neurocomputing 174:1107–1115CrossRef Li Y, Yang J, Ji W (2016) Local learning-based feature weighting with privacy preservation. Neurocomputing 174:1107–1115CrossRef
84.
Zurück zum Zitat Sun YJ, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32:1–18CrossRef Sun YJ, Todorovic S, Goodison S (2010) Local learning based feature selection for high dimensional data analysis. IEEE Trans Pattern Anal Mach Intell 32:1–18CrossRef
85.
86.
Zurück zum Zitat Huang L, Joseph AD, Nelson B, Rubinstein BIP, Tygar JD (2011) Adversarial machine learning. In: Proceedings of 4th ACM workshop on artificial intelligence and security, pp 43–58 Huang L, Joseph AD, Nelson B, Rubinstein BIP, Tygar JD (2011) Adversarial machine learning. In: Proceedings of 4th ACM workshop on artificial intelligence and security, pp 43–58
87.
Zurück zum Zitat Biggio B, Fumera G, Roli F (2014) Security evaluation of pattern classifiers under attack. IEEE Trans Knowl Data Eng 26:984–996CrossRef Biggio B, Fumera G, Roli F (2014) Security evaluation of pattern classifiers under attack. IEEE Trans Knowl Data Eng 26:984–996CrossRef
88.
Zurück zum Zitat Li B, Vorobeychik Y (2014) Feature cross-substitution in adversarial classification. In: Proceedings of advances in neural information processing systems, pp 2087–2095 Li B, Vorobeychik Y (2014) Feature cross-substitution in adversarial classification. In: Proceedings of advances in neural information processing systems, pp 2087–2095
89.
Zurück zum Zitat Xiao H, Biggio B, Brown G, Fumera G, Eckert C, Roli F (2015) Is feature selection secure against training data poisoning? In: Proceedings of the 32th international conference on machine learning Xiao H, Biggio B, Brown G, Fumera G, Eckert C, Roli F (2015) Is feature selection secure against training data poisoning? In: Proceedings of the 32th international conference on machine learning
90.
Zurück zum Zitat Zhang F, Chan PPK, Biggio B, Yeung DS, Roli F (2015) Adversarial feature selection against evasion attacks. IEEE Trans Cybern 46:766–777 Zhang F, Chan PPK, Biggio B, Yeung DS, Roli F (2015) Adversarial feature selection against evasion attacks. IEEE Trans Cybern 46:766–777
91.
Zurück zum Zitat Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517CrossRef Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517CrossRef
92.
Zurück zum Zitat Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A, Benitez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135CrossRef Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A, Benitez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135CrossRef
93.
Zurück zum Zitat Nie FP, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint l21-norms minimization. Adv Neural Inf Process Syst 23:1813–1821 Nie FP, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint l21-norms minimization. Adv Neural Inf Process Syst 23:1813–1821
94.
Zurück zum Zitat Tang JL, Liu H (2012) Feature selection with linked data in social media. In: SIAM international conference on data mining Tang JL, Liu H (2012) Feature selection with linked data in social media. In: SIAM international conference on data mining
95.
Zurück zum Zitat Tang JL, Liu H (2012) Unsupervised feature selection for linked social media data. In: Eighteenth ACM SIGKDD international conference on knowledge discovery and data mining Tang JL, Liu H (2012) Unsupervised feature selection for linked social media data. In: Eighteenth ACM SIGKDD international conference on knowledge discovery and data mining
96.
Zurück zum Zitat Tang JL, Liu H (2014) Feature selection for social media data. ACM Trans Knowl Discov Data 8:1–27CrossRef Tang JL, Liu H (2014) Feature selection for social media data. ACM Trans Knowl Discov Data 8:1–27CrossRef
97.
Zurück zum Zitat Tang JL, Liu H (2014) An unsupervised feature selection framework for social media data. IEEE Trans Knowl Data Eng 26:2914–2927CrossRef Tang JL, Liu H (2014) An unsupervised feature selection framework for social media data. IEEE Trans Knowl Data Eng 26:2914–2927CrossRef
98.
Zurück zum Zitat Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113-1-026113-15 Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113-1-026113-15
100.
Zurück zum Zitat Li JD, Tang JL, Hu X, Liu H (2015) Unsupervised streaming feature selection in social media. In: Proceedings of ACM international conference of information and knowledge management Li JD, Tang JL, Hu X, Liu H (2015) Unsupervised streaming feature selection in social media. In: Proceedings of ACM international conference of information and knowledge management
101.
Zurück zum Zitat Wu F, Han YH, Liu X, Shao J, Zhuang YT, Zhang ZF (2012) The heterogeneous feature selection with structural sparsity for multimedia annotation and hashing: a survey. Int J Multimed Inf Retr 1:3–15CrossRef Wu F, Han YH, Liu X, Shao J, Zhuang YT, Zhang ZF (2012) The heterogeneous feature selection with structural sparsity for multimedia annotation and hashing: a survey. Int J Multimed Inf Retr 1:3–15CrossRef
102.
Zurück zum Zitat Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31:210–227CrossRef Wright J, Yang A, Ganesh A, Sastry S, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31:210–227CrossRef
103.
Zurück zum Zitat Jiang W, Er GH, Dai QH, Gu JW (2006) Similarity-based online feature selection in content-based image retrieval. IEEE Trans Image Process 15:702–712CrossRef Jiang W, Er GH, Dai QH, Gu JW (2006) Similarity-based online feature selection in content-based image retrieval. IEEE Trans Image Process 15:702–712CrossRef
104.
Zurück zum Zitat Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 38:337–374MathSciNetCrossRefMATH Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 38:337–374MathSciNetCrossRefMATH
105.
Zurück zum Zitat Khoshgoftaar TM, Gao KH, Napolitano A, Wald R (2014) A comparative study of iterative and non-iterative feature selection techniques for software defect prediction. Info Syst Frontiers 16:801–822CrossRef Khoshgoftaar TM, Gao KH, Napolitano A, Wald R (2014) A comparative study of iterative and non-iterative feature selection techniques for software defect prediction. Info Syst Frontiers 16:801–822CrossRef
106.
107.
Zurück zum Zitat Zhao L, Hu Q, Wang W (2015) Heterogeneous feature selection with multi-modal deep neural networks and sparse group lasso. IEEE Trans Multimed 17:1936–1948CrossRef Zhao L, Hu Q, Wang W (2015) Heterogeneous feature selection with multi-modal deep neural networks and sparse group lasso. IEEE Trans Multimed 17:1936–1948CrossRef
108.
Zurück zum Zitat Moro S, Cortez P, Rita P (2015) Business intelligence in banking: a literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Syst Appl 42:1314–1324CrossRef Moro S, Cortez P, Rita P (2015) Business intelligence in banking: a literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Syst Appl 42:1314–1324CrossRef
Metadaten
Titel
Recent advances in feature selection and its applications
verfasst von
Yun Li
Tao Li
Huan Liu
Publikationsdatum
05.05.2017
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2017
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-017-1059-8

Weitere Artikel der Ausgabe 3/2017

Knowledge and Information Systems 3/2017 Zur Ausgabe

Premium Partner