Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2017

03.11.2015 | Theoretical Advances

A new feature selection approach based on ensemble methods in semi-supervised classification

verfasst von: Nesma Settouti, Mohamed Amine Chikh, Vincent Barra

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In computer aided medical system, many practical classification applications are confronted to the massive multiplication of collection and storage of data, this is especially the case in areas such as the prediction of medical test efficiency, the classification of tumors and the detection of cancers. Data with known class labels (labeled data) can be limited but unlabeled data (with unknown class labels) are more readily available. Semi-supervised learning deals with methods for exploiting the unlabeled data in addition to the labeled data to improve performance on the classification task. In this paper, we consider the problem of using a large amount of unlabeled data to improve the efficiency of feature selection in large dimensional datasets, when only a small set of labeled examples is available. We propose a new semi-supervised feature evaluation method called Optimized co-Forest for Feature Selection (OFFS) that combines ideas from co-forest and the embedded principle of selecting in Random Forest based by the permutation of out-of-bag set. We provide empirical results on several medical and biological benchmark datasets, indicating an overall significant improvement of OFFS compared to four other feature selection approaches using filter, wrapper and embedded manner in semi-supervised learning. Our method proves its ability and effectiveness to select and measure importance to improve the performance of the hypothesis learned with a small amount of labeled samples by exploiting unlabeled samples.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A bootstrap sample L is, for example, obtained by randomly drawing n observations with replacement from the training sample \(L_n\) each observation with a probability 1/n to be drawn.
 
Literatur
1.
Zurück zum Zitat Aha DW, Bankert RL (1996) A comparative evaluation of sequential feature selection algorithms. In: Fisher DH, Lenz HJ (eds) Learning from data: artificial intelligence and Statistics V, Lecture Notes in Statistics, chap 4, pp 199–206. Springer-Verlag, 175 Fifth Avenue, New York, 10010, USA Aha DW, Bankert RL (1996) A comparative evaluation of sequential feature selection algorithms. In: Fisher DH, Lenz HJ (eds) Learning from data: artificial intelligence and Statistics V, Lecture Notes in Statistics, chap 4, pp 199–206. Springer-Verlag, 175 Fifth Avenue, New York, 10010, USA
2.
Zurück zum Zitat Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588CrossRef Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588CrossRef
5.
Zurück zum Zitat Benabdeslem K, Hindawi M (2013) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143CrossRef Benabdeslem K, Hindawi M (2013) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26(5):1131–1143CrossRef
6.
Zurück zum Zitat Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. COLT’ 98NY, USA, New York, pp 92–100 Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. COLT’ 98NY, USA, New York, pp 92–100
9.
Zurück zum Zitat Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: Proceedings of the 20th international joint conference on artificial intelligence, IJCAI’07, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 708–713 Cai D, He X, Zhou K, Han J, Bao H (2007) Locality sensitive discriminant analysis. In: Proceedings of the 20th international joint conference on artificial intelligence, IJCAI’07, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 708–713
11.
Zurück zum Zitat Cheng Y, Cai Y, Sun Y, Li J (2008) Semi-supervised feature selection under logistic i-relief framework. In: ICPR IEEE, pp 1–4 Cheng Y, Cai Y, Sun Y, Li J (2008) Semi-supervised feature selection under logistic i-relief framework. In: ICPR IEEE, pp 1–4
12.
Zurück zum Zitat Cun YL, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605. Morgan Kaufmann Cun YL, Denker JS, Solla SA (1990) Optimal brain damage. In: Advances in neural information processing systems, pp 598–605. Morgan Kaufmann
13.
Zurück zum Zitat Deng C, Guo M (2011) A new co-training-style random forest for computer aided diagnosis. J Intell Inf Syst 36(3):253–281CrossRef Deng C, Guo M (2011) A new co-training-style random forest for computer aided diagnosis. J Intell Inf Syst 36(3):253–281CrossRef
14.
Zurück zum Zitat Doquire G, Verleysen M (2011) Graph laplacian for semi-supervised feature selection in regression problems. In: Cabestany J, Rojas I, Caparrs GJ (eds) IWANN (1), Lecture Notes in Computer Science, vol 6691. Springer, pp 248–255 Doquire G, Verleysen M (2011) Graph laplacian for semi-supervised feature selection in regression problems. In: Cabestany J, Rojas I, Caparrs GJ (eds) IWANN (1), Lecture Notes in Computer Science, vol 6691. Springer, pp 248–255
15.
Zurück zum Zitat Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetMATH Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845–889MathSciNetMATH
17.
Zurück zum Zitat Elghazel H, Aussem A (2010) Feature selection for unsupervised learning using random cluster ensembles. In: 2013 IEEE 13th international conference on data mining, pp 168–175 Elghazel H, Aussem A (2010) Feature selection for unsupervised learning using random cluster ensembles. In: 2013 IEEE 13th international conference on data mining, pp 168–175
18.
Zurück zum Zitat Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914 (Evaluation Studies) Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10):906–914 (Evaluation Studies)
19.
Zurück zum Zitat Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. In: Foundations of genetic algorithms. Morgan Kaufmann, pp 69–93 Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. In: Foundations of genetic algorithms. Morgan Kaufmann, pp 69–93
20.
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
21.
Zurück zum Zitat Han J, Kamber M, Pei J (2011) Data Mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San FranciscoMATH Han J, Kamber M, Pei J (2011) Data Mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San FranciscoMATH
22.
Zurück zum Zitat Hindawi M, Benabdeslem K (2013) Local-to-global semi-supervised feature selection. In: He Q, Iyengar A, Nejdl W, Pei J, Rastogi R (eds) CIKM. ACM, pp 2159–2168 Hindawi M, Benabdeslem K (2013) Local-to-global semi-supervised feature selection. In: He Q, Iyengar A, Nejdl W, Pei J, Rastogi R (eds) CIKM. ACM, pp 2159–2168
26.
Zurück zum Zitat John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international. Morgan Kaufmann, pp 121–129 John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international. Morgan Kaufmann, pp 121–129
27.
Zurück zum Zitat Kallakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recogn Lett 32(5):656–665CrossRef Kallakech M, Biela P, Macaire L, Hamad D (2011) Constraint scores for semi-supervised feature selection: a comparative study. Pattern Recogn Lett 32(5):656–665CrossRef
28.
Zurück zum Zitat Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning, ML92. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256 Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the ninth international workshop on machine learning, ML92. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 249–256
30.
Zurück zum Zitat Kong X, Yu PS (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York, NY, USA, pp 793–802. doi:10.1145/1835804.1835905 Kong X, Yu PS (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York, NY, USA, pp 793–802. doi:10.​1145/​1835804.​1835905
31.
Zurück zum Zitat Kuncheva LI (2007) A stability index for feature selection. In: Proceedings of the 25th conference on proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications, AIAP’07. ACTA Press, Anaheim, CA, USA, pp 390–395 Kuncheva LI (2007) A stability index for feature selection. In: Proceedings of the 25th conference on proceedings of the 25th IASTED international multi-conference: artificial intelligence and applications, AIAP’07. ACTA Press, Anaheim, CA, USA, pp 390–395
34.
Zurück zum Zitat Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, NorwellCrossRefMATH Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, NorwellCrossRefMATH
35.
Zurück zum Zitat Liu H, Motoda H (2007) Computational methods of feature selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series). Chapman & Hall/CRC Liu H, Motoda H (2007) Computational methods of feature selection (Chapman & Hall/Crc Data Mining and Knowledge Discovery Series). Chapman & Hall/CRC
36.
Zurück zum Zitat Mitchell TM (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science. San Sebastian, Spain Mitchell TM (1999) The role of unlabeled data in supervised learning. In: Proceedings of the sixth international colloquium on cognitive science. San Sebastian, Spain
37.
38.
Zurück zum Zitat Miyahara K, Pazzani MJ (2000) Collaborative filtering with the simple bayesian classifier. In: Proceedings of the 6th Pacific Rim international conference on artificial intelligence, PRICAI’00. Springer-Verlag, Berlin, pp 679–689 Miyahara K, Pazzani MJ (2000) Collaborative filtering with the simple bayesian classifier. In: Proceedings of the 6th Pacific Rim international conference on artificial intelligence, PRICAI’00. Springer-Verlag, Berlin, pp 679–689
39.
Zurück zum Zitat Nakatani Y, Zhu K, Uehara K (2007) Semisupervised learning using feature selection based on maximum density subgraphs. Syst Comput Jpn 38(9):32–43. doi:10.1002/scj.20757 CrossRef Nakatani Y, Zhu K, Uehara K (2007) Semisupervised learning using feature selection based on maximum density subgraphs. Syst Comput Jpn 38(9):32–43. doi:10.​1002/​scj.​20757 CrossRef
41.
Zurück zum Zitat Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on information and knowledge management, CIKM ’00. ACM, New York, NY, USA, pp 86–93. doi:10.1145/354756.354805 Nigam K, Ghani R (2000) Analyzing the effectiveness and applicability of co-training. In: Proceedings of the ninth international conference on information and knowledge management, CIKM ’00. ACM, New York, NY, USA, pp 86–93. doi:10.​1145/​354756.​354805
42.
Zurück zum Zitat Press WH, Teukolsky SA (1992) In: Vetterling WT, Flannery BP (eds) Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, New York Press WH, Teukolsky SA (1992) In: Vetterling WT, Flannery BP (eds) Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, New York
43.
Zurück zum Zitat Ren J, Qiu Z, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’08. Springer-Verlag, Berlin, pp 970–976 Ren J, Qiu Z, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’08. Springer-Verlag, Berlin, pp 970–976
45.
Zurück zum Zitat Stoppiglia H, Dreyfus G, Dubois R, Oussar Y (2003) Ranking a random feature for variable and feature selection. J Mach Learn Res 3:1399–1414MATH Stoppiglia H, Dreyfus G, Dubois R, Oussar Y (2003) Ranking a random feature for variable and feature selection. J Mach Learn Res 3:1399–1414MATH
47.
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc (Ser B) 58:267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc (Ser B) 58:267–288MathSciNetMATH
48.
Zurück zum Zitat Torkkola K (2003) Feature extraction by non parametric mutual information maximization. J Mach Learn Res 3:1415–1438MathSciNetMATH Torkkola K (2003) Feature extraction by non parametric mutual information maximization. J Mach Learn Res 3:1415–1438MathSciNetMATH
49.
Zurück zum Zitat Wang J, Luo S, Zeng X (2008) A random subspace method for co-training. In: IJCNN, IEEE, pp 195–200 Wang J, Luo S, Zeng X (2008) A random subspace method for co-training. In: IJCNN, IEEE, pp 195–200
50.
Zurück zum Zitat Xu Z, King I, Lyu MR, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–1047CrossRef Xu Z, King I, Lyu MR, Jin R (2010) Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans Neural Netw 21(7):1033–1047CrossRef
54.
Zurück zum Zitat Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: SDM, SIAM Zhao Z, Liu H (2007) Semi-supervised feature selection via spectral analysis. In: SDM, SIAM
55.
Zurück zum Zitat Zhou Y, Goldman S (2004) Democratic co-learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence, ICTAI ’04, IEEE Computer Society, Washington, DC, USA, pp 594–202. doi:10.1109/ICTAI.2004.48 Zhou Y, Goldman S (2004) Democratic co-learning. In: Proceedings of the 16th IEEE international conference on tools with artificial intelligence, ICTAI ’04, IEEE Computer Society, Washington, DC, USA, pp 594–202. doi:10.​1109/​ICTAI.​2004.​48
56.
57.
Zurück zum Zitat Zhu X (2005) Semi-Supervised learning literature survey. Computer Sciences, University of Wisconsin-Madison, Tech. rep Zhu X (2005) Semi-Supervised learning literature survey. Computer Sciences, University of Wisconsin-Madison, Tech. rep
Metadaten
Titel
A new feature selection approach based on ensemble methods in semi-supervised classification
verfasst von
Nesma Settouti
Mohamed Amine Chikh
Vincent Barra
Publikationsdatum
03.11.2015
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2017
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-015-0524-9

Weitere Artikel der Ausgabe 3/2017

Pattern Analysis and Applications 3/2017 Zur Ausgabe