Skip to main content

2017 | OriginalPaper | Buchkapitel

Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm

verfasst von : Claudio De Stefano, Francesco Fontanella, Alessandra Scotto di Freca

Erschienen in: Applications of Evolutionary Computation

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In classification and clustering problems, feature selection techniques can be used to reduce the dimensionality of the data and increase the performances. However, feature selection is a challenging task, especially when hundred or thousands of features are involved. In this framework, we present a new approach for improving the performance of a filter-based genetic algorithm. The proposed approach consists of two steps: first, the available features are ranked according to a univariate evaluation function; then the search space represented by the first M features in the ranking is searched using a filter-based genetic algorithm for finding feature subsets with a high discriminative power.
Experimental results demonstrated the effectiveness of our approach in dealing with high dimensional data, both in terms of recognition rate and feature number reduction.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that the same holds also for the feature-class correlation.
 
Literatur
2.
Zurück zum Zitat Bermejo, P., Gámez, J.A., Puerta, J.M.: Improving incremental wrapper-based subset selection via replacement and early stopping. IJPRAI 25(5), 605–625 (2011)MathSciNet Bermejo, P., Gámez, J.A., Puerta, J.M.: Improving incremental wrapper-based subset selection via replacement and early stopping. IJPRAI 25(5), 605–625 (2011)MathSciNet
3.
Zurück zum Zitat Cordella, L.P., De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: Combining single class features for improving performance of a two stage classifier. In: 20th International Conference on Pattern Recognition (ICPR 2010), pp. 4352–4355. IEEE Computer Society (2010) Cordella, L.P., De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: Combining single class features for improving performance of a two stage classifier. In: 20th International Conference on Pattern Recognition (ICPR 2010), pp. 4352–4355. IEEE Computer Society (2010)
4.
Zurück zum Zitat Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)CrossRef Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)CrossRef
5.
Zurück zum Zitat De Stefano, C., Fontanella, F., Marrocco, C.: A GA-based feature selection algorithm for remote sensing images. In: Giacobini, M., et al. (eds.) EvoWorkshops 2008. LNCS, vol. 4974, pp. 285–294. Springer, Heidelberg (2008). doi:10.1007/978-3-540-78761-7_29CrossRef De Stefano, C., Fontanella, F., Marrocco, C.: A GA-based feature selection algorithm for remote sensing images. In: Giacobini, M., et al. (eds.) EvoWorkshops 2008. LNCS, vol. 4974, pp. 285–294. Springer, Heidelberg (2008). doi:10.​1007/​978-3-540-78761-7_​29CrossRef
6.
Zurück zum Zitat De Stefano, C., Fontanella, F., Maniaci, M., Scotto di Freca, A.: A method for scribe distinction in medieval manuscripts using page layout features. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011. LNCS, vol. 6978, pp. 393–402. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24085-0_41CrossRef De Stefano, C., Fontanella, F., Maniaci, M., Scotto di Freca, A.: A method for scribe distinction in medieval manuscripts using page layout features. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011. LNCS, vol. 6978, pp. 393–402. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-24085-0_​41CrossRef
7.
Zurück zum Zitat Gütlein, M., Frank, E., Hall, M., Karwath, A.: Large scale attribute selection using wrappers. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009) (2009) Gütlein, M., Frank, E., Hall, M., Karwath, A.: Large scale attribute selection using wrappers. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009) (2009)
8.
Zurück zum Zitat Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH
9.
Zurück zum Zitat Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000) Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)
10.
Zurück zum Zitat Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)CrossRef Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)CrossRef
11.
Zurück zum Zitat Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540, April 1997 Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540, April 1997
12.
Zurück zum Zitat Lee, J.S., Oh, I.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)CrossRef Lee, J.S., Oh, I.S., Moon, B.R.: Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004)CrossRef
13.
Zurück zum Zitat Li, R., Lu, J., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23(3), 195–201 (2010)CrossRef Li, R., Lu, J., Zhang, Y., Zhao, T.: Dynamic adaboost learning with feature selection based on parallel genetic algorithm for image annotation. Knowl. Based Syst. 23(3), 195–201 (2010)CrossRef
15.
Zurück zum Zitat Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: ICTAI, pp. 88–91. IEEE Computer Society, Washington, DC (1995) Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: ICTAI, pp. 88–91. IEEE Computer Society, Washington, DC (1995)
17.
Zurück zum Zitat Ochoa, G.: Error thresholds in genetic algorithms. Evol. Comput. 14(2), 157–182 (2006)CrossRef Ochoa, G.: Error thresholds in genetic algorithms. Evol. Comput. 14(2), 157–182 (2006)CrossRef
18.
Zurück zum Zitat Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4, Part 2), 2052–2064 (2014)CrossRef Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4, Part 2), 2052–2064 (2014)CrossRef
19.
Zurück zum Zitat Spolaôr, N., Lorena, A.C., Lee, H.D.: Multi-objective genetic algorithm evaluation in feature selection. In: Takahashi, R.H.C., Deb, K., Wanner, E.F., Greco, S. (eds.) EMO 2011. LNCS, vol. 6576, pp. 462–476. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19893-9_32CrossRef Spolaôr, N., Lorena, A.C., Lee, H.D.: Multi-objective genetic algorithm evaluation in feature selection. In: Takahashi, R.H.C., Deb, K., Wanner, E.F., Greco, S. (eds.) EMO 2011. LNCS, vol. 6576, pp. 462–476. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-19893-9_​32CrossRef
20.
Zurück zum Zitat Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft Comput. 12(2), 111–120 (2007)CrossRef Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft Comput. 12(2), 111–120 (2007)CrossRef
21.
Zurück zum Zitat Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRef Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRef
22.
Zurück zum Zitat Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn. Lett. 30(5), 525–534 (2009)CrossRef Yusta, S.C.: Different metaheuristic strategies to solve the feature selection problem. Pattern Recogn. Lett. 30(5), 525–534 (2009)CrossRef
23.
Zurück zum Zitat Zhai, Y., Ong, Y.S., Tsang, I.: The emerging “big dimensionality”. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)CrossRef Zhai, Y., Ong, Y.S., Tsang, I.: The emerging “big dimensionality”. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)CrossRef
Metadaten
Titel
Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm
verfasst von
Claudio De Stefano
Francesco Fontanella
Alessandra Scotto di Freca
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55849-3_33