Skip to main content

2018 | OriginalPaper | Buchkapitel

Improving Evolutionary Algorithm Performance for Feature Selection in High-Dimensional Data

verfasst von : N. Cilia, C. De Stefano, F. Fontanella, A. Scotto di Freca

Erschienen in: Applications of Evolutionary Computation

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In classification and clustering problems, selecting a subset of discriminative features is a challenging problem, especially when hundreds or thousands of features are involved. In this framework, Evolutionary Computation (EC) techniques have received a growing scientific interest in the last years, because they are able to explore large search spaces without requiring any a priori knowledge or assumption on the considered domain. Following this line of thought, we developed a novel strategy to improve the performance of EC-based algorithms for feature selection. The proposed strategy requires to rank the whole set of available features according to a univariate evaluation function; then the search space represented by the first M ranked features is searched using an evolutionary algorithm for finding feature subsets with high discriminative power. Results of comparisons demonstrated the effectiveness of the proposed approach in improving the performance obtainable with three effective and widely used EC-based algorithm for feature selection in high dimensional data problems, namely Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO) and Artificial Bees Colony (ABC).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that the same holds also for the feature-class correlation.
 
Literatur
1.
Zurück zum Zitat Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)CrossRef Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)CrossRef
2.
Zurück zum Zitat Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRef Xue, B., Zhang, M., Browne, W.N., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016)CrossRef
3.
Zurück zum Zitat Bevilacqua, V., Mastronardi, G., Piscopo, G.: Evolutionary approach to inverse planning in coplanar radiotherapy. Image Vis. Comput. 25(2), 196–203 (2007). Soft Computing in Image AnalysisCrossRefMATH Bevilacqua, V., Mastronardi, G., Piscopo, G.: Evolutionary approach to inverse planning in coplanar radiotherapy. Image Vis. Comput. 25(2), 196–203 (2007). Soft Computing in Image AnalysisCrossRefMATH
4.
Zurück zum Zitat Menolascina, F., Tommasi, S., Paradiso, A., Cortellino, M., Bevilacqua, V., Mastronardi, G.: Novel data mining techniques in acgh based breast cancer subtypes profiling: the biological perspective. In: 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp. 9–16, April 2007 Menolascina, F., Tommasi, S., Paradiso, A., Cortellino, M., Bevilacqua, V., Mastronardi, G.: Novel data mining techniques in acgh based breast cancer subtypes profiling: the biological perspective. In: 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp. 9–16, April 2007
5.
Zurück zum Zitat Menolascina, F., Bellomo, D., Maiwald, T., Bevilacqua, V., Ciminelli, C., Paradiso, A., Tommasi, S.: Developing optimal input design strategies in cancer systems biology with applications to microfluidic device engineering. BMC Bioinform. 10(12) (2009) Menolascina, F., Bellomo, D., Maiwald, T., Bevilacqua, V., Ciminelli, C., Paradiso, A., Tommasi, S.: Developing optimal input design strategies in cancer systems biology with applications to microfluidic device engineering. BMC Bioinform. 10(12) (2009)
6.
Zurück zum Zitat Bevilacqua, V., Brunetti, A., Triggiani, M., Magaletti, D., Telegrafo, M., Moschetta, M.: An optimized feed-forward artificial neural network topology to support radiologists in breast lesions classification. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, GECCO 2016 Companion, pp. 1385–1392. ACM, New York (2016). https://doi.org/10.1145/2908961.2931733 Bevilacqua, V., Brunetti, A., Triggiani, M., Magaletti, D., Telegrafo, M., Moschetta, M.: An optimized feed-forward artificial neural network topology to support radiologists in breast lesions classification. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, GECCO 2016 Companion, pp. 1385–1392. ACM, New York (2016). https://​doi.​org/​10.​1145/​2908961.​2931733
7.
Zurück zum Zitat Manimala, K., Selvi, K., Ahila, R.: Hybrid soft computing techniques for feature selection and parameter optimization in power quality data mining. Appl. Soft Comput. 11(8), 5485–5497 (2011)CrossRef Manimala, K., Selvi, K., Ahila, R.: Hybrid soft computing techniques for feature selection and parameter optimization in power quality data mining. Appl. Soft Comput. 11(8), 5485–5497 (2011)CrossRef
8.
Zurück zum Zitat Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)CrossRef Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43(6), 1656–1671 (2013)CrossRef
10.
Zurück zum Zitat Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540, April 1997 Lanzi, P.: Fast feature selection with genetic algorithms: a filter approach. In: IEEE International Conference on Evolutionary Computation, pp. 537–540, April 1997
11.
Zurück zum Zitat Cordella, L.P., De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: Combining single class features for improving performance of a two stage classifier. In: 20th International Conference on Pattern Recognition (ICPR 2010), pp. 4352–4355. IEEE Computer Society (2010) Cordella, L.P., De Stefano, C., Fontanella, F., Marrocco, C., Scotto di Freca, A.: Combining single class features for improving performance of a two stage classifier. In: 20th International Conference on Pattern Recognition (ICPR 2010), pp. 4352–4355. IEEE Computer Society (2010)
13.
14.
Zurück zum Zitat Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4, Part 2), 2052–2064 (2014)CrossRef Oreski, S., Oreski, G.: Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41(4, Part 2), 2052–2064 (2014)CrossRef
15.
Zurück zum Zitat Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft. Comput. 12(2), 111–120 (2007)CrossRef Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft. Comput. 12(2), 111–120 (2007)CrossRef
16.
Zurück zum Zitat Ugolotti, R., Mesejo, P., Zongaro, S., Bardoni, B., Berto, G., Bianchi, F., Molineris, I., Giacobini, M., Cagnoni, S., Cunto, F.D.: Visual search of neuropil-enriched rnas from brain in situ hybridization data through the image analysis pipeline hippo-atesc. PLOS ONE 8(9) (2013) Ugolotti, R., Mesejo, P., Zongaro, S., Bardoni, B., Berto, G., Bianchi, F., Molineris, I., Giacobini, M., Cagnoni, S., Cunto, F.D.: Visual search of neuropil-enriched rnas from brain in situ hybridization data through the image analysis pipeline hippo-atesc. PLOS ONE 8(9) (2013)
18.
Zurück zum Zitat Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: ICTAI, pp. 88–91. IEEE Computer Society, Washington, DC (1995) Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: ICTAI, pp. 88–91. IEEE Computer Society, Washington, DC (1995)
19.
Zurück zum Zitat Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000) Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)
21.
Zurück zum Zitat Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)CrossRef Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28(13), 1825–1844 (2007)CrossRef
22.
Zurück zum Zitat Karaboga, D.: An idea based on Honey Bee Swarm for Numerical Optimization. Technical report TR06, Erciyes University, October 2005 Karaboga, D.: An idea based on Honey Bee Swarm for Numerical Optimization. Technical report TR06, Erciyes University, October 2005
23.
Zurück zum Zitat Gütlein, M., Frank, E., Hall, M., Karwath, A.: Large scale attribute selection using wrappers. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009) (2009) Gütlein, M., Frank, E., Hall, M., Karwath, A.: Large scale attribute selection using wrappers. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009) (2009)
24.
Zurück zum Zitat Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML 2003, pp. 856–863. AAAI Press (2003) Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML 2003, pp. 856–863. AAAI Press (2003)
25.
Zurück zum Zitat Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Patt. Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Patt. Anal. Mach. Intell. 27(8), 1226–1238 (2005)CrossRef
26.
Zurück zum Zitat Babiloni, C., Triggiani, A.I., Lizio, R., Cordone, S., Tattoli, G., Bevilacqua, V., Soricelli, A., Ferri, R., Nobili, F., Gesualdo, L., Millán-Calenti, J.C., Buján, A., Tortelli, R., Cardinali, V., Barulli, M.R., Giannini, A., Spagnolo, P., Armenise, S., Buenza, G., Scianatico, G., Logroscino, G., Frisoni, G.B., del Percio, C.: Classification of single normal and alzheimer’s disease individuals from cortical sources of resting state eeg rhythms. Front. Neurosci. 10, 47 (2016)CrossRef Babiloni, C., Triggiani, A.I., Lizio, R., Cordone, S., Tattoli, G., Bevilacqua, V., Soricelli, A., Ferri, R., Nobili, F., Gesualdo, L., Millán-Calenti, J.C., Buján, A., Tortelli, R., Cardinali, V., Barulli, M.R., Giannini, A., Spagnolo, P., Armenise, S., Buenza, G., Scianatico, G., Logroscino, G., Frisoni, G.B., del Percio, C.: Classification of single normal and alzheimer’s disease individuals from cortical sources of resting state eeg rhythms. Front. Neurosci. 10, 47 (2016)CrossRef
27.
Zurück zum Zitat Bria, A., Marrocco, C., Molinara, M., Tortorella, F.: An effective learning strategy for cascaded object detection. Inf. Sci. 340, 17–26 (2016)MathSciNetCrossRef Bria, A., Marrocco, C., Molinara, M., Tortorella, F.: An effective learning strategy for cascaded object detection. Inf. Sci. 340, 17–26 (2016)MathSciNetCrossRef
28.
Zurück zum Zitat Marrocco, C., Molinara, M., Tortorella, F.: On linear combinations of dichotomizers for maximizing the area under the ROC curve. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 41(3), 610–620 (2011)CrossRef Marrocco, C., Molinara, M., Tortorella, F.: On linear combinations of dichotomizers for maximizing the area under the ROC curve. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 41(3), 610–620 (2011)CrossRef
29.
Zurück zum Zitat Marrocco, C., Tortorella, F.: Exploiting coding theory for classification: an ldpc-based strategy for multiclass-to-binary decomposition. Inf. Sci. 357, 88–107 (2016)CrossRef Marrocco, C., Tortorella, F.: Exploiting coding theory for classification: an ldpc-based strategy for multiclass-to-binary decomposition. Inf. Sci. 357, 88–107 (2016)CrossRef
30.
Zurück zum Zitat Ricamato, M.T., Marrocco, C., Tortorella, F.: MCS-based balancing techniques for skewed classes: an empirical comparison. In: IEEE 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4 (2008) Ricamato, M.T., Marrocco, C., Tortorella, F.: MCS-based balancing techniques for skewed classes: an empirical comparison. In: IEEE 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4 (2008)
Metadaten
Titel
Improving Evolutionary Algorithm Performance for Feature Selection in High-Dimensional Data
verfasst von
N. Cilia
C. De Stefano
F. Fontanella
A. Scotto di Freca
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-77538-8_30

Premium Partner