Skip to main content
Erschienen in: Annals of Data Science 3/2015

01.09.2015

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

verfasst von: Li-min Du, Yang Xu, Hua Zhu

Erschienen in: Annals of Data Science | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents an improved genetic algorithm based feature selection method for multi-class imbalanced data. This method improves the fitness function through using the evaluation criterion EG-mean instead of the global classification accuracy in order to choose the features which are favorable to recognize the minor classes. The method is evaluated using several benchmark data sets, and the experimental results show that, compared with the traditional feature selection method based on genetic algorithm, the proposed method has certain advantages in the size of feature subsets and improves the precision of the minor classes for multi-class imbalanced data sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Guyon I, ElisseefF A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182 Guyon I, ElisseefF A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
2.
Zurück zum Zitat Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning form imbalanced data sets. SIGKDD Explor 6(1):1–6CrossRef Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning form imbalanced data sets. SIGKDD Explor 6(1):1–6CrossRef
3.
Zurück zum Zitat Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inform Sci 286:228–246CrossRef Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inform Sci 286:228–246CrossRef
4.
Zurück zum Zitat Du LM, Xu Y, Jin LQ (2014) Feature selection for imbalanced datasets based on improved genetic algorithm. In: Proc of the 11th International FLINS conference on decision making and soft computing, Brazil, pp 119–124 Du LM, Xu Y, Jin LQ (2014) Feature selection for imbalanced datasets based on improved genetic algorithm. In: Proc of the 11th International FLINS conference on decision making and soft computing, Brazil, pp 119–124
5.
Zurück zum Zitat Yin LZ, Ge Y, Xiao KL et al (2013) Feature selection for high-dimensional imbalanced data. Neurocomputing 105:3–11CrossRef Yin LZ, Ge Y, Xiao KL et al (2013) Feature selection for high-dimensional imbalanced data. Neurocomputing 105:3–11CrossRef
6.
Zurück zum Zitat Cerf L, Gay D, Selmaoui-Folcher N et al (2013) Parameter-free classification in multi-class imbalanced data sets. Data Knowl Eng 87:109–129CrossRef Cerf L, Gay D, Selmaoui-Folcher N et al (2013) Parameter-free classification in multi-class imbalanced data sets. Data Knowl Eng 87:109–129CrossRef
7.
Zurück zum Zitat Fernández A, López V, Galar M et al (2013) Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl Based Syst 42:97–110CrossRef Fernández A, López V, Galar M et al (2013) Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl Based Syst 42:97–110CrossRef
8.
Zurück zum Zitat Tang K, Wang R, Chen T (2011) Towards maximizing the area under the ROC curve for multi-class classification problems. In: Proceedings of the 25th AAAI conference on artificial intelligence (AAAI 2011), San Francisco, pp 483–488 Tang K, Wang R, Chen T (2011) Towards maximizing the area under the ROC curve for multi-class classification problems. In: Proceedings of the 25th AAAI conference on artificial intelligence (AAAI 2011), San Francisco, pp 483–488
9.
Zurück zum Zitat Wang R, Tang K (2012) Feature selection for MAUC oriented classification systems. Neurocomputing 89:39–54CrossRef Wang R, Tang K (2012) Feature selection for MAUC oriented classification systems. Neurocomputing 89:39–54CrossRef
10.
Zurück zum Zitat Frohlich H, Chapelle O (2003) Feature selection for support vector machines by means of genetic algorithms. In: Proceedings of the 15th IEEE international conference on tools with artificial intelligence, Sacramento, pp 142–148 Frohlich H, Chapelle O (2003) Feature selection for support vector machines by means of genetic algorithms. In: Proceedings of the 15th IEEE international conference on tools with artificial intelligence, Sacramento, pp 142–148
11.
Zurück zum Zitat Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31:231–240CrossRef Huang CL, Wang CJ (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31:231–240CrossRef
12.
Zurück zum Zitat Zhou X, Pei Z, Liu PH et al (2013) A new method for feature selection of radio abnormal signal. ICIC Express Lett 7(2):303–309 Zhou X, Pei Z, Liu PH et al (2013) A new method for feature selection of radio abnormal signal. ICIC Express Lett 7(2):303–309
13.
Zurück zum Zitat Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215CrossRef Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30:195–215CrossRef
14.
Zurück zum Zitat Sun Y, Kamel MS, Wang Y (2006) Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of the international conference on data mining, pp 592–602 Sun Y, Kamel MS, Wang Y (2006) Boosting for learning multiple classes with imbalanced class distribution. In: Proceedings of the international conference on data mining, pp 592–602
Metadaten
Titel
Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm
verfasst von
Li-min Du
Yang Xu
Hua Zhu
Publikationsdatum
01.09.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 3/2015
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-015-0060-x

Weitere Artikel der Ausgabe 3/2015

Annals of Data Science 3/2015 Zur Ausgabe

Premium Partner