This work investigates the use of sampling methods in Genetic Programming (GP) to improve the classification accuracy in binary classification problems in which the datasets have a class imbalance. Class imbalance occurs when there are more data instances in one class than the other. As a consequence of this imbalance, when overall classification rate is used as the fitness function, as in standard GP approaches, the result is often biased towards the majority class, at the expense of poor minority class accuracy. We establish that the variation in training performance introduced by sampling examples from the training set is no worse than the variation between GP runs already accepted. Results also show that the use of sampling methods during training can improve minority class classification accuracy and the robustness of classifiers evolved, giving performance on the test set better than that of those classifiers which made up the training set Pareto front.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Sampling Methods in Genetic Programming for Classification with Unbalanced Data
- Springer Berlin Heidelberg
Neuer Inhalt/© ITandMEDIA