Skip to main content

2018 | OriginalPaper | Buchkapitel

Evolutionary Cost-Sensitive Balancing: A Generic Method for Imbalanced Classification Problems

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Efficient classification under imbalanced class distributions is currently of interest in data mining research, considering that traditional learning methods often fail to achieve satisfying results in such domains. Also, the correct choice of the metric is essential for the recognition effort. This paper presents a new general methodology for improving the performance of classifiers in imbalanced problems. The method, Evolutionary Cost-Sensitive Balancing (ECSB), is a meta-approach, which can be employed with any error-reduction classifier. It utilizes genetic search and cost-sensitive mechanisms to boost the performance of the base classifier. We present evaluations on benchmark data, comparing the results obtained by ECSB with those of similar recent methods in the literature: SMOTE and EUS. We found that ECSB boosts the performance of traditional classifiers in imbalanced problems, achieving ~45% relative improvement in true positive rate (\(\text {TP}_{\text {rate}}\)) and around 16% in F-measure (FM) on the average; also, it performs better than sampling strategies, with ~35% relative improvement in \(\text {TP}_{\text {rate}}\) and ~12% in FM over SMOTE (on the average), similar \(text{TP}_{\text {rate}}\) and geometric mean (GM) values and slightly higher area under de curve (AUC) values than EUS (up to ~9% relative improvement).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aliamiri A: Statistical Methods for Unexploded Ordnance Discrimination. PhD Thesis. Department of Electrical and Computer Engineering. Northeastern University. Boston, MA (2006) Aliamiri A: Statistical Methods for Unexploded Ordnance Discrimination. PhD Thesis. Department of Electrical and Computer Engineering. Northeastern University. Boston, MA (2006)
2.
Zurück zum Zitat Barandela, R., Sanchez, J.S., Garcia, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–85 (2003)CrossRef Barandela, R., Sanchez, J.S., Garcia, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–85 (2003)CrossRef
4.
Zurück zum Zitat Brodersen, K.H., Ong, C.S., Stephen, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: Proceedings of the 20th International Conference on Pattern Recognition, pp. 3121–3124 (2010) Brodersen, K.H., Ong, C.S., Stephen, K.E., Buhmann, J.M.: The balanced accuracy and its posterior distribution. In: Proceedings of the 20th International Conference on Pattern Recognition, pp. 3121–3124 (2010)
5.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH
6.
Zurück zum Zitat Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEboost: improving prediction of the minority class in boosting. In: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 107—119 (2003) Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEboost: improving prediction of the minority class in boosting. In: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 107—119 (2003)
7.
Zurück zum Zitat Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)CrossRef Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. 6(1), 1–6 (2004)CrossRef
8.
Zurück zum Zitat Chawla, N.: Data Mining from Imbalanced Data Sets: An Overview. Data Mining and Knowledge Discovery Handbook. Springer, US (2006) Chawla, N.: Data Mining from Imbalanced Data Sets: An Overview. Data Mining and Knowledge Discovery Handbook. Springer, US (2006)
10.
Zurück zum Zitat Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press (1999) Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press (1999)
11.
Zurück zum Zitat Garcia, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)CrossRef Garcia, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)CrossRef
12.
Zurück zum Zitat Grzymala-Busse, J.W., Stefanowski, J., Wilk, S.: A comparison of two approaches to data mining from imbalanced data. J. Intell. Manuf. 16, 565–573 (2005)CrossRef Grzymala-Busse, J.W., Stefanowski, J., Wilk, S.: A comparison of two approaches to data mining from imbalanced data. J. Intell. Manuf. 16, 565–573 (2005)CrossRef
13.
Zurück zum Zitat Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the databoost-IM approach. Sigkdd Explor. 6, 30–39 (2004)CrossRef Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the databoost-IM approach. Sigkdd Explor. 6, 30–39 (2004)CrossRef
14.
Zurück zum Zitat Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory IT-14, 515—516 (1968) Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory IT-14, 515—516 (1968)
15.
Zurück zum Zitat Huang, K., Yang, H., King, I., Lyu, M.R.: Imbalanced learning with a biased minimax probability machine. IEEE Trans. Syst. Man Cybern. B Cybern. 36(4), 913–923 (2006)CrossRef Huang, K., Yang, H., King, I., Lyu, M.R.: Imbalanced learning with a biased minimax probability machine. IEEE Trans. Syst. Man Cybern. B Cybern. 36(4), 913–923 (2006)CrossRef
16.
Zurück zum Zitat Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–449 (2002)MATH Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–449 (2002)MATH
17.
Zurück zum Zitat Kubat, M., Matwin, S.: Addressing the course of imbalanced training sets: one-sided selection. In: ICML, pp. 179—186 (1997) Kubat, M., Matwin, S.: Addressing the course of imbalanced training sets: one-sided selection. In: ICML, pp. 179—186 (1997)
18.
Zurück zum Zitat Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. Technical Report A-2001-2. University of Tampere (2001) Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. Technical Report A-2001-2. University of Tampere (2001)
19.
Zurück zum Zitat Lemnaru, C., Potolea, R.: Imbalanced Classification Problems: Systematic Study. Issues and Best Practices. LNBIP, vol. 102, pp. 35–50 (2012) Lemnaru, C., Potolea, R.: Imbalanced Classification Problems: Systematic Study. Issues and Best Practices. LNBIP, vol. 102, pp. 35–50 (2012)
20.
Zurück zum Zitat Lin, Y., Lee, Y., Wahba, G.: Support vector machines for classification in nonstandard situations. Mach. Learn. 46, 191–202 (2002)CrossRefMATH Lin, Y., Lee, Y., Wahba, G.: Support vector machines for classification in nonstandard situations. Mach. Learn. 46, 191–202 (2002)CrossRefMATH
21.
Zurück zum Zitat Liu, B., Ma, Y., Wong, C.K.: Improving an association rule based classifier. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 504–509 (2000) Liu, B., Ma, Y., Wong, C.K.: Improving an association rule based classifier. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 504–509 (2000)
22.
Zurück zum Zitat Liu, W., Chawlam, S., Cieslak, D., Chawla, N.: A robust decision tree algorithms for imbalanced data sets. In: Proceedings of the Tenth SIAM International Conference on Data Mining, pp. 766–777 (2010) Liu, W., Chawlam, S., Cieslak, D., Chawla, N.: A robust decision tree algorithms for imbalanced data sets. In: Proceedings of the Tenth SIAM International Conference on Data Mining, pp. 766–777 (2010)
23.
Zurück zum Zitat Liu, W., Chawla, S.: Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets. Advances in Knowledge Discovery and Data Mining. LNCS, vol. 6635, pp. 345–356 (2011) Liu, W., Chawla, S.: Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets. Advances in Knowledge Discovery and Data Mining. LNCS, vol. 6635, pp. 345–356 (2011)
24.
Zurück zum Zitat Quinlan, J.R.: Improved estimates for the accuracy of small disjuncts. Mach. Learn. 6, 93–98 (1991) Quinlan, J.R.: Improved estimates for the accuracy of small disjuncts. Mach. Learn. 6, 93–98 (1991)
25.
Zurück zum Zitat Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)CrossRefMATH Sun, Y., Kamel, M.S., Wong, A.K.C., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)CrossRefMATH
26.
Zurück zum Zitat Tian, J., Gu, H., Liu, W.: Imbalanced classification using support vector machine ensemble. Neural Comput. Appl. 20(2), 203–209 (2011)CrossRef Tian, J., Gu, H., Liu, W.: Imbalanced classification using support vector machine ensemble. Neural Comput. Appl. 20(2), 203–209 (2011)CrossRef
27.
Zurück zum Zitat Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Commun. SMC-6, 769—772 (1976) Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Commun. SMC-6, 769—772 (1976)
28.
Zurück zum Zitat Turney, P.: Types of cost in inductive concept learning. In: Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning. Stanford University, California (2000) Turney, P.: Types of cost in inductive concept learning. In: Proceedings of the Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning. Stanford University, California (2000)
29.
Zurück zum Zitat Visa, S., Ralescu, A.: Issues in mining imbalanced data sets-a review paper. In: Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67–73 (2005) Visa, S., Ralescu, A.: Issues in mining imbalanced data sets-a review paper. In: Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67–73 (2005)
30.
Zurück zum Zitat Weiss, G.M., Provost, F.: The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-44. Department of Computer Science, Rutgers University (2001) Weiss, G.M., Provost, F.: The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-44. Department of Computer Science, Rutgers University (2001)
31.
Zurück zum Zitat Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)MATH Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)MATH
32.
Zurück zum Zitat Weiss, G.: Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7—19 (2004) Weiss, G.: Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7—19 (2004)
33.
Zurück zum Zitat Williams, D., Myers, V., Silvious, M.: Mine classification with imbalanced data. IEEE Geosci. Remote Sens. Lett. 6(3), 528–532 (2009)CrossRef Williams, D., Myers, V., Silvious, M.: Mine classification with imbalanced data. IEEE Geosci. Remote Sens. Lett. 6(3), 528–532 (2009)CrossRef
34.
Zurück zum Zitat Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
35.
Zurück zum Zitat Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Data Sets (2003) Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Data Sets (2003)
36.
Zurück zum Zitat Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pp. 204–213 (2001) Zadrozny, B., Elkan, C.: Learning and making decisions when costs and probabilities are both unknown. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pp. 204–213 (2001)
37.
Zurück zum Zitat Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)CrossRefMathSciNet Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)CrossRefMathSciNet
Metadaten
Titel
Evolutionary Cost-Sensitive Balancing: A Generic Method for Imbalanced Classification Problems
verfasst von
Camelia Lemnaru
Rodica Potolea
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-69710-9_14