Skip to main content
Erschienen in: Neural Processing Letters 2/2019

10.01.2019

Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function

verfasst von: Yuri Sousa Aurelio, Gustavo Matheus de Almeida, Cristiano Leite de Castro, Antonio Padua Braga

Erschienen in: Neural Processing Letters | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a novel approach to deal with the imbalanced data set problem in neural networks by incorporating prior probabilities into a cost-sensitive cross-entropy error function. Several classical benchmarks were tested for performance evaluation using different metrics, namely G-Mean, area under the ROC curve (AUC), adjusted G-Mean, Accuracy, True Positive Rate, True Negative Rate and F1-score. The obtained results were compared to well-known algorithms and showed the effectiveness and robustness of the proposed approach, which results in well-balanced classifiers given different imbalance scenarios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chawla NV, Japkowicz N, Kotcz A (2004a) Special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6CrossRef Chawla NV, Japkowicz N, Kotcz A (2004a) Special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6CrossRef
2.
Zurück zum Zitat Chawla N, Japkowicz N, Kolcz A (2004b) Special issue on learning from imbalanced data sets. In: Editorial of the ACM SIGKDD explorations newsletter Chawla N, Japkowicz N, Kolcz A (2004b) Special issue on learning from imbalanced data sets. In: Editorial of the ACM SIGKDD explorations newsletter
3.
Zurück zum Zitat He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
4.
Zurück zum Zitat López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141CrossRef López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141CrossRef
5.
Zurück zum Zitat Bhowan U, Johnston M, Zhang M, Yao X (2013) Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans Evol Comput 17(3):368–386CrossRef Bhowan U, Johnston M, Zhang M, Yao X (2013) Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans Evol Comput 17(3):368–386CrossRef
6.
Zurück zum Zitat Frasca M, Bertoni A, Re M, Valentini G (2013) A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Netw 43:84–98CrossRef Frasca M, Bertoni A, Re M, Valentini G (2013) A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Netw 43:84–98CrossRef
7.
Zurück zum Zitat Wang L, Yang B, Chen Y, Zhang X, Orchard J (2017) Improving neural-network classifiers using nearest neighbor partitioning. IEEE Trans Neural Netw Learn Syst 28(10):2255–2267MathSciNetCrossRef Wang L, Yang B, Chen Y, Zhang X, Orchard J (2017) Improving neural-network classifiers using nearest neighbor partitioning. IEEE Trans Neural Netw Learn Syst 28(10):2255–2267MathSciNetCrossRef
8.
Zurück zum Zitat Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899CrossRef Castro CL, Braga AP (2013) Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data. IEEE Trans Neural Netw Learn Syst 24(6):888–899CrossRef
9.
Zurück zum Zitat Oh SH (2011) A statistical perspective of neural networks for imbalanced data problems. Int J Contents 7(3):1–5CrossRef Oh SH (2011) A statistical perspective of neural networks for imbalanced data problems. Int J Contents 7(3):1–5CrossRef
10.
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkMATH Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkMATH
11.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357
12.
Zurück zum Zitat Barandela R, Valdovinos RM, Sánchez JS, Ferri FJ (2004) The imbalanced training sample problem: under or over sampling? In: Structural, syntactic, and statistical pattern recognition. Springer, pp 806–814 Barandela R, Valdovinos RM, Sánchez JS, Ferri FJ (2004) The imbalanced training sample problem: under or over sampling? In: Structural, syntactic, and statistical pattern recognition. Springer, pp 806–814
13.
Zurück zum Zitat He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328
14.
Zurück zum Zitat Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21(5):813–830CrossRef Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21(5):813–830CrossRef
15.
Zurück zum Zitat Chen S, He H, Garcia EA (2010) Ramoboost: ranked minority oversampling in boosting. IEEE Trans Neural Netw 21(10):1624–1642CrossRef Chen S, He H, Garcia EA (2010) Ramoboost: ranked minority oversampling in boosting. IEEE Trans Neural Netw 21(10):1624–1642CrossRef
16.
Zurück zum Zitat Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378CrossRef Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378CrossRef
17.
Zurück zum Zitat Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336CrossRef Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336CrossRef
18.
Zurück zum Zitat Kukar M, Kononenko I (1998) Cost-sensitive learning with neural networks. In: ECAI, pp 445–449 Kukar M, Kononenko I (1998) Cost-sensitive learning with neural networks. In: ECAI, pp 445–449
19.
Zurück zum Zitat Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence. Lawrence Erlbaum Associates Ltd, pp 973–978 Elkan C (2001) The foundations of cost-sensitive learning. In: International joint conference on artificial intelligence. Lawrence Erlbaum Associates Ltd, pp 973–978
20.
Zurück zum Zitat Alejo R, García V, Sotoca JM, Mollineda RA, Sánchez JS (2007) Improving the performance of the rbf neural networks trained with imbalanced samples. In: Computational and ambient intelligence. Springer, pp 162–169 Alejo R, García V, Sotoca JM, Mollineda RA, Sánchez JS (2007) Improving the performance of the rbf neural networks trained with imbalanced samples. In: Computational and ambient intelligence. Springer, pp 162–169
21.
Zurück zum Zitat Kline DM, Berardi VL (2005) Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comput Appl 14(4):310–318CrossRef Kline DM, Berardi VL (2005) Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comput Appl 14(4):310–318CrossRef
22.
Zurück zum Zitat Berger JO (2010) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New York Berger JO (2010) Statistical decision theory and Bayesian analysis, 2nd edn. Springer, New York
23.
Zurück zum Zitat Riedmiller M, Braun H (1993) A direct adaptive method for faster back propagation learning: the rprop algorithm. In: IEEE international conference on neural networks. IEEE, pp 586–591 Riedmiller M, Braun H (1993) A direct adaptive method for faster back propagation learning: the rprop algorithm. In: IEEE international conference on neural networks. IEEE, pp 586–591
24.
Zurück zum Zitat Zhu C, Wang Z (2017) Entropy-based matrix learning machine for imbalanced data sets. Pattern Recognit Lett 88:72–80CrossRef Zhu C, Wang Z (2017) Entropy-based matrix learning machine for imbalanced data sets. Pattern Recognit Lett 88:72–80CrossRef
26.
Zurück zum Zitat Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231CrossRef Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231CrossRef
27.
Zurück zum Zitat Kubat M, Matwin S (1997) Addressing the curse of imbalanced trainingsets: one-sided selection. In: ICML, Nashville, USA, vol 97, pp 179–186 Kubat M, Matwin S (1997) Addressing the curse of imbalanced trainingsets: one-sided selection. In: ICML, Nashville, USA, vol 97, pp 179–186
29.
Zurück zum Zitat Batuwita R, Palade V (2012) Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. J Bioinform Comput Biol 10(04):1250003CrossRef Batuwita R, Palade V (2012) Adjusted geometric-mean: a novel performance measure for imbalanced bioinformatics datasets learning. J Bioinform Comput Biol 10(04):1250003CrossRef
30.
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30MathSciNetMATH
31.
Zurück zum Zitat Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRef Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701CrossRef
Metadaten
Titel
Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function
verfasst von
Yuri Sousa Aurelio
Gustavo Matheus de Almeida
Cristiano Leite de Castro
Antonio Padua Braga
Publikationsdatum
10.01.2019
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 2/2019
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-018-09977-1

Weitere Artikel der Ausgabe 2/2019

Neural Processing Letters 2/2019 Zur Ausgabe

Neuer Inhalt