Skip to main content
Erschienen in: Neural Processing Letters 3/2015

01.12.2015

An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem

verfasst von: R. Alejo, V. García, J. H. Pacheco-Sánchez

Erschienen in: Neural Processing Letters | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper a new dynamic over-sampling method is proposed, it is a hybrid method that combines a well known over-sampling technique (SMOTE) with the sequential back-propagation algorithm. The method is based on the back-propagation mean square error (MSE) for automatically identifying the over-sampling rate, i.e., it allows only the use of necessary training samples for dealing with the class imbalance problem and avoiding to increase excessively the (neural networks) NN training time. The main aim of the proposed method is to obtain a trade-off between NN classification performance and NN training time on scenarios where the training data set represents a multi-class classification problem, it is high imbalanced and it might request a large NN training time. Experimental results on fifteen multi-class imbalanced data sets show that the proposed method is promising.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Alejo R, Valdovinos RM, García V, Pacheco-Sanchez JH (2012) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognit Lett 34(4):380–388CrossRef Alejo R, Valdovinos RM, García V, Pacheco-Sanchez JH (2012) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognit Lett 34(4):380–388CrossRef
3.
Zurück zum Zitat Anand R, Mehrotra K, Mohan C, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4:962–969CrossRef Anand R, Mehrotra K, Mohan C, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4:962–969CrossRef
4.
Zurück zum Zitat Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:20–29CrossRef Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:20–29CrossRef
5.
Zurück zum Zitat Batista GEAPA, Prati RC, Monard MC (2005) Balancing strategies and class overlapping. In: IDA, pp. 24–35 Batista GEAPA, Prati RC, Monard MC (2005) Balancing strategies and class overlapping. In: IDA, pp. 24–35
6.
Zurück zum Zitat Bruzzone L, Serpico S (1997) Classification of imbalanced remote-sensing data by neural networks. Pattern Recognit Lett 18:1323–1328CrossRef Bruzzone L, Serpico S (1997) Classification of imbalanced remote-sensing data by neural networks. Pattern Recognit Lett 18:1323–1328CrossRef
7.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
8.
Zurück zum Zitat Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17:225–252MathSciNetCrossRef Chawla NV, Cieslak DA, Hall LO, Joshi A (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17:225–252MathSciNetCrossRef
9.
Zurück zum Zitat Crone SF, Lessmann S, Stahlbock R (2006) The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur J Oper Res 173(3):781–800MathSciNetCrossRefMATH Crone SF, Lessmann S, Stahlbock R (2006) The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur J Oper Res 173(3):781–800MathSciNetCrossRefMATH
10.
Zurück zum Zitat Debowski B, Areibi S, Gréwal G, Tempelman J (2012). A dynamic sampling framework for multi-class imbalanced data. ICMLA 2:113–118 Debowski B, Areibi S, Gréwal G, Tempelman J (2012). A dynamic sampling framework for multi-class imbalanced data. ICMLA 2:113–118
11.
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30MathSciNetMATH
12.
Zurück zum Zitat Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874CrossRef Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874CrossRef
13.
Zurück zum Zitat Fernández-Navarro F, Hervás-Martínez C, Antonio Gutiérrez P (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recogn 44(8):1821–1833CrossRefMATH Fernández-Navarro F, Hervás-Martínez C, Antonio Gutiérrez P (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recogn 44(8):1821–1833CrossRefMATH
14.
Zurück zum Zitat Fernández-Navarro F, Hervás-Martínez C, García-Alonso CR, Torres-Jiménez M (2011) Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. Expert Syst Appl 38(10):12483–12490CrossRef Fernández-Navarro F, Hervás-Martínez C, García-Alonso CR, Torres-Jiménez M (2011) Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. Expert Syst Appl 38(10):12483–12490CrossRef
15.
Zurück zum Zitat García S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17:275–306CrossRef García S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol Comput 17:275–306CrossRef
16.
Zurück zum Zitat García V, Sánchez JS, Mollineda RA (2008) On the use of surrounding neighbors for synthetic over-sampling of the minority class. In: Proceedings of the 8th conference on Simulation., modelling and optimization, SMO’08Stevens Point, Wisconsin, USA, pp 389–394 García V, Sánchez JS, Mollineda RA (2008) On the use of surrounding neighbors for synthetic over-sampling of the minority class. In: Proceedings of the 8th conference on Simulation., modelling and optimization, SMO’08Stevens Point, Wisconsin, USA, pp 389–394
17.
Zurück zum Zitat Han H, Wang W, Mao B (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. ICIC 1:878–887 Han H, Wang W, Mao B (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. ICIC 1:878–887
18.
Zurück zum Zitat Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2):171–186CrossRefMATH Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2):171–186CrossRefMATH
19.
Zurück zum Zitat Haykin S (1999) Neural networks. A comprehensive foundation, 2nd edn. Pretince Hall, New JerseyMATH Haykin S (1999) Neural networks. A comprehensive foundation, 2nd edn. Pretince Hall, New JerseyMATH
20.
Zurück zum Zitat He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328
21.
Zurück zum Zitat He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
22.
Zurück zum Zitat Iman RL, Davenport JM (1980) Approximations of the critical region of the friedman statistic. Commun Stat Theory Methods 9(6):571–595CrossRef Iman RL, Davenport JM (1980) Approximations of the critical region of the friedman statistic. Commun Stat Theory Methods 9(6):571–595CrossRef
23.
Zurück zum Zitat Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449MATH Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449MATH
24.
Zurück zum Zitat Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Emerging artificial intelligence applications in computer engineering, pp. 3–24 Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Emerging artificial intelligence applications in computer engineering, pp. 3–24
25.
Zurück zum Zitat Kretzschmar R, Karayiannis NB, Eggimann F (2005) Feedforward neural network models for handling class overlap and class imbalance. Int J Neural Syst 15(5):323–338CrossRef Kretzschmar R, Karayiannis NB, Eggimann F (2005) Feedforward neural network models for handling class overlap and class imbalance. Int J Neural Syst 15(5):323–338CrossRef
26.
Zurück zum Zitat Lawrence S, Burns I, Back A, Tsoi A, Giles CL (1998) Neural network classification and unequal prior class probabilities. In: Neural networks: tricks of the trade, LNCS. pp 299–314 Lawrence S, Burns I, Back A, Tsoi A, Giles CL (1998) Neural network classification and unequal prior class probabilities. In: Neural networks: tricks of the trade, LNCS. pp 299–314
27.
Zurück zum Zitat Lecun Y, Bottou L, Orr GB, Müller KR (1998) Efficient backProp. In: G. Orr, K. Müller (eds.) Neural networks-tricks of the trade, lecture notes in computer science, vol. 1524, pp. 5–50. Springer Verlag Lecun Y, Bottou L, Orr GB, Müller KR (1998) Efficient backProp. In: G. Orr, K. Müller (eds.) Neural networks-tricks of the trade, lecture notes in computer science, vol. 1524, pp. 5–50. Springer Verlag
28.
Zurück zum Zitat Li BY, Peng J, Chen YQ, Jin YQ (2006) Classifying unbalanced pattern groups by training neural network. ISNN 2:8–13 Li BY, Peng J, Chen YQ, Jin YQ (2006) Classifying unbalanced pattern groups by training neural network. ISNN 2:8–13
29.
Zurück zum Zitat Moscato P, Cotta C (2003) A gentle introduction to memetic algorithms. Handbook of metaheuristics, international series in operations research and management science. Springer, New York, p 105144 Moscato P, Cotta C (2003) A gentle introduction to memetic algorithms. Handbook of metaheuristics, international series in operations research and management science. Springer, New York, p 105144
30.
Zurück zum Zitat Murphey YL, Guo H, Feldkamp LA (2004) Neural learning from unbalanced data. Appl Intell 21(2):117–128CrossRefMATH Murphey YL, Guo H, Feldkamp LA (2004) Neural learning from unbalanced data. Appl Intell 21(2):117–128CrossRefMATH
31.
Zurück zum Zitat Oh SH (2011) Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74(6):1058–1061CrossRef Oh SH (2011) Error back-propagation algorithm for classification of imbalanced data. Neurocomputing 74(6):1058–1061CrossRef
32.
Zurück zum Zitat Orriols-Puig A, Bernadó-Mansilla E, Goldberg DE, Sastry K, Lanzi PL (2009) Facetwise analysis of xcs for problems with class imbalances. Trans Evol Comp 13:1093–1119CrossRef Orriols-Puig A, Bernadó-Mansilla E, Goldberg DE, Sastry K, Lanzi PL (2009) Facetwise analysis of xcs for problems with class imbalances. Trans Evol Comp 13:1093–1119CrossRef
33.
Zurück zum Zitat Ou G, Murphey YL (2007) Multi-class pattern classification using neural networks. Pattern Recognit 40(1):4–18CrossRefMATH Ou G, Murphey YL (2007) Multi-class pattern classification using neural networks. Pattern Recognit 40(1):4–18CrossRefMATH
34.
Zurück zum Zitat Provost F (2000) Machine learning from imbalanced data sets 101. In: Proceedings of the learning from imbalanced data sets: Papers from the Amercian association for artificial intelligence workshop, 2000 (Technical report WS-00-05) Provost F (2000) Machine learning from imbalanced data sets 101. In: Proceedings of the learning from imbalanced data sets: Papers from the Amercian association for artificial intelligence workshop, 2000 (Technical report WS-00-05)
35.
Zurück zum Zitat Ramanan S, Clarkson T, Taylor J (1998) Adaptive algorithm for training pram neural networks on unbalanced data sets. Electron Lett 34(13):1335–1336CrossRef Ramanan S, Clarkson T, Taylor J (1998) Adaptive algorithm for training pram neural networks on unbalanced data sets. Electron Lett 34(13):1335–1336CrossRef
36.
Zurück zum Zitat Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130CrossRef Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130CrossRef
37.
Zurück zum Zitat Weiss GM, Provost FJ (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19:315–354MATH Weiss GM, Provost FJ (2003) Learning when training data are costly: the effect of class distribution on tree induction. J Artif Intell Res 19:315–354MATH
38.
Zurück zum Zitat Wilamowski BM, Kaynak O (2001) An algorithm for fast convergence in training neural networks. In: Proceedings of the international joint conference on neural networks, 2:17781782 Wilamowski BM, Kaynak O (2001) An algorithm for fast convergence in training neural networks. In: Proceedings of the international joint conference on neural networks, 2:17781782
39.
Zurück zum Zitat Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl and Data Eng 18:63–77CrossRef Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl and Data Eng 18:63–77CrossRef
Metadaten
Titel
An Efficient Over-sampling Approach Based on Mean Square Error Back-propagation for Dealing with the Multi-class Imbalance Problem
verfasst von
R. Alejo
V. García
J. H. Pacheco-Sánchez
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 3/2015
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-014-9376-3

Weitere Artikel der Ausgabe 3/2015

Neural Processing Letters 3/2015 Zur Ausgabe

Neuer Inhalt