Skip to main content
Top
Published in: Neural Computing and Applications 10/2017

16-03-2017 | New Trends in data pre-processing methods for signal and image classification

An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem

Authors: R. Alejo, J. Monroy-de-Jesús, J. C. Ambriz-Polo, J. H. Pacheco-Sánchez

Published in: Neural Computing and Applications | Issue 10/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we present an improved dynamic sampling approach (I-SDSA) for facing the multi-class imbalance problem. I-SDSA is a modification of the back-propagation algorithm, which is focused to make a better use of the training samples for improving the classification performance of the multilayer perceptron (MLP). I-SDSA uses the mean square error and a Gaussian function to identify the best samples to train the neural network. Results shown in this article stand out that I-SDSA makes better exploitation of the training dataset and improves the MLP classification performance. In others words, I-SDSA is a successful technique for dealing with the multi-class imbalance problem. In addition, results presented in this work indicate that the proposed method is very competitive in terms of classification performance with respect to classical over-sampling methods (also, combined with well-known features selection methods) and other dynamic sampling approaches, even in training time and size it is better than the over-sampling methods .

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
This MLP only has two neural network outputs (\(z_{0}^{q}\) and \(z_{1}^{q}\)), because it has been designed to work with datasets of two classes.
 
Literature
3.
go back to reference Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287 Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Logic Soft Comput 17(2–3):255–287
4.
go back to reference Alejo R, García V, Pacheco-Sánchez JH (2015) An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Process Lett 42(3):603–617. doi:10.1007/s11063-014-9376-3 CrossRef Alejo R, García V, Pacheco-Sánchez JH (2015) An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Process Lett 42(3):603–617. doi:10.​1007/​s11063-014-9376-3 CrossRef
6.
go back to reference Alejo R, Valdovinos R, García V, Pacheco-Sanchez JH (2013) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognit Lett 34(4):380–388CrossRef Alejo R, Valdovinos R, García V, Pacheco-Sanchez JH (2013) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recognit Lett 34(4):380–388CrossRef
7.
go back to reference Anand R, Mehrotra K, Mohan C, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4:962–969CrossRef Anand R, Mehrotra K, Mohan C, Ranka S (1993) An improved algorithm for neural network classification of imbalanced training sets. IEEE Trans Neural Netw 4:962–969CrossRef
9.
go back to reference Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:20–29CrossRef Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor Newsl 6:20–29CrossRef
11.
go back to reference Bruzzone L, Serpico SB (1997) Classification of imbalanced remote-sensing data by neural networks. Pattern Recognit Lett 18:1323–1328CrossRef Bruzzone L, Serpico SB (1997) Classification of imbalanced remote-sensing data by neural networks. Pattern Recognit Lett 18:1323–1328CrossRef
12.
go back to reference Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining(PAKDD09). Lecture notes on computer science, vol 5476. Springer, pp 475–482. doi:10.1007/978-3-642-01307-2_43 Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining(PAKDD09). Lecture notes on computer science, vol 5476. Springer, pp 475–482. doi:10.​1007/​978-3-642-01307-2_​43
13.
go back to reference Chawla N, Cieslak D, Hall L, Ajay J (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17:225–252MathSciNetCrossRef Chawla N, Cieslak D, Hall L, Ajay J (2008) Automatically countering imbalance and its empirical relationship to cost. Data Min Knowl Discov 17:225–252MathSciNetCrossRef
14.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATH
15.
go back to reference Duda R, Hart P, Stork D (2001) Pattern classification and scene analysis, 2nd edn. Wiley, New YorkMATH Duda R, Hart P, Stork D (2001) Pattern classification and scene analysis, 2nd edn. Wiley, New YorkMATH
17.
go back to reference Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27:861–874CrossRef Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27:861–874CrossRef
18.
go back to reference Fernández A, López V, Galar M, del Jesus M, Herrera F (2013) Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl Based Syst 42:97–110CrossRef Fernández A, López V, Galar M, del Jesus M, Herrera F (2013) Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl Based Syst 42:97–110CrossRef
19.
go back to reference Fernández-Navarro F, Hervás-Martínez C, Gutiérrez PA (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognit 44(8):1821–1833CrossRefMATH Fernández-Navarro F, Hervás-Martínez C, Gutiérrez PA (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognit 44(8):1821–1833CrossRefMATH
20.
go back to reference Fernández-Navarro F, Hervás-Martínez C, García-Alonso C, Torres-Jiménez M (2011) Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. Expert Syst Appl 38(10):12483–12490CrossRef Fernández-Navarro F, Hervás-Martínez C, García-Alonso C, Torres-Jiménez M (2011) Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. Expert Syst Appl 38(10):12483–12490CrossRef
21.
go back to reference Galar M, Fernández A, Tartas EB, Sola HB, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C 42(4):463–484. doi:10.1109/TSMCC.2011.2161285 CrossRef Galar M, Fernández A, Tartas EB, Sola HB, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C 42(4):463–484. doi:10.​1109/​TSMCC.​2011.​2161285 CrossRef
22.
go back to reference García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH
23.
go back to reference Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in intelligent computing. ICIC 2005. Lecture notes in computer science, vol 3644. Springer, Berlin, pp 878–887 Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in intelligent computing. ICIC 2005. Lecture notes in computer science, vol 3644. Springer, Berlin, pp 878–887
24.
go back to reference Hart P (1968) The condensed nearest neighbour rule. IEEE Trans Inf Theory 14(5):515–516CrossRef Hart P (1968) The condensed nearest neighbour rule. IEEE Trans Inf Theory 14(5):515–516CrossRef
26.
go back to reference He H, Bai Y, Garcia E, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN. pp 1322–1328 He H, Bai Y, Garcia E, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN. pp 1322–1328
27.
go back to reference He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
28.
30.
go back to reference Lecun Y, Bottou L, Orr GB, Müller KR (1998) Efficient backprop. In: Orr G, Müller K. (eds) Neural networks—tricks of the trade. Lecture notes in computer science, vol 1524. Springer, pp 5–50 Lecun Y, Bottou L, Orr GB, Müller KR (1998) Efficient backprop. In: Orr G, Müller K. (eds) Neural networks—tricks of the trade. Lecture notes in computer science, vol 1524. Springer, pp 5–50
31.
go back to reference Lin M, Tang K, Yao X (2013) Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Trans Neural Netw Learn Syst 24(4):647–660CrossRef Lin M, Tang K, Yao X (2013) Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Trans Neural Netw Learn Syst 24(4):647–660CrossRef
32.
go back to reference Liu H, Setiono R (1996) Feature selection and classification: a probabilistic wrapper approach. In: 9th International conference on industrial and engineering applications of artificial intelligence and expert systems(IEA-AIE96). pp 419–424 Liu H, Setiono R (1996) Feature selection and classification: a probabilistic wrapper approach. In: 9th International conference on industrial and engineering applications of artificial intelligence and expert systems(IEA-AIE96). pp 419–424
33.
go back to reference López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141CrossRef López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141CrossRef
34.
go back to reference Luengo J, García S, Herrera F (2009) A study on the use of statistical tests for experimentation with neural networks: analysis of parametric test conditions and non-parametric tests. Expert Syst Appl 36(4):7798–7808CrossRef Luengo J, García S, Herrera F (2009) A study on the use of statistical tests for experimentation with neural networks: analysis of parametric test conditions and non-parametric tests. Expert Syst Appl 36(4):7798–7808CrossRef
37.
go back to reference Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158(C):48–61CrossRef Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158(C):48–61CrossRef
38.
go back to reference Prati R, Batista G, Monard M (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI. pp 312–321 Prati R, Batista G, Monard M (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI. pp 312–321
39.
go back to reference Prati RC, Batista GE, Monard MC (2009) Data mining with imbalanced class distributions: concepts and methods. In: Proceedings of the 4th Indian international conference on artificial intelligence, IICAI, Tumkur, Karnataka, India, 16-18 Dec 2009, pp 359–376 Prati RC, Batista GE, Monard MC (2009) Data mining with imbalanced class distributions: concepts and methods. In: Proceedings of the 4th Indian international conference on artificial intelligence, IICAI, Tumkur, Karnataka, India, 16-18 Dec 2009, pp 359–376
40.
go back to reference Sánchez JS, García V, Mollineda RA (2011) Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification. In: Proceedings of the 7th International conference on machine learning and data mining in pattern recognition, MLDM’11. Springer, Berlin, pp 511–523. http://dl.acm.org/citation.cfm?id=2033831.2033875 Sánchez JS, García V, Mollineda RA (2011) Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification. In: Proceedings of the 7th International conference on machine learning and data mining in pattern recognition, MLDM’11. Springer, Berlin, pp 511–523. http://​dl.​acm.​org/​citation.​cfm?​id=​2033831.​2033875
43.
go back to reference Sun T, Jiao L, Feng J, Liu F, Zhang X (2015) Imbalanced hyperspectral image classification based on maximum margin. IEEE Geosci Remote Sens Lett 12(3):522–526CrossRef Sun T, Jiao L, Feng J, Liu F, Zhang X (2015) Imbalanced hyperspectral image classification based on maximum margin. IEEE Geosci Remote Sens Lett 12(3):522–526CrossRef
45.
go back to reference Wang J, Jean JSN (1993) Resolving multifont character confusion with neural networks. Pattern Recognit 26(1):175–187CrossRef Wang J, Jean JSN (1993) Resolving multifont character confusion with neural networks. Pattern Recognit 26(1):175–187CrossRef
46.
go back to reference Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130CrossRef Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130CrossRef
47.
go back to reference Xu-Ying L, Qian-Qian L, Zhi-Hua Z (2013) Learning imbalanced multi-class data with optimal dichotomy weights. In: 2013 IEEE 13th international conference on data mining, Dallas, TX, USA, 7–10 Dec 2013, pp 478–487. doi:10.1109/ICDM.2013.51 Xu-Ying L, Qian-Qian L, Zhi-Hua Z (2013) Learning imbalanced multi-class data with optimal dichotomy weights. In: 2013 IEEE 13th international conference on data mining, Dallas, TX, USA, 7–10 Dec 2013, pp 478–487. doi:10.​1109/​ICDM.​2013.​51
48.
go back to reference Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst 13(2):44–49CrossRef Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. IEEE Intell Syst 13(2):44–49CrossRef
50.
go back to reference Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18:63–77CrossRef Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18:63–77CrossRef
Metadata
Title
An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem
Authors
R. Alejo
J. Monroy-de-Jesús
J. C. Ambriz-Polo
J. H. Pacheco-Sánchez
Publication date
16-03-2017
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 10/2017
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-2938-3

Other articles of this Issue 10/2017

Neural Computing and Applications 10/2017 Go to the issue

New Trends in data pre-processing methods for signal and image classification

Muscular synergy classification and myoelectric control using high-order cross-cumulants

New Trends in data pre-processing methods for signal and image classification

A new approach to eliminating EOG artifacts from the sleep EEG signals for the automatic sleep stage classification

New Trends in data pre-processing methods for signal and image classification

A novel image segmentation approach based on neutrosophic c-means clustering and indeterminacy filtering

New Trends in data pre-processing methods for signal and image classification

Automatic detection of respiratory arrests in OSA patients using PPG and machine learning techniques

Premium Partner