Skip to main content
Top

2018 | OriginalPaper | Chapter

Imbalanced Data Classification Based on MBCDK-means Undersampling and GA-ANN

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The imbalanced classification problem is often a problem in classification tasks where one class contains a few samples while the other contains a great deal of samples. When the traditional machine learning classification method is applied to the imbalanced data set, the classification performance is bad and the time cost is high. As a result, mini batch with cluster distribution K-means (MBCDK-means) undersampling method and GA-ANN model is proposed in this paper to solve these two problems. MBCDK-means chooses the samples according to the clusters distribution and the distance from the majority class clusters to the minority class cluster center. This technology can keep the original distribution of cluster and increase the sampling rate of boundary samples. It is helpful to improve the final classification performance. At the same time, compared with the classic K-means clustering undersampling method, the presented MBCDK-means undersampling method has lower time complexity. Artificial neural network (ANN) is widely used in data classification but it is easily trapped in a local minimum. Genetic algorithm artificial neural network (GA-ANN), which uses genetic algorithm to optimize the weight and bias of neural network, is raised because of this. GA-ANN achieves better performance than ANN. Experimental results on 8 data sets show the effectiveness of the proposed algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 1–12 (2016)CrossRef Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 1–12 (2016)CrossRef
2.
go back to reference Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)CrossRef Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)CrossRef
3.
go back to reference Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)CrossRef Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)CrossRef
5.
8.
go back to reference Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 5(8), 17–26 (2017)CrossRef Lin, W.C., Tsai, C.F., Hu, Y.H., Jhang, J.S.: Clustering-based undersampling in class-imbalanced data. Inf. Sci. 5(8), 17–26 (2017)CrossRef
9.
go back to reference Idris, A., Iftikhar, A., Rehman, Z.U.: Intelligent churn prediction for telecom using GP-AdaBoost learning and PSO undersampling. Cluster Comput. 1–15 (2017) Idris, A., Iftikhar, A., Rehman, Z.U.: Intelligent churn prediction for telecom using GP-AdaBoost learning and PSO undersampling. Cluster Comput. 1–15 (2017)
Metadata
Title
Imbalanced Data Classification Based on MBCDK-means Undersampling and GA-ANN
Authors
Anping Song
Quanhua Xu
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-01421-6_34

Premium Partner