Skip to main content
Top

2021 | OriginalPaper | Chapter

An Under-Sampling Method of Unbalanced Data Classification Based on Support Vector

Authors : Jinqiang He, Yongli Liao, Dengjie Zhu, Xujuan Fan, Xiancong Zhang

Published in: Human Centered Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

To address the problem of unbalanced class distribution of power grid transmission line fault data, the number of fault classes is relatively smaller compared to the number of normal classes, an algorithm based on support vector under-sampling is proposed for transmission line fault classification. The method obtains the support vector on the original data, calculates the distance from the majority of the classes to the \(k\) nearest neighbor support vector, then calculates the class bit statistics to measure the local density information according to the distance, and finally under-samples based on the size of the sample class bit statistics. The bird nest dataset and insulator dataset are selected for performance evaluation, and the results show that this method has a good classification effect on unbalanced data, provides a theoretical reference for unbalanced data classification research, and has a certain practical value in the problem of grid transmission line fault classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lin, W.C., Ts, C.F., Hu, Y.H., et al.: Clustering-based under sampling class-imbalanced data. Inf. Sci. 409, 17–26 (2017)CrossRef Lin, W.C., Ts, C.F., Hu, Y.H., et al.: Clustering-based under sampling class-imbalanced data. Inf. Sci. 409, 17–26 (2017)CrossRef
2.
go back to reference He, H., Bai, Y., Garcia, E.A., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 1322–1328 (2008) He, H., Bai, Y., Garcia, E.A., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 1322–1328 (2008)
3.
go back to reference Galar, M., Fernandez, A., Barrenechea, E., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2011)CrossRef Galar, M., Fernandez, A., Barrenechea, E., et al.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(4), 463–484 (2011)CrossRef
4.
go back to reference Zhang, X.C., Jiang, D.X., Han, T., et al.: Rotating machinery fault diagnosis for imbalanced data based on fast clustering algorithm and support vector machine. J. Sens. 2017, 8092691 (2017) Zhang, X.C., Jiang, D.X., Han, T., et al.: Rotating machinery fault diagnosis for imbalanced data based on fast clustering algorithm and support vector machine. J. Sens. 2017, 8092691 (2017)
5.
go back to reference Zhang, Y.Y., Li, X.Y., Gao, L., et al.: Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning. J. Manuf. Syst, 48, 34–50 (2018)CrossRef Zhang, Y.Y., Li, X.Y., Gao, L., et al.: Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning. J. Manuf. Syst, 48, 34–50 (2018)CrossRef
6.
go back to reference Chawla, N.V., Bowyer, K.W., Hall, L.O., et al.: SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)CrossRef Chawla, N.V., Bowyer, K.W., Hall, L.O., et al.: SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)CrossRef
7.
go back to reference Zhao, J., Lu, H., Jiang, J., et al.: An oversampling random forest algorithm for unbalanced data classification. Comput. Appl. Softw. 36(4), 255–261, 316 (2019) Zhao, J., Lu, H., Jiang, J., et al.: An oversampling random forest algorithm for unbalanced data classification. Comput. Appl. Softw. 36(4), 255–261, 316 (2019)
8.
go back to reference Peng, R., Yang, T., Kong, H., et al.: Research on class imbalance data classification algorithm based on CPD-SMOTE. Computer Applications and Software 35(12), 259–262, 268 (2018) Peng, R., Yang, T., Kong, H., et al.: Research on class imbalance data classification algorithm based on CPD-SMOTE. Computer Applications and Software 35(12), 259–262, 268 (2018)
9.
go back to reference He, H.B., Bai, Y., Garcia, E.A., et al.:ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (2008) He, H.B., Bai, Y., Garcia, E.A., et al.:ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (2008)
10.
go back to reference Puichung, L., Ming, H., Detian, H., et al.: An unbalanced classification algorithm based on the combination of ADASYN and Ada Boost SVM. J. Beijing Univ. Technol. 43(3), 368–375 (2017) Puichung, L., Ming, H., Detian, H., et al.: An unbalanced classification algorithm based on the combination of ADASYN and Ada Boost SVM. J. Beijing Univ. Technol. 43(3), 368–375 (2017)
11.
go back to reference Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3(1), 4–21 (2009)CrossRef Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3(1), 4–21 (2009)CrossRef
12.
go back to reference Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: 2005 International Conference on Advances in Intelligent Computing, pp. 878–887 (2005) Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: 2005 International Conference on Advances in Intelligent Computing, pp. 878–887 (2005)
13.
go back to reference Japkowiczn, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–450 (2002)CrossRef Japkowiczn, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–450 (2002)CrossRef
14.
go back to reference Alcalá-Fdez, J., Fernández, A., Luengo, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17, 1–33 (2011)MathSciNet Alcalá-Fdez, J., Fernández, A., Luengo, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17, 1–33 (2011)MathSciNet
15.
go back to reference Asuncion, A., Newman, D.: UCI machine learning repository. Univ. Calif. Irvine School Inf. Comput. Sci. 9, 10–23 (2007) Asuncion, A., Newman, D.: UCI machine learning repository. Univ. Calif. Irvine School Inf. Comput. Sci. 9, 10–23 (2007)
16.
go back to reference Kullback, S., York, N.: Information theory and entropy. Model Based Inference Life Sci. A Primer Evid. 2008, 51–82 (2008) Kullback, S., York, N.: Information theory and entropy. Model Based Inference Life Sci. A Primer Evid. 2008, 51–82 (2008)
17.
go back to reference Tang, B., He, H.: GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn. 71, 306–319 (2017)CrossRef Tang, B., He, H.: GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn. 71, 306–319 (2017)CrossRef
18.
go back to reference Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef
19.
go back to reference Mani, I., Zhang, I: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced datasets, vol. 126 (2003) Mani, I., Zhang, I: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced datasets, vol. 126 (2003)
Metadata
Title
An Under-Sampling Method of Unbalanced Data Classification Based on Support Vector
Authors
Jinqiang He
Yongli Liao
Dengjie Zhu
Xujuan Fan
Xiancong Zhang
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-70626-5_45

Premium Partner