Skip to main content
Top

22-08-2024

Clustering-Based Oversampling Algorithm for Multi-class Imbalance Learning

Authors: Haixia Zhao, Jian Wu

Published in: Journal of Classification

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multi-class imbalanced data learning faces many challenges. Its complex structural characteristics cause severe intra-class imbalance or overgeneralization in most solution strategies. This negatively affects data learning. This paper proposes a clustering-based oversampling algorithm (COM) to handle multi-class imbalance learning. In order to avoid the loss of important information, COM clusters the minority class based on the structural characteristics of the instances, among which rare instances and outliers are carefully portrayed through assigning a sampling weight to each of the clusters. Clusters with high densities are given low weights, and then, oversampling is performed within clusters to avoid overgeneralization. COM avoids intra-class imbalance effectively because low-density clusters are more likely than high-density ones to be selected to synthesize instances. Our study used the UCI and KEEL imbalanced datasets to demonstrate the effectiveness and stability of the proposed method.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Abdi, L., & Hashemi, S. (2015). To combat multi-class imbalanced problems by means of over-sampling techniques. Soft Computing, 19(12), 3369–3385.CrossRef Abdi, L., & Hashemi, S. (2015). To combat multi-class imbalanced problems by means of over-sampling techniques. Soft Computing, 19(12), 3369–3385.CrossRef
go back to reference Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1), 341–378. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1), 341–378.
go back to reference Dong, M., Liu, M., & Jing, C. (2022). One-against-all-based Hellinger distance decision tree for multiclass imbalanced learning. Front Inform Technol Electron Eng, 23, 278–290.CrossRef Dong, M., Liu, M., & Jing, C. (2022). One-against-all-based Hellinger distance decision tree for multiclass imbalanced learning. Front Inform Technol Electron Eng, 23, 278–290.CrossRef
go back to reference Fernandez-navarro, F., Hervásmartínez, C., & Gutiérrez, P. A. (2011). A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognition, 44(8), 1821–1833.CrossRef Fernandez-navarro, F., Hervásmartínez, C., & Gutiérrez, P. A. (2011). A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recognition, 44(8), 1821–1833.CrossRef
go back to reference Guo, H., Li, Y., Li, Y., & Li, J. (2016). BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Engineering Applications of Artificial Intelligence, 49, 176–193.CrossRef Guo, H., Li, Y., Li, Y., & Li, J. (2016). BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Engineering Applications of Artificial Intelligence, 49, 176–193.CrossRef
go back to reference García, V., Sánchez, J. S., & Mollineda, R. A. (2012). On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems, 25(1), 13–21.CrossRef García, V., Sánchez, J. S., & Mollineda, R. A. (2012). On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems, 25(1), 13–21.CrossRef
go back to reference H. He, Y. Bai, E. A. Garcia, and S. Li, (2008) “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” 2008 IEEE International Joint Conference on Neural Networks, IEEE World Congress on Computational Intelligence, pp. 1322–1328. H. He, Y. Bai, E. A. Garcia, and S. Li,  (2008) “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” 2008 IEEE International Joint Conference on Neural Networks, IEEE World Congress on Computational Intelligence, pp. 1322–1328.
go back to reference H. Hartono, E. Ongko, “Combining hybrid approach redefinition-multiclass imbalance (HAR-MI) and hybrid sampling in handling multi-class imbalance and overlapping,” JOIV: International Journal on Informatics Visualization, vol. 5, no. 1, pp. 22–26, 2021.CrossRef H. Hartono, E. Ongko, “Combining hybrid approach redefinition-multiclass imbalance (HAR-MI) and hybrid sampling in handling multi-class imbalance and overlapping,” JOIV: International Journal on Informatics Visualization, vol. 5, no. 1, pp. 22–26, 2021.CrossRef
go back to reference Hartono, H., Ongko, E., & Risyani, Y. (2021). Combining feature selection and hybrid approach redefinition in handling class imbalance and overlapping for multi-class imbalanced. Indonesian Journal of Electrical Engineering and Computer Science, 21(3), 1513–1522.CrossRef Hartono, H., Ongko, E., & Risyani, Y. (2021). Combining feature selection and hybrid approach redefinition in handling class imbalance and overlapping for multi-class imbalanced. Indonesian Journal of Electrical Engineering and Computer Science, 21(3), 1513–1522.CrossRef
go back to reference Han, H., Wang, W., & Mao, B. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. Lecture Notes in Computer Science, 3644(5), 878–887.CrossRef Han, H., Wang, W., & Mao, B. (2005). Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. Lecture Notes in Computer Science, 3644(5), 878–887.CrossRef
go back to reference Kang, S., Cho, S., & Kang, P. (2015). Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing, 149, 677–682.CrossRef Kang, S., Cho, S., & Kang, P. (2015). Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing, 149, 677–682.CrossRef
go back to reference Krawczyk, B., Koziarski, M., & Wozniak, M. (2020). Radial-based oversampling for multiclass imbalanced data classification. IEEE Transactions on Neural Networks and Learning Systems, 31(8), 2818–2831.MathSciNetCrossRef Krawczyk, B., Koziarski, M., & Wozniak, M. (2020). Radial-based oversampling for multiclass imbalanced data classification. IEEE Transactions on Neural Networks and Learning Systems, 31(8), 2818–2831.MathSciNetCrossRef
go back to reference Liu, M., Dong, M., & Jing, C. (2021). A modified real-value negative selection detector-based oversampling approach for multiclass imbalance problems. Information Sciences, 556, 160–176.MathSciNetCrossRef Liu, M., Dong, M., & Jing, C. (2021). A modified real-value negative selection detector-based oversampling approach for multiclass imbalance problems. Information Sciences, 556, 160–176.MathSciNetCrossRef
go back to reference Li, Q., Song, Y., Zhang, J., & Sheng, V. S. (2020). Multiclass imbalanced learning with one-versus-one decomposition and spectral clustering. Expert Systems with Application, 147, 1–14.CrossRef Li, Q., Song, Y., Zhang, J., & Sheng, V. S. (2020). Multiclass imbalanced learning with one-versus-one decomposition and spectral clustering. Expert Systems with Application, 147, 1–14.CrossRef
go back to reference Lin, M., Tang, K., & Yao, X. (2013). Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Transactions on Neural Networks & Learning Systems, 24(4), 647–660.CrossRef Lin, M., Tang, K., & Yao, X. (2013). Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Transactions on Neural Networks & Learning Systems, 24(4), 647–660.CrossRef
go back to reference Napierala, K., & Stefanowski, J. (2016). Types of minority class examples and their influence on learning classifiers from imbalanced data. Journal of Intelligent Information Systems, 46(3), 563–597.CrossRef Napierala, K., & Stefanowski, J. (2016). Types of minority class examples and their influence on learning classifiers from imbalanced data. Journal of Intelligent Information Systems, 46(3), 563–597.CrossRef
go back to reference Rekha, G., & Eddy, V. (2021). DDCO - Diversified data characteristic-based oversampling for imbalance classification problems. Journal of Information Science and Engineering, 37(5), 1011–1023. Rekha, G., & Eddy, V. (2021). DDCO - Diversified data characteristic-based oversampling for imbalance classification problems. Journal of Information Science and Engineering, 37(5), 1011–1023.
go back to reference S. Shaikh, C. Liu, M. Rasheed, and S. Rizwan, “Wide research on software defect model with overgeneralization problems,” International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), pp.1–6, 2019. S. Shaikh, C. Liu, M. Rasheed, and S. Rizwan, “Wide research on software defect model with overgeneralization problems,” International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), pp.1–6, 2019.
go back to reference Saez, J., Luengo, J., & Stefanowski, J. (2015). Addressing the noisy and borderline examples problem in classification with imbalanced datasets via a class noise filtering method-based re-sampling technique. Information Sciences, 291, 184–203.CrossRef Saez, J., Luengo, J., & Stefanowski, J. (2015). Addressing the noisy and borderline examples problem in classification with imbalanced datasets via a class noise filtering method-based re-sampling technique. Information Sciences, 291, 184–203.CrossRef
go back to reference Tang, B., & He, H. B. (2017). GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recognition, 71, 306–319.CrossRef Tang, B., & He, H. B. (2017). GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recognition, 71, 306–319.CrossRef
go back to reference Wu, J., Xiong, H., & Chen, J. (2010). COG: Local decomposition for rare class analysis. Data Mining and Knowledge Discovery, 20(2), 191–220.MathSciNetCrossRef Wu, J., Xiong, H., & Chen, J. (2010). COG: Local decomposition for rare class analysis. Data Mining and Knowledge Discovery, 20(2), 191–220.MathSciNetCrossRef
go back to reference S. Wang, X. Yao, “Multiclass imbalance problems: Analysis and potential solutions,” IEEE Trans. Syst, Man Cybern. B, Cybern, vol. 42, no. 4, pp. 1119–1130, 2012. S. Wang, X. Yao, “Multiclass imbalance problems: Analysis and potential solutions,” IEEE Trans. Syst, Man Cybern. B, Cybern, vol. 42, no. 4, pp. 1119–1130, 2012.
go back to reference Wang, Q., Zhou, Y., Cao, Z., & Zhang, W. (2022). M2SPL: Generative multiview features with adaptive meta-self-paced sampling for class-imbalance learning. Expert Systems with Applications, 189, 115999.CrossRef Wang, Q., Zhou, Y., Cao, Z., & Zhang, W. (2022). M2SPL: Generative multiview features with adaptive meta-self-paced sampling for class-imbalance learning. Expert Systems with Applications, 189, 115999.CrossRef
go back to reference Zhou, Z. H., & Liu, X. Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge & Data Engineering, 18(1), 63–77.CrossRef Zhou, Z. H., & Liu, X. Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge & Data Engineering, 18(1), 63–77.CrossRef
go back to reference Zhu, T., Lin, Y., & Liu, Y. (2017). Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognition, 72, 327–340.CrossRef Zhu, T., Lin, Y., & Liu, Y. (2017). Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognition, 72, 327–340.CrossRef
go back to reference Zhu, T., Lin, Y., Liu, Y., Zhang, W., & Zhang, J. (2019). Minority oversampling for imbalanced ordinal regression. Knowledge-Based Systems, 166, 140–155.CrossRef Zhu, T., Lin, Y., Liu, Y., Zhang, W., & Zhang, J. (2019). Minority oversampling for imbalanced ordinal regression. Knowledge-Based Systems, 166, 140–155.CrossRef
Metadata
Title
Clustering-Based Oversampling Algorithm for Multi-class Imbalance Learning
Authors
Haixia Zhao
Jian Wu
Publication date
22-08-2024
Publisher
Springer US
Published in
Journal of Classification
Print ISSN: 0176-4268
Electronic ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-024-09491-1

Premium Partner