Skip to main content
Top
Published in: The Journal of Supercomputing 12/2020

02-03-2020

Dynamic clustering method for imbalanced learning based on AdaBoost

Authors: Xiaoheng Deng, Yuebin Xu, Lingchi Chen, Weijian Zhong, Alireza Jolfaei, Xi Zheng

Published in: The Journal of Supercomputing | Issue 12/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Our paper aims at learning from imbalance data based on ensemble learning. At the stage, the main solution is to combine under-sampling, oversampling or cost sensitivity learning with ensemble learning. However, these feature space-based methods fail to reflect the transformation of distribution and are usually accompanied with high computational complexity and risk of overfitting. In this paper, we propose a dynamic cluster algorithm based on coefficient of variation (or entropy), which learns the local spatial distribution of data and hierarchically clusters the majority. This algorithm has low complexity and can dynamically adjust the cluster according to the iteration of AdaBoost, adaptively synchronized with changes caused by sample weight changes. Then, we design an index to measure the importance of each cluster. Based on this index, a dynamic sampling algorithm based on maximum weight is proposed. The effectiveness of the sampling algorithm is proved by visual experiments. Finally, we propose a cost-sensitive algorithm based on Bagging, and combine it with the dynamic sampling algorithm to propose a multi-fusion imbalanced ensemble learning algorithm. In experimental research, our algorithms have been validated on three artificial datasets, 22 KEEL datasets and two gene expression cancer datasets, and have shown ideal or better performance than SOTA in terms of AUC, indicating that our algorithms are not only effective imbalance algorithms, but also provide potential for building a reliable biological cyber-physical system.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
2.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357MATHCrossRef
3.
go back to reference Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer, pp 107–119 Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery. Springer, pp 107–119
4.
go back to reference Deng X, Zeng D, Shen H (2018) Causation analysis model: based on ahp and hybrid apriori-genetic algorithm. J Intell Fuzzy Syst 35(1):767–778CrossRef Deng X, Zeng D, Shen H (2018) Causation analysis model: based on ahp and hybrid apriori-genetic algorithm. J Intell Fuzzy Syst 35(1):767–778CrossRef
5.
go back to reference Deng X, Chen H, Cai R, Zeng F, Xu G, Zhang H (2019) A knowledge-based multiplayer collaborative routing in opportunistic networks. In: 2019 IEEE Intl Conf on Dependable. Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). IEEE, pp 16–21 Deng X, Chen H, Cai R, Zeng F, Xu G, Zhang H (2019) A knowledge-based multiplayer collaborative routing in opportunistic networks. In: 2019 IEEE Intl Conf on Dependable. Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). IEEE, pp 16–21
6.
go back to reference Devi RL, Kalaivani V (2019) Machine learning and iot-based cardiac arrhythmia diagnosis using statistical and dynamic features of ecg. J Supercomput 3:1–12 Devi RL, Kalaivani V (2019) Machine learning and iot-based cardiac arrhythmia diagnosis using statistical and dynamic features of ecg. J Supercomput 3:1–12
7.
go back to reference Elkan C (2001) The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, Vol 17, No 1. Lawrence Erlbaum Associates Ltd Elkan C (2001) The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, Vol 17, No 1. Lawrence Erlbaum Associates Ltd
8.
go back to reference Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetMATHCrossRef Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MathSciNetMATHCrossRef
9.
go back to reference Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: Icml, vol 96. Citeseer, pp 148–156 Freund Y, Schapire RE et al (1996) Experiments with a new boosting algorithm. In: Icml, vol 96. Citeseer, pp 148–156
10.
go back to reference Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, Springer, pp 878–887 Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International Conference on Intelligent Computing, Springer, pp 878–887
11.
go back to reference Hanifah FS, Wijayanto H, Kurnia A (2015) Smotebagging algorithm for imbalanced dataset in logistic regression analysis (case: Credit of bank x). Appl. Math. Sci. 9(138):6857–6865 Hanifah FS, Wijayanto H, Kurnia A (2015) Smotebagging algorithm for imbalanced dataset in logistic regression analysis (case: Credit of bank x). Appl. Math. Sci. 9(138):6857–6865
12.
go back to reference He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE, pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE, pp 1322–1328
13.
go back to reference Hu P, Xia E, Li S, Du X, Ma C, Dong J, Chan KC (2019) Network-based prediction of major adverse cardiac events in acute coronary syndromes from imbalanced emr data. Stud Health Technol Inf 264:1480–1481 Hu P, Xia E, Li S, Du X, Ma C, Dong J, Chan KC (2019) Network-based prediction of major adverse cardiac events in acute coronary syndromes from imbalanced emr data. Stud Health Technol Inf 264:1480–1481
14.
go back to reference Hu S, Liang Y, Ma L, He Y (2009) Msmote: improving classification performance when training data is imbalanced. In: Second International Workshop on Computer Science and Engineering, WCSE’09, vol 2. IEEE, pp 13–17 Hu S, Liang Y, Ma L, He Y (2009) Msmote: improving classification performance when training data is imbalanced. In: Second International Workshop on Computer Science and Engineering, WCSE’09, vol 2. IEEE, pp 13–17
15.
go back to reference Desai A, Jadav K, Chaudhary S (2015) An empirical evaluation of costboost extensions for cost-sensitive classification. In: Proceedings of the 8th Annual ACM India Conference, pp 73–77 Desai A, Jadav K, Chaudhary S (2015) An empirical evaluation of costboost extensions for cost-sensitive classification. In: Proceedings of the 8th Annual ACM India Conference, pp 73–77
16.
go back to reference Kaur P, Negi V (2016) Techniques based upon boosting to counter class imbalance problem?a survey. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE. pp 2620–2623 Kaur P, Negi V (2016) Techniques based upon boosting to counter class imbalance problem?a survey. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), IEEE. pp 2620–2623
17.
go back to reference Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232CrossRef Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221–232CrossRef
18.
go back to reference Lee J, Moon D, Kim I, Lee Y (2019) A semantic approach to improving machine readability of a large-scale attack graph. J Supercomput 75(6):3028–3045CrossRef Lee J, Moon D, Kim I, Lee Y (2019) A semantic approach to improving machine readability of a large-scale attack graph. J Supercomput 75(6):3028–3045CrossRef
19.
go back to reference Lingchi C, Xiaoheng D, Hailan S, Congxu Z, Le C (2018) Dycusboost: Adaboost-based imbalanced learning using dynamic clustering and undersampling. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, pp 208–215 Lingchi C, Xiaoheng D, Hailan S, Congxu Z, Le C (2018) Dycusboost: Adaboost-based imbalanced learning using dynamic clustering and undersampling. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE, pp 208–215
20.
go back to reference Liu TY (2009) Easyensemble and feature selection for imbalance data sets. In: International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS’09. IEEE, pp 517–520 Liu TY (2009) Easyensemble and feature selection for imbalance data sets. In: International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS’09. IEEE, pp 517–520
21.
go back to reference Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550CrossRef Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern Part B (Cybern) 39(2):539–550CrossRef
22.
go back to reference Lusa L et al (2012) Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 2012 11th International Conference on Machine Learning and Applications, vol 2. IEEE, pp 89–94 Lusa L et al (2012) Evaluation of smote for high-dimensional class-imbalanced microarray data. In: 2012 11th International Conference on Machine Learning and Applications, vol 2. IEEE, pp 89–94
23.
go back to reference Masnadi-Shirazi H, Vasconcelos N (2011) Cost-sensitive boosting. IEEE Trans Pattern Anal Mach Intell 33(2):294–309CrossRef Masnadi-Shirazi H, Vasconcelos N (2011) Cost-sensitive boosting. IEEE Trans Pattern Anal Mach Intell 33(2):294–309CrossRef
24.
go back to reference Moorthy K, Mohamad MS (2011) Random forest for gene selection and microarray data classification. Bioinformation 7(3):142CrossRef Moorthy K, Mohamad MS (2011) Random forest for gene selection and microarray data classification. Bioinformation 7(3):142CrossRef
25.
go back to reference Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158:48–61CrossRef Nanni L, Fantozzi C, Lazzarini N (2015) Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158:48–61CrossRef
26.
go back to reference Pandey A, Sequeria R, Kumar P, Kumar S (2019) A multistage deep residual network for biomedical cyber-physical systems. IEEE Syst J 55:1–10 Pandey A, Sequeria R, Kumar P, Kumar S (2019) A multistage deep residual network for biomedical cyber-physical systems. IEEE Syst J 55:1–10
27.
go back to reference Prati RC, Batista GE, Monard MC (2004) Learning with class skews and small disjuncts. In: Brazilian Symposium on Artificial Intelligence. Springer, pp 296–306 Prati RC, Batista GE, Monard MC (2004) Learning with class skews and small disjuncts. In: Brazilian Symposium on Artificial Intelligence. Springer, pp 296–306
28.
go back to reference Qi K, Yang H, Hu Q, Yang D (2019) A new adaptive weighted imbalanced data classifier via improved support vector machines with high-dimension nature. Knowl-Based Syst 185:104933CrossRef Qi K, Yang H, Hu Q, Yang D (2019) A new adaptive weighted imbalanced data classifier via improved support vector machines with high-dimension nature. Knowl-Based Syst 185:104933CrossRef
29.
go back to reference Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197CrossRef Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans 40(1):185–197CrossRef
30.
go back to reference Smeraldi F, Bicego M, Cristani M, Murino V (2011) Cloosting: Clustering data with boosting. In: International Workshop on Multiple Classifier Systems, vol 6713, pp 289–298 Smeraldi F, Bicego M, Cristani M, Murino V (2011) Cloosting: Clustering data with boosting. In: International Workshop on Multiple Classifier Systems, vol 6713, pp 289–298
31.
go back to reference Soltani S, Sadri J, Torshizi HA (2011) Feature selection and ensemble hierarchical cluster-based under-sampling approach for extremely imbalanced datasets: Application to gene classification. In: 2011 1st International eConference on Computer and Knowledge Engineering (ICCKE). IEEE, pp 166–171 Soltani S, Sadri J, Torshizi HA (2011) Feature selection and ensemble hierarchical cluster-based under-sampling approach for extremely imbalanced datasets: Application to gene classification. In: 2011 1st International eConference on Computer and Knowledge Engineering (ICCKE). IEEE, pp 166–171
32.
go back to reference Tavallali P, Yazdi M, Khosravi MR (2017) An efficient training procedure for viola-jones face detector. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, pp 828–831 Tavallali P, Yazdi M, Khosravi MR (2017) An efficient training procedure for viola-jones face detector. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, pp 828–831
33.
go back to reference Tavallali P, Yazdi M, Khosravi MR (2019) Robust cascaded skin detector based on adaboost. Multimed Tools Appl 78(2):2599–2620CrossRef Tavallali P, Yazdi M, Khosravi MR (2019) Robust cascaded skin detector based on adaboost. Multimed Tools Appl 78(2):2599–2620CrossRef
34.
go back to reference Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRef Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRef
35.
go back to reference Xu G, Jia L, Lu Y, Zeng X, Yao Z, Li X (2018a) A novel efficient maka protocol with desynchronization for anonymous roaming service in global mobility networks. J Netw Comput Appl 107:S1084804518300407 Xu G, Jia L, Lu Y, Zeng X, Yao Z, Li X (2018a) A novel efficient maka protocol with desynchronization for anonymous roaming service in global mobility networks. J Netw Comput Appl 107:S1084804518300407
36.
go back to reference Xu G, Yao Z, Sangaiah AK, Li X, Castiglione A, Xi Z (2018b) Csp-e 2: An abuse-free contract signing protocol with low-storage TTP for energy-efficient electronic transaction ecosystems. Inf Sci 476:505–515CrossRef Xu G, Yao Z, Sangaiah AK, Li X, Castiglione A, Xi Z (2018b) Csp-e 2: An abuse-free contract signing protocol with low-storage TTP for energy-efficient electronic transaction ecosystems. Inf Sci 476:505–515CrossRef
37.
go back to reference Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306CrossRef Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306CrossRef
38.
go back to reference Zeng X, Xu G, Xi Z, Yang X, Zhou W (2018) E-aua: an efficient anonymous user authentication protocol for mobile iot. IEEE Internet Things J PP(99):1–1 Zeng X, Xu G, Xi Z, Yang X, Zhou W (2018) E-aua: an efficient anonymous user authentication protocol for mobile iot. IEEE Internet Things J PP(99):1–1
39.
go back to reference Zhang X, Luo Q (2015) Unbalanced data classification algorithm based on clustering ensemble under-sampling. Comput Sci 42(11):63–66 Zhang X, Luo Q (2015) Unbalanced data classification algorithm based on clustering ensemble under-sampling. Comput Sci 42(11):63–66
40.
go back to reference Zhu T, Lin Y, Liu Y (2020) Improving interpolation-based oversampling for imbalanced data learning. Knowl-Based Syst 187:104826CrossRef Zhu T, Lin Y, Liu Y (2020) Improving interpolation-based oversampling for imbalanced data learning. Knowl-Based Syst 187:104826CrossRef
41.
go back to reference Zhu ZB, Song ZH (2010) Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chem Eng Res Des 88(8):936–951CrossRef Zhu ZB, Song ZH (2010) Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chem Eng Res Des 88(8):936–951CrossRef
Metadata
Title
Dynamic clustering method for imbalanced learning based on AdaBoost
Authors
Xiaoheng Deng
Yuebin Xu
Lingchi Chen
Weijian Zhong
Alireza Jolfaei
Xi Zheng
Publication date
02-03-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 12/2020
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03211-3

Other articles of this Issue 12/2020

The Journal of Supercomputing 12/2020 Go to the issue

Premium Partner