Skip to main content
Erschienen in: Neural Computing and Applications 17/2021

11.01.2021 | Original Article

A novel ensemble method for classification in imbalanced datasets using split balancing technique based on instance hardness (sBal_IH)

verfasst von: Halimu Chongomweru, Asem Kasem

Erschienen in: Neural Computing and Applications | Ausgabe 17/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Classification tasks in datasets that suffer from high class imbalance pose challenge to machine learning algorithms and such datasets are prevalent in many real-world domains and applications. In machine learning research, ensemble methods for classification tasks in imbalanced datasets have attracted a lot of attention due to their ability to improve classification performance. However, these methods are still prone to the negative effects of noise in the training sets. Furthermore, many of them alter the original class distribution to create a sort of balance in the datasets through over-sampling or undersampling techniques, which can lead to overfitting or discarding useful data, respectively, and thus may still hamper performance. In this work, we propose a novel ensemble method for classification that creates an arbitrary number of balanced splits (sBal) of data generated based on Instance Hardness as a weighting mechanism for creating balanced bags. Each of the generated bags will contain all the minority instances, and a mixture of majority instances with varying degrees of hardness (easy, normal, and hard), and we call this approach sBal_IH technique. This will enable base learners to train on different balanced bags comprising varied characteristics of the training data. We evaluated the performance of our proposed method on a total of 100 datasets that include 30 synthetic datasets with controlled levels of noise, 29 balanced and 41 imbalanced real-world datasets, and compared its performance with both traditional ensemble methods (Bagging, Wagging Random Forest, and AdaBoost), and those specialized for class imbalanced problems (Balanced Bagging, Balanced Random forest, RUSBoost, and Easy Ensemble). The results reveal that our proposed method brings a substantial improvement in classification performance relevant to the compared methods. For statistical significance analysis, we conducted Friedman’s nonparametric statistical test with Bergman post hoc test. The analysis shows that our method performs significantly better than the compared traditional and specialized ensemble methods for imbalanced problems across many datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300CrossRef Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300CrossRef
2.
Zurück zum Zitat Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310CrossRef Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310CrossRef
3.
Zurück zum Zitat Ling CX and Zhang H 2003 AUC: a statistically consistent and more discriminating measure than accuracy Ling CX and Zhang H 2003 AUC: a statistically consistent and more discriminating measure than accuracy
4.
Zurück zum Zitat Tapkan P, Özbakir L, Kulluk S, Baykasolu A (2016) A cost-sensitive classification algorithm: BEE-Miner. Knowledge-Based Syst 95:99–113CrossRef Tapkan P, Özbakir L, Kulluk S, Baykasolu A (2016) A cost-sensitive classification algorithm: BEE-Miner. Knowledge-Based Syst 95:99–113CrossRef
5.
Zurück zum Zitat Weiss G, McCarthy K, Zabar B (2007) Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Dmin, no pp 1–7 Weiss G, McCarthy K, Zabar B (2007) Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Dmin, no pp 1–7
6.
Zurück zum Zitat Japkowicz N, Proc AAAI 2000 Workshop on learning from imbalanced data sets, in Proc AAAI 2000 workshop on learning from imbalanced data sets, 2000 Japkowicz N, Proc AAAI 2000 Workshop on learning from imbalanced data sets, in Proc AAAI 2000 workshop on learning from imbalanced data sets, 2000
7.
Zurück zum Zitat Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explor. Newsl. 6(1):1–6CrossRef Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explor. Newsl. 6(1):1–6CrossRef
8.
Zurück zum Zitat Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. J Big Data 5(1):42CrossRef Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. J Big Data 5(1):42CrossRef
9.
Zurück zum Zitat Liu X-Y, Jianxin Wu, Zhou Z-H (2009) exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man, Cybern Part B Cybern 39(2):539–550CrossRef Liu X-Y, Jianxin Wu, Zhou Z-H (2009) exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man, Cybern Part B Cybern 39(2):539–550CrossRef
10.
Zurück zum Zitat Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting, pp 107–119 Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMOTEBoost: improving prediction of the minority class in boosting, pp 107–119
11.
Zurück zum Zitat Shahrabi J, Hadaegh F, Ramezankhani A, Azizi F, Khalili D, Pournik O (2014) The Impact of oversampling with SMOTE on the performance of 3 classifiers in prediction of type 2 diabetes. Med Decis Mak 36(1):137–144 Shahrabi J, Hadaegh F, Ramezankhani A, Azizi F, Khalili D, Pournik O (2014) The Impact of oversampling with SMOTE on the performance of 3 classifiers in prediction of type 2 diabetes. Med Decis Mak 36(1):137–144
12.
Zurück zum Zitat Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans. 40(1):185–197CrossRef Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans Syst Man Cybern Part A Syst Humans. 40(1):185–197CrossRef
13.
Zurück zum Zitat Zheng Z, Yunpeng Cai YL (2015) Oversampling Method for Imbalanced Classification. Comput. Inform 34:1017–1037 Zheng Z, Yunpeng Cai YL (2015) Oversampling Method for Imbalanced Classification. Comput. Inform 34:1017–1037
14.
Zurück zum Zitat Liu XY, Wu J, Zhou ZH (2006) Exploratory under-sampling for class-imbalance learning. Proc IEEE Int Conf Data Mining, ICDM, pp 965–969 Liu XY, Wu J, Zhou ZH (2006) Exploratory under-sampling for class-imbalance learning. Proc IEEE Int Conf Data Mining, ICDM, pp 965–969
15.
Zurück zum Zitat Barandela R, Valdovinos RM, Salvador Sánchez J, Ferri FJ (2004) The imbalanced training sample problem: under or over sampling? Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 3138:806–814MATH Barandela R, Valdovinos RM, Salvador Sánchez J, Ferri FJ (2004) The imbalanced training sample problem: under or over sampling? Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 3138:806–814MATH
16.
Zurück zum Zitat Hoens TR, Chawla NV (2013). Imbalanced datasets: from sampling to classifiers. Imbalanced learn Algorithms Appl, pp 43–59 Hoens TR, Chawla NV (2013). Imbalanced datasets: from sampling to classifiers. Imbalanced learn Algorithms Appl, pp 43–59
17.
Zurück zum Zitat Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor Newsl 6(1):80CrossRef Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor Newsl 6(1):80CrossRef
18.
Zurück zum Zitat Mladenić D, Grobelnik M (1999) Feature selection for unbalanced class distribution and Naive Bayes. Proc Sixt Int Conf Mach Learn, pp 258–267 Mladenić D, Grobelnik M (1999) Feature selection for unbalanced class distribution and Naive Bayes. Proc Sixt Int Conf Mach Learn, pp 258–267
19.
Zurück zum Zitat Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. Proc 2014 Sci Inf Conf SAI 2014, no. July, pp 372–378 Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. Proc 2014 Sci Inf Conf SAI 2014, no. July, pp 372–378
20.
Zurück zum Zitat Tan J, Zhang Z, Zhen L, Zhang C, Deng N (2013) Adaptive feature selection via a new version of support vector machine. Neural Comput Appl 23(3–4):937–945CrossRef Tan J, Zhang Z, Zhen L, Zhang C, Deng N (2013) Adaptive feature selection via a new version of support vector machine. Neural Comput Appl 23(3–4):937–945CrossRef
21.
Zurück zum Zitat Pes B (2019) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput. Appl. 3:9951–9973 Pes B (2019) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput. Appl. 3:9951–9973
22.
Zurück zum Zitat Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRef
23.
Zurück zum Zitat Liu B, Ma Y, Wong CK (2000) Improving an association rule based classifier. In: Proceedings of the 4th european conference on principles and practice of knowledge discovery. pp 504–509 Liu B, Ma Y, Wong CK (2000) Improving an association rule based classifier. In: Proceedings of the 4th european conference on principles and practice of knowledge discovery. pp 504–509
24.
Zurück zum Zitat Sanchez JS, Barandela R, Rangel E, Garcia V (2003) Strategies for learning in class imbalance problems. Pattern Recognit 36(3):849–851CrossRef Sanchez JS, Barandela R, Rangel E, Garcia V (2003) Strategies for learning in class imbalance problems. Pattern Recognit 36(3):849–851CrossRef
26.
Zurück zum Zitat Siers MJ, Islam MZ (2018) Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects. Inf Sci (Ny) 459:53–70CrossRef Siers MJ, Islam MZ (2018) Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects. Inf Sci (Ny) 459:53–70CrossRef
27.
Zurück zum Zitat Wang S, Li Z, Chao W, Cao Q (2012) Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. Proc Int Jt Conf Neural Networks, pp 10–15 Wang S, Li Z, Chao W, Cao Q (2012) Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. Proc Int Jt Conf Neural Networks, pp 10–15
28.
Zurück zum Zitat Zhang D, Ma J, Yi J, Niu X, Xu X (2016) An ensemble method for unbalanced sentiment classification. Proc Int Conf Nat Comput vol 2016-Janua, no 61170052, pp 440–445 Zhang D, Ma J, Yi J, Niu X, Xu X (2016) An ensemble method for unbalanced sentiment classification. Proc Int Conf Nat Comput vol 2016-Janua, no 61170052, pp 440–445
29.
Zurück zum Zitat Pławiak P, Acharya UR (2020) Novel deep genetic ensemble of classifiers for arrhythmia detection using ECG signals. Neural Comput Appl 32(15):11137–11161CrossRef Pławiak P, Acharya UR (2020) Novel deep genetic ensemble of classifiers for arrhythmia detection using ECG signals. Neural Comput Appl 32(15):11137–11161CrossRef
30.
Zurück zum Zitat Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39CrossRef Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33(1–2):1–39CrossRef
31.
Zurück zum Zitat Tkachenko R, Izonin I, Kryvinska N, Dronyuk I, Zub K (2020) An approach towards increasing prediction accuracy for the recovery of missing IoT data based on the GRNN SGTM ensemble. Sensors. 20(9):2625CrossRef Tkachenko R, Izonin I, Kryvinska N, Dronyuk I, Zub K (2020) An approach towards increasing prediction accuracy for the recovery of missing IoT data based on the GRNN SGTM ensemble. Sensors. 20(9):2625CrossRef
32.
Zurück zum Zitat Zhang C, Ma Y (2012) Ensemble machine learning-methods and applications. Springer New, New York Dordrecht Heidelberg LondonMATHCrossRef Zhang C, Ma Y (2012) Ensemble machine learning-methods and applications. Springer New, New York Dordrecht Heidelberg LondonMATHCrossRef
33.
Zurück zum Zitat Wintner S (2000) Dietterich TG: an experimental comparison of three methods for constructing ensembles of decision trees. En Sci commons Org. 40(2):139–157 Wintner S (2000) Dietterich TG: an experimental comparison of three methods for constructing ensembles of decision trees. En Sci commons Org. 40(2):139–157
34.
Zurück zum Zitat Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–44CrossRef Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–44CrossRef
35.
Zurück zum Zitat Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39(4):261–283CrossRef Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39(4):261–283CrossRef
36.
Zurück zum Zitat Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-boosting and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev. 42(4):463–484CrossRef Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-boosting and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev. 42(4):463–484CrossRef
37.
Zurück zum Zitat Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J. Netw. Innov. Comput. 1:332–340 Abd Elrahman SM, Abraham A (2013) A review of class imbalance problem. J. Netw. Innov. Comput. 1:332–340
38.
Zurück zum Zitat Bolón-Canedo V, Alonso-Betanzos A (2019) Ensembles for feature selection: a review and future trends. Inf. Fusion 52:1–12CrossRef Bolón-Canedo V, Alonso-Betanzos A (2019) Ensembles for feature selection: a review and future trends. Inf. Fusion 52:1–12CrossRef
39.
Zurück zum Zitat Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):1–18CrossRef Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):1–18CrossRef
40.
Zurück zum Zitat Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37:132–156CrossRef Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inf Fusion 37:132–156CrossRef
41.
Zurück zum Zitat Bhatt J (2014) A survey on one class classification using ensembles method. Int J Innov Res Sci Technol 1(7):19–23 Bhatt J (2014) A survey on one class classification using ensembles method. Int J Innov Res Sci Technol 1(7):19–23
42.
Zurück zum Zitat Jurek A, Bi Y, Wu S, Nugent C (2013) A survey of commonly used ensemble-based classification techniques. Knowl Eng Rev 29(5):551–581CrossRef Jurek A, Bi Y, Wu S, Nugent C (2013) A survey of commonly used ensemble-based classification techniques. Knowl Eng Rev 29(5):551–581CrossRef
43.
Zurück zum Zitat Gomes HM, Barddal JP, Enembreck AF, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput. Surv. 50(2):1–36CrossRef Gomes HM, Barddal JP, Enembreck AF, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput. Surv. 50(2):1–36CrossRef
44.
Zurück zum Zitat Moyano JM, Gibaja EL, Cios KJ, Ventura S (2018) Review of ensembles of multi-label classifiers: models, experimental study and prospects. Inf. Fusion 44:33–45CrossRef Moyano JM, Gibaja EL, Cios KJ, Ventura S (2018) Review of ensembles of multi-label classifiers: models, experimental study and prospects. Inf. Fusion 44:33–45CrossRef
45.
Zurück zum Zitat L. Breiman (1994) Bagging predictors: technical report No 421, Mach Learn no 2, pp 19 L. Breiman (1994) Bagging predictors: technical report No 421, Mach Learn no 2, pp 19
46.
Zurück zum Zitat Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780 Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780
47.
Zurück zum Zitat Schapire RE (1999) A brief introduction to boosting. IJCAI Int Joint Conf Artif Intell 2:1401–1406 Schapire RE (1999) A brief introduction to boosting. IJCAI Int Joint Conf Artif Intell 2:1401–1406
48.
Zurück zum Zitat Walmsley FN, Cavalcanti GDC, Oliveira DVR, Cruz RMO, Sabourin R (2018) An ensemble generation method based on instance hardness, Proc Int Jt Conf Neural Networks vol 2018-July Walmsley FN, Cavalcanti GDC, Oliveira DVR, Cruz RMO, Sabourin R (2018) An ensemble generation method based on instance hardness, Proc Int Jt Conf Neural Networks vol 2018-July
49.
Zurück zum Zitat Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210MATHCrossRef Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210MATHCrossRef
50.
Zurück zum Zitat Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869MATHCrossRef Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869MATHCrossRef
51.
Zurück zum Zitat Rahman MM, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Mach Learn Comput 3(2):224–228CrossRef Rahman MM, Davis DN (2013) Addressing the class imbalance problem in medical datasets. Int J Mach Learn Comput 3(2):224–228CrossRef
52.
Zurück zum Zitat Ali A, Shamsuddin SM, Ralescu AL (2015) Classification with class imbalance problem: a review. Int J Adv Soft Comput its Appl 7(3):176–204 Ali A, Shamsuddin SM, Ralescu AL (2015) Classification with class imbalance problem: a review. Int J Adv Soft Comput its Appl 7(3):176–204
53.
Zurück zum Zitat Barandela R, Sánchez JS, Valdovinos RM (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256MathSciNetCrossRef Barandela R, Sánchez JS, Valdovinos RM (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3):245–256MathSciNetCrossRef
54.
Zurück zum Zitat Orriols-Puig A, Bernadó-Mansilla E (2009) Evolutionary rule-based systems for imbalanced data sets. Soft Comput 13(3):213–225CrossRef Orriols-Puig A, Bernadó-Mansilla E (2009) Evolutionary rule-based systems for imbalanced data sets. Soft Comput 13(3):213–225CrossRef
55.
Zurück zum Zitat Hido S, Kashima H, Takahashi Y (2009) Roughly balanced Bagging for Imbalanced data. Stat Anal Data Min 2(5–6):412–426MathSciNetCrossRef Hido S, Kashima H, Takahashi Y (2009) Roughly balanced Bagging for Imbalanced data. Stat Anal Data Min 2(5–6):412–426MathSciNetCrossRef
56.
Zurück zum Zitat Kasem A, Ghaibeh AA, Moriguchi H (2016) Empirical study of sampling methods for classification in imbalanced clinical datasets. In: International conference on computational intelligence in information system, pp 152–162 Kasem A, Ghaibeh AA, Moriguchi H (2016) Empirical study of sampling methods for classification in imbalanced clinical datasets. In: International conference on computational intelligence in information system, pp 152–162
57.
Zurück zum Zitat Alcalá-Fdez J et al (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Log Soft Comput 17(2–3):255–287 Alcalá-Fdez J et al (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Log Soft Comput 17(2–3):255–287
58.
Zurück zum Zitat Lavesson N, Davidsson P (2006) Quantifying the impact of learning algorithm parameter tuning. Proc Natl Conf Artif Intell 1(1): 395–400 Lavesson N, Davidsson P (2006) Quantifying the impact of learning algorithm parameter tuning. Proc Natl Conf Artif Intell 1(1): 395–400
59.
Zurück zum Zitat Kwon O, Sim JM (2013) Effects of data set features on the performances of classification algorithms. Expert Syst Appl 40(5):1847–1857CrossRef Kwon O, Sim JM (2013) Effects of data set features on the performances of classification algorithms. Expert Syst Appl 40(5):1847–1857CrossRef
60.
Zurück zum Zitat Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetCrossRef Smith MR, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225–256MathSciNetCrossRef
61.
Zurück zum Zitat Liu H, Shah S, Jiang W (2004) On-line outlier detection and data cleaning. Comput Chem Eng 28(9):1635–1647CrossRef Liu H, Shah S, Jiang W (2004) On-line outlier detection and data cleaning. Comput Chem Eng 28(9):1635–1647CrossRef
62.
Zurück zum Zitat Gamberger D, Lavrac N, Dzeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14(2):205–223CrossRef Gamberger D, Lavrac N, Dzeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14(2):205–223CrossRef
63.
Zurück zum Zitat Kabir A, Ruiz C, Alvarez SA (2018) Mixed Bagging: a novel ensemble learning framework for supervised classification based on instance hardness. Proc IEEE Int Conf Data Mining, ICDM, vol 2018-Novem, pp 1073–1078 Kabir A, Ruiz C, Alvarez SA (2018) Mixed Bagging: a novel ensemble learning framework for supervised classification based on instance hardness. Proc IEEE Int Conf Data Mining, ICDM, vol 2018-Novem, pp 1073–1078
64.
Zurück zum Zitat Smith MR, Martinez T (2016) A comparative evaluation of curriculum learning with filtering and boosting in supervised classification problems. Comput Intell 32(2):167–195MathSciNetCrossRef Smith MR, Martinez T (2016) A comparative evaluation of curriculum learning with filtering and boosting in supervised classification problems. Comput Intell 32(2):167–195MathSciNetCrossRef
65.
Zurück zum Zitat Wei Q, Dunbrack RL (2013) The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 8(7):67863CrossRef Wei Q, Dunbrack RL (2013) The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 8(7):67863CrossRef
66.
Zurück zum Zitat Gu Q, Zhu L, Cai Z (2009) Evaluation measures of the classification performance of imbalanced data sets. Comput Intell Intell Syst 51(51):461–471MATH Gu Q, Zhu L, Cai Z (2009) Evaluation measures of the classification performance of imbalanced data sets. Comput Intell Intell Syst 51(51):461–471MATH
67.
Zurück zum Zitat Pereira L, Nunes N (2018) A comparison of performance metrics for event classification in non-intrusive load monitoring. 2017 IEEE Int Conf Smart Grid Commun Smart GridComm 2017 vol 2018-Janua, no October, pp 159–164 Pereira L, Nunes N (2018) A comparison of performance metrics for event classification in non-intrusive load monitoring. 2017 IEEE Int Conf Smart Grid Commun Smart GridComm 2017 vol 2018-Janua, no October, pp 159–164
68.
Zurück zum Zitat Bi J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl-Based Syst 158(May):81–93CrossRef Bi J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl-Based Syst 158(May):81–93CrossRef
69.
Zurück zum Zitat Liu L, Ghosh J, Martin CE (2007) Generative oversampling for mining imbalanced datasets. Int Conf data Min, no May, pp 66–72 Liu L, Ghosh J, Martin CE (2007) Generative oversampling for mining imbalanced datasets. Int Conf data Min, no May, pp 66–72
71.
Zurück zum Zitat Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRef
72.
Zurück zum Zitat Freund Y, Schapire RRE (1996) Experiments with a new boosting algorithm. Int Conf Mach Learn, pp 148–156 Freund Y, Schapire RRE (1996) Experiments with a new boosting algorithm. Int Conf Mach Learn, pp 148–156
73.
Zurück zum Zitat Chawla KWPNV, Bowyer KW, Hall LO (2002) SMOTE Synthetic Minority Over Sampling Technique. J Artif Intell Res 16:321–357MATHCrossRef Chawla KWPNV, Bowyer KW, Hall LO (2002) SMOTE Synthetic Minority Over Sampling Technique. J Artif Intell Res 16:321–357MATHCrossRef
74.
Zurück zum Zitat Halimu C, Kasem A (2020) Split balancing ( sBal )—a data preprocessing sampling technique for ensemble methods for binary classification in imbalanced datasets. In: Computational science and technology. Springer, Singapore Halimu C, Kasem A (2020) Split balancing ( sBal )—a data preprocessing sampling technique for ensemble methods for binary classification in imbalanced datasets. In: Computational science and technology. Springer, Singapore
75.
Zurück zum Zitat Bauer E, Kohavi R (1999) Empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1):105–139CrossRef Bauer E, Kohavi R (1999) Empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1):105–139CrossRef
76.
Zurück zum Zitat Varoquaux G, Buitinck L, Louppe G, Grisel O, Pedregosa F, Mueller A (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(1):2825–2830MathSciNetMATH Varoquaux G, Buitinck L, Louppe G, Grisel O, Pedregosa F, Mueller A (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(1):2825–2830MathSciNetMATH
77.
Zurück zum Zitat Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18:1–5 Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18:1–5
78.
Zurück zum Zitat Chen C, Liaw A, Breiman L Using random forest to learn imbalanced data, Discovery no 1999, pp 1–12 Chen C, Liaw A, Breiman L Using random forest to learn imbalanced data, Discovery no 1999, pp 1–12
79.
Zurück zum Zitat Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2008) RUSBoost: improving classification performance when training data is skewed. Proc Int Conf Pattern Recognit, no December, 2008 Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2008) RUSBoost: improving classification performance when training data is skewed. Proc Int Conf Pattern Recognit, no December, 2008
80.
Zurück zum Zitat Van Hulse J, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. Proc 24th Int Conf Mach Learn, pp 935–942 Van Hulse J, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. Proc 24th Int Conf Mach Learn, pp 935–942
81.
Zurück zum Zitat Napierała K, Stefanowski J, Wilk S (2010) Learning from imbalanced data in presence of noisy and borderline examples. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 6086:158–167 Napierała K, Stefanowski J, Wilk S (2010) Learning from imbalanced data in presence of noisy and borderline examples. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 6086:158–167
82.
Zurück zum Zitat Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS (2014) Consistent binary classification with generalized performance metrics. Adv Neural Inf Process Syst 27 Annu Conf Neural Inf Process Syst 2014, December 8–13 2014, Montr Quebec, Canada, pp 2744–2752 Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS (2014) Consistent binary classification with generalized performance metrics. Adv Neural Inf Process Syst 27 Annu Conf Neural Inf Process Syst 2014, December 8–13 2014, Montr Quebec, Canada, pp 2744–2752
83.
Zurück zum Zitat Öztürk MM (2017) Which type of metrics are useful to deal with class imbalance in software defect prediction? Inf Softw Technol 92:17–29CrossRef Öztürk MM (2017) Which type of metrics are useful to deal with class imbalance in software defect prediction? Inf Softw Technol 92:17–29CrossRef
84.
Zurück zum Zitat Folleco A, Khoshgoftaar TM, Napolitano A (2008) Comparison of four performance metrics for evaluating sampling techniques for low quality class-imbalanced data. Proc 7th Int Conf Mach Learn Appl ICMLA, pp 153–158 Folleco A, Khoshgoftaar TM, Napolitano A (2008) Comparison of four performance metrics for evaluating sampling techniques for low quality class-imbalanced data. Proc 7th Int Conf Mach Learn Appl ICMLA, pp 153–158
85.
Zurück zum Zitat Guo H, Viktor HL (2007) Learning from imbalanced data sets with boosting and data generation. ACM SIGKDD Explor Newsl 6(1):30CrossRef Guo H, Viktor HL (2007) Learning from imbalanced data sets with boosting and data generation. ACM SIGKDD Explor Newsl 6(1):30CrossRef
86.
Zurück zum Zitat Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PLoS ONE 12(6):1–17CrossRef Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PLoS ONE 12(6):1–17CrossRef
87.
Zurück zum Zitat Halimu C, Kasem A, Shah N (2019) Empirical comparison of area under roc curve (AUC) and mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. Int Conf Mach Learn Soft Comput no Mcc, pp 10–15 Halimu C, Kasem A, Shah N (2019) Empirical comparison of area under roc curve (AUC) and mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification. Int Conf Mach Learn Soft Comput no Mcc, pp 10–15
88.
Zurück zum Zitat Van Hulse J, Khoshgoftaar T (2009) Knowledge discovery from imbalanced and noisy data. Data Knowl Eng 68(12):1513–1542CrossRef Van Hulse J, Khoshgoftaar T (2009) Knowledge discovery from imbalanced and noisy data. Data Knowl Eng 68(12):1513–1542CrossRef
89.
Zurück zum Zitat Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
90.
Zurück zum Zitat Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701MATHCrossRef Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701MATHCrossRef
91.
Zurück zum Zitat Garcia S, Herrera F (2008) An extension on ‘statistical comparisons of classifiers over multiple data sets’ for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH Garcia S, Herrera F (2008) An extension on ‘statistical comparisons of classifiers over multiple data sets’ for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATH
Metadaten
Titel
A novel ensemble method for classification in imbalanced datasets using split balancing technique based on instance hardness (sBal_IH)
verfasst von
Halimu Chongomweru
Asem Kasem
Publikationsdatum
11.01.2021
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 17/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-05570-7

Weitere Artikel der Ausgabe 17/2021

Neural Computing and Applications 17/2021 Zur Ausgabe

S. I : Hybridization of Neural Computing with Nature Inspired Algorithms

A fuzzy compromise approach for solving multi-objective stratified sampling design

S. I : Hybridization of Neural Computing with Nature Inspired Algorithms

An adaptive hybrid differential evolution algorithm for continuous optimization and classification problems

S.I: Hybridization of Neural Computing with Nature Inspired Algorithms

Comparative analysis of time series model and machine testing systems for crime forecasting

Premium Partner