Skip to main content
Erschienen in: Arabian Journal for Science and Engineering 8/2022

29.11.2021 | Research Article-Computer Engineering and Computer Science

Evaluating the Performance of Data Level Methods Using KEEL Tool to Address Class Imbalance Problem

verfasst von: Kamlesh Upadhyay, Prabhjot Kaur, Deepak Kumar Verma

Erschienen in: Arabian Journal for Science and Engineering | Ausgabe 8/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The class imbalance problem (CIP) has become a hot topic of machine learning in recent years because of its increasing importance in today’s era. As the application area of technology is increases, the size and variety of data also increases. By nature, most of the real-world raw data is present in imbalanced form like credit card frauds, fraudulent telephone calls, shuttle system failure, text classification, nuclear explosions, oil spill detection, detection of brain tumor images etc. The classification algorithms are not able to classify imbalance data accurately and their results always deviate toward the bigger class. This problem is known as Class Imbalance Problem. This paper assess various data level methods which are used to balance the data before classification. It also discusses various characteristics of data which impact class imbalance problem and the reasons why traditional classification algorithms are not able to tackle this issue. Apart from this it also discusses about other data abnormalities which makes the CIP more critical like size of data, overlapping classes, presence of noise in the data, data distribution within each class etc. The paper empirically compared 20 data-level classification methods with 44 UCI real imbalanced data-sets with the imbalance ratio ranging from as low as to 1.82 to as high as to 129.44 using KEEL tool. The performance of the methods is assessed using AUC, F-measure, G-mean metrics and the results are analyzed and represented graphically.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yong, Y.: The Research of Imbalanced data-set of instance sampling method based on K-means cluster and Genetic algorithm. Energy Procedia 17, 164–170 (2012)CrossRef Yong, Y.: The Research of Imbalanced data-set of instance sampling method based on K-means cluster and Genetic algorithm. Energy Procedia 17, 164–170 (2012)CrossRef
2.
Zurück zum Zitat Mollineda R.A.; Alejo, R.; Sotoca, J.M.: The class imbalance problem in pattern classification and learning. II Congreso Español de Informática (CEDI 2007), pp. 283–291 (2007). Mollineda R.A.; Alejo, R.; Sotoca, J.M.: The class imbalance problem in pattern classification and learning. II Congreso Español de Informática (CEDI 2007), pp. 283–291 (2007).
3.
Zurück zum Zitat Visa, S., Ralescu, A.: Issues in Mining Imbalance data-sets – A Review paper. In: Proceedings of the 16th Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67–73 (2005). Visa, S., Ralescu, A.: Issues in Mining Imbalance data-sets – A Review paper. In: Proceedings of the 16th Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67–73 (2005).
4.
Zurück zum Zitat Guo, X.: On the class imbalance problem. In: Proceedings of 4th International Conference on Natural Computation, IEEE Computer Society, pp. 192–201 (2008). Guo, X.: On the class imbalance problem. In: Proceedings of 4th International Conference on Natural Computation, IEEE Computer Society, pp. 192–201 (2008).
5.
Zurück zum Zitat Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17, 1 (2011)MathSciNet Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17, 1 (2011)MathSciNet
6.
Zurück zum Zitat Alcalá-Fdez, J., et al.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)CrossRef Alcalá-Fdez, J., et al.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)CrossRef
8.
Zurück zum Zitat Zhang, Y.: New advances in machine learning: data mining with skewed data. Intech Open 1, 173–188 (2010) Zhang, Y.: New advances in machine learning: data mining with skewed data. Intech Open 1, 173–188 (2010)
9.
10.
Zurück zum Zitat Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)CrossRef Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)CrossRef
11.
Zurück zum Zitat Miroslav, K.; Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. Proc. ICML 97, 179–186 (1997) Miroslav, K.; Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. Proc. ICML 97, 179–186 (1997)
12.
Zurück zum Zitat Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. In: S. Quaglini, P. Barahona, and S. Andreassen (Eds.) AIME 2001, In Proceedings of LNAI 2101, pp. 63–66 (2001). Laurikkala, J.: Improving Identification of Difficult Small Classes by Balancing Class Distribution. In: S. Quaglini, P. Barahona, and S. Andreassen (Eds.) AIME 2001, In Proceedings of LNAI 2101, pp. 63–66 (2001).
13.
Zurück zum Zitat Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. System man Commun 2(3), 408–421 (1972)MathSciNetMATHCrossRef Wilson, D.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. System man Commun 2(3), 408–421 (1972)MathSciNetMATHCrossRef
14.
Zurück zum Zitat Chyi, Y.M.: Classification analysis techniques for skewed class distribution problems. National Sun Yat-Sen University, Department of Information Management (2003) Chyi, Y.M.: Classification analysis techniques for skewed class distribution problems. National Sun Yat-Sen University, Department of Information Management (2003)
15.
Zurück zum Zitat Yoon, K.; Kwek, S.: An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In: Proceedings of International conference on Hybrid Intelligent Systems, pp. 1–6 (2005). Yoon, K.; Kwek, S.: An unsupervised learning approach to resolving the data imbalanced issue in supervised learning problems in functional genomics. In: Proceedings of International conference on Hybrid Intelligent Systems, pp. 1–6 (2005).
16.
Zurück zum Zitat Tang, Y.; Jin, B.; Zhang, Y.Q.: Granular support vector machines with association rules mining for protein homology prediction. Artif. Intell. Med. 35(1–2), 121–134 (2005)CrossRef Tang, Y.; Jin, B.; Zhang, Y.Q.: Granular support vector machines with association rules mining for protein homology prediction. Artif. Intell. Med. 35(1–2), 121–134 (2005)CrossRef
17.
Zurück zum Zitat Tang, Y.; Jin, B.; Zhang, Y.Q.; Fang, H.; Wang, B.: Granular support vector machines using linear decision hyperplanes for fast medical binary classification. In: Proceedings of FUZZ'05, The 14th IEEE International Conference on Fuzzy Systems, pp. 138–142 (2005). Tang, Y.; Jin, B.; Zhang, Y.Q.; Fang, H.; Wang, B.: Granular support vector machines using linear decision hyperplanes for fast medical binary classification. In: Proceedings of FUZZ'05, The 14th IEEE International Conference on Fuzzy Systems, pp. 138–142 (2005).
18.
Zurück zum Zitat Tang, Y.; Zhang, Y.Q.; Huang, Z.; Hu, X.T.; Zhao, Y.: Granular SVM-RFE feature selection algorithm for reliable cancer-related gene subsets extraction on microarray gene expression data. In: Proceedings of IEEE Symp. Bioinformatics and Bioeng, pp. 290–293 (2005). Tang, Y.; Zhang, Y.Q.; Huang, Z.; Hu, X.T.; Zhao, Y.: Granular SVM-RFE feature selection algorithm for reliable cancer-related gene subsets extraction on microarray gene expression data. In: Proceedings of IEEE Symp. Bioinformatics and Bioeng, pp. 290–293 (2005).
19.
Zurück zum Zitat Prabhjot, K.; Gosain, A.: Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise, p. 23–30. ICT Based Innovations. Springer, Singapore (2018) Prabhjot, K.; Gosain, A.: Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise, p. 23–30. ICT Based Innovations. Springer, Singapore (2018)
20.
Zurück zum Zitat Salvador, G.; Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)CrossRef Salvador, G.; Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)CrossRef
21.
Zurück zum Zitat Galar, M., et al.: EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46, 3460–3471 (2013)CrossRef Galar, M., et al.: EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling. Pattern Recogn. 46, 3460–3471 (2013)CrossRef
22.
Zurück zum Zitat Larry, E.J.: The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination. Found. Genet. Algorithms 1, 265–283 (1991) Larry, E.J.: The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination. Found. Genet. Algorithms 1, 265–283 (1991)
23.
Zurück zum Zitat Yen, S.-J.; Lee, Y.-S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36, 5718–5727 (2009)CrossRef Yen, S.-J.; Lee, Y.-S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst. Appl. 36, 5718–5727 (2009)CrossRef
24.
Zurück zum Zitat Zhang, J.; Mani, I.: KNN approach to unbalanced data distributions: A case study involving information extraction. In: Proceedings of ICML ‘2003 Workshop on Learning from Imbalanced Data-Sets, Vol. 126 (2003). Zhang, J.; Mani, I.: KNN approach to unbalanced data distributions: A case study involving information extraction. In: Proceedings of ICML ‘2003 Workshop on Learning from Imbalanced Data-Sets, Vol. 126 (2003).
25.
Zurück zum Zitat Rahman, M.M.; Davis, D.: Cluster based under-sampling for unbalanced cardiovascular data. In: Proceedings of the World Congress on Engineering, Vol. 3 (2013). Rahman, M.M.; Davis, D.: Cluster based under-sampling for unbalanced cardiovascular data. In: Proceedings of the World Congress on Engineering, Vol. 3 (2013).
26.
Zurück zum Zitat Sun, Z., et al.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48, 1623–1637 (2015)CrossRef Sun, Z., et al.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48, 1623–1637 (2015)CrossRef
27.
Zurück zum Zitat Fernando, S.-H., et al.: Predictive modeling of ICU healthcare-associated infections from imbalanced data. Using ensembles and a clustering-based undersampling approach. Appl. Sci. 9(24), 5287 (2019)CrossRef Fernando, S.-H., et al.: Predictive modeling of ICU healthcare-associated infections from imbalanced data. Using ensembles and a clustering-based undersampling approach. Appl. Sci. 9(24), 5287 (2019)CrossRef
28.
Zurück zum Zitat Devi, D.; Suyel, N.; Kadry, S.: A boosting-aided adaptive cluster-based undersampling approach for treatment of class imbalance problem. Int. J. Data Warehousing Min. (IJDWM) 16(3), 60–86 (2020)CrossRef Devi, D.; Suyel, N.; Kadry, S.: A boosting-aided adaptive cluster-based undersampling approach for treatment of class imbalance problem. Int. J. Data Warehousing Min. (IJDWM) 16(3), 60–86 (2020)CrossRef
29.
Zurück zum Zitat Maruthi Padmaja, T.: Class Imbalance and Its Effect on PCA Preprocessing. Int. J. Knowl. Eng. Soft Data Paradigms 4(3), 272–294 (2014)CrossRef Maruthi Padmaja, T.: Class Imbalance and Its Effect on PCA Preprocessing. Int. J. Knowl. Eng. Soft Data Paradigms 4(3), 272–294 (2014)CrossRef
30.
Zurück zum Zitat Addabbo, D.; Maglietta, R.: Parallel selective sampling method for imbalanced and large data classification. Pattern Recognit. Lett. 62, 61–67 (2015)CrossRef Addabbo, D.; Maglietta, R.: Parallel selective sampling method for imbalanced and large data classification. Pattern Recognit. Lett. 62, 61–67 (2015)CrossRef
31.
Zurück zum Zitat Kaur, P.; Gosain, A.: An intelligent undersampling technique based upon intuitionistic fuzzy sets to alleviate class imbalance problem of classification with noisy environment. Int. J. Intell. Eng. Inf. 6(5), 417–433 (2018) Kaur, P.; Gosain, A.: An intelligent undersampling technique based upon intuitionistic fuzzy sets to alleviate class imbalance problem of classification with noisy environment. Int. J. Intell. Eng. Inf. 6(5), 417–433 (2018)
32.
Zurück zum Zitat Zhang, J.; Wang, T.; Ng, W.W.Y.; Zhang, S.; Nugent, C.D.: Undersampling near Decision Boundary for Imbalance Problems. In: International Conference on Machine Learning and Cybernetics (ICMLC); IEEE (2019). Zhang, J.; Wang, T.; Ng, W.W.Y.; Zhang, S.; Nugent, C.D.: Undersampling near Decision Boundary for Imbalance Problems. In: International Conference on Machine Learning and Cybernetics (ICMLC); IEEE (2019).
33.
Zurück zum Zitat Liu, T., et al.: A design of information granule-based under-sampling method in imbalanced data classification. Soft. Comput. 24, 17333–17347 (2020)CrossRef Liu, T., et al.: A design of information granule-based under-sampling method in imbalanced data classification. Soft. Comput. 24, 17333–17347 (2020)CrossRef
34.
Zurück zum Zitat Fernandez, A., et al.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and adhoc approaches. Knowl. Based Syst. 42, 97–110 (2013)CrossRef Fernandez, A., et al.: Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and adhoc approaches. Knowl. Based Syst. 42, 97–110 (2013)CrossRef
35.
Zurück zum Zitat Batista, G.E.A.P.A., et al.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Expl. Newl. 6(1), 20–29 (2004)CrossRef Batista, G.E.A.P.A., et al.: A study of the behaviour of several methods for balancing machine learning training data. SIGKDD Expl. Newl. 6(1), 20–29 (2004)CrossRef
36.
Zurück zum Zitat Jo, T.; Japkowicz, N.: Class Imbalances versus Small Disjuncts. ACM SIGKDD Explor. Newsl 6(1), 40–49 (2004)CrossRef Jo, T.; Japkowicz, N.: Class Imbalances versus Small Disjuncts. ACM SIGKDD Explor. Newsl 6(1), 40–49 (2004)CrossRef
37.
Zurück zum Zitat He, H.; Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef He, H.; Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef
38.
Zurück zum Zitat Chawla, N.V., et al.: SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATHCrossRef Chawla, N.V., et al.: SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATHCrossRef
39.
Zurück zum Zitat Saez, J.A. et al.: Managing Borderline and Noisy examples in Imbalanced Classification by combining SMOTE with Ensemble Filtering. In: Proceedings of IDEAL2014, LNCS, Vol. 8669, pp. 61–68. Springer (2014). Saez, J.A. et al.: Managing Borderline and Noisy examples in Imbalanced Classification by combining SMOTE with Ensemble Filtering. In: Proceedings of IDEAL2014, LNCS, Vol. 8669, pp. 61–68. Springer (2014).
40.
Zurück zum Zitat Akbani, R.; Kwek, S.; Japkowicz, N.: Applying Support Vector Machines to Imbalanced Datasets. In: Proceedings of ECML 2004, LNAI 3201, pp. 39–50. Springer (2004) Akbani, R.; Kwek, S.; Japkowicz, N.: Applying Support Vector Machines to Imbalanced Datasets. In: Proceedings of ECML 2004, LNAI 3201, pp. 39–50. Springer (2004)
41.
Zurück zum Zitat Yong, Z.; Wang, D.: A cost-sensitive ensemble method for class-imbalanced datasets. Abst. Appl. Anal. Vol. 2013, Hindawi (2013). Yong, Z.; Wang, D.: A cost-sensitive ensemble method for class-imbalanced datasets. Abst. Appl. Anal. Vol. 2013, Hindawi (2013).
42.
Zurück zum Zitat Hui, H.; Wang, W.-Y.; Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of International Conference on Intelligent Computing. Springer (2005). Hui, H.; Wang, W.-Y.; Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of International Conference on Intelligent Computing. Springer (2005).
43.
Zurück zum Zitat Haibo, H., et al.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IEEE International Joint Conference on Neural Network. IEEE (2008). Haibo, H., et al.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IEEE International Joint Conference on Neural Network. IEEE (2008).
44.
Zurück zum Zitat Tang, S.; Chen, S.: The Generation Mechanism of Synthetic Minority Class Examples. In: Proceedings of the 5th International Conference on Information Technology and Application in Biomedicine in conjunction with The 2nd International Symposium & Summer School on Biomedical and Health Engineering Shenzhen, China, May 30–31, pp. 444–447 (2008). Tang, S.; Chen, S.: The Generation Mechanism of Synthetic Minority Class Examples. In: Proceedings of the 5th International Conference on Information Technology and Application in Biomedicine in conjunction with The 2nd International Symposium & Summer School on Biomedical and Health Engineering Shenzhen, China, May 30–31, pp. 444–447 (2008).
45.
Zurück zum Zitat Stefanowski, J.; Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Proceedings of International Conference on Data Warehousing and Knowledge Discovery, pp. 283–292. Springer (2008). Stefanowski, J.; Wilk, S.: Selective pre-processing of imbalanced data for improving classification performance. In: Proceedings of International Conference on Data Warehousing and Knowledge Discovery, pp. 283–292. Springer (2008).
46.
Zurück zum Zitat Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging, boosting and hybrid based approaches. IEEE Trans. Syst. Man Cybern.-Part C: Appl. Rev. 42(4), 463–484 (2011)CrossRef Galar, M., et al.: A review on ensembles for the class imbalance problem: bagging, boosting and hybrid based approaches. IEEE Trans. Syst. Man Cybern.-Part C: Appl. Rev. 42(4), 463–484 (2011)CrossRef
47.
Zurück zum Zitat Hu, S.; Liang, Y.; He, Y.: MSMOTE: Improving Classification Performance When Training Data is Imbalanced, 2009 Second International Workshop on Computer Science and Engineering. Hu, S.; Liang, Y.; He, Y.: MSMOTE: Improving Classification Performance When Training Data is Imbalanced, 2009 Second International Workshop on Computer Science and Engineering.
48.
Zurück zum Zitat Chumphol, B.; Sinapiromsaran, K.; Lursinsap, C.: Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2009). Chumphol, B.; Sinapiromsaran, K.; Lursinsap, C.: Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer (2009).
49.
Zurück zum Zitat Feng, L.; Qiu, M.-H.; Wang, Y.-X.; Xiang, Q.-L.; Yang, Y.-F.; Liu, K.: A fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recognit. Lett. 31, 1216–1225 (2010)CrossRef Feng, L.; Qiu, M.-H.; Wang, Y.-X.; Xiang, Q.-L.; Yang, Y.-F.; Liu, K.: A fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recognit. Lett. 31, 1216–1225 (2010)CrossRef
50.
Zurück zum Zitat Mi, Y.: Imbalanced classification based on Active Learning SMOTE. Res. J. Appl. Sci. Eng. Tech. 5(3), 944–949 (2013)CrossRef Mi, Y.: Imbalanced classification based on Active Learning SMOTE. Res. J. Appl. Sci. Eng. Tech. 5(3), 944–949 (2013)CrossRef
51.
Zurück zum Zitat Ai, X., et al.: Immune Centroids Oversampling method for binary classification. Comput. Intell. Neurosci. 2015, 11 (2015)CrossRef Ai, X., et al.: Immune Centroids Oversampling method for binary classification. Comput. Intell. Neurosci. 2015, 11 (2015)CrossRef
53.
Zurück zum Zitat Shaoning, P., et al.: Dynamic class imbalance learning for incremental LPSVM. Neural Netw. 44, 87–100 (2013)MATHCrossRef Shaoning, P., et al.: Dynamic class imbalance learning for incremental LPSVM. Neural Netw. 44, 87–100 (2013)MATHCrossRef
55.
Zurück zum Zitat Nnamoko, N.; Korkontzelos, l.: Efficient treatment of outliers and class imbalance for diabetes prediction. Artif. Intell. Med. 104, 101815 (2020)CrossRef Nnamoko, N.; Korkontzelos, l.: Efficient treatment of outliers and class imbalance for diabetes prediction. Artif. Intell. Med. 104, 101815 (2020)CrossRef
56.
Zurück zum Zitat Pan, T., et al.: Learning imbalanced datasets based on SMOTE and Gaussian distribution. Inf. Sci. 512, 1214–1233 (2020)CrossRef Pan, T., et al.: Learning imbalanced datasets based on SMOTE and Gaussian distribution. Inf. Sci. 512, 1214–1233 (2020)CrossRef
57.
Zurück zum Zitat Son, M.; Jung, S.; Moon, J.; Hwang, E. BCGAN-Based over-Sampling Scheme for Imbalanced Data. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE (2020). Son, M.; Jung, S.; Moon, J.; Hwang, E. BCGAN-Based over-Sampling Scheme for Imbalanced Data. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE (2020).
58.
Zurück zum Zitat Pal, B.; Tarafder, A.K.; Rahman, M.D.S.: Synthetic samples generation for imbalance class distribution with LSTM recurrent neural networks. In: Proceedings of the International Conference on Computing Advancements (2020). Pal, B.; Tarafder, A.K.; Rahman, M.D.S.: Synthetic samples generation for imbalance class distribution with LSTM recurrent neural networks. In: Proceedings of the International Conference on Computing Advancements (2020).
59.
Zurück zum Zitat Li, D.-C., et al.: Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput. Oper. Res. 34(4), 966–982 (2007)MATHCrossRef Li, D.-C., et al.: Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput. Oper. Res. 34(4), 966–982 (2007)MATHCrossRef
60.
Zurück zum Zitat Enislay, R., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)CrossRef Enislay, R., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)CrossRef
61.
Zurück zum Zitat Majid, A.-R.M.; Alhakbani, H.A.: Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions. In: Proceedings of SAI Computing Conference (SAI). IEEE (2016). Majid, A.-R.M.; Alhakbani, H.A.: Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions. In: Proceedings of SAI Computing Conference (SAI). IEEE (2016).
62.
Zurück zum Zitat Uriz, M., et al.: FUZZ-EQ: A data equalizer for boosting the discrimination power of fuzzy classifiers. Appl. Soft Comput. 93, 1099 (2020)CrossRef Uriz, M., et al.: FUZZ-EQ: A data equalizer for boosting the discrimination power of fuzzy classifiers. Appl. Soft Comput. 93, 1099 (2020)CrossRef
63.
Zurück zum Zitat Koziarski, M.: CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification (2020). Koziarski, M.: CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification (2020).
64.
Zurück zum Zitat Kaur, P.; Gosain, A.: Robust hybrid data-level sampling approach to handle imbalanced data during classification. Soft. Comput. 24(20), 15715–15732 (2020)CrossRef Kaur, P.; Gosain, A.: Robust hybrid data-level sampling approach to handle imbalanced data during classification. Soft. Comput. 24(20), 15715–15732 (2020)CrossRef
66.
Zurück zum Zitat Stefan, K.; Personnaz, L.; Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network, p. 41–50. Neurocomputing, Springer, Berlin, Heidelberg (1999) Stefan, K.; Personnaz, L.; Dreyfus, G.: Single-layer learning revisited: a stepwise procedure for building and training a neural network, p. 41–50. Neurocomputing, Springer, Berlin, Heidelberg (1999)
67.
Zurück zum Zitat Kotsiantis, S., et al.: Handling imbalanced data-sets: A review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006) Kotsiantis, S., et al.: Handling imbalanced data-sets: A review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
68.
Zurück zum Zitat Weiss, G.: Mining with rarity: A unified framework. SIGKDD Explorations 6(1), 7–19 (2004)CrossRef Weiss, G.: Mining with rarity: A unified framework. SIGKDD Explorations 6(1), 7–19 (2004)CrossRef
69.
Zurück zum Zitat Nathalie, J.: Class imbalances: are we focusing on the right issue. In: Proceedings of Workshop on Learning from Imbalanced Data Sets II, Vol. 172 (2003). Nathalie, J.: Class imbalances: are we focusing on the right issue. In: Proceedings of Workshop on Learning from Imbalanced Data Sets II, Vol. 172 (2003).
70.
Zurück zum Zitat Hickey, R.: Learning rare class footprints: the reflex algorithm. In: Proceedings of the ICML’03, Vol. 3 (2003). Hickey, R.: Learning rare class footprints: the reflex algorithm. In: Proceedings of the ICML’03, Vol. 3 (2003).
71.
Zurück zum Zitat Nathalie, J.: Concept-learning in the presence of between-class and within-class imbalances. In: Proceedings of Conference of the Canadian Society for Computational Studies of Intelligence, pp. 67–77. Springer (2001). Nathalie, J.: Concept-learning in the presence of between-class and within-class imbalances. In: Proceedings of Conference of the Canadian Society for Computational Studies of Intelligence, pp. 67–77. Springer (2001).
72.
Zurück zum Zitat Garcia, V. et al.: Combined effects of Class Imbalance and Class Overlap on Instance-based Classification. In: Proceedings of IDEAL 2006, LNCS, Vol. 4224, pp. 371-378. Springer (2006). Garcia, V. et al.: Combined effects of Class Imbalance and Class Overlap on Instance-based Classification. In: Proceedings of IDEAL 2006, LNCS, Vol. 4224, pp. 371-378. Springer (2006).
73.
Zurück zum Zitat Prati, R.C.; Gustavo, E.A.P.A.B.; Monard, M.C.: Class imbalances versus class overlapping: an analysis of a learning system behaviour. In: Proceedings of Mexican International Conference on Artificial Intelligence, pp. 312–321. Springer (2014). Prati, R.C.; Gustavo, E.A.P.A.B.; Monard, M.C.: Class imbalances versus class overlapping: an analysis of a learning system behaviour. In: Proceedings of Mexican International Conference on Artificial Intelligence, pp. 312–321. Springer (2014).
74.
Zurück zum Zitat Visa, S.; Ralescu, A.: Learning imbalanced and overlapping classes using fuzzy sets. In: Proceedings of the ICML’03 Workshop on Learning from Imbalanced data-sets, Vol. 3 (2003). Visa, S.; Ralescu, A.: Learning imbalanced and overlapping classes using fuzzy sets. In: Proceedings of the ICML’03 Workshop on Learning from Imbalanced data-sets, Vol. 3 (2003).
75.
Zurück zum Zitat Dai, H.-L.: Class imbalance learning via a fuzzy total margin based support vector machine. Appl. SoftComput. 31, 172–184 (2015) Dai, H.-L.: Class imbalance learning via a fuzzy total margin based support vector machine. Appl. SoftComput. 31, 172–184 (2015)
76.
Zurück zum Zitat Masko, D.; Hensman, P. The impact of imbalanced training data for convolutional neural networks (2015). Masko, D.; Hensman, P. The impact of imbalanced training data for convolutional neural networks (2015).
77.
Zurück zum Zitat Johnson, J.M.; Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019)CrossRef Johnson, J.M.; Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 1–54 (2019)CrossRef
78.
Zurück zum Zitat Lee, H.; Park, M.; Kim, J.: Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning. In: IEEE international conference on image processing (ICIP). IEEE, 2016. Lee, H.; Park, M.; Kim, J.: Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning. In: IEEE international conference on image processing (ICIP). IEEE, 2016.
79.
Zurück zum Zitat Samira, P., et al.: Dynamic sampling in convolutional neural networks for imbalanced data classification. In: IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE (2018). Samira, P., et al.: Dynamic sampling in convolutional neural networks for imbalanced data classification. In: IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE (2018).
80.
Zurück zum Zitat Buda, M.; Maki, A.; Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)CrossRef Buda, M.; Maki, A.; Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)CrossRef
81.
Zurück zum Zitat Wang, Y., et al.: GAN and CNN for imbalanced partial discharge pattern recognition in GIS. High Voltage (2021). Wang, Y., et al.: GAN and CNN for imbalanced partial discharge pattern recognition in GIS. High Voltage (2021).
82.
Zurück zum Zitat Nazari, E., Branco, P.: On Oversampling via Generative Adversarial Networks under Different Data Difficulty Factors. In: 3rd International Workshop on Learning with Imbalanced Domains: Theory and Applications, pp. 76–89. PMLR (2021). Nazari, E., Branco, P.: On Oversampling via Generative Adversarial Networks under Different Data Difficulty Factors. In: 3rd International Workshop on Learning with Imbalanced Domains: Theory and Applications, pp. 76–89. PMLR (2021).
83.
Zurück zum Zitat Amalapuram, S.K.; Reddy, T.T.; Channappayya, S.S., Tamma, B.R.: On Handling Class Imbalance in Continual Learning based Network Intrusion Detection Systems. In: 1st International Conference on AI-ML-Systems, pp. 1–7 (2021). Amalapuram, S.K.; Reddy, T.T.; Channappayya, S.S., Tamma, B.R.: On Handling Class Imbalance in Continual Learning based Network Intrusion Detection Systems. In: 1st International Conference on AI-ML-Systems, pp. 1–7 (2021).
84.
Zurück zum Zitat Piboon, P.; Sinapiromsaran, K.: Mass Ratio Variance Majority Undersampling and Minority Oversampling Technique for Class Imbalance. In: Fuzzy Systems and Data Mining VII, pp. 152–161. IOS Press (2021). Piboon, P.; Sinapiromsaran, K.: Mass Ratio Variance Majority Undersampling and Minority Oversampling Technique for Class Imbalance. In: Fuzzy Systems and Data Mining VII, pp. 152–161. IOS Press (2021).
Metadaten
Titel
Evaluating the Performance of Data Level Methods Using KEEL Tool to Address Class Imbalance Problem
verfasst von
Kamlesh Upadhyay
Prabhjot Kaur
Deepak Kumar Verma
Publikationsdatum
29.11.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Arabian Journal for Science and Engineering / Ausgabe 8/2022
Print ISSN: 2193-567X
Elektronische ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-021-06377-x

Weitere Artikel der Ausgabe 8/2022

Arabian Journal for Science and Engineering 8/2022 Zur Ausgabe

Research Article-Computer Engineering and Computer Science

Application of Mathematical Modeling in Prediction of COVID-19 Transmission Dynamics

Research Article-Computer Engineering and Computer Science

Multiple Ant Colony Algorithm Combining Community Relationship Network

Research Article-Computer Engineering and Computer Science

Prostate Segmentation via Dynamic Fusion Model

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.