Skip to main content
Top
Published in: Knowledge and Information Systems 3/2024

23-09-2023 | Regular paper

Evidence-based adaptive oversampling algorithm for imbalanced classification

Authors: Chen-ju Lin, Florence Leony

Published in: Knowledge and Information Systems | Issue 3/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Classification task is complicated by several facts including skewed class proportion and unclear decision regions due to noise, class overlap, small disjunct, caused by large within-class variation. These issues make data classification difficult, reducing overall performance, and challenging to draw meaningful insights. In this research, the evidence-based adaptive oversampling algorithm (EVA-oversampling) based on Dempster–Shafer theory of evidence is developed for imbalance classification. This technique involves assigning probability regarding class belonging for each instance to represent uncertainty that each data point may hold. Synthetic data points are generated to make up for the under-representation of minority instances on the region with high confidence, thereby strengthening the minority class region. The experiments revealed that the proposed method worked effectively even in situations where imbalanced counts and data complexity would normally pose significant obstacles. This approach performs better than SMOTE, Borderline-SMOTE, ADASYN, MWMOTE, KMeansSMOTE, LoRAS, and SyMProD algorithms in terms of \(F_1\)-measure and G-mean for highly imbalanced data while maintaining the overall performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Dal Pozzolo A, Caelen O, Le Borgne Y-A, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl 41(10):4915–4928CrossRef Dal Pozzolo A, Caelen O, Le Borgne Y-A, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl 41(10):4915–4928CrossRef
2.
go back to reference Kelly D, Glavin FG, Barrett E (2022) Dowts–denial-of-wallet test simulator: synthetic data generation for preemptive defence. J Intell Inf Syst, 1–24 Kelly D, Glavin FG, Barrett E (2022) Dowts–denial-of-wallet test simulator: synthetic data generation for preemptive defence. J Intell Inf Syst, 1–24
3.
go back to reference Zhang T, Chen J, Li F, Zhang K, Lv H, He S, Xu E (2022) Intelligent fault diagnosis of machines with small & imbalanced data: a state-of-the-art review and possible extensions. ISA Trans 119:152–171CrossRefPubMed Zhang T, Chen J, Li F, Zhang K, Lv H, He S, Xu E (2022) Intelligent fault diagnosis of machines with small & imbalanced data: a state-of-the-art review and possible extensions. ISA Trans 119:152–171CrossRefPubMed
4.
go back to reference Guo R, Liu H, Xie G, Zhang Y (2021) Weld defect detection from imbalanced radiographic images based on contrast enhancement conditional generative adversarial network and transfer learning. IEEE Sens J 21(9):10844–10853ADSCrossRef Guo R, Liu H, Xie G, Zhang Y (2021) Weld defect detection from imbalanced radiographic images based on contrast enhancement conditional generative adversarial network and transfer learning. IEEE Sens J 21(9):10844–10853ADSCrossRef
5.
go back to reference Hammad M, Alkinani MH, Gupta B, El-Latif A, Ahmed A (2021) Myocardial infarction detection based on deep neural network on imbalanced data. Multimedia Syst, pp 1–13 Hammad M, Alkinani MH, Gupta B, El-Latif A, Ahmed A (2021) Myocardial infarction detection based on deep neural network on imbalanced data. Multimedia Syst, pp 1–13
6.
go back to reference Azhar NA, Pozi MSM, Din AM, Jatowt A (2022) An investigation of smote based methods for imbalanced datasets with data complexity analysis. IEEE Trans Knowl Data Eng Azhar NA, Pozi MSM, Din AM, Jatowt A (2022) An investigation of smote based methods for imbalanced datasets with data complexity analysis. IEEE Trans Knowl Data Eng
7.
go back to reference Santos MS, Abreu PH, Japkowicz N, Fernández A, Santos J (2023) A unifying view of class overlap and imbalance: key concepts, multi-view panorama, and open avenues for research. Information Fusion 89:228–253CrossRef Santos MS, Abreu PH, Japkowicz N, Fernández A, Santos J (2023) A unifying view of class overlap and imbalance: key concepts, multi-view panorama, and open avenues for research. Information Fusion 89:228–253CrossRef
8.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef
9.
go back to reference Fernández A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905MathSciNetCrossRef Fernández A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905MathSciNetCrossRef
10.
go back to reference Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. Springer Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, pp 878–887. Springer
11.
go back to reference Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining, pp 475–482 . Springer Bunkhumpornpat C, Sinapiromsaran K, Lursinsap C (2009) Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Pacific-Asia conference on knowledge discovery and data mining, pp 475–482 . Springer
12.
go back to reference He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence), pp 1322–1328 He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE World Congress on Computational Intelligence), pp 1322–1328
13.
go back to reference Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20CrossRef Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20CrossRef
14.
go back to reference Zhang Y, Li X, Gao L, Wang L, Wen L (2018) Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning. J Manuf Syst 48:34–50CrossRef Zhang Y, Li X, Gao L, Wang L, Wen L (2018) Imbalanced data fault diagnosis of rotating machinery using synthetic oversampling and feature learning. J Manuf Syst 48:34–50CrossRef
15.
go back to reference Wei J, Huang H, Yao L, Hu Y, Fan Q, Huang D (2020) Ia-suwo: an improving adaptive semi-unsupervised weighted oversampling for imbalanced classification problems. Knowl-Based Syst 203:106116CrossRef Wei J, Huang H, Yao L, Hu Y, Fan Q, Huang D (2020) Ia-suwo: an improving adaptive semi-unsupervised weighted oversampling for imbalanced classification problems. Knowl-Based Syst 203:106116CrossRef
16.
go back to reference Napierala K, Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 46(3):563–597CrossRef Napierala K, Stefanowski J (2016) Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst 46(3):563–597CrossRef
17.
go back to reference Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In: Conference on artificial intelligence in medicine in Europe, pp 63–66 . Springer Laurikkala J (2001) Improving identification of difficult small classes by balancing class distribution. In: Conference on artificial intelligence in medicine in Europe, pp 63–66 . Springer
18.
go back to reference Onan A (2019) Consensus clustering-based undersampling approach to imbalanced learning. Sci Program 2019 Onan A (2019) Consensus clustering-based undersampling approach to imbalanced learning. Sci Program 2019
19.
go back to reference Chen B, Xia S, Chen Z, Wang B, Wang G (2021) Rsmote: a self-adaptive robust smote for imbalanced problems with label noise. Inf Sci 553:397–428MathSciNetCrossRef Chen B, Xia S, Chen Z, Wang B, Wang G (2021) Rsmote: a self-adaptive robust smote for imbalanced problems with label noise. Inf Sci 553:397–428MathSciNetCrossRef
20.
go back to reference Dolo KM, Mnkandla E (2022) Modifying the smote and safe-level smote oversampling method to improve performance. In: 4th International conference on wireless, intelligent and distributed environment for communication: WIDECOM 2021, pp 47–59 . Springer Dolo KM, Mnkandla E (2022) Modifying the smote and safe-level smote oversampling method to improve performance. In: 4th International conference on wireless, intelligent and distributed environment for communication: WIDECOM 2021, pp 47–59 . Springer
21.
go back to reference Barua S, Islam MM, Yao X, Murase K (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425CrossRef Barua S, Islam MM, Yao X, Murase K (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425CrossRef
22.
go back to reference Kunakorntum I, Hinthong W, Phunchongharn P (2020) A synthetic minority based on probabilistic distribution (symprod) oversampling for imbalanced datasets. IEEE Access 8:114692–114704CrossRef Kunakorntum I, Hinthong W, Phunchongharn P (2020) A synthetic minority based on probabilistic distribution (symprod) oversampling for imbalanced datasets. IEEE Access 8:114692–114704CrossRef
23.
go back to reference Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng 28(1):238–251CrossRef Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng 28(1):238–251CrossRef
24.
go back to reference Bej S, Davtyan N, Wolfien M, Nassar M, Wolkenhauer O (2021) Loras: an oversampling approach for imbalanced datasets. Mach Learn 110:279–301MathSciNetCrossRef Bej S, Davtyan N, Wolfien M, Nassar M, Wolkenhauer O (2021) Loras: an oversampling approach for imbalanced datasets. Mach Learn 110:279–301MathSciNetCrossRef
25.
go back to reference Agrawal A, Viktor HL, Paquet E (2015) Scut: multi-class imbalanced data classification using smote and cluster-based undersampling. In: 2015 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3k), vol 1, pp 226–234 . IEEE Agrawal A, Viktor HL, Paquet E (2015) Scut: multi-class imbalanced data classification using smote and cluster-based undersampling. In: 2015 7th international joint conference on knowledge discovery, knowledge engineering and knowledge management (IC3k), vol 1, pp 226–234 . IEEE
26.
go back to reference Alejo R, García V, Pacheco-Sánchez JH (2015) An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Process Lett 42(3):603–617CrossRef Alejo R, García V, Pacheco-Sánchez JH (2015) An efficient over-sampling approach based on mean square error back-propagation for dealing with the multi-class imbalance problem. Neural Process Lett 42(3):603–617CrossRef
27.
go back to reference Koziarski M, Krawczyk B, Woźniak M (2017) Radial-based approach to imbalanced data oversampling. In: International conference on hybrid artificial intelligence systems, pp 318–327. Springer Koziarski M, Krawczyk B, Woźniak M (2017) Radial-based approach to imbalanced data oversampling. In: International conference on hybrid artificial intelligence systems, pp 318–327. Springer
28.
go back to reference Dang XT, Tran DH, Hirose O, Satou K (2015) Spy: A novel resampling method for improving classification performance in imbalanced data. In: 2015 Seventh international conference on knowledge and systems engineering (KSE), pp 280–285. IEEE Dang XT, Tran DH, Hirose O, Satou K (2015) Spy: A novel resampling method for improving classification performance in imbalanced data. In: 2015 Seventh international conference on knowledge and systems engineering (KSE), pp 280–285. IEEE
29.
go back to reference Cervantes J, Garcia-Lamont F, Rodriguez L, López A, Castilla JR, Trueba A (2017) Pso-based method for svm classification on skewed data sets. Neurocomputing 228:187–197CrossRef Cervantes J, Garcia-Lamont F, Rodriguez L, López A, Castilla JR, Trueba A (2017) Pso-based method for svm classification on skewed data sets. Neurocomputing 228:187–197CrossRef
30.
go back to reference Dempster AP (1968) Upper and lower probabilities generated by a random closed interval. Ann Math Stat, pp 957–966 Dempster AP (1968) Upper and lower probabilities generated by a random closed interval. Ann Math Stat, pp 957–966
31.
go back to reference Shafer G (1976) A mathematical theory of evidence, vol 42. Princeton University Press, New JerseyCrossRef Shafer G (1976) A mathematical theory of evidence, vol 42. Princeton University Press, New JerseyCrossRef
32.
go back to reference Chen L, Diao L, Sang J (2019) A novel weighted evidence combination rule based on improved entropy function with a diagnosis application. Int J Distrib Sens Netw 15(1):1550147718823990CrossRef Chen L, Diao L, Sang J (2019) A novel weighted evidence combination rule based on improved entropy function with a diagnosis application. Int J Distrib Sens Netw 15(1):1550147718823990CrossRef
33.
go back to reference Tong Z, Xu P, Denoeux T (2021) An evidential classifier based on Dempster–Shafer theory and deep learning. Neurocomputing 450:275–293CrossRef Tong Z, Xu P, Denoeux T (2021) An evidential classifier based on Dempster–Shafer theory and deep learning. Neurocomputing 450:275–293CrossRef
34.
go back to reference Grina F, Elouedi Z, Lefevre E (2021) Evidential undersampling approach for imbalanced datasets with class-overlapping and noise. In: International conference on modeling decisions for artificial intelligence, pp 181–192. Springer Grina F, Elouedi Z, Lefevre E (2021) Evidential undersampling approach for imbalanced datasets with class-overlapping and noise. In: International conference on modeling decisions for artificial intelligence, pp 181–192. Springer
35.
go back to reference Grina F, Elouedi Z, Lefevre E (2020) A preprocessing approach for class-imbalanced data using smote and belief function theory. In: Analide C, Novais P, Camacho D, Yin H (eds) Intelligent data engineering and automated learning—IDEAL 2020. Springer, Cham, pp 3–11CrossRef Grina F, Elouedi Z, Lefevre E (2020) A preprocessing approach for class-imbalanced data using smote and belief function theory. In: Analide C, Novais P, Camacho D, Yin H (eds) Intelligent data engineering and automated learning—IDEAL 2020. Springer, Cham, pp 3–11CrossRef
36.
go back to reference Grina F, Elouedi Z, Lefèvre E (2021) Uncertainty-aware resampling method for imbalanced classification using evidence theory. In: Vejnarová J, Wilson N (eds) Symbolic and quantitative approaches to reasoning with uncertainty. Springer, Cham, pp 342–353CrossRef Grina F, Elouedi Z, Lefèvre E (2021) Uncertainty-aware resampling method for imbalanced classification using evidence theory. In: Vejnarová J, Wilson N (eds) Symbolic and quantitative approaches to reasoning with uncertainty. Springer, Cham, pp 342–353CrossRef
37.
go back to reference Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813CrossRef Denoeux T (1995) A k-nearest neighbor classification rule based on Dempster–Shafer theory. IEEE Trans Syst Man Cybern 25(5):804–813CrossRef
38.
go back to reference Xiao F, Qin B (2018) A weighted combination method for conflicting evidence in multi-sensor data fusion. Sensors 18(5) Xiao F, Qin B (2018) A weighted combination method for conflicting evidence in multi-sensor data fusion. Sensors 18(5)
40.
go back to reference Capó M, Pérez A, Lozano JA (2020) An efficient k-means clustering algorithm for tall data. Data Min Knowl Disc 34:776–811MathSciNetCrossRef Capó M, Pérez A, Lozano JA (2020) An efficient k-means clustering algorithm for tall data. Data Min Knowl Disc 34:776–811MathSciNetCrossRef
Metadata
Title
Evidence-based adaptive oversampling algorithm for imbalanced classification
Authors
Chen-ju Lin
Florence Leony
Publication date
23-09-2023
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 3/2024
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-023-01985-5

Other articles of this Issue 3/2024

Knowledge and Information Systems 3/2024 Go to the issue

Premium Partner