Skip to main content
Erschienen in:
Buchtitelbild

2020 | OriginalPaper | Buchkapitel

A Preprocessing Approach for Class-Imbalanced Data Using SMOTE and Belief Function Theory

verfasst von : Fares Grina, Zied Elouedi, Eric Lefevre

Erschienen in: Intelligent Data Engineering and Automated Learning – IDEAL 2020

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dealing with imbalanced datasets at the preprocessing level is an efficient strategy used by many methods to re-balance the data and improve classification performance. Specifically, SMOTE is a popular oversampling technique which modifies the training data by adding artificial minority samples. However, SMOTE may create instances in noisy and overlapping areas, far from safe regions. To tackle this issue, we propose SMOTE-BFT, in which we use the belief function theory to remove generated minority instances that are not in safe regions. After applying SMOTE, each generated minority instance is represented by an evidential membership structure, which provides detailed information about class memberships. Rules based on the belief function theory are then enforced to detect and remove generated instances that are in noisy and overlapping regions. Experiments on noisy artificial datasets show that our proposal significantly outperforms other popular oversampling methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recog. 30(7), 1145–1159 (1997)CrossRef Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recog. 30(7), 1145–1159 (1997)CrossRef
3.
Zurück zum Zitat Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43CrossRef Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://​doi.​org/​10.​1007/​978-3-642-01307-2_​43CrossRef
4.
Zurück zum Zitat Chawla, N., Japkowicz, N., Kołcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6(1), 1–6 (2004)CrossRef Chawla, N., Japkowicz, N., Kołcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6(1), 1–6 (2004)CrossRef
5.
Zurück zum Zitat Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 1770–1775. IEEE (2006) Chen, K., Lu, B.L., Kwok, J.T.: Efficient classification of multi-label and imbalanced data using min-max modular classifiers. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 1770–1775. IEEE (2006)
6.
Zurück zum Zitat Dempster, A.P.: A generalization of bayesian inference. J. R. Stat. Soc. Ser. B (Methodol.) 30(2), 205–232 (1968)MathSciNetMATH Dempster, A.P.: A generalization of bayesian inference. J. R. Stat. Soc. Ser. B (Methodol.) 30(2), 205–232 (1968)MathSciNetMATH
7.
Zurück zum Zitat Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(1), 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(1), 1–30 (2006)MathSciNetMATH
8.
Zurück zum Zitat Denoeux, T.: A k-nearest neighbor classification rule based on dempster-shafer theory. systems, man and cybernetics. IEEE Trans. 219, 804–813 (1995) Denoeux, T.: A k-nearest neighbor classification rule based on dempster-shafer theory. systems, man and cybernetics. IEEE Trans. 219, 804–813 (1995)
9.
Zurück zum Zitat Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)CrossRef Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)CrossRef
10.
11.
Zurück zum Zitat He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef
12.
Zurück zum Zitat He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 1322–1328. IEEE (2008) He, H., Bai, Y., Garcia, E.A., Li, S.: Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks, pp. 1322–1328. IEEE (2008)
14.
15.
Zurück zum Zitat Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB *: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)CrossRef Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB *: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2012)CrossRef
16.
Zurück zum Zitat Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)CrossRef Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015)CrossRef
17.
Zurück zum Zitat Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, New Jersey (1976)MATH Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, New Jersey (1976)MATH
Metadaten
Titel
A Preprocessing Approach for Class-Imbalanced Data Using SMOTE and Belief Function Theory
verfasst von
Fares Grina
Zied Elouedi
Eric Lefevre
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-62365-4_1

Premium Partner