Skip to main content
Top

2018 | OriginalPaper | Chapter

An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data

Authors : Szymon Wojciechowski, Szymon Wilk, Jerzy Stefanowski

Published in: Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper we propose a new algorithm called SPIDER3 for selective preprocessing of multi-class imbalanced data sets. While it borrows selected ideas (i.e., combination of relabeling and local resampling) from its predecessor – SPIDER2, it introduces several important extensions. Unlike SPIDER2, it is able to handle directly multi-class problems. Moreover, it considers the relevance of specific decision classes to control the order of their processing. Finally, it uses information about relations between specific classes (modeled with misclassification costs) to better control the extent of changes introduced locally to preprocessed data. We performed a computational experiment on artificial 3-class data sets to evaluate and compare SPIDER3 to SPIDER2 with temporarily aggregated classes and the results confirmed advantages of the new algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms and Applications. Wiley, New York (2013)CrossRefMATH He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms and Applications. Wiley, New York (2013)CrossRefMATH
4.
go back to reference Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artifi. Intell. 5(4), 221–232 (2016)CrossRef Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artifi. Intell. 5(4), 221–232 (2016)CrossRef
5.
go back to reference Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, pp. 104–111 (2011) Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of SMOTE for mining imbalanced data. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, pp. 104–111 (2011)
6.
go back to reference Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inform. Syst. 46, 563–597 (2016)CrossRef Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inform. Syst. 46, 563–597 (2016)CrossRef
7.
go back to reference Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 158–167. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13529-3_18 CrossRef Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 158–167. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-13529-3_​18 CrossRef
8.
go back to reference Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recognit. 57, 164–178 (2015)CrossRef Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recognit. 57, 164–178 (2015)CrossRef
9.
go back to reference Wilk, S., Stefanowski, J., Wojciechowski, S., Farion, K.J., Michalowski, W.: Application of preprocessing methods to imbalanced clinical data: an experimental study. In: Piętka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 503–515. Springer, Cham (2016). doi:10.1007/978-3-319-39796-2_41 Wilk, S., Stefanowski, J., Wojciechowski, S., Farion, K.J., Michalowski, W.: Application of preprocessing methods to imbalanced clinical data: an experimental study. In: Piętka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technologies in Medicine. AISC, vol. 471, pp. 503–515. Springer, Cham (2016). doi:10.​1007/​978-3-319-39796-2_​41
Metadata
Title
An Algorithm for Selective Preprocessing of Multi-class Imbalanced Data
Authors
Szymon Wojciechowski
Szymon Wilk
Jerzy Stefanowski
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-59162-9_25

Premium Partner