Skip to main content

2016 | OriginalPaper | Buchkapitel

On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Undersampling bagging ensembles specialized for class imbalanced data are considered. Particular attention is paid to Roughly Balanced Bagging, as it leads to better classification performance than other extensions of bagging. We experimentally analyze its properties with respect to bootstrap construction, deciding on the number of component classifiers, their diversity, and ability to deal with the most difficult types of the minority examples. We also discuss further extensions of undersampling bagging, where the data difficulty factors influence sampling examples into bootstraps.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We are grateful to our Master students Lukasz Idkowiak and Mateusz Lango for their help in implementing and testing these algorithms.
 
Literatur
1.
Zurück zum Zitat Blaszczynski, J., Stefanowski, J., Idkowiak L.: Extending bagging for imbalanced data. In: Proceedings of the 8th CORES 2013. Springer Series on Advances in Intelligent Systems and Computing, vol. 226, pp. 269–278 (2013) Blaszczynski, J., Stefanowski, J., Idkowiak L.: Extending bagging for imbalanced data. In: Proceedings of the 8th CORES 2013. Springer Series on Advances in Intelligent Systems and Computing, vol. 226, pp. 269–278 (2013)
2.
Zurück zum Zitat Blaszczynski, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150-Part B, 529–542 (2015) Blaszczynski, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150-Part B, 529–542 (2015)
3.
Zurück zum Zitat Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 99, 1–22 (2011) Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 99, 1–22 (2011)
4.
Zurück zum Zitat He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Data Knowl. Eng. 21(9), 1263–1284 (2009)CrossRef He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Data Knowl. Eng. 21(9), 1263–1284 (2009)CrossRef
5.
Zurück zum Zitat He, H., Ma, Y. (eds.): IEEE Imbalanced Learning. Foundations, Algorithms and Applications. Wiley, NewYork (2013) He, H., Ma, Y. (eds.): IEEE Imbalanced Learning. Foundations, Algorithms and Applications. Wiley, NewYork (2013)
6.
Zurück zum Zitat Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Stat. Anal. Data Min. 2(5–6), 412–426 (2009)MathSciNetCrossRef Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Stat. Anal. Data Min. 2(5–6), 412–426 (2009)MathSciNetCrossRef
7.
Zurück zum Zitat Japkowicz, N., Mohak, S.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)CrossRefMATH Japkowicz, N., Mohak, S.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)CrossRefMATH
8.
Zurück zum Zitat Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern.-Part A 41(3), 552–568 (2011)CrossRef Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern.-Part A 41(3), 552–568 (2011)CrossRef
9.
Zurück zum Zitat Krawczyk, N., Woźniak, M.: Analysis of diversity assurance methods for combined classifiers. In: Choraś, R.S. (ed.) Image Processing and Communications Challenges 4. Advances in Intelligent Systems and Computing, vol. 184, pp. 177–184. Springer, Heidelberg (2013) Krawczyk, N., Woźniak, M.: Analysis of diversity assurance methods for combined classifiers. In: Choraś, R.S. (ed.) Image Processing and Communications Challenges 4. Advances in Intelligent Systems and Computing, vol. 184, pp. 177–184. Springer, Heidelberg (2013)
10.
Zurück zum Zitat Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms, 2nd edn. Wiley, NewYork (2014)MATH Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms, 2nd edn. Wiley, NewYork (2014)MATH
11.
Zurück zum Zitat Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 257, 113–141 (2014)CrossRef Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 257, 113–141 (2014)CrossRef
12.
Zurück zum Zitat Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 6086, pp. 158–167. Springer, Heidelberg (2010) Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)
13.
Zurück zum Zitat Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) Hybrid Artificial Intelligent Systems. Lecture Notes in Computer Science, vol. 7209, pp. 139–150. Springer, Heidelberg (2012) Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) Hybrid Artificial Intelligent Systems. Lecture Notes in Computer Science, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)
14.
Zurück zum Zitat Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. (accepted) (2015). doi:10.1007/s10844-015-0368-1 Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. (accepted) (2015). doi:10.​1007/​s10844-015-0368-1
15.
Zurück zum Zitat Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In Proc. IEEE Symp. Comput. Intell. Data Min. pp. 324–331 (2009) Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In Proc. IEEE Symp. Comput. Intell. Data Min. pp. 324–331 (2009)
16.
Zurück zum Zitat Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)CrossRef Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)CrossRef
Metadaten
Titel
On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data
verfasst von
Jerzy Stefanowski
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-26227-7_38

Premium Partner