Skip to main content
Top

2016 | OriginalPaper | Chapter

On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Undersampling bagging ensembles specialized for class imbalanced data are considered. Particular attention is paid to Roughly Balanced Bagging, as it leads to better classification performance than other extensions of bagging. We experimentally analyze its properties with respect to bootstrap construction, deciding on the number of component classifiers, their diversity, and ability to deal with the most difficult types of the minority examples. We also discuss further extensions of undersampling bagging, where the data difficulty factors influence sampling examples into bootstraps.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We are grateful to our Master students Lukasz Idkowiak and Mateusz Lango for their help in implementing and testing these algorithms.
 
Literature
1.
go back to reference Blaszczynski, J., Stefanowski, J., Idkowiak L.: Extending bagging for imbalanced data. In: Proceedings of the 8th CORES 2013. Springer Series on Advances in Intelligent Systems and Computing, vol. 226, pp. 269–278 (2013) Blaszczynski, J., Stefanowski, J., Idkowiak L.: Extending bagging for imbalanced data. In: Proceedings of the 8th CORES 2013. Springer Series on Advances in Intelligent Systems and Computing, vol. 226, pp. 269–278 (2013)
2.
go back to reference Blaszczynski, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150-Part B, 529–542 (2015) Blaszczynski, J., Stefanowski, J.: Neighbourhood sampling in bagging for imbalanced data. Neurocomputing 150-Part B, 529–542 (2015)
3.
go back to reference Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 99, 1–22 (2011) Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 99, 1–22 (2011)
4.
go back to reference He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Data Knowl. Eng. 21(9), 1263–1284 (2009)CrossRef He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. Data Knowl. Eng. 21(9), 1263–1284 (2009)CrossRef
5.
go back to reference He, H., Ma, Y. (eds.): IEEE Imbalanced Learning. Foundations, Algorithms and Applications. Wiley, NewYork (2013) He, H., Ma, Y. (eds.): IEEE Imbalanced Learning. Foundations, Algorithms and Applications. Wiley, NewYork (2013)
6.
go back to reference Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Stat. Anal. Data Min. 2(5–6), 412–426 (2009)MathSciNetCrossRef Hido, S., Kashima, H.: Roughly balanced bagging for imbalance data. Stat. Anal. Data Min. 2(5–6), 412–426 (2009)MathSciNetCrossRef
7.
go back to reference Japkowicz, N., Mohak, S.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)CrossRefMATH Japkowicz, N., Mohak, S.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press, Cambridge (2011)CrossRefMATH
8.
go back to reference Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern.-Part A 41(3), 552–568 (2011)CrossRef Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern.-Part A 41(3), 552–568 (2011)CrossRef
9.
go back to reference Krawczyk, N., Woźniak, M.: Analysis of diversity assurance methods for combined classifiers. In: Choraś, R.S. (ed.) Image Processing and Communications Challenges 4. Advances in Intelligent Systems and Computing, vol. 184, pp. 177–184. Springer, Heidelberg (2013) Krawczyk, N., Woźniak, M.: Analysis of diversity assurance methods for combined classifiers. In: Choraś, R.S. (ed.) Image Processing and Communications Challenges 4. Advances in Intelligent Systems and Computing, vol. 184, pp. 177–184. Springer, Heidelberg (2013)
10.
go back to reference Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms, 2nd edn. Wiley, NewYork (2014)MATH Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms, 2nd edn. Wiley, NewYork (2014)MATH
11.
go back to reference Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 257, 113–141 (2014)CrossRef Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 257, 113–141 (2014)CrossRef
12.
go back to reference Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 6086, pp. 158–167. Springer, Heidelberg (2010) Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 6086, pp. 158–167. Springer, Heidelberg (2010)
13.
go back to reference Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) Hybrid Artificial Intelligent Systems. Lecture Notes in Computer Science, vol. 7209, pp. 139–150. Springer, Heidelberg (2012) Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) Hybrid Artificial Intelligent Systems. Lecture Notes in Computer Science, vol. 7209, pp. 139–150. Springer, Heidelberg (2012)
14.
go back to reference Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. (accepted) (2015). doi:10.1007/s10844-015-0368-1 Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. (accepted) (2015). doi:10.​1007/​s10844-015-0368-1
15.
go back to reference Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In Proc. IEEE Symp. Comput. Intell. Data Min. pp. 324–331 (2009) Wang, S., Yao, T.: Diversity analysis on imbalanced data sets by using ensemble models. In Proc. IEEE Symp. Comput. Intell. Data Min. pp. 324–331 (2009)
16.
go back to reference Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)CrossRef Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)CrossRef
Metadata
Title
On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data
Author
Jerzy Stefanowski
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-26227-7_38

Premium Partner