Skip to main content

2019 | OriginalPaper | Buchkapitel

A Review on Ensembles-Based Approach to Overcome Class Imbalance Problem

verfasst von : Sujit Kumar, J. N. Madhuri, Mausumi Goswami

Erschienen in: Emerging Research in Computing, Information, Communication and Applications

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Predictive analytics incorporate various statistical techniques from predictive modelling, machine learning and data mining to analyse large database for future prediction. Data mining is a powerful technology to help organization to concentrate on most important data by extracting useful information from large database. With the improvement in technology day by day large amount of data are collected in raw form and as a result necessity of using data mining techniques in various domains are increasing. Class imbalance is an open challenge problem in data mining and machine learning. It occurs due to imbalanced data set. A data set is considered as imbalanced when a data set contains number of instance in one class vastly outnumber the number of instances in other class. When traditional data mining algorithms trained with imbalanced data sets, it gives suboptimal classification model. Recently class imbalance problem have gain significance attention from data mining and machine learning researcher community due to its presence in many real world problem such as remote-sensing, pollution detection, risk management, fraud detection and medical diagnosis. Several methods have been proposed to overcome the problem of class imbalance problem. In this paper, our goal is to review various methods which are proposed to overcome the effect of imbalance data on classification learning algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Yang, Z., Tang, W., Shintemirov, A., & Wu, Q. (2009). Association rule mining based dissolved gas analysis for fault diagnosis of power transformers. IEEE Transactions on Systems, Man, and Cybernetics, Part C, (Applications and Reviews), 39(6), 597–610.CrossRef Yang, Z., Tang, W., Shintemirov, A., & Wu, Q. (2009). Association rule mining based dissolved gas analysis for fault diagnosis of power transformers. IEEE Transactions on Systems, Man, and Cybernetics, Part C, (Applications and Reviews), 39(6), 597–610.CrossRef
3.
Zurück zum Zitat Zhu, Z.-B., & Song, Z.-H. (2010). Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chemical Engineering Research and Design, 88(8), 936–951.CrossRef Zhu, Z.-B., & Song, Z.-H. (2010). Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chemical Engineering Research and Design, 88(8), 936–951.CrossRef
4.
Zurück zum Zitat Mazurowski, M. A., Habas, P. A., Zurada, J. M., Lo, J. Y., Baker, J. A., & Tourassi, G. D. (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks, 21(2–3), 427–436.CrossRef Mazurowski, M. A., Habas, P. A., Zurada, J. M., Lo, J. Y., Baker, J. A., & Tourassi, G. D. (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks, 21(2–3), 427–436.CrossRef
5.
Zurück zum Zitat Khreich, W., Granger, E., Miri, A., & Sabourin, R. (2010). Iterative Boolean combination of classifiers in the roc space: An application to anomaly detection with hmms. Pattern Recognition, 43(8), 2732–2752.CrossRef Khreich, W., Granger, E., Miri, A., & Sabourin, R. (2010). Iterative Boolean combination of classifiers in the roc space: An application to anomaly detection with hmms. Pattern Recognition, 43(8), 2732–2752.CrossRef
6.
Zurück zum Zitat Tavallaee, M., Stakhanova, N., & Ghorbani, A. (2010). Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C, (Applications and Reviews), 40(5), 516–524.CrossRef Tavallaee, M., Stakhanova, N., & Ghorbani, A. (2010). Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C, (Applications and Reviews), 40(5), 516–524.CrossRef
7.
Zurück zum Zitat Liu, Y.-H., & Chen, Y.-T. (2005). Total margin-based adaptive fuzzy support vector machines for multiview face recognition. In Proceedings of the IEEE International Conference on Systems, Man, Cybernetics, Vol. 2, pp. 1704–1711. Liu, Y.-H., & Chen, Y.-T. (2005). Total margin-based adaptive fuzzy support vector machines for multiview face recognition. In Proceedings of the IEEE International Conference on Systems, Man, Cybernetics, Vol. 2, pp. 1704–1711.
8.
Zurück zum Zitat Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30, 195–215.CrossRef Kubat, M., Holte, R. C., & Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30, 195–215.CrossRef
9.
Zurück zum Zitat Galar, M., Fern´andez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. In IEEE Transaction on Systems, Man, and Cybernetics-Part C: Application and Review, IEEE. Galar, M., Fern´andez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. In IEEE Transaction on Systems, Man, and Cybernetics-Part C: Application and Review, IEEE.
10.
Zurück zum Zitat Nguyen, G. H., Bouzerdoum, A., & Phung, S. (2009). Learning pattern classification tasks with imbalanced data sets. In P. Yin (Ed.), Pattern recognition (pp. 193–208). Nguyen, G. H., Bouzerdoum, A., & Phung, S. (2009). Learning pattern classification tasks with imbalanced data sets. In P. Yin (Ed.), Pattern recognition (pp. 193–208).
11.
Zurück zum Zitat Liu, B., Ma, Y., & Wong, C. (2000). Improving an association rule based classifier. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Principles of data mining and knowledge discovery (Lecture Notes in Computer Science Series 1910) (pp. 293–317). Liu, B., Ma, Y., & Wong, C. (2000). Improving an association rule based classifier. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Principles of data mining and knowledge discovery (Lecture Notes in Computer Science Series 1910) (pp. 293–317).
12.
Zurück zum Zitat Lin, Y., Lee, Y., & Wahba, G. (2002). Support vector machines for classification in non standard situations. Machine Learning, 46, 191–202.CrossRef Lin, Y., Lee, Y., & Wahba, G. (2002). Support vector machines for classification in non standard situations. Machine Learning, 46, 191–202.CrossRef
13.
Zurück zum Zitat Barandela, R., Sanchez, J. S., Garcýa, V., & Rangel, E. (2003). Strategies for learning in class imbalance problems. Pattern Recognition, 36(3), 849–851.CrossRef Barandela, R., Sanchez, J. S., Garcýa, V., & Rangel, E. (2003). Strategies for learning in class imbalance problems. Pattern Recognition, 36(3), 849–851.CrossRef
14.
Zurück zum Zitat Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Conference on Machine Learning, pp. 217–225. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In Conference on Machine Learning, pp. 217–225.
15.
Zurück zum Zitat Ezawa, K.J., Singh, M., & Norton, S.W. (1996). Learning goal oriented Bayesian networks for telecommunications management. In Proceedings of the 13th International Conference on Machine Learning, pp. 139–147. Ezawa, K.J., Singh, M., & Norton, S.W. (1996). Learning goal oriented Bayesian networks for telecommunications management. In Proceedings of the 13th International Conference on Machine Learning, pp. 139–147.
16.
Zurück zum Zitat Kubat, M., Holte, R., & Matwin, S. (1998). Detection of oil-spills in radar images of sea surface. Machine Learning, 30, 195–215.CrossRef Kubat, M., Holte, R., & Matwin, S. (1998). Detection of oil-spills in radar images of sea surface. Machine Learning, 30, 195–215.CrossRef
17.
Zurück zum Zitat Barandela, R., Sánchez, J. S., García, V., & Rangel, E. (2003). Strategies for learning in class imbalance problems. Pattern Recognition, 36, 849–851.CrossRef Barandela, R., Sánchez, J. S., García, V., & Rangel, E. (2003). Strategies for learning in class imbalance problems. Pattern Recognition, 36, 849–851.CrossRef
18.
Zurück zum Zitat Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancingmachine learning training data. SIGKDD Explorations Newsletters, 6, 20–29.CrossRef Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancingmachine learning training data. SIGKDD Explorations Newsletters, 6, 20–29.CrossRef
19.
Zurück zum Zitat Stefanowski, J., & Wilk, S. (2008). Selective pre-processing of imbalanced data for improving classification performance. In I.-Y. Song, J. Eder, & T. Nguyen, (Eds.), Data Warehousing and Knowledge Discovery (Lecture Notes in Computer Science Series 5182), pp. 283–292. Stefanowski, J., & Wilk, S. (2008). Selective pre-processing of imbalanced data for improving classification performance. In I.-Y. Song, J. Eder, & T. Nguyen, (Eds.), Data Warehousing and Knowledge Discovery (Lecture Notes in Computer Science Series 5182), pp. 283–292.
20.
Zurück zum Zitat Zhang, S., Liu, L., Zhu, X., & Zhang, C. (2008). A strategy for attributes selection in cost sensitive decision trees induction. In Proceedings of the IEEE 8th International Conference on Computer and Information Technology Workshops, pp. 8–13. Zhang, S., Liu, L., Zhu, X., & Zhang, C. (2008). A strategy for attributes selection in cost sensitive decision trees induction. In Proceedings of the IEEE 8th International Conference on Computer and Information Technology Workshops, pp. 8–13.
21.
Zurück zum Zitat Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.CrossRef Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.CrossRef
22.
Zurück zum Zitat Hart, P. E. (1968). The condensed nearest neighbour rule. IEEE Transactions on Information Theory, 14(3), 515–516.CrossRef Hart, P. E. (1968). The condensed nearest neighbour rule. IEEE Transactions on Information Theory, 14(3), 515–516.CrossRef
23.
Zurück zum Zitat Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 3, 408–421.MathSciNetCrossRef Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, 3, 408–421.MathSciNetCrossRef
24.
Zurück zum Zitat Liu, X.-Y., Wu, J., & Zhou, Z.-H.: Exploratory undersampling for class imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics Part B, Application Review, 39(2), 539–550. Liu, X.-Y., Wu, J., & Zhou, Z.-H.: Exploratory undersampling for class imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics Part B, Application Review, 39(2), 539–550.
25.
Zurück zum Zitat Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proceedings of the Thirteenth International Conference.
26.
Zurück zum Zitat Cao, D. S., Xu, Q. S., Liang, Y.-Z., Zhang, L.-X., & Li, H.-D. (2010). The boosting: A new idea of building models. Chemometrics and Intelligent Laboratory Systems, 100, 1–11.CrossRef Cao, D. S., Xu, Q. S., Liang, Y.-Z., Zhang, L.-X., & Li, H.-D. (2010). The boosting: A new idea of building models. Chemometrics and Intelligent Laboratory Systems, 100, 1–11.CrossRef
27.
Zurück zum Zitat Bauer, E., & Kohavi, R. (1998). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. In Machine Learning, vv, 1, Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Bauer, E., & Kohavi, R. (1998). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. In Machine Learning, vv, 1, Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
28.
Zurück zum Zitat Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.MATH Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.MATH
29.
Zurück zum Zitat Webb, G. I. (2000). MultiBoosting: A technique for combining boosting and wagging. Machine Learning, 40, 159–196, Kluwer Academic Publishers, Boston. Webb, G. I. (2000). MultiBoosting: A technique for combining boosting and wagging. Machine Learning, 40, 159–196, Kluwer Academic Publishers, Boston.
30.
Zurück zum Zitat Chawla, N.V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the Knowledge Discovery Databases, pp. 107–119. Chawla, N.V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the Knowledge Discovery Databases, pp. 107–119.
31.
Zurück zum Zitat Seiffert, C., Khoshgoftaar, T., Van Hulse, J., & Napolitano, A. (2010). Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A, Systems, and Humans, 40(1), 185–197.CrossRef Seiffert, C., Khoshgoftaar, T., Van Hulse, J., & Napolitano, A. (2010). Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A, Systems, and Humans, 40(1), 185–197.CrossRef
32.
Zurück zum Zitat Krawczyka, B., Galar, M., Jelen, Ł., & Herrera, F. (2016). Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Applied Soft Computing, Elsevier, 38, 714–726.CrossRef Krawczyka, B., Galar, M., Jelen, Ł., & Herrera, F. (2016). Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Applied Soft Computing, Elsevier, 38, 714–726.CrossRef
33.
Zurück zum Zitat Mustafa, G., Niu, Z., Yousif, A., & Tarus, J. (2015). Solving the class imbalance problems using RUSMultiBoost ensemble. In 10th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, Aveiro, Portugal. Mustafa, G., Niu, Z., Yousif, A., & Tarus, J. (2015). Solving the class imbalance problems using RUSMultiBoost ensemble. In 10th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, Aveiro, Portugal.
Metadaten
Titel
A Review on Ensembles-Based Approach to Overcome Class Imbalance Problem
verfasst von
Sujit Kumar
J. N. Madhuri
Mausumi Goswami
Copyright-Jahr
2019
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-6001-5_12

Neuer Inhalt