Skip to main content
Erschienen in:
Buchtitelbild

2018 | OriginalPaper | Buchkapitel

Benchmarking Swarm Rebalancing Algorithm for Relieving Imbalanced Machine Learning Problems

verfasst von : Jinyan Li, Simon Fong

Erschienen in: Behavior Engineering and Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Imbalanced classification is a well-known NP-hard problem in data mining. Since there are more data from the majority classes than the minorities in imbalanced dataset, the resultant classifier would become over-fitted to the former and under-fitted to the latter. Previous solutions focus on increasing the learning sensitivity to the minorities and/or rebalancing sample sizes before learning. Using swarm intelligence algorithm, we propose a series of unified pre-processing approaches to address imbalanced classification problem. These methods used stochastic swarm heuristics to cooperatively optimize and fuse the distribution of an imbalanced training dataset. Foremost, as shown in our published paper, this series of algorithms indeed have an edge in relieving imbalanced problem. In this book chapter we take an in-depth and thorough evaluation of the performances of the contemporary swarm rebalancing algorithms. Through the experimental results, we observe that the proposed algorithms overcome the current 17 comparative algorithms. Though some are better than the others, in general these algorithm exhibit superior computational speed, high accuracy and acceptable reliability of classification model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Brown, I. and C. Mues, An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 2012. 39(3): p. 3446–3453.CrossRef Brown, I. and C. Mues, An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 2012. 39(3): p. 3446–3453.CrossRef
2.
Zurück zum Zitat Amin, A., et al., Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study. IEEE Access, 2016. 4: p. 7940–7957.CrossRef Amin, A., et al., Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study. IEEE Access, 2016. 4: p. 7940–7957.CrossRef
3.
Zurück zum Zitat Li, J., et al., Solving the under-fitting problem for decision tree algorithms by incremental swarm optimization in rare-event healthcare classification. Journal of Medical Imaging and Health Informatics, 2016. 6(4): p. 1102–1110.CrossRef Li, J., et al., Solving the under-fitting problem for decision tree algorithms by incremental swarm optimization in rare-event healthcare classification. Journal of Medical Imaging and Health Informatics, 2016. 6(4): p. 1102–1110.CrossRef
4.
Zurück zum Zitat Sun, A., E.-P. Lim, and Y. Liu, On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems, 2009. 48(1): p. 191–201.CrossRef Sun, A., E.-P. Lim, and Y. Liu, On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems, 2009. 48(1): p. 191–201.CrossRef
5.
Zurück zum Zitat Kubat, M., R.C. Holte, and S. Matwin, Machine learning for the detection of oil spills in satellite radar images. Machine learning, 1998. 30(2–3): p. 195–215.CrossRef Kubat, M., R.C. Holte, and S. Matwin, Machine learning for the detection of oil spills in satellite radar images. Machine learning, 1998. 30(2–3): p. 195–215.CrossRef
6.
Zurück zum Zitat Jinyan, L., F. Simon, and Y. Xin-She, Solving imbalanced dataset problems for high-dimensional image processing by swarm optimization, in Bio-Inspired Computation and Applications in Image Processing. 2016, ELSEVIER. p. 311–321. Jinyan, L., F. Simon, and Y. Xin-She, Solving imbalanced dataset problems for high-dimensional image processing by swarm optimization, in Bio-Inspired Computation and Applications in Image Processing. 2016, ELSEVIER. p. 311–321.
7.
Zurück zum Zitat Li, J., et al., Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. The Journal of Supercomputing, 2016. 72(10): p. 3708–3728.CrossRef Li, J., et al., Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms. The Journal of Supercomputing, 2016. 72(10): p. 3708–3728.CrossRef
8.
Zurück zum Zitat Quinlan, J.R. Bagging, boosting, and C4. 5. in AAAI/IAAI, Vol. 1. 1996. Quinlan, J.R. Bagging, boosting, and C4. 5. in AAAI/IAAI, Vol. 1. 1996.
9.
Zurück zum Zitat Fan, W., et al. AdaCost: misclassification cost-sensitive boosting. in Icml. 1999. Fan, W., et al. AdaCost: misclassification cost-sensitive boosting. in Icml. 1999.
10.
Zurück zum Zitat Seiffert, C., et al., RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2010. 40(1): p. 185–197.CrossRef Seiffert, C., et al., RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 2010. 40(1): p. 185–197.CrossRef
11.
Zurück zum Zitat Chen, C., A. Liaw, and L. Breiman, Using random forest to learn imbalanced data. University of California, Berkeley, 2004. 110. Chen, C., A. Liaw, and L. Breiman, Using random forest to learn imbalanced data. University of California, Berkeley, 2004. 110.
12.
Zurück zum Zitat Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321–357. Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321–357.
13.
Zurück zum Zitat Li, J., S. Fong, and Y. Zhuang. Optimizing SMOTE by metaheuristics with neural network and decision tree. in Computational and Business Intelligence (ISCBI), 2015 3rd International Symposium on. 2015. IEEE. Li, J., S. Fong, and Y. Zhuang. Optimizing SMOTE by metaheuristics with neural network and decision tree. in Computational and Business Intelligence (ISCBI), 2015 3rd International Symposium on. 2015. IEEE.
14.
Zurück zum Zitat Hu, S., et al. MSMOTE: improving classification performance when training data is imbalanced. in Computer Science and Engineering, 2009. WCSE'09. Second International Workshop on. 2009. IEEE. Hu, S., et al. MSMOTE: improving classification performance when training data is imbalanced. in Computer Science and Engineering, 2009. WCSE'09. Second International Workshop on. 2009. IEEE.
15.
Zurück zum Zitat Chawla, N.V., et al. SMOTEBoost: Improving prediction of the minority class in boosting. in European Conference on Principles of Data Mining and Knowledge Discovery. 2003. Springer.CrossRef Chawla, N.V., et al. SMOTEBoost: Improving prediction of the minority class in boosting. in European Conference on Principles of Data Mining and Knowledge Discovery. 2003. Springer.CrossRef
16.
Zurück zum Zitat Kotsiantis, S., D. Kanellopoulos, and P. Pintelas, Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 2006. 30(1): p. 25–36. Kotsiantis, S., D. Kanellopoulos, and P. Pintelas, Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 2006. 30(1): p. 25–36.
17.
Zurück zum Zitat Tomek, I., An experiment with the edited nearest-neighbor rule.IEEE Transactions on systems, Man, and Cybernetics, 1976(6): p. 448–452. Tomek, I., An experiment with the edited nearest-neighbor rule.IEEE Transactions on systems, Man, and Cybernetics, 1976(6): p. 448–452.
18.
Zurück zum Zitat Bekkar, M. and T.A. Alitouche, Imbalanced data learning approaches review.International Journal of Data Mining & Knowledge Management Process, 2013. 3(4): p. 15.CrossRef Bekkar, M. and T.A. Alitouche, Imbalanced data learning approaches review.International Journal of Data Mining & Knowledge Management Process, 2013. 3(4): p. 15.CrossRef
19.
Zurück zum Zitat He, H. and E.A. Garcia, Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 2009. 21(9): p. 1263–1284.CrossRef He, H. and E.A. Garcia, Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 2009. 21(9): p. 1263–1284.CrossRef
20.
Zurück zum Zitat Tang, Y., et al., SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009. 39(1): p. 281–288.CrossRef Tang, Y., et al., SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009. 39(1): p. 281–288.CrossRef
21.
Zurück zum Zitat Li, J., et al., Adaptive multi-objective swarm fusion for imbalanced data classification. Information Fusion, 2018. 39: p. 1–24.CrossRef Li, J., et al., Adaptive multi-objective swarm fusion for imbalanced data classification. Information Fusion, 2018. 39: p. 1–24.CrossRef
22.
Zurück zum Zitat Nikolaou, N., et al., Cost-sensitive boosting algorithms: Do we really need them? Machine Learning, 2016. 104(2–3): p. 359–384.CrossRef Nikolaou, N., et al., Cost-sensitive boosting algorithms: Do we really need them? Machine Learning, 2016. 104(2–3): p. 359–384.CrossRef
23.
Zurück zum Zitat Li, J., et al. Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification. in Advanced Data Mining and Applications: 12th International Conference, ADMA 2016, Gold Coast, QLD, Australia, December 12-15, 2016, Proceedings 12. 2016. Springer.CrossRef Li, J., et al. Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification. in Advanced Data Mining and Applications: 12th International Conference, ADMA 2016, Gold Coast, QLD, Australia, December 12-15, 2016, Proceedings 12. 2016. Springer.CrossRef
24.
Zurück zum Zitat Viera, A.J. and J.M. Garrett, Understanding interobserver agreement: the kappa statistic. Fam Med, 2005. 37(5): p. 360–363. Viera, A.J. and J.M. Garrett, Understanding interobserver agreement: the kappa statistic. Fam Med, 2005. 37(5): p. 360–363.
25.
Zurück zum Zitat Chen, Y.-W. and C.-J. Lin, Combining SVMs with various feature selection strategies, in Feature extraction. 2006, Springer. p. 315–324. Chen, Y.-W. and C.-J. Lin, Combining SVMs with various feature selection strategies, in Feature extraction. 2006, Springer. p. 315–324.
26.
Zurück zum Zitat Stone, E.A., Predictor performance with stratified data and imbalanced classes. Nature methods, 2014. 11(8): p. 782.CrossRef Stone, E.A., Predictor performance with stratified data and imbalanced classes. Nature methods, 2014. 11(8): p. 782.CrossRef
27.
Zurück zum Zitat Tan, S., Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 2005. 28(4): p. 667–671.CrossRef Tan, S., Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 2005. 28(4): p. 667–671.CrossRef
28.
Zurück zum Zitat Maratea, A., A. Petrosino, and M. Manzo, Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 2014. 257: p. 331–341.CrossRef Maratea, A., A. Petrosino, and M. Manzo, Adjusted F-measure and kernel scaling for imbalanced data learning. Information Sciences, 2014. 257: p. 331–341.CrossRef
29.
Zurück zum Zitat Chawla, N.V. C4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. in Proceedings of the ICML. 2003. Chawla, N.V. C4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. in Proceedings of the ICML. 2003.
30.
Zurück zum Zitat Poli, R., J. Kennedy, and T. Blackwell, Particle swarm optimization. Swarm intelligence, 2007. 1(1): p. 33–57.CrossRef Poli, R., J. Kennedy, and T. Blackwell, Particle swarm optimization. Swarm intelligence, 2007. 1(1): p. 33–57.CrossRef
31.
Zurück zum Zitat Kohavi, R. and G.H. John, Wrappers for feature subset selection. Artificial intelligence, 1997. 97(1–2): p. 273–324.CrossRef Kohavi, R. and G.H. John, Wrappers for feature subset selection. Artificial intelligence, 1997. 97(1–2): p. 273–324.CrossRef
32.
Zurück zum Zitat Fonseca, C.M. and P.J. Fleming, Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 1998. 28(1): p. 26–37.CrossRef Fonseca, C.M. and P.J. Fleming, Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 1998. 28(1): p. 26–37.CrossRef
33.
Zurück zum Zitat Li, X. and S. Ma, Multi-objective memetic search algorithm for multi-objective permutation flow shop scheduling problem. IEEE Access, 2016. 4: p. 2154–2165.CrossRef Li, X. and S. Ma, Multi-objective memetic search algorithm for multi-objective permutation flow shop scheduling problem. IEEE Access, 2016. 4: p. 2154–2165.CrossRef
34.
Zurück zum Zitat Landis, J.R. and G.G. Koch, The measurement of observer agreement for categorical data. biometrics, 1977: p. 159–174.CrossRef Landis, J.R. and G.G. Koch, The measurement of observer agreement for categorical data. biometrics, 1977: p. 159–174.CrossRef
35.
Zurück zum Zitat Fong, S., et al., Feature selection in life science classification: metaheuristic swarm search. IT Professional, 2014. 16(4): p. 24–29.CrossRef Fong, S., et al., Feature selection in life science classification: metaheuristic swarm search. IT Professional, 2014. 16(4): p. 24–29.CrossRef
36.
Zurück zum Zitat Li, J., et al., Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification. BioData Mining, 2016. 9(1): p. 37.CrossRef Li, J., et al., Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification. BioData Mining, 2016. 9(1): p. 37.CrossRef
37.
Zurück zum Zitat Blumer, A., et al., Occam's razor. Information processing letters, 1987. 24(6): p. 377–380.CrossRef Blumer, A., et al., Occam's razor. Information processing letters, 1987. 24(6): p. 377–380.CrossRef
38.
Zurück zum Zitat Bifet, A., et al., Moa: Massive online analysis. Journal of Machine Learning Research, 2010. 11(May): p. 1601–1604. Bifet, A., et al., Moa: Massive online analysis. Journal of Machine Learning Research, 2010. 11(May): p. 1601–1604.
39.
Zurück zum Zitat He, H., et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on. 2008. IEEE. He, H., et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on. 2008. IEEE.
40.
Zurück zum Zitat Liu, X.-Y., J. Wu, and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009. 39(2): p. 539–550.CrossRef Liu, X.-Y., J. Wu, and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2009. 39(2): p. 539–550.CrossRef
Metadaten
Titel
Benchmarking Swarm Rebalancing Algorithm for Relieving Imbalanced Machine Learning Problems
verfasst von
Jinyan Li
Simon Fong
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-76430-6_1