Skip to main content
Top

Hint

Swipe to navigate through the chapters of this book

2020 | OriginalPaper | Chapter

Clustering and Weighted Scoring in Geometric Space Support Vector Machine Ensemble for Highly Imbalanced Data Classification

Authors : Paweł Ksieniewicz, Robert Burduk

Published in: Computational Science – ICCS 2020

Publisher: Springer International Publishing

share
SHARE

Abstract

Learning from imbalanced datasets is a challenging task for standard classification algorithms. In general, there are two main approaches to solve the problem of imbalanced data: algorithm-level and data-level solutions. This paper deals with the second approach. In particular, this paper shows a new proposition for calculating the weighted score function to use in the integration phase of the multiple classification system. The presented research includes experimental evaluation over multiple, open-source, highly imbalanced datasets, presenting the results of comparing the proposed algorithm with three other approaches in the context of six performance measures. Comprehensive experimental results show that the proposed algorithm has better performance measures than the other ensemble methods for highly imbalanced datasets.

To get access to this content you need the following product:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 69.000 Bücher
  • über 500 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 50.000 Bücher
  • über 380 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe



 


Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 58.000 Bücher
  • über 300 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko





Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Literature
1.
go back to reference Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system: a survey. J. Netw. Comput. Appl. 68, 90–113 (2016) CrossRef Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system: a survey. J. Netw. Comput. Appl. 68, 90–113 (2016) CrossRef
2.
go back to reference Abdulhammed, R., Faezipour, M., Abuzneid, A., AbuMallouh, A.: Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network traffic. IEEE Sens. Lett. 3(1), 1–4 (2018) CrossRef Abdulhammed, R., Faezipour, M., Abuzneid, A., AbuMallouh, A.: Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network traffic. IEEE Sens. Lett. 3(1), 1–4 (2018) CrossRef
3.
go back to reference Alcalá-Fdez, J., et al.: Kee data-mining sotware tool: dat set repository, integration of algrithms and experimental nalysis framewor. J. Multiple-Valued Logic Soft Comput. 17, 255–287 (2011) Alcalá-Fdez, J., et al.: Kee data-mining sotware tool: dat set repository, integration of algrithms and experimental nalysis framewor. J. Multiple-Valued Logic Soft Comput. 17, 255–287 (2011)
4.
go back to reference Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2014) MATH Alpaydin, E.: Introduction to Machine Learning. MIT Press, Cambridge (2014) MATH
5.
go back to reference Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proceedings of 19th International Conference on Machine Learning, ICML 2002. Citeseer (2002) Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proceedings of 19th International Conference on Machine Learning, ICML 2002. Citeseer (2002)
6.
go back to reference Burduk, R.: Classifier fusion with interval-valued weights. Pattern Recogn. Lett. 34(14), 1623–1629 (2013) Burduk, R.: Classifier fusion with interval-valued weights. Pattern Recogn. Lett. 34(14), 1623–1629 (2013)
7.
go back to reference Cao, X., Wu, C., Yan, P., Li, X.: Linear SVM classification using boosting hog features for vehicle detection in low-altitude airborne videos. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 2421–2424. IEEE (2011) Cao, X., Wu, C., Yan, P., Li, X.: Linear SVM classification using boosting hog features for vehicle detection in low-altitude airborne videos. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 2421–2424. IEEE (2011)
9.
go back to reference Fotouhi, S., Asadi, S., Kattan, M.W.: A comprehensive data level analysis for cancer diagnosis on imbalanced data. J. Biomed. Inform. 90, 103089 (2019) CrossRef Fotouhi, S., Asadi, S., Kattan, M.W.: A comprehensive data level analysis for cancer diagnosis on imbalanced data. J. Biomed. Inform. 90, 103089 (2019) CrossRef
10.
go back to reference Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2011) Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2011)
11.
go back to reference Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017) CrossRef Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017) CrossRef
12.
go back to reference Hajdu, A., Hajdu, L., Jonas, A., Kovacs, L., Toman, H.: Generalizing the majority voting scheme to spatially constrained voting. IEEE Trans. Image Process. 22(11), 4182–4194 (2013) MathSciNetCrossRef Hajdu, A., Hajdu, L., Jonas, A., Kovacs, L., Toman, H.: Generalizing the majority voting scheme to spatially constrained voting. IEEE Trans. Image Process. 22(11), 4182–4194 (2013) MathSciNetCrossRef
14.
go back to reference Kozik, R., Choras, M., Keller, J.: Balanced efficient lifelong learning (B-ELLA) for cyber attack detection. J. UCS 25(1), 2–15 (2019) MathSciNet Kozik, R., Choras, M., Keller, J.: Balanced efficient lifelong learning (B-ELLA) for cyber attack detection. J. UCS 25(1), 2–15 (2019) MathSciNet
16.
go back to reference Krawczyk, B., Woźniak, M., Schaefer, G.: Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. 14, 554–562 (2014) CrossRef Krawczyk, B., Woźniak, M., Schaefer, G.: Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. 14, 554–562 (2014) CrossRef
17.
go back to reference Ksieniewicz, P., Zyblewski, P.: Stream-learn-open-source python library for difficult data stream batch analysis. arXiv preprint arXiv:​2001.​11077 (2020) Ksieniewicz, P., Zyblewski, P.: Stream-learn-open-source python library for difficult data stream batch analysis. arXiv preprint arXiv:​2001.​11077 (2020)
18.
go back to reference Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004) CrossRef Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004) CrossRef
19.
go back to reference Mao, S., Jiao, L., Xiong, L., Gou, S., Chen, B., Yeung, S.K.: Weighted classifier ensemble based on quadratic form. Pattern Recogn. 48(5), 1688–1706 (2015) CrossRef Mao, S., Jiao, L., Xiong, L., Gou, S., Chen, B., Yeung, S.K.: Weighted classifier ensemble based on quadratic form. Pattern Recogn. 48(5), 1688–1706 (2015) CrossRef
20.
go back to reference Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) MathSciNetMATH Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) MathSciNetMATH
21.
go back to reference Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999) Platt, J.C.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press (1999)
23.
go back to reference Rokach, L.: Pattern Classification Using Ensemble Methodsd, vol. 75. World Scientific, Singapore (2010) MATH Rokach, L.: Pattern Classification Using Ensemble Methodsd, vol. 75. World Scientific, Singapore (2010) MATH
24.
go back to reference Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005) CrossRef Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005) CrossRef
25.
go back to reference Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(04), 687–719 (2009) CrossRef Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(04), 687–719 (2009) CrossRef
28.
go back to reference Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014) CrossRef Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014) CrossRef
29.
go back to reference Zhang, C., et al.: Multi-imbalance: an open-source software for multi-class imbalance learning. Knowl.-Based Syst. 174, 137–143 (2019) CrossRef Zhang, C., et al.: Multi-imbalance: an open-source software for multi-class imbalance learning. Knowl.-Based Syst. 174, 137–143 (2019) CrossRef
Metadata
Title
Clustering and Weighted Scoring in Geometric Space Support Vector Machine Ensemble for Highly Imbalanced Data Classification
Authors
Paweł Ksieniewicz
Robert Burduk
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-50423-6_10