Skip to main content

2018 | OriginalPaper | Buchkapitel

Oversample Based Large Scale Support Vector Machine for Online Class Imbalance Problem

verfasst von : D. Himaja, T. Maruthi Padmaja, P. Radha Krishna

Erschienen in: Big Data Analytics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Dealing with online class imbalance from evolving stream is a critical issue than the conventional class imbalance problem. Usually, the class imbalance problem occurs when one class of data severely outnumbers the other classes of data, thus leads to skewed class boundaries. In the case of online class imbalance problem, the degree of class imbalance changes over time and the present state of imbalance is not known a prior to the learner. To address such problem, in this paper, we present an Oversampling based Online Large Scale Support Vector Machine (OOLASVM) algorithm which is a hybrid of active sample selection and over sampling of Support Vectors and thereby both oversampling and under sampling coexists while learning the new boundary. Further, OOLASVM maintains the balanced boundary throughout the learning process. Results on simulated and real world datasets demonstrate that proposed OOLASVM yields better performance than existing approaches such as Generalized Oversampling based Online Imbalanced Learners and Over Online Bagging.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRef
2.
Zurück zum Zitat Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)CrossRef Sun, Y., Wong, A., Kamel, M.: Classification of imbalanced data. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)CrossRef
3.
Zurück zum Zitat Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)CrossRef Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)CrossRef
4.
Zurück zum Zitat Wang, S., Minku, L.L., Yao, X.: Systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 1–20 (2018)CrossRef Wang, S., Minku, L.L., Yao, X.: Systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 1–20 (2018)CrossRef
5.
Zurück zum Zitat Nathalie, J., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)CrossRef Nathalie, J., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)CrossRef
6.
Zurück zum Zitat Wu, G., Chang, E.: Class-boundary alignment for imbalanced dataset Learning. In: ICML 2003, Santa Barbara, California (2003) Wu, G., Chang, E.: Class-boundary alignment for imbalanced dataset Learning. In: ICML 2003, Santa Barbara, California (2003)
7.
Zurück zum Zitat Bordes, A., Ertekin, S., Weston, J., Bottou, L.: Fast kernel classifiers with online and active learning. J. Mach. Learn. Res. 6, 1579–1619 (2005)MathSciNetMATH Bordes, A., Ertekin, S., Weston, J., Bottou, L.: Fast kernel classifiers with online and active learning. J. Mach. Learn. Res. 6, 1579–1619 (2005)MathSciNetMATH
8.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef
9.
Zurück zum Zitat Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach-a case study in intensive care monitoring. In: ICML (1999) Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach-a case study in intensive care monitoring. In: ICML (1999)
11.
Zurück zum Zitat Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 275, 1356–1368 (2014) Wang, S., Minku, L.L., Yao, X.: Resampling-based ensemble methods for online class imbalance learning. IEEE Trans. Knowl. Data Eng. 275, 1356–1368 (2014)
12.
Zurück zum Zitat Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Recursive least square perceptron model for non-stationary and imbalanced data streams classification. Evol. Syst. 42, 119–131 (2013)CrossRef Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Recursive least square perceptron model for non-stationary and imbalanced data streams classification. Evol. Syst. 42, 119–131 (2013)CrossRef
13.
Zurück zum Zitat Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Online neural network model for non-stationary and imbalanced data stream. Int. J. Mach. Learn. Cybern. 51, 51–62 (2013) Ghazikhani, A., Monsefi, R., Yazdi, H.S.: Online neural network model for non-stationary and imbalanced data stream. Int. J. Mach. Learn. Cybern. 51, 51–62 (2013)
15.
Zurück zum Zitat Yan, Y., Yang, T., Chen, J.: A framework of online learning with imbalanced streaming data. In: Sing, S.P., Markovitch, S. (eds.) Conference on Artificial Intelligence 2017, San Francisco, pp. 2817–2823. AAAI Press (2017) Yan, Y., Yang, T., Chen, J.: A framework of online learning with imbalanced streaming data. In: Sing, S.P., Markovitch, S. (eds.) Conference on Artificial Intelligence 2017, San Francisco, pp. 2817–2823. AAAI Press (2017)
16.
Zurück zum Zitat Tang, Y., Zhang, Q.-Y., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B 39(1), 281–288 (2009)CrossRef Tang, Y., Zhang, Q.-Y., Chawla, N.V., Krasser, S.: SVMs modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B 39(1), 281–288 (2009)CrossRef
17.
Zurück zum Zitat Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active Learning with support vector machines. Wires Data Min. Knowl. Discov. 4(4), 313–326 (2014)CrossRef Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active Learning with support vector machines. Wires Data Min. Knowl. Discov. 4(4), 313–326 (2014)CrossRef
18.
Zurück zum Zitat Calma, A., Reitmaier, T., Sick, B.: Semi-supervised active learning for support vector machines: a novel approach that exploits structure information in data. Inf. Sci. 456, 3–33 (2018)MathSciNetCrossRef Calma, A., Reitmaier, T., Sick, B.: Semi-supervised active learning for support vector machines: a novel approach that exploits structure information in data. Inf. Sci. 456, 3–33 (2018)MathSciNetCrossRef
19.
Zurück zum Zitat Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 203, 273–297 (1995)MATH Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 203, 273–297 (1995)MATH
20.
Zurück zum Zitat Platt, J.C.: Sequential minimal optimization: a fast algorithm for training support vector machines. A technical report MSR-TR-98-14 (1998) Platt, J.C.: Sequential minimal optimization: a fast algorithm for training support vector machines. A technical report MSR-TR-98-14 (1998)
21.
Zurück zum Zitat Ertekin, S., Huang, J., Lee Giles, C.: Active learning class imbalance Problem. In: Kraaij, W., de Vries, AP., Clarke, L.A.C., Fuhr, N., Kando, N. (eds.) Conference on Research and Development in Information Retrieval 2007, Netherlands, pp. 823–824 (2007). https://doi.org/10.1145/1277741 Ertekin, S., Huang, J., Lee Giles, C.: Active learning class imbalance Problem. In: Kraaij, W., de Vries, AP., Clarke, L.A.C., Fuhr, N., Kando, N. (eds.) Conference on Research and Development in Information Retrieval 2007, Netherlands, pp. 823–824 (2007). https://​doi.​org/​10.​1145/​1277741
22.
23.
Zurück zum Zitat Minku, L., White, A., Yao, X.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 225, 730–742 (2010)CrossRef Minku, L., White, A., Yao, X.: The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 225, 730–742 (2010)CrossRef
25.
Zurück zum Zitat Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 262, 405–425 (2014)CrossRef Barua, S., Islam, M.M., Yao, X., Murase, K.: MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 262, 405–425 (2014)CrossRef
Metadaten
Titel
Oversample Based Large Scale Support Vector Machine for Online Class Imbalance Problem
verfasst von
D. Himaja
T. Maruthi Padmaja
P. Radha Krishna
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-04780-1_24