Skip to main content
Erschienen in: Neural Computing and Applications 2/2011

01.03.2011 | Original Article

Imbalanced classification using support vector machine ensemble

verfasst von: Jiang Tian, Hong Gu, Wenqi Liu

Erschienen in: Neural Computing and Applications | Ausgabe 2/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Imbalanced data sets often have detrimental effects on the performance of a conventional support vector machine (SVM). To solve this problem, we adopt both strategies of modifying the data distribution and adjusting the classifier. Both minority and majority classes are resampled to increase the generalization ability. For minority class, an one-class support vector machine model combined with synthetic minority oversampling technique is used to oversample the support vector instances. For majority class, we propose a new method to decompose the majority class into clusters and remove two clusters using a distance measure to lessen the effect of outliers. The remaining clusters are used to build an SVM ensemble with the oversampled minority patterns, the SVM ensemble can achieve better performance by considering potentially suboptimal solutions. Experimental results on benchmark data sets are provided to illustrate the effectiveness of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6CrossRef Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6CrossRef
2.
Zurück zum Zitat Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6(1):7–19CrossRef Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6(1):7–19CrossRef
3.
Zurück zum Zitat Vapnik VN (2000) The nature of statistical learning theory. Springer, New YorkMATH Vapnik VN (2000) The nature of statistical learning theory. Springer, New YorkMATH
4.
Zurück zum Zitat Zhao XM, Li X, Chen L, Aihara K (2008) Protein classification with imbalanced data. Proteins 70(4):1125–1132CrossRef Zhao XM, Li X, Chen L, Aihara K (2008) Protein classification with imbalanced data. Proteins 70(4):1125–1132CrossRef
5.
Zurück zum Zitat Wu MR, Ye JP (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31(11):2088–2092CrossRefMathSciNet Wu MR, Ye JP (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31(11):2088–2092CrossRefMathSciNet
6.
Zurück zum Zitat Li X, Wang L, Sung E (2008) Adaboost with svm-based component classifiers. Eng Appl Artif Intell 21(5):785–795CrossRef Li X, Wang L, Sung E (2008) Adaboost with svm-based component classifiers. Eng Appl Artif Intell 21(5):785–795CrossRef
7.
Zurück zum Zitat Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77CrossRef Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77CrossRef
8.
Zurück zum Zitat Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378MATHCrossRef Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378MATHCrossRef
9.
Zurück zum Zitat Gardner AB, Krieger AM, Vachtsevanos G, Litt B (2006) One-class novelty detection for seizure analysis from intracranial eeg. J Mach Learn Res 7(7):1025–1044MathSciNet Gardner AB, Krieger AM, Vachtsevanos G, Litt B (2006) One-class novelty detection for seizure analysis from intracranial eeg. J Mach Learn Res 7(7):1025–1044MathSciNet
10.
Zurück zum Zitat Giacinto G, Perdisci R, Del Rio M, Roli F (2008) Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf Fusion 9(1):69–82CrossRef Giacinto G, Perdisci R, Del Rio M, Roli F (2008) Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf Fusion 9(1):69–82CrossRef
11.
Zurück zum Zitat Liao TW (2008) Classification of weld flaws with imbalanced class data. Expert Syst Appl 35(3):1041–1052CrossRef Liao TW (2008) Classification of weld flaws with imbalanced class data. Expert Syst Appl 35(3):1041–1052CrossRef
12.
Zurück zum Zitat Guo H, Viktor H (2004) Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. SIGKDD Explor 6(1):30–39CrossRef Guo H, Viktor H (2004) Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. SIGKDD Explor 6(1):30–39CrossRef
13.
Zurück zum Zitat Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727CrossRefMathSciNet Yen SJ, Lee YS (2009) Cluster-based under-sampling approaches for imbalanced data distributions. Expert Syst Appl 36(3):5718–5727CrossRefMathSciNet
14.
Zurück zum Zitat Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, Nashville, pp 179–186 Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, Nashville, pp 179–186
15.
Zurück zum Zitat Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286MATHCrossRef Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286MATHCrossRef
16.
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: Synthetic minority over-sampling technique. J Artif Intell Res 16(6):321–357MATH Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: Synthetic minority over-sampling technique. J Artif Intell Res 16(6):321–357MATH
17.
Zurück zum Zitat Liu Y, An A, Huang X (2006) Boosting prediction accuracy on imbalanced datasets with svm ensembles. In: 10th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, Singapore, pp 107–118 Liu Y, An A, Huang X (2006) Boosting prediction accuracy on imbalanced datasets with svm ensembles. In: 10th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, Singapore, pp 107–118
18.
Zurück zum Zitat Wang HY (2008) Combination approach of smote and biased-svm for imbalanced datasets. In: Proceedings of the international joint conference on neural networks. Institute of Electrical and Electronics Engineers Inc., Hong Kong, pp 228–231 Wang HY (2008) Combination approach of smote and biased-svm for imbalanced datasets. In: Proceedings of the international joint conference on neural networks. Institute of Electrical and Electronics Engineers Inc., Hong Kong, pp 228–231
19.
Zurück zum Zitat Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306CrossRef Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306CrossRef
20.
Zurück zum Zitat Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning. Springer, Pisa, pp 39–50 Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of the 15th European conference on machine learning. Springer, Pisa, pp 39–50
21.
Zurück zum Zitat Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5(5):975–1005MathSciNet Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5(5):975–1005MathSciNet
22.
Zurück zum Zitat Scholkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRef Scholkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRef
25.
Zurück zum Zitat Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publisers, San FransiscoMATH Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publisers, San FransiscoMATH
26.
Zurück zum Zitat Kim HC, Pang S, Je HM, Kim D, Bang SY (2003) Constructing support vector machine ensemble. Pattern Recognit 36(12):2757–2767MATHCrossRef Kim HC, Pang S, Je HM, Kim D, Bang SY (2003) Constructing support vector machine ensemble. Pattern Recognit 36(12):2757–2767MATHCrossRef
28.
Zurück zum Zitat Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kauffman, San Francisco, pp 148–156 Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kauffman, San Francisco, pp 148–156
29.
Zurück zum Zitat Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196CrossRef Webb GI (2000) Multiboosting: a technique for combining boosting and wagging. Mach Learn 40(2):159–196CrossRef
Metadaten
Titel
Imbalanced classification using support vector machine ensemble
verfasst von
Jiang Tian
Hong Gu
Wenqi Liu
Publikationsdatum
01.03.2011
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 2/2011
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-010-0349-9

Weitere Artikel der Ausgabe 2/2011

Neural Computing and Applications 2/2011 Zur Ausgabe

Premium Partner