Skip to main content
Top
Published in: Neural Computing and Applications 10/2019

15-03-2018 | Original Article

Equation of SVM-rebalancing: the point-normal form of a plane for class imbalance problem

Authors: Che-Chang Hsu, Kuo-Shong Wang, Hung-Yuan Chung, Shih-Hsing Chang

Published in: Neural Computing and Applications | Issue 10/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The task of detecting a rare but important class has extensively been studied in the machine learning community. It is commonly agreed that traditional classifiers are certainly limited to imbalanced datasets and do not perform well. A number of solutions to the problem were proposed at both data and algorithmic levels. We propose the point-normal form of a plane, namely SVM-rebalancing, to be based on the second type. In this learning process, the assumption of pseudo-prior probabilities provides a rebalanced recipe for countering the imbalance inspired by Bayesian decision theory. Thus, we set a rebalancing programming problem by incorporating a rebalanced heuristics into the fitting of model to raise the class separability. In addition, various measures are used to characterize the performance of classifiers. Compared with several popular decision tree splitting criteria and cost-sensitive learning, the proposed method gives comparable separability with minority class to avoid heavy biasing of the majority class.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, pp 179–186 Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the 14th international conference on machine learning, pp 179–186
2.
go back to reference Japkowicz N (ed) (2000) Proceeding of the AAAI’2000 workshop on learning from imbalanced data sets. Technical Report WS-00-05. AAAI Press, Menlo Park, CA Japkowicz N (ed) (2000) Proceeding of the AAAI’2000 workshop on learning from imbalanced data sets. Technical Report WS-00-05. AAAI Press, Menlo Park, CA
4.
go back to reference Weiss G (2004) Mining with rarity: a unifying framework. SIGKDD Explorations 6(1):7–19CrossRef Weiss G (2004) Mining with rarity: a unifying framework. SIGKDD Explorations 6(1):7–19CrossRef
5.
go back to reference Prati RC, Batista GEAPA, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI, pp 312–321 Prati RC, Batista GEAPA, Monard MC (2004) Class imbalances versus class overlapping: an analysis of a learning system behavior. In: MICAI, pp 312–321
6.
go back to reference Visa S, Ralescu A (2005) Issues in mining imbalanced data sets—a review paper. In: Proceeding of the sixteen midwest artificial intelligence and cognitive science conference, Dayton, Ohio, USA, pp 67–73 Visa S, Ralescu A (2005) Issues in mining imbalanced data sets—a review paper. In: Proceeding of the sixteen midwest artificial intelligence and cognitive science conference, Dayton, Ohio, USA, pp 67–73
7.
go back to reference Chawla NV, Japcowicz N, Kolcz A (2004) Editorial: special issue on learning from imbalanced datasets. SIGKDD Explorations 6(1):1–6CrossRef Chawla NV, Japcowicz N, Kolcz A (2004) Editorial: special issue on learning from imbalanced datasets. SIGKDD Explorations 6(1):1–6CrossRef
8.
go back to reference Vapnik VN (1995) The nature of statistical learning theory. Springer, BerlinCrossRef Vapnik VN (1995) The nature of statistical learning theory. Springer, BerlinCrossRef
9.
go back to reference Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Networks 10:988–999CrossRef Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Networks 10:988–999CrossRef
10.
go back to reference Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of 15th ECML, pp 39–50 Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced datasets. In: Proceedings of 15th ECML, pp 39–50
11.
go back to reference Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef
12.
go back to reference Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on artificial intelligence, pp 55–60 Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on artificial intelligence, pp 55–60
13.
go back to reference Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B 39(1):281–288CrossRef Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B 39(1):281–288CrossRef
14.
go back to reference Tomek I (1976) Two modifications of CNN. IEEE Transactions on Systems Man and Communications 6:769–772MathSciNetMATH Tomek I (1976) Two modifications of CNN. IEEE Transactions on Systems Man and Communications 6:769–772MathSciNetMATH
15.
go back to reference Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Transaction on Knowledge and Data Engineering 17(6):786–795CrossRef Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Transaction on Knowledge and Data Engineering 17(6):786–795CrossRef
16.
go back to reference Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkMATH Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkMATH
17.
go back to reference Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46:21–52CrossRef Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46:21–52CrossRef
18.
go back to reference Ghosal S, Roy A (2006) Posterior consistency of gaussian process prior for nonparametric binary regression. Ann. Statist 34(5):2413–2429MathSciNetCrossRef Ghosal S, Roy A (2006) Posterior consistency of gaussian process prior for nonparametric binary regression. Ann. Statist 34(5):2413–2429MathSciNetCrossRef
19.
go back to reference Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, CambridgeMATH Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, CambridgeMATH
20.
go back to reference Chung HY, Ho CH (2009) Design of Bayesian-based knowledge extraction for SVMs in unbalanced classifications. Department of Electrical Engineering, National Central University, Jhongli, Taiwan, ROC Chung HY, Ho CH (2009) Design of Bayesian-based knowledge extraction for SVMs in unbalanced classifications. Department of Electrical Engineering, National Central University, Jhongli, Taiwan, ROC
21.
go back to reference Hsu CC, Wang KS, Chang SH (2011) Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization. Expert Syst Appl 38(5):4698–4704CrossRef Hsu CC, Wang KS, Chang SH (2011) Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization. Expert Syst Appl 38(5):4698–4704CrossRef
22.
go back to reference Chung HY, Ho CH, Hsu CC (2011) Support vector machines using Bayesian-based approach in the issue of unbalanced classifications. Expert Syst Appl 38(9):11447–11452CrossRef Chung HY, Ho CH, Hsu CC (2011) Support vector machines using Bayesian-based approach in the issue of unbalanced classifications. Expert Syst Appl 38(9):11447–11452CrossRef
23.
go back to reference Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth, LondonMATH Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth, LondonMATH
24.
go back to reference Buckland M, Gey F (1994) The relationship between Recall and Precision. Journal of American Society for Information Science 45(1):12–19CrossRef Buckland M, Gey F (1994) The relationship between Recall and Precision. Journal of American Society for Information Science 45(1):12–19CrossRef
25.
go back to reference Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159CrossRef Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159CrossRef
26.
go back to reference Lin Y, Yoonkyun L, Grace W (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46(3):191–202CrossRef Lin Y, Yoonkyun L, Grace W (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46(3):191–202CrossRef
27.
go back to reference Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: European conference on principles and practice of knowledge discovery in databases, pp 241–256 Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: European conference on principles and practice of knowledge discovery in databases, pp 241–256
28.
go back to reference Vilariño F, Spyridonos P, Vitrià J, Radeva P (2005) Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions. In: Proceedings of 3rd ICAPR, pp 783–791 Vilariño F, Spyridonos P, Vitrià J, Radeva P (2005) Experiments with SVM and stratified sampling with an imbalanced problem: detection of intestinal contractions. In: Proceedings of 3rd ICAPR, pp 783–791
29.
go back to reference Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell. Data Anal. 6(5):429–449CrossRef Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell. Data Anal. 6(5):429–449CrossRef
30.
go back to reference Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, HobokenMATH Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, HobokenMATH
31.
go back to reference Breiman L (1996) Bias, variance and arcing classifiers. Technical Report 460. Statistics Department, University of California at Berkeley, Berkeley, CA Breiman L (1996) Bias, variance and arcing classifiers. Technical Report 460. Statistics Department, University of California at Berkeley, Berkeley, CA
34.
go back to reference Hastie T, Tibshirani R, Friendman J (2001) The elements of statistical learning: data mining, inference and prediction. Springer, Berlin, pp 214–217CrossRef Hastie T, Tibshirani R, Friendman J (2001) The elements of statistical learning: data mining, inference and prediction. Springer, Berlin, pp 214–217CrossRef
35.
go back to reference Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern B Cybern 39(1):281–288CrossRef Tang Y, Zhang Y-Q, Chawla NV, Krasser S (2009) SVMs modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern B Cybern 39(1):281–288CrossRef
Metadata
Title
Equation of SVM-rebalancing: the point-normal form of a plane for class imbalance problem
Authors
Che-Chang Hsu
Kuo-Shong Wang
Hung-Yuan Chung
Shih-Hsing Chang
Publication date
15-03-2018
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 10/2019
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-018-3419-z

Other articles of this Issue 10/2019

Neural Computing and Applications 10/2019 Go to the issue

Premium Partner