Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 2/2019

29.08.2017 | Original Article

KNN-based maximum margin and minimum volume hyper-sphere machine for imbalanced data classification

verfasst von: Yitian Xu, Yuqun Zhang, Jiang Zhao, Zhiji Yang, Xianli Pan

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Imbalanced data classification is often met in our real life. In this paper, a novel k-nearest neighbor (KNN)-based maximum margin and minimum volume hyper-sphere machine (KNN-M3VHM) is presented for the imbalanced data classification. The basic idea is to construct two hyper-spheres with different centres and radiuses. The first one contains majority examples and the second one covers minority examples. When constructing the first hyper-sphere, we remove some redundant majority samples using k-nearest neighbor (KNN)-based strategy to balance two classes of samples. Meanwhile, we maximize the margin between two hyper-spheres and minimize their volumes, which can result in two tight boundaries around each class. Similar to the twin hyper-sphere support vector machine (THSVM), KNN-M3VHM solves two related SVM-type problems and avoids the matrix inverse operation when solving the convex optimization problems. KNN-M3VHM considers not only the within-class information but also the between-class margin, then it achieves better performance in comparison with other state-of-the-art algorithms. Experimental results on twenty-five datasets validate the significant advantages of our proposed algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
1.
2.
Zurück zum Zitat Wang X, Aamir R, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29:1185–1196MathSciNetCrossRef Wang X, Aamir R, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29:1185–1196MathSciNetCrossRef
3.
Zurück zum Zitat Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2(1):139–154MATH Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2(1):139–154MATH
4.
Zurück zum Zitat Zhang W, Yoshida T, Tang X (2008) Text classification based on multi-word with support vector machine. Knowl Based Syst 21(8):879–886CrossRef Zhang W, Yoshida T, Tang X (2008) Text classification based on multi-word with support vector machine. Knowl Based Syst 21(8):879–886CrossRef
5.
Zurück zum Zitat Kaper M, Meinicke P, Grossekathoefer U (2004) BCI competition 2003-data set IIb: support vector machines for the P300 speller paradigm. IEEE Trans Biomed Eng 51:1073–1076CrossRef Kaper M, Meinicke P, Grossekathoefer U (2004) BCI competition 2003-data set IIb: support vector machines for the P300 speller paradigm. IEEE Trans Biomed Eng 51:1073–1076CrossRef
6.
Zurück zum Zitat Xu Y, Wang L (2005) Fault diagnosis system based on rough set theory and support vector machine. Lecture Notes Comput Sci 3614:981–988 Xu Y, Wang L (2005) Fault diagnosis system based on rough set theory and support vector machine. Lecture Notes Comput Sci 3614:981–988
7.
Zurück zum Zitat Liu Z, Wu QH, Zhang Y et al (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cybern 2(1):37–47CrossRef Liu Z, Wu QH, Zhang Y et al (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cybern 2(1):37–47CrossRef
8.
Zurück zum Zitat Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29:905–910MATHCrossRef Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29:905–910MATHCrossRef
9.
Zurück zum Zitat Fung G, Mangasarian O (2001) Proximal support vector machine classifiers. In: Provost F, Srikant R (eds) KDD '01 proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. Asscociation for Computing Machinery, New York, pp 77–86 Fung G, Mangasarian O (2001) Proximal support vector machine classifiers. In: Provost F, Srikant R (eds) KDD '01 proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. Asscociation for Computing Machinery, New York, pp 77–86
10.
Zurück zum Zitat Ghorai S, Mukherjee A, Dutta P (2009) Nonparallel plane proximal classifier. Signal Process 89:510–522MATHCrossRef Ghorai S, Mukherjee A, Dutta P (2009) Nonparallel plane proximal classifier. Signal Process 89:510–522MATHCrossRef
11.
Zurück zum Zitat Fung G, Mangasarian O (2005) Multicategory proximal support vector machine classifiers. Mach Learn 59:77–97MATHCrossRef Fung G, Mangasarian O (2005) Multicategory proximal support vector machine classifiers. Mach Learn 59:77–97MATHCrossRef
12.
Zurück zum Zitat Peng X (2010) A \(\nu\)-twin support vector machine (\(\nu\)-TSVM) classifier and its geometric algorithms. Inf Sci 180:3863–3875MathSciNetMATHCrossRef Peng X (2010) A \(\nu\)-twin support vector machine (\(\nu\)-TSVM) classifier and its geometric algorithms. Inf Sci 180:3863–3875MathSciNetMATHCrossRef
13.
Zurück zum Zitat Xu Y, Wang L, Zhong P (2012) A rough margin-based \(\nu\)-twin support vector machine. Neural Comput Appl 21:1307–1317CrossRef Xu Y, Wang L, Zhong P (2012) A rough margin-based \(\nu\)-twin support vector machine. Neural Comput Appl 21:1307–1317CrossRef
14.
Zurück zum Zitat Kumar M, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36:7535–7543CrossRef Kumar M, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36:7535–7543CrossRef
15.
Zurück zum Zitat Peng X (2010) TSVR: an efficient twin support vector machine for regression. Neural Netw 23:365–372MATHCrossRef Peng X (2010) TSVR: an efficient twin support vector machine for regression. Neural Netw 23:365–372MATHCrossRef
17.
Zurück zum Zitat Xu Y, Guo R (2013) A twin multi-class classification support vector machine. Cognit Comput 5(4):580–588CrossRef Xu Y, Guo R (2013) A twin multi-class classification support vector machine. Cognit Comput 5(4):580–588CrossRef
18.
Zurück zum Zitat Wang X, He Q, Chen D, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68:225–238CrossRef Wang X, He Q, Chen D, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68:225–238CrossRef
19.
Zurück zum Zitat Peng X, Xu D (2013) A twin hypersphere support vector machine classifier and the fast learning algorithm. Inf Sci 221:12–27MathSciNetMATHCrossRef Peng X, Xu D (2013) A twin hypersphere support vector machine classifier and the fast learning algorithm. Inf Sci 221:12–27MathSciNetMATHCrossRef
20.
Zurück zum Zitat He HB, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef He HB, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef
21.
Zurück zum Zitat Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449MATHCrossRef Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449MATHCrossRef
22.
Zurück zum Zitat Wei W, Li J, Cao L et al (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16:449–475CrossRef Wei W, Li J, Cao L et al (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16:449–475CrossRef
23.
Zurück zum Zitat Thomas C (2013) Improving intrusion detection for imbalanced network traffic. Secur Commun Netw 6:309–324CrossRef Thomas C (2013) Improving intrusion detection for imbalanced network traffic. Secur Commun Netw 6:309–324CrossRef
24.
Zurück zum Zitat Khalilia M, Chakraborty S, Popescu M (2011) Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak 11(1):51CrossRef Khalilia M, Chakraborty S, Popescu M (2011) Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak 11(1):51CrossRef
25.
Zurück zum Zitat Pedrajas NG, Rodriguez JP, Pedrajas MG et al (2012) Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl Based Syst 25:22–34CrossRef Pedrajas NG, Rodriguez JP, Pedrajas MG et al (2012) Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl Based Syst 25:22–34CrossRef
26.
Zurück zum Zitat Mao W, Wang J, Xue Z (2017) An ELM-based model with sparse-weighting strategy for sequential data imbalance problem. Int J Mach Learn Cybern 8(4):1333–1345CrossRef Mao W, Wang J, Xue Z (2017) An ELM-based model with sparse-weighting strategy for sequential data imbalance problem. Int J Mach Learn Cybern 8(4):1333–1345CrossRef
27.
Zurück zum Zitat Vong CM, Ip WF, Wong PK, Chiu CC (2014) Predicting minority class for suspended particulate matters level by extreme learning machine. Neurocomputing 128:136–144CrossRef Vong CM, Ip WF, Wong PK, Chiu CC (2014) Predicting minority class for suspended particulate matters level by extreme learning machine. Neurocomputing 128:136–144CrossRef
28.
Zurück zum Zitat Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484CrossRef Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484CrossRef
29.
Zurück zum Zitat Sun YM, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719CrossRef Sun YM, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719CrossRef
30.
Zurück zum Zitat Zhai JH, Zhang SF, Wang CX (2017) The classification of imbalanced large data sets based on mapreduce and ensemble of ELM classifiers. J Mach Learn Cybern 8(3):1009–1017CrossRef Zhai JH, Zhang SF, Wang CX (2017) The classification of imbalanced large data sets based on mapreduce and ensemble of ELM classifiers. J Mach Learn Cybern 8(3):1009–1017CrossRef
31.
Zurück zum Zitat Zhai JH, Wang XZ, Pang XH (2016) Voting-based instance selection from large data sets with mapreduce and random weight networks. Inf Sci 367:1066–1077CrossRef Zhai JH, Wang XZ, Pang XH (2016) Voting-based instance selection from large data sets with mapreduce and random weight networks. Inf Sci 367:1066–1077CrossRef
32.
Zurück zum Zitat Zhai JH, Li T, Wang XZ (2016) A cross-selection instance algorithm. J Intell Fuzzy Syst 30(2):717–728CrossRef Zhai JH, Li T, Wang XZ (2016) A cross-selection instance algorithm. J Intell Fuzzy Syst 30(2):717–728CrossRef
33.
Zurück zum Zitat Wang X, Xing H, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef Wang X, Xing H, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef
34.
35.
Zurück zum Zitat Wu M, Ye J (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31(11):2088–2092CrossRef Wu M, Ye J (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31(11):2088–2092CrossRef
36.
Zurück zum Zitat Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced data sets. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Proceedings of 15th ECML, vol 3201. Springer, Berlin, Heidelberg, pp 39–50 Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced data sets. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Proceedings of 15th ECML, vol 3201. Springer, Berlin, Heidelberg, pp 39–50
37.
Zurück zum Zitat Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27MATHCrossRef Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27MATHCrossRef
38.
Zurück zum Zitat Ye Q, Zhao C, Gao S, Zheng H (2012) Weighted twin support vector machines with local information and its application. Neural Netw 35:31–39MATHCrossRef Ye Q, Zhao C, Gao S, Zheng H (2012) Weighted twin support vector machines with local information and its application. Neural Netw 35:31–39MATHCrossRef
39.
Zurück zum Zitat Xu Y, Yu J, Zhang Y (2014) KNN-based weighted rough v-twin support vector machine. Knowl Based Syst 71:303–313CrossRef Xu Y, Yu J, Zhang Y (2014) KNN-based weighted rough v-twin support vector machine. Knowl Based Syst 71:303–313CrossRef
40.
Zurück zum Zitat Shao Y, Chen W, Zhang J, Wang Z, Deng N (2014) An efficient weighted Lagrangian twin support vector machine for imbalanced data classification. Pattern Recognit 47:3158–3167MATHCrossRef Shao Y, Chen W, Zhang J, Wang Z, Deng N (2014) An efficient weighted Lagrangian twin support vector machine for imbalanced data classification. Pattern Recognit 47:3158–3167MATHCrossRef
41.
Zurück zum Zitat Xu Y, Yang Z, Zhang Y, Pan X, Wang L (2016) A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification. Knowl Based Syst 95:75–85CrossRef Xu Y, Yang Z, Zhang Y, Pan X, Wang L (2016) A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification. Knowl Based Syst 95:75–85CrossRef
42.
Zurück zum Zitat Demsar J (2006) Statistical comparisons of classification over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH Demsar J (2006) Statistical comparisons of classification over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH
43.
Zurück zum Zitat Garca S, Fernndez A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064CrossRef Garca S, Fernndez A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064CrossRef
Metadaten
Titel
KNN-based maximum margin and minimum volume hyper-sphere machine for imbalanced data classification
verfasst von
Yitian Xu
Yuqun Zhang
Jiang Zhao
Zhiji Yang
Xianli Pan
Publikationsdatum
29.08.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 2/2019
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-017-0720-6

Weitere Artikel der Ausgabe 2/2019

International Journal of Machine Learning and Cybernetics 2/2019 Zur Ausgabe

Neuer Inhalt