nach oben

International Journal of Machine Learning and Cybernetics

Erschienen in:

29.08.2017 | Original Article

KNN-based maximum margin and minimum volume hyper-sphere machine for imbalanced data classification

verfasst von: Yitian Xu, Yuqun Zhang, Jiang Zhao, Zhiji Yang, Xianli Pan

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Imbalanced data classification is often met in our real life. In this paper, a novel k-nearest neighbor (KNN)-based maximum margin and minimum volume hyper-sphere machine (KNN-M³VHM) is presented for the imbalanced data classification. The basic idea is to construct two hyper-spheres with different centres and radiuses. The first one contains majority examples and the second one covers minority examples. When constructing the first hyper-sphere, we remove some redundant majority samples using k-nearest neighbor (KNN)-based strategy to balance two classes of samples. Meanwhile, we maximize the margin between two hyper-spheres and minimize their volumes, which can result in two tight boundaries around each class. Similar to the twin hyper-sphere support vector machine (THSVM), KNN-M³VHM solves two related SVM-type problems and avoids the matrix inverse operation when solving the convex optimization problems. KNN-M³VHM considers not only the within-class information but also the between-class margin, then it achieves better performance in comparison with other state-of-the-art algorithms. Experimental results on twenty-five datasets validate the significant advantages of our proposed algorithm.

Vorheriger Artikel Some distances, similarity and entropy measures for interval-valued neutrosophic sets and their relationship

Nächster Artikel v-soft margin multi-task learning logistic regression

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

Jetzt informieren

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

Jetzt informieren

http://archive.ics.uci.edu/ml/datasets.html.

http://www.bbci.de/competition.

http://sci2s.ugr.es/keel/imbalanced.php.

Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkMATHCrossRef

Wang X, Aamir R, Fu A (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29:1185–1196MathSciNetCrossRef

Manevitz LM, Yousef M (2001) One-class SVMs for document classification. J Mach Learn Res 2(1):139–154MATH

Zhang W, Yoshida T, Tang X (2008) Text classification based on multi-word with support vector machine. Knowl Based Syst 21(8):879–886CrossRef

Kaper M, Meinicke P, Grossekathoefer U (2004) BCI competition 2003-data set IIb: support vector machines for the P300 speller paradigm. IEEE Trans Biomed Eng 51:1073–1076CrossRef

Xu Y, Wang L (2005) Fault diagnosis system based on rough set theory and support vector machine. Lecture Notes Comput Sci 3614:981–988

Liu Z, Wu QH, Zhang Y et al (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cybern 2(1):37–47CrossRef

Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29:905–910MATHCrossRef

Fung G, Mangasarian O (2001) Proximal support vector machine classifiers. In: Provost F, Srikant R (eds) KDD '01 proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. Asscociation for Computing Machinery, New York, pp 77–86

10.

Ghorai S, Mukherjee A, Dutta P (2009) Nonparallel plane proximal classifier. Signal Process 89:510–522MATHCrossRef

11.

Fung G, Mangasarian O (2005) Multicategory proximal support vector machine classifiers. Mach Learn 59:77–97MATHCrossRef

12.

Peng X (2010) A \(\nu\)-twin support vector machine (\(\nu\)-TSVM) classifier and its geometric algorithms. Inf Sci 180:3863–3875MathSciNetMATHCrossRef

13.

Xu Y, Wang L, Zhong P (2012) A rough margin-based \(\nu\)-twin support vector machine. Neural Comput Appl 21:1307–1317CrossRef

14.

Kumar M, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36:7535–7543CrossRef

15.

Peng X (2010) TSVR: an efficient twin support vector machine for regression. Neural Netw 23:365–372MATHCrossRef

16.

Xu Y, Wang L (2012) A weighted twin support vector regression. Knowl Based Syst 33:92–101MathSciNetCrossRef

17.

Xu Y, Guo R (2013) A twin multi-class classification support vector machine. Cognit Comput 5(4):580–588CrossRef

18.

Wang X, He Q, Chen D, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68:225–238CrossRef

19.

Peng X, Xu D (2013) A twin hypersphere support vector machine classifier and the fast learning algorithm. Inf Sci 221:12–27MathSciNetMATHCrossRef

20.

He HB, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRef

21.

Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–449MATHCrossRef

22.

Wei W, Li J, Cao L et al (2013) Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16:449–475CrossRef

23.

Thomas C (2013) Improving intrusion detection for imbalanced network traffic. Secur Commun Netw 6:309–324CrossRef

24.

Khalilia M, Chakraborty S, Popescu M (2011) Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak 11(1):51CrossRef

25.

Pedrajas NG, Rodriguez JP, Pedrajas MG et al (2012) Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl Based Syst 25:22–34CrossRef

26.

Mao W, Wang J, Xue Z (2017) An ELM-based model with sparse-weighting strategy for sequential data imbalance problem. Int J Mach Learn Cybern 8(4):1333–1345CrossRef

27.

Vong CM, Ip WF, Wong PK, Chiu CC (2014) Predicting minority class for suspended particulate matters level by extreme learning machine. Neurocomputing 128:136–144CrossRef

28.

Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):463–484CrossRef

29.

Sun YM, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687–719CrossRef

30.

Zhai JH, Zhang SF, Wang CX (2017) The classification of imbalanced large data sets based on mapreduce and ensemble of ELM classifiers. J Mach Learn Cybern 8(3):1009–1017CrossRef

31.

Zhai JH, Wang XZ, Pang XH (2016) Voting-based instance selection from large data sets with mapreduce and random weight networks. Inf Sci 367:1066–1077CrossRef

32.

Zhai JH, Li T, Wang XZ (2016) A cross-selection instance algorithm. J Intell Fuzzy Syst 30(2):717–728CrossRef

33.

Wang X, Xing H, Li Y et al (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRef

34.

Tax D, Duin R (2004) Support vector data description. Mach Learn 54:45–66MATHCrossRef

35.

Wu M, Ye J (2009) A small sphere and large margin approach for novelty detection using training data with outliers. IEEE Trans Pattern Anal Mach Intell 31(11):2088–2092CrossRef

36.

Akbani R, Kwek S, Japkowicz N (2004) Applying support vector machines to imbalanced data sets. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Proceedings of 15th ECML, vol 3201. Springer, Berlin, Heidelberg, pp 39–50

37.

Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27MATHCrossRef

38.

Ye Q, Zhao C, Gao S, Zheng H (2012) Weighted twin support vector machines with local information and its application. Neural Netw 35:31–39MATHCrossRef

39.

Xu Y, Yu J, Zhang Y (2014) KNN-based weighted rough v-twin support vector machine. Knowl Based Syst 71:303–313CrossRef

40.

Shao Y, Chen W, Zhang J, Wang Z, Deng N (2014) An efficient weighted Lagrangian twin support vector machine for imbalanced data classification. Pattern Recognit 47:3158–3167MATHCrossRef

41.

Xu Y, Yang Z, Zhang Y, Pan X, Wang L (2016) A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification. Knowl Based Syst 95:75–85CrossRef

42.

Demsar J (2006) Statistical comparisons of classification over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATH

43.

Garca S, Fernndez A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064CrossRef

Titel: KNN-based maximum margin and minimum volume hyper-sphere machine for imbalanced data classification
verfasst von: Yitian Xu
Yuqun Zhang
Jiang Zhao
Zhiji Yang
Xianli Pan
Publikationsdatum: 29.08.2017
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal of Machine Learning and Cybernetics / Ausgabe 2/2019
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-017-0720-6

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Suresh Vittal/© Alteryx, Additiv gefertigte Teile/© Marina_Skoropadskaya | Getty Images | iStock, Warnschild "Land unter"/© Bluedesign / Fotolia, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

ATZelectronics worldwide

ATZelektronik

Weitere Artikel der Ausgabe 2/2019

N-semble: neural network based ensemble approach

Semi-supervised rough fuzzy Laplacian Eigenmaps for dimensionality reduction

Methods for virtual machine scheduling with uncertain execution times in cloud computing

v-soft margin multi-task learning logistic regression

Self-organizing mapping based swarm intelligence for secondary and tertiary proteins classification

Network design of a multi-period collaborative distribution system

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.