Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2018

18.04.2017 | Theoretical Advances

A fast classification strategy for SVM on the large-scale high-dimensional datasets

verfasst von: I-Jing Li, Jiunn-Lin Wu, Chih-Hung Yeh

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang F (2011) Semisupervised metric learning by maximizing constraint margin. IEEE Trans Syst Man Cybern B 41(4):931–939CrossRef Wang F (2011) Semisupervised metric learning by maximizing constraint margin. IEEE Trans Syst Man Cybern B 41(4):931–939CrossRef
2.
Zurück zum Zitat Yu J, Rui Y, Tang YY, Tao D (2014) High-order distance based multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442CrossRef Yu J, Rui Y, Tang YY, Tao D (2014) High-order distance based multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442CrossRef
3.
Zurück zum Zitat Yu J, Tao D (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272MathSciNetCrossRef Yu J, Tao D (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272MathSciNetCrossRef
5.
Zurück zum Zitat Yu J, Tao D, Rui Y, Cheng J (2013) Pairwise constraints based multiview features fusion for scene classification. Pattern Recognit 46:483–496CrossRef Yu J, Tao D, Rui Y, Cheng J (2013) Pairwise constraints based multiview features fusion for scene classification. Pattern Recognit 46:483–496CrossRef
6.
Zurück zum Zitat Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435CrossRef Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435CrossRef
7.
Zurück zum Zitat Triguero I, Derrac J, Garcıa S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C 42(1):86–100CrossRef Triguero I, Derrac J, Garcıa S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C 42(1):86–100CrossRef
8.
Zurück zum Zitat Joachims T (1999) Transductive inference for text classification using support vector machines prodigy. In: Proceedings of international conference on machine learning Joachims T (1999) Transductive inference for text classification using support vector machines prodigy. In: Proceedings of international conference on machine learning
9.
Zurück zum Zitat Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor for visual object recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition Zhang H, Berg AC, Maire M, Malik J (2006) SVM-KNN: discriminative nearest neighbor for visual object recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition
10.
Zurück zum Zitat Van Nguyen H, Porikli F (2013) Support vector shape: a classifier-based shape representation. IEEE Trans Pattern Anal Mach Intell 35(4):970–982CrossRef Van Nguyen H, Porikli F (2013) Support vector shape: a classifier-based shape representation. IEEE Trans Pattern Anal Mach Intell 35(4):970–982CrossRef
11.
Zurück zum Zitat Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474CrossRef Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474CrossRef
13.
Zurück zum Zitat Rojas SA, Fernandez Reyes D (2005) Adapting multiple kernel parameters for support vector machines using genetic algorithms. In: The 2005 IEEE congress on evolutionary computation, vol 1. pp 626–631 Rojas SA, Fernandez Reyes D (2005) Adapting multiple kernel parameters for support vector machines using genetic algorithms. In: The 2005 IEEE congress on evolutionary computation, vol 1. pp 626–631
14.
Zurück zum Zitat Liang X, Liu F (2002) Choosing multiple parameters for SVM based on genetic algorithm. In: 6th International conference on signal processing, vol 1. pp 117–119 Liang X, Liu F (2002) Choosing multiple parameters for SVM based on genetic algorithm. In: 6th International conference on signal processing, vol 1. pp 117–119
15.
Zurück zum Zitat Liu HJ, Wang YN, Lu XF (2005) A method to choose kernel function and its parameters for support vector machines. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4277–4280 Liu HJ, Wang YN, Lu XF (2005) A method to choose kernel function and its parameters for support vector machines. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4277–4280
16.
Zurück zum Zitat Liu S, Jia CY, Ma H (2005) A new weighted support vector machine with GA-based parameter selection. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4351–4355 Liu S, Jia CY, Ma H (2005) A new weighted support vector machine with GA-based parameter selection. In: Proceedings of 2005 international conference on machine learning and cybernetics, vol 7. pp 4351–4355
17.
Zurück zum Zitat Quang AT, Zhang QL, Li X (2002) Evolving support vector machine parameters. In: Proceedings of 2002 international conference on machine learning and cybernetics, vol 1. pp 548–551 Quang AT, Zhang QL, Li X (2002) Evolving support vector machine parameters. In: Proceedings of 2002 international conference on machine learning and cybernetics, vol 1. pp 548–551
18.
Zurück zum Zitat Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognit 42(5):710–717CrossRef Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognit 42(5):710–717CrossRef
19.
Zurück zum Zitat Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of 1st SIAM international conference on data mining Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of 1st SIAM international conference on data mining
20.
Zurück zum Zitat Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In: Proceedings of international conference on knowledge discovery data mining. pp 306–315 Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In: Proceedings of international conference on knowledge discovery data mining. pp 306–315
21.
Zurück zum Zitat Bakur GH, Bottou L, Weston J (2005) Breaking SVM complexity with cross-training. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems (NIPS), vol 17. MIT Press, Cambridge, pp 81–88 Bakur GH, Bottou L, Weston J (2005) Breaking SVM complexity with cross-training. In: Saul LK, Weiss Y, Bottou L (eds) Advances in neural information processing systems (NIPS), vol 17. MIT Press, Cambridge, pp 81–88
22.
Zurück zum Zitat Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357CrossRef Angiulli F, Astorino A (2010) Scaling up support vector machines using nearest neighbor condensation. IEEE Trans Neural Netw 21(2):351–357CrossRef
23.
Zurück zum Zitat Devi FS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35(2):505–513CrossRef Devi FS, Murty MN (2002) An incremental prototype set building technique. Pattern Recognit 35(2):505–513CrossRef
24.
Zurück zum Zitat Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Academic Press, LondonMATH Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Academic Press, LondonMATH
25.
Zurück zum Zitat Hart PE, Stock DG, Duda RO (2001) Pattern classification, 2nd edn. Wiley, Hoboken Hart PE, Stock DG, Duda RO (2001) Pattern classification, 2nd edn. Wiley, Hoboken
26.
Zurück zum Zitat Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6):888–900CrossRef Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4(6):888–900CrossRef
27.
Zurück zum Zitat Lau KW, Wu QH (2008) Local prediction of non-linear time series using support vector regression. Pattern Recognit 41(5):1556–1564CrossRef Lau KW, Wu QH (2008) Local prediction of non-linear time series using support vector regression. Pattern Recognit 41(5):1556–1564CrossRef
28.
Zurück zum Zitat Li IJ, Chen JC, Wu JL (2013) A fast prototype reduction method based on template reduction and visualization-induced self-organizing map for nearest neighbor algorithm. Appl Intell 39(3):564–582CrossRef Li IJ, Chen JC, Wu JL (2013) A fast prototype reduction method based on template reduction and visualization-induced self-organizing map for nearest neighbor algorithm. Appl Intell 39(3):564–582CrossRef
29.
Zurück zum Zitat Cheng HB, Tan PN, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22(4):537–549CrossRef Cheng HB, Tan PN, Jin R (2010) Efficient algorithm for localized support vector machine. IEEE Trans Knowl Data Eng 22(4):537–549CrossRef
30.
Zurück zum Zitat Schrijver A (1998) Theory of linear and integer programming. Wiley, HobokenMATH Schrijver A (1998) Theory of linear and integer programming. Wiley, HobokenMATH
31.
Zurück zum Zitat Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464CrossRef Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464CrossRef
32.
Zurück zum Zitat Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516CrossRef Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(3):515–516CrossRef
33.
Zurück zum Zitat Gates W (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433CrossRef Gates W (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433CrossRef
35.
Zurück zum Zitat Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognit 48:3102–3112CrossRef Zhang L, Zhang Q, Zhang L, Tao D, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognit 48:3102–3112CrossRef
36.
Zurück zum Zitat Xiong W, Zhang L, Du B, Tao D (2016) Combining local and global: rich and robust feature pooling for visual recognition. Pattern Recognit 62:225–235CrossRef Xiong W, Zhang L, Du B, Tao D (2016) Combining local and global: rich and robust feature pooling for visual recognition. Pattern Recognit 62:225–235CrossRef
Metadaten
Titel
A fast classification strategy for SVM on the large-scale high-dimensional datasets
verfasst von
I-Jing Li
Jiunn-Lin Wu
Chih-Hung Yeh
Publikationsdatum
18.04.2017
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2018
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-017-0620-0

Weitere Artikel der Ausgabe 4/2018

Pattern Analysis and Applications 4/2018 Zur Ausgabe