Skip to main content
Top
Published in: The Journal of Supercomputing 10/2021

24-03-2021

Speeding up the testing and training time for the support vector machines with minimal effect on the performance

Author: Hamid Reza Ghaffari

Published in: The Journal of Supercomputing | Issue 10/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Support vector machine faces some problems associated with training time in the presence of large data sets due to the need for high memory and high computational cost. The main problem with Support vector machine occurs during the training phase, which is computationally expensive and depends on the size of the input data set. The high boundary complexity among the classes is another major problem with the most data sets, which will reduce the generalizability. Therefore, this study was conducted to present a useful method for reducing the number of data and boundary complexity, in which the training set is divided into the boundary, non-boundary, and harmful patterns. It has four phases: In the first phase, the information is determined about the neighborhood of each sample with other samples. In the second phase, harmful patterns are removed to reduce the data complexity. In the third phase, the remaining training data samples are divided into the boundary and non-boundary patterns. In phase 4, representatives of non-boundary data are determined and a reduced set is formed through combination of these representatives and boundary patterns. The proposed method is tested with 33 data sets and comparatively evaluated against five of the most successful instance-based condensation algorithms. Experiments showed that the proposed method is better than other methods presented in the research literature.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ougiaroglou S, Diamantaras KI, Evangelidis G (2018) Exploring the effect of data reduction on neural network and support vector machine classification. Neurocomputing 280:101–110CrossRef Ougiaroglou S, Diamantaras KI, Evangelidis G (2018) Exploring the effect of data reduction on neural network and support vector machine classification. Neurocomputing 280:101–110CrossRef
2.
go back to reference Guo L, Boukir S (2015) Fast data selection for SVM training using ensemble margin. Pattern Recognit Lett 51:112–119CrossRef Guo L, Boukir S (2015) Fast data selection for SVM training using ensemble margin. Pattern Recognit Lett 51:112–119CrossRef
3.
go back to reference Kawulok M, Nalepa J (2014) Dynamically adaptive genetic algorithm to select training data for SVMs. In: Paper Presented at the Ibero-American Conference on Artificial Intelligence Kawulok M, Nalepa J (2014) Dynamically adaptive genetic algorithm to select training data for SVMs. In: Paper Presented at the Ibero-American Conference on Artificial Intelligence
4.
go back to reference Lin W-C, Ke S-W, Tsai C-F (2015) CANN: an intrusion detection system based on combining cluster centers and nearest neighbors. Knowl-Based Syst 78:13–21CrossRef Lin W-C, Ke S-W, Tsai C-F (2015) CANN: an intrusion detection system based on combining cluster centers and nearest neighbors. Knowl-Based Syst 78:13–21CrossRef
5.
go back to reference Huang J, Shao X, Wechsler H (1998) Face pose discrimination using support vector machines (SVM). In: Paper Presented at the Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170) Huang J, Shao X, Wechsler H (1998) Face pose discrimination using support vector machines (SVM). In: Paper Presented at the Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No. 98EX170)
6.
go back to reference Jung HG, Kim G (2013) Support vector number reduction: survey and experimental evaluations. IEEE Trans Intell Transp Syst 15(2):463–476CrossRef Jung HG, Kim G (2013) Support vector number reduction: survey and experimental evaluations. IEEE Trans Intell Transp Syst 15(2):463–476CrossRef
7.
go back to reference Yang L, Zhu Q, Huang J, Wu Q, Cheng D, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245CrossRef Yang L, Zhu Q, Huang J, Wu Q, Cheng D, Hong X (2019) Constraint nearest neighbor for instance reduction. Soft Comput 23:13235–13245CrossRef
8.
go back to reference Nikolaidis K, Goulermas JY, Wu Q (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715CrossRef Nikolaidis K, Goulermas JY, Wu Q (2011) A class boundary preserving algorithm for data condensation. Pattern Recognit 44(3):704–715CrossRef
9.
go back to reference Tomek I (1976) AN experiment with the edited nearest-nieghbor rule Tomek I (1976) AN experiment with the edited nearest-nieghbor rule
10.
go back to reference Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 3:408–421MathSciNetCrossRef Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 3:408–421MathSciNetCrossRef
11.
go back to reference Cameron-Jones R (1995) Instance selection by encoding length heuristic with random mutation hill climbing. In: Paper Presented at the Eighth Australian Joint Conference on Artificial Intelligence Cameron-Jones R (1995) Instance selection by encoding length heuristic with random mutation hill climbing. In: Paper Presented at the Eighth Australian Joint Conference on Artificial Intelligence
12.
go back to reference Wilson DR, Martinez TR (1997) Improved center point selection for radial basis function networks. In: Paper Presented at the Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA’97) Wilson DR, Martinez TR (1997) Improved center point selection for radial basis function networks. In: Paper Presented at the Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms (ICANNGA’97)
13.
go back to reference Liu C,* Wang W, Wang M, Lv F, Konan M (2017) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst 116:58–73CrossRef Liu C,* Wang W, Wang M, Lv F, Konan M (2017) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst 116:58–73CrossRef
14.
go back to reference de Haro-García A, García-Pedrajas N (2009) A divide-and-conquer recursive approach for scaling up instance selection algorithms. Data Min Knowl Disc 18(3):392–418MathSciNetCrossRef de Haro-García A, García-Pedrajas N (2009) A divide-and-conquer recursive approach for scaling up instance selection algorithms. Data Min Knowl Disc 18(3):392–418MathSciNetCrossRef
15.
go back to reference Li X, Cervantes J, Yu W (2010) A novel SVM classification method for large data sets. In: Paper Presented at the 2010 IEEE International Conference on Granular Computing Li X, Cervantes J, Yu W (2010) A novel SVM classification method for large data sets. In: Paper Presented at the 2010 IEEE International Conference on Granular Computing
16.
go back to reference Koggalage R, Halgamuge S (2004) Reducing the number of training samples for fast support vector machine classification. Neural Inf Process-Lett Rev 2(3):57–65 Koggalage R, Halgamuge S (2004) Reducing the number of training samples for fast support vector machine classification. Neural Inf Process-Lett Rev 2(3):57–65
17.
go back to reference Li B, Wang Q, Hu J (2009) A fast SVM training method for very large data sets. In: Paper Presented at the 2009 International Joint Conference on Neural Networks Li B, Wang Q, Hu J (2009) A fast SVM training method for very large data sets. In: Paper Presented at the 2009 International Joint Conference on Neural Networks
18.
go back to reference Leyva E, González A, Pérez R (2013) Knowledge-based instance selection: a compromise between efficiency and versatility. Knowl-Based Syst 47:65–76CrossRef Leyva E, González A, Pérez R (2013) Knowledge-based instance selection: a compromise between efficiency and versatility. Knowl-Based Syst 47:65–76CrossRef
19.
go back to reference Smith-Miles K, Islam R (2010) Meta-learning for data summarization based on instance selection method. In: Paper Presented at the IEEE Congress on Evolutionary Computation Smith-Miles K, Islam R (2010) Meta-learning for data summarization based on instance selection method. In: Paper Presented at the IEEE Congress on Evolutionary Computation
20.
go back to reference Cervantes J, Lamont FG, López-Chau A, Mazahua LR, Ruíz JS (2015) Data selection based on decision tree for SVM classification on large data sets. Appl Soft Comput 37:787–798CrossRef Cervantes J, Lamont FG, López-Chau A, Mazahua LR, Ruíz JS (2015) Data selection based on decision tree for SVM classification on large data sets. Appl Soft Comput 37:787–798CrossRef
21.
go back to reference Anwar IM, Salama KM, Abdelbar AM (2015) Instance selection with ant colony optimization. Procedia Comput Sci 53:248–256CrossRef Anwar IM, Salama KM, Abdelbar AM (2015) Instance selection with ant colony optimization. Procedia Comput Sci 53:248–256CrossRef
22.
go back to reference Cervantes J, López A, García F, Trueba A (2011) A fast SVM training algorithm based on a decision tree data filter. In: Paper Presented at the Mexican International Conference on Artificial Intelligence Cervantes J, López A, García F, Trueba A (2011) A fast SVM training algorithm based on a decision tree data filter. In: Paper Presented at the Mexican International Conference on Artificial Intelligence
23.
go back to reference Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In:Paper Presented at the Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In:Paper Presented at the Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
24.
go back to reference Chitrakar R, Huang C (2014) Selection of candidate support vectors in incremental SVM for network intrusion detection. Comput Secur 45:231–241CrossRef Chitrakar R, Huang C (2014) Selection of candidate support vectors in incremental SVM for network intrusion detection. Comput Secur 45:231–241CrossRef
25.
go back to reference Shen XJ, Mu L, Li Z, Wu HX, Gou JP, Chen X (2016) Large-scale support vector machine classification with redundant data reduction. Neurocomputing 172:189–197CrossRef Shen XJ, Mu L, Li Z, Wu HX, Gou JP, Chen X (2016) Large-scale support vector machine classification with redundant data reduction. Neurocomputing 172:189–197CrossRef
26.
go back to reference Yang L, Zhu Q, Huang J, Cheng D (2016) Adaptive edited natural neighbor algorithm. Neurocomputing 230:427–433CrossRef Yang L, Zhu Q, Huang J, Cheng D (2016) Adaptive edited natural neighbor algorithm. Neurocomputing 230:427–433CrossRef
27.
go back to reference Basu M, Ho TK (2006) Data complexity in pattern recognition. Springer Science & Business Media, New YorkCrossRef Basu M, Ho TK (2006) Data complexity in pattern recognition. Springer Science & Business Media, New YorkCrossRef
28.
go back to reference Ghaffari HR, Yazdi HS (2014) Multiclass classifier based on boundary complexity. Neural Comput Appl 24(5):985–993CrossRef Ghaffari HR, Yazdi HS (2014) Multiclass classifier based on boundary complexity. Neural Comput Appl 24(5):985–993CrossRef
29.
go back to reference Li L (2006) Data complexity in machine learning and novel classification algorithms. California Institute of Technology Li L (2006) Data complexity in machine learning and novel classification algorithms. California Institute of Technology
30.
go back to reference Vapnik V (2013) The nature of statistical learning theory. Springer science & business media, New YorkMATH Vapnik V (2013) The nature of statistical learning theory. Springer science & business media, New YorkMATH
31.
go back to reference Asuncion A, Newman D (2007) UCI machine learning repository. In Asuncion A, Newman D (2007) UCI machine learning repository. In
32.
go back to reference Liua C, Wanga W, Wanga M, Lv F, Konana M (2016) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst 116:58–73CrossRef Liua C, Wanga W, Wanga M, Lv F, Konana M (2016) An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl-Based Syst 116:58–73CrossRef
Metadata
Title
Speeding up the testing and training time for the support vector machines with minimal effect on the performance
Author
Hamid Reza Ghaffari
Publication date
24-03-2021
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 10/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-021-03729-0

Other articles of this Issue 10/2021

The Journal of Supercomputing 10/2021 Go to the issue

Premium Partner