research-article

A memetic algorithm to select training data for support vector machines

Authors:
Jakub Nalepa

Silesian University of Technology, Gliwice, Poland

Silesian University of Technology, Gliwice, Poland
View Profile

,
Michal Kawulok

Silesian University of Technology, Gliwice, Poland

Silesian University of Technology, Gliwice, Poland
View Profile

GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary ComputationJuly 2014Pages 573–580https://doi.org/10.1145/2576768.2598370

Published:12 July 2014Publication History

GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

Pages 573–580

ABSTRACT

In this paper we propose a new memetic algorithm (MASVM) for fast and efficient selection of a valuable training set for support vector machines (SVMs). This is a crucial step especially in case of large and noisy data sets, since the SVM training has high time and memory complexity. The majority of state-of-the-art methods exploit the data geometry analysis, both in the input and kernel space. Although evolutionary algorithms have been proven to be very efficient for this purpose, they have not been extensively studied so far. Here, we propose a new method employing an adaptive genetic algorithm enhanced by some refinement techniques. The refinements are based on utilizing a pool of the support vectors identified so far at various steps of the algorithm. Extensive experimental study performed on the well-known benchmark, real-world and artificial data sets clearly confirms the efficacy, robustness and convergence capabilities of the proposed approach, and shows that it is competitive compared with other state-of-the-art techniques.

References

S. Abe and T. Inoue. Fast training of support vector machines by extracting boundary data. In Proc. Int. Conf. on ANNs, pages 308--313. Springer, 2001. Google ScholarDigital Library
J. L. Balcäzar, Y. Dai, and O. Watanabe. A random sampling technique for training support vector machines. In Proc. ALT, pages 119--134. Springer, 2001. Google ScholarDigital Library
C.-C. Chang, H.-K. Pao, and Y.-J. Lee. An RSVM based two-teachers-one-student semi-supervised learning algorithm. Neural Networks, 25:57--69, 2012. Google ScholarDigital Library
L.-J. Chien, C.-C. Chang, and Y.-J. Lee. Variant methods of reduced set selection for reduced support vector machines. JISE, 26(1):183--196, 2010.Google Scholar
C. Cortes and V. Vapnik. Support-Vector Networks. Mach. Learn., 20(3):273--297, 1995. Google ScholarDigital Library
E. E. A. Elamin. A proposed genetic algorithm selection method. In Proc. 1st NITS, pages 1--8, 2006.Google Scholar
E. Ferragut and J. Laska. Randomized sampling for large data applications of SVM. In Proc. IEEE ICMLA, volume 1, pages 350--355, 2012. Google ScholarDigital Library
X. Guan, X. Zhang, D. Han, Y. Zhu, J. Lv, and J. Su. A strategic flight conflict avoidance approach based on a memetic algorithm. Chinese J. of Aeronautics, 27(1):93--101, 2014.Google ScholarCross Ref
Y. Jin, J.-K. Hao, and J.-P. Hamiez. A memetic algorithm for the minimum sum coloring problem. Comput. Oper. Res., 43:318--327, 2014. Google ScholarDigital Library
T. Joachims. Making large-scale SVM learning practical. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Adv. in kernel methods, pages 169--184. MIT Press, 1999. Google ScholarDigital Library
M. Kawulok and J. Nalepa. Support vector machines training data selection using a genetic algorithm. In Proc. S+SSPR 2012, volume 7626 of LNCS, pages 557--565. Springer, 2012. Google ScholarDigital Library
R. Koggalage and S. Halgamuge. Reducing the number of training samples for fast support vector machine classification. Neural Information Process. Lett. and Reviews, 2(3):57--65, 2004.Google Scholar
Q. Le, T. Sarlos, and A. Smola. Fastfood -- approximating kernel expansions in loglinear time. In Proc. ICML, 2013.Google Scholar
Y.-J. Lee and S.-Y. Huang. Reduced support vector machines: A statistical theory. IEEE T. Neural Networ., 18(1):1--13, 2007. Google ScholarDigital Library
Y. Li, P. Li, B. Wu, L. Jiao, and R. Shang. Kernel clustering using a hybrid memetic algorithm. Natural Comput., 12(4):605--615, 2013. Google ScholarDigital Library
A. Lopez-Chau, X. Li, and W. Yu. Convex-concave hull for classification with SVM. In Proc. IEEE ICDMW, pages 431--438, 2012. Google ScholarDigital Library
M. Marinaki and Y. Marinakis. An island memetic differential evolution algorithm for the feature selection problem. In Proc. NICSO, volume 512 of SCI, pages 29--42. Springer, 2014.Google Scholar
P. Moscato. On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts -- Towards Memetic Algorithms. Technical Report 158-79, California Institute of Technology, 1989.Google Scholar
D. R. Musicant and A. Feinberg. Active set support vector regression. IEEE T. on Neural Networ., 15(2):268--275, 2004. Google ScholarDigital Library
J. Nalepa and Z. J. Czech. New selection schemes in a memetic algorithm for the vehicle routing problem with time windows. In Proc. ICANNGA, volume 7824 of LNCS, pages 396--405. Springer, 2013.Google Scholar
J. Nalepa and M. Kawulok. Adaptive genetic algorithm to select training data for support vector machines. In EvoApp., LNCS. Springer, 2014. in press.Google Scholar
S. L. Phung, D. Chai, and A. Bouzerdoum. Adaptive skin segmentation in color images. In Proc. IEEE ICASSP, pages 353--356, 2003.Google ScholarCross Ref
A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Proc. NIPS, pages 1177--1184. 2008.Google Scholar
A. Rahimi and B. Recht. Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning. In Proc. NIPS, pages 1313--1320. 2009.Google Scholar
I. Rodriguez-Lujan, C. S. Cruz, and R. Huerta. Hierarchical linear support vector machine. Patt. Recogn., 45(12):4414--4427, 2012. Google ScholarDigital Library
S. Salvador and P. Chan. Determining the number of clusters / segments in hierarchical clustering / segmentation algorithms. In Proc. IEEE ICTAI, pages 576--584, 2004. Google ScholarDigital Library
G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In Proc. ICML, pages 839--846, 2000. Google ScholarDigital Library
H. Shin and S. Cho. Neighborhood property-based pattern selection for SVMs. Neural Comput., 19(3):816--855, 2007. Google ScholarDigital Library
I. W. Tsang, J. T. Kwok, and P.-M. Cheung. Core vector machines: Fast SVM training on very large data sets. J. of Mach. Learn. Res., 6:363--392, 2005. Google ScholarDigital Library
D. Wang and L. Shi. Selecting valuable training samples for SVMs via data structure analysis. Neurocomputing, 71:2772--2781, 2008. Google ScholarDigital Library
J. Wang, P. Neskovic, and L. N. Cooper. Training data selection for SVMs. In Adv. in Natural Comp., pages 554--564. Springer, 2005. Google ScholarDigital Library
Z.-Q. Zeng, H.-R. Xu, Y.-Q. Xie, and J. Gao. A geometric approach to train SVM on very large data sets. In Proc. ISKE, volume 1, pages 991--996, 2008.Google Scholar
W. Zhang and I. King. Locating support vectors via-skeleton technique. In Proc. ICONIP, pages 1423--1427, 2002.Google ScholarCross Ref

Index Terms

A memetic algorithm to select training data for support vector machines
1. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence
    2. Search methodologies
      1. Heuristic function construction

Recommendations

Incremental training of support vector machines using hyperspheres

In the conventional incremental training of support vector machines, candidates for support vectors tend to be deleted if the separating hyperplane rotates as the training data are added. To solve this problem, in this paper, we propose an incremental ...
Read More
Training Support Vector Machines Using Gilbert's Algorithm
ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining

Support Vector Machines are classifiers designed around the computation of an optimal separating hyperplane. This hyperplane is typically obtained by solving a constrained quadratic programming problem, but may also be located by solving a nearest point ...
Read More
An overview on twin support vector machines

Twin support vector machines (TWSVM) is based on the idea of proximal SVM based on generalized eigenvalues (GEPSVM), which determines two nonparallel planes by solving two related SVM-type problems, so that its computing cost in the training phase is 1/...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation
July 2014
1478 pages
ISBN:9781450326629
DOI:10.1145/2576768
Editor-in-chief:
Christian Igel
Ruhr University of Bochum, University of Copenhagen
,
General Chair:
Dirk V. Arnold
Dalhousie University
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 July 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
memetic algorithm
self-adaptation
support vector machines
training data selection
Qualifiers
- research-article
Conference

Acceptance Rates
GECCO '14 Paper Acceptance Rate180of544submissions,33%Overall Acceptance Rate1,669of4,410submissions,38%
More
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 332
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A memetic algorithm to select training data for support vector machines

GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Incremental training of support vector machines using hyperspheres

Training Support Vector Machines Using Gilbert's Algorithm

An overview on twin support vector machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A memetic algorithm to select training data for support vector machines

GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Incremental training of support vector machines using hyperspheres

Training Support Vector Machines Using Gilbert's Algorithm

An overview on twin support vector machines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media