Benchmarking Least Squares Support Vector Machine Classifiers

van Gestel, Tony; Suykens, Johan A.K.; Baesens, Bart; Viaene, Stijn; Vanthienen, Jan; Dedene, Guido; de Moor, Bart; Vandewalle, Joos

doi:10.1023/B:MACH.0000008082.80494.e0

Benchmarking Least Squares Support Vector Machine Classifiers

Published: January 2004

Volume 54, pages 5–32, (2004)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Benchmarking Least Squares Support Vector Machine Classifiers

Download PDF

Tony van Gestel¹,
Johan A.K. Suykens¹,
Bart Baesens²,
Stijn Viaene²,
Jan Vanthienen²,
Guido Dedene²,
Bart de Moor³ &
…
Joos Vandewalle³

Abstract

In Support Vector Machines (SVMs), the solution of the classification problem is characterized by a (convex) quadratic programming (QP) problem. In a modified version of SVMs, called Least Squares SVM classifiers (LS-SVMs), a least squares cost function is proposed so as to obtain a linear set of equations in the dual space. While the SVM classifier has a large margin interpretation, the LS-SVM formulation is related in this paper to a ridge regression approach for classification with binary targets and to Fisher's linear discriminant analysis in the feature space. Multiclass categorization problems are represented by a set of binary classifiers using different output coding schemes. While regularization is used to control the effective number of parameters of the LS-SVM classifier, the sparseness property of SVMs is lost due to the choice of the 2-norm. Sparseness can be imposed in a second stage by gradually pruning the support value spectrum and optimizing the hyperparameters during the sparse approximation procedure. In this paper, twenty public domain benchmark datasets are used to evaluate the test set performance of LS-SVM classifiers with linear, polynomial and radial basis function (RBF) kernels. Both the SVM and LS-SVM classifier with RBF kernel in combination with standard cross-validation procedures for hyperparameter selection achieve comparable test set performances. These SVM and LS-SVM performances are consistently very good when compared to a variety of methods described in the literature including decision tree based algorithms, statistical algorithms and instance based learning methods. We show on ten UCI datasets that the LS-SVM sparse approximation procedure can be successfully applied.

Article PDF

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Survey on SVM and their application in image classification

Article 11 January 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aha, D., & Kibler, D. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Google Scholar
Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141.
Google Scholar
Baudat, G., & Anouar, F. (2000). Generalized discriminant analysis using a Kernel approach. Neural Computation, 12, 2385–2404.
Google Scholar
Bay, S. D. (1999). Nearest neighbor classification from multiple feature subsets. Intelligent Data Analysis, 3, 191–209.
Google Scholar
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press.
Blake, C. L., & Merz, C. J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Dept. of Information and Computer Science.
Google Scholar
Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proc. of the Fifth Annual Workshop on Computational Learning Theory (pp. 144–152). Pittsburgh: ACM.
Google Scholar
Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In J. Shavlik (ed.), Machine Learning Proc. of the Fifteenth Int. Conf. (ICML'98) (pp. 82–90). Morgan Kaufmann, San Francisco, California.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees. New York: Chapman and Hall.
Google Scholar
Cawley, G. C. (2000). MATLAB Support Vector Machine Toolbox (v0.54β). [http://theoval.sys.uea.ac.uk/~gcc/svm/toolbox]. University of East Anglia, School of Information Systems, Norwich, Norfolk, U.K.
Google Scholar
Cristianini, N., & Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines. Cambridge University Press.
De Groot, M. H. (1986). Probability and Statistics, 2nd ed. Reading, MA: Addison-Wesley.
Google Scholar
Domingos, P. (1996). Unifying instance-based and rule-based induction. Machine Learning, 24, 141–168.
Google Scholar
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10, 1895–1924.
Google Scholar
Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.
Google Scholar
Duda, R. O., & Hart, P. E. (1973), Pattern Classification and Scene Analysis. New York: John Wiley.
Google Scholar
Evgeniou, T., Pontil, M., & Poggio, T. (2001). Regularization networks and support vector machines. Advances in Computational Mathematics, 13, 1–50.
Google Scholar
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:2, 179–188.
Google Scholar
Friedman, J. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165–175.
Google Scholar
Girosi, F. (1998). An equivalence between sparse approximation and support vector machines. Neural Computation, 10, 1455–1480.
Google Scholar
Golub, G. H., & Van Loan, C. F. (1989). Matrix Computations. Baltimore MD: Johns Hopkins University Press.
Google Scholar
Hassibi, B., & Stork, D. G. (1993). Second order derivatives for network pruning: Optimal brain surgeon. In Hanson, Cowan, & Giles (Eds.), Advances in Neural Information Processing Systems (Vol. 5). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. The Annals of Statistics, 26, 451–471.
Google Scholar
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.
Google Scholar
John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345). Montreal, Quebec, Morgan Kaufmann.
Google Scholar
Kwok, J. T. (2000). The evidence framework applied to support vector machines. IEEE Trans. on Neural Networks, 10:5, 1018–1031.
Google Scholar
Le Cun, Y., Denker, J. S., & Solla, S. A. (1990). Optimal brain damage. In Touretzky (Ed.), Advances in Neural Information Processing Systems (Vol. 2). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40:3, 203–228.
Google Scholar
MacKay, D. J. C. (1995). Probable networks and plausible predictions—A review of practical Bayesian methods for supervised neural networks. Network: Computation in Neural Systems, 6, 469–505.
Google Scholar
Mika, S., Rätsch, G., Weston, J., Schölkopf, B., & Müller, K.-R. (1999). Fisher discriminant analysis with Kernels. In Proc. IEEE Neural Networks for Signal Processing Workshop 1999, NNSP 99.
Navia-Vázquez, A., Pérez-Cruz, F., Artés-Rodríguez, A., Figueiras-Vidál, A. R. (2001). Weighted least squares training of support vector classifiers leading to compact and adaptive schemes. IEEE Transactions on Neural Networks, 12, 1047–1059.
Google Scholar
Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. J. C. Burges, & A. J. Smola (Eds.), Advances in Kernel Methods—Support Vector Learning. Cambridge, MA.
Quinlan, J. (1993). C4.5 Programs for Machine Learning. Morgan Kaufmann.
Rao, P. (1983) Nonparametric Functional Estimation. Orlando: Academic Press.
Google Scholar
Ripley, B. D. (1996). Pattern Classification and Neural Networks. Cambridge.
Saunders, C., Gammerman, A., & Vovk, V. (1998). Ridge regression learning algorithm in dual variables. In Proc. of the 15th Int. Conf. on Machine Learning ICML-98 (pp. 515–521). Madison-Wisconsin: Morgan Kaufmann.
Google Scholar
Schölkopf, B., Sung, K.-K., Burges, C., Girosi, F., Niyogi, P., Poggio, T., & Vapnik, V. (1997). Comparing support vector machines with Gaussian Kernels to radial basis function classifiers. IEEE Transactions on Signal Processing, 45, 2758–2765.
Google Scholar
Schölkopf, B., Burges, C., & Smola, A. (Eds.), (1998). Advances in Kernel Methods—Support Vector Learning. MIT Press.
Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Journal of Complex Systems, 1:1, 145–168.
Google Scholar
Smola, A., Schölkopf, B., & Müller, K.-R. (1998). The connection between regularization operators and support vector kernels. Neural Networks, 11, 637–649.
Google Scholar
Smola, A. (1999). Learning with Kernels. PhD Thesis, published by: GMD, Birlinghoven.
Google Scholar
Suykens, J. A. K., & Vandewalle, J. (Eds.) (1998). Nonlinear Modeling: Advanced Black-Box Techniques. Boston: Kluwer Academic Publishers.
Google Scholar
Suykens, J. A. K., & Vandewalle, J. (1999a). Training multilayer perceptron classifiers based on a modified support vector method. IEEE Transactions on Neural Networks, 10, 907–912.
Google Scholar
Suykens, J. A. K., & Vandewalle, J. (1999b). Least squares support vector machine classifiers. Neural Processing Letters, 9, 293–300.
Google Scholar
Suykens, J. A. K., Lukas, L., Van Dooren, P., De Moor, B., & Vandewalle, J. (1999). Least squares support vector machine classifiers: A large scale algorithm. In Proc. of the European Conf. on Circuit Theory and Design (ECCTD'99) (pp. 839–842).
Suykens, J. A. K., & Vandewalle, J. (1999c). Multiclass least squares support vector machines. In Proc. of the Int. Joint Conf. on Neural Networks (IJCNN'99), Washington, DC.
Suykens, J. A. K., De Brabanter, J., Lukas, L., & Vandewalle, J. (2002). Weighted least squares support vector machines: Robustness and sparse approximation. Neurocomputing, 48:1–4, 85–105.
Google Scholar
Suykens, J. A. K., & Vandewalle, J. (2000). Recurrent least squares support vector machines. IEEE Transactions on Circuits and Systems-I, 47, 1109–1114.
Google Scholar
Suykens, J. A. K., Vandewalle, J., & De Moor, B. (2001). Optimal control by least squares support vector machines. Neural Networks, 14, 23–35.
Google Scholar
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Google Scholar
Utschick, W. (1998). A regularization method for non-trivial codes in polychotomous classification. International Journal of Pattern Recognition and Artificial Intelligence, 12, 453–474.
Google Scholar
Van Gestel, T., Suykens, J. A. K., Baestaens, D.-E., Lambrechts, A., Lanckriet, G., Vandaele, B., De Moor, B., & Vandewalle, J. (2001). Predicting financial time series using least squares support vector machines within the evidence framework. IEEE Transactions on Neural Networks, (Special Issue on Financial Engineering), 12, 809–821.
Google Scholar
Van Gestel, T., Suykens, J. A. K., Lanckriet, G., Lambrechts, A., De Moor, B., & Vandewalle, J. (2002). A Bayesian framework for least squares support vector machine classifiers. Neural Computation, 14, 1115–1148.
Google Scholar
Vapnik, V. (1995). The Nature of Statistical Learning Theory. New-York: Springer-Verlag.
Google Scholar
Vapnik, V. (1998a). Statistical Learning Theory. New-York: John Wiley.
Google Scholar
Vapnik, V. (1998b). The support vector method of function estimation. In J. A. K. Suykens, & J. Vandewalle, (Eds.), Nonlinear Modeling: Advanced Black-box Techniques. Boston: Kluwer Academic Publishers.
Google Scholar
Viaene, S., Baesens, B., Van Gestel, T., Suykens, J. A. K., Van den Poel, D., Vanthienen, J., De Moor, B., & Dedene, G. (2001). Knowledge discovery in a direct marketing case using least squares support vector machine classifiers. International Journal of Intelligent Systems, 9, 1023–1036.
Google Scholar
Williams, C. K. I. (1998). Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M. I. Jordan (Ed.), Learning and Inference in Graphical Models. Kluwer Academic Press.
Witten, I. H., & Frank, E. (2000). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, ESAT/SISTA, Katholieke Universiteit Leuven, Belgium
Tony van Gestel & Johan A.K. Suykens
Leuven Institute for Research on Information Systems, Katholieke Universiteit Leuven, Belgium
Bart Baesens, Stijn Viaene, Jan Vanthienen & Guido Dedene
Department of Electrical Engineering, ESAT/SISTA, Katholieke Universiteit Leuven, Belgium
Bart de Moor & Joos Vandewalle

Authors

Tony van Gestel
View author publications
You can also search for this author in PubMed Google Scholar
Johan A.K. Suykens
View author publications
You can also search for this author in PubMed Google Scholar
Bart Baesens
View author publications
You can also search for this author in PubMed Google Scholar
Stijn Viaene
View author publications
You can also search for this author in PubMed Google Scholar
Jan Vanthienen
View author publications
You can also search for this author in PubMed Google Scholar
Guido Dedene
View author publications
You can also search for this author in PubMed Google Scholar
Bart de Moor
View author publications
You can also search for this author in PubMed Google Scholar
Joos Vandewalle
View author publications
You can also search for this author in PubMed Google Scholar

About this article

Cite this article

van Gestel, T., Suykens, J.A., Baesens, B. et al. Benchmarking Least Squares Support Vector Machine Classifiers. Machine Learning 54, 5–32 (2004). https://doi.org/10.1023/B:MACH.0000008082.80494.e0

Download citation

Issue Date: January 2004
DOI: https://doi.org/10.1023/B:MACH.0000008082.80494.e0

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Benchmarking Least Squares Support Vector Machine Classifiers

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Survey on SVM and their application in image classification

References

Author information

Authors and Affiliations

About this article

Cite this article

Navigation

Benchmarking Least Squares Support Vector Machine Classifiers

Abstract

Article PDF

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Survey on SVM and their application in image classification

References

Author information

Authors and Affiliations

About this article

Cite this article

Share this article

Search

Navigation