ABSTRACT
Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a non-convex sparse SVM formulation that explicitly constrains the cardinality of the vector of feature weights. One relaxation results in a quadratically-constrained quadratic program (QCQP), while the second is based on a semidefinite programming (SDP) relaxation. The QCQP formulation can be interpreted as applying an adaptive soft-threshold on the SVM hyperplane, while the SDP formulation learns a weighted inner-product (i.e. a kernel) that results in a sparse hyperplane. Experimental results show an increase in sparsity while conserving the generalization performance compared to a standard as well as a linear programming SVM.
- Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw., 1, 23--34.Google ScholarCross Ref
- Bi, J., Bennett, K. P., Embrechts, M., Breneman, C. M., & Song, M. (2003). Dimensionality reduction via sparse support vector machines. J. Mach. Learn. Res., 3, 1229--1243. Google ScholarDigital Library
- Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245--271. Google ScholarDigital Library
- Borchers, B. (1999). CSDP, a C library for semidefinite programming. Optim. Methods Softw., 11, 613--623.Google ScholarCross Ref
- Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press. Google ScholarDigital Library
- Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. Intl. Conf. on Machine Learning. Google ScholarDigital Library
- Bradley, P. S., & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optim. Methods Softw., 13(1), 1--10.Google ScholarCross Ref
- Chan, A. B., Vasconcelos, N., & Lanckriet, G. R. G. (2007). Duals of the QCQP and SDP sparse SVM (Technical Report SVCL-TR-2007-02). University of California, San Diego. http://www.svcl.ucsd.edu.Google Scholar
- Fung, G. M., & Mangasarian, O. L. (2004). A feature selection Newton method for support vector machine classification. Computational Optimization and Applications, 28, 185--202. Google ScholarDigital Library
- Grandvalet, Y., & Canu, S. (2003). Adaptive scaling for feature selection in SVMs. Neural Information Processing Systems.Google Scholar
- Grate, L. R., Bhattacharyya, C., Jordan, M. I., & Mian, I. S. (2002). Simultaneous relevant feature identification and classification in high-dimensional spaces. Lecture Notes in Computer Science. Google ScholarDigital Library
- Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. J. Mach. Learn. Res., 3, 1157--1182. Google ScholarDigital Library
- Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Mach. Learn., 46, 389--422. Google ScholarDigital Library
- Lemaréchal, C., & Oustry, F. (1999). Semidefinite relaxations and Lagrangian duality with application to combinatorial optimization (Technical Report 3710). INRIA.Google Scholar
- MOSEK (2006). MOSEK optimization software (Technical Report). http://www.mosek.com/.Google Scholar
- Neumann, J., Schnörr, C., & Steidl, G. (2005). Combined SVM-based feature selection and classification. Mach. Learn., 61, 129--150. Google ScholarDigital Library
- Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases (Technical Report). http://www.ics.uci.edu/~mlearn.Google Scholar
- Peleg, D., & Meir, R. (2004). A feature selection algorithm based on the global minimization of a generalization error bound. Neural Information Processing Systems.Google Scholar
- Rakotomamonjy, A. (2003). Variable selection using SVM-based criteria. J. Mach. Learn. Res., 3, 1357--1370. Google ScholarDigital Library
- Sturm, J. F. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw., 11, 625--653.Google ScholarCross Ref
- Vapnik, V. N. (1995). The nature of statistical learning theory. Springer-Verlag. Google ScholarDigital Library
- Weston, J., Elisseeff, A., Scholkopf, B., & Tipping, M. (2003). Use of zero-norm with linear models and kernel methods. J. Mach. Learn. Res., 3, 1439--1461. Google ScholarDigital Library
- Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. Neural Information Processing Systems.Google Scholar
- Zhu, J., Rossett, S., Hastie, T., & Tibshirani, R. (2003). 1-norm support vector machines. Neural Information Processing Systems.Google Scholar
- Direct convex relaxations of sparse SVM
Recommendations
Convex relaxations for nonconvex quadratically constrained quadratic programming: matrix cone decomposition and polyhedral approximation
We present a decomposition-approximation method for generating convex relaxations for nonconvex quadratically constrained quadratic programming (QCQP). We first develop a general conic program relaxation for QCQP based on a matrix decomposition scheme ...
Convex relaxations of non-convex mixed integer quadratically constrained programs: projected formulations
A common way to produce a convex relaxation of a Mixed Integer Quadratically Constrained Program (MIQCP) is to lift the problem into a higher-dimensional space by introducing variables Y ij to represent each of the products x i x j of variables ...
Relaxations of multilinear convex envelopes: dual is better than primal
SEA'12: Proceedings of the 11th international conference on Experimental AlgorithmsBilinear, trilinear, quadrilinear and general multilinear terms arise naturally in several important applications and yield nonconvex mathematical programs, which are customarily solved using the spatial Branch-and-Bound algorithm. This requires a ...
Comments