skip to main content
10.1145/1273496.1273515acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Direct convex relaxations of sparse SVM

Published:20 June 2007Publication History

ABSTRACT

Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a non-convex sparse SVM formulation that explicitly constrains the cardinality of the vector of feature weights. One relaxation results in a quadratically-constrained quadratic program (QCQP), while the second is based on a semidefinite programming (SDP) relaxation. The QCQP formulation can be interpreted as applying an adaptive soft-threshold on the SVM hyperplane, while the SDP formulation learns a weighted inner-product (i.e. a kernel) that results in a sparse hyperplane. Experimental results show an increase in sparsity while conserving the generalization performance compared to a standard as well as a linear programming SVM.

References

  1. Bennett, K. P., & Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw., 1, 23--34.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bi, J., Bennett, K. P., Embrechts, M., Breneman, C. M., & Song, M. (2003). Dimensionality reduction via sparse support vector machines. J. Mach. Learn. Res., 3, 1229--1243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245--271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Borchers, B. (1999). CSDP, a C library for semidefinite programming. Optim. Methods Softw., 11, 613--623.Google ScholarGoogle ScholarCross RefCross Ref
  5. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. Intl. Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bradley, P. S., & Mangasarian, O. L. (2000). Massive data discrimination via linear support vector machines. Optim. Methods Softw., 13(1), 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  8. Chan, A. B., Vasconcelos, N., & Lanckriet, G. R. G. (2007). Duals of the QCQP and SDP sparse SVM (Technical Report SVCL-TR-2007-02). University of California, San Diego. http://www.svcl.ucsd.edu.Google ScholarGoogle Scholar
  9. Fung, G. M., & Mangasarian, O. L. (2004). A feature selection Newton method for support vector machine classification. Computational Optimization and Applications, 28, 185--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Grandvalet, Y., & Canu, S. (2003). Adaptive scaling for feature selection in SVMs. Neural Information Processing Systems.Google ScholarGoogle Scholar
  11. Grate, L. R., Bhattacharyya, C., Jordan, M. I., & Mian, I. S. (2002). Simultaneous relevant feature identification and classification in high-dimensional spaces. Lecture Notes in Computer Science. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. J. Mach. Learn. Res., 3, 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Mach. Learn., 46, 389--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lemaréchal, C., & Oustry, F. (1999). Semidefinite relaxations and Lagrangian duality with application to combinatorial optimization (Technical Report 3710). INRIA.Google ScholarGoogle Scholar
  15. MOSEK (2006). MOSEK optimization software (Technical Report). http://www.mosek.com/.Google ScholarGoogle Scholar
  16. Neumann, J., Schnörr, C., & Steidl, G. (2005). Combined SVM-based feature selection and classification. Mach. Learn., 61, 129--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases (Technical Report). http://www.ics.uci.edu/~mlearn.Google ScholarGoogle Scholar
  18. Peleg, D., & Meir, R. (2004). A feature selection algorithm based on the global minimization of a generalization error bound. Neural Information Processing Systems.Google ScholarGoogle Scholar
  19. Rakotomamonjy, A. (2003). Variable selection using SVM-based criteria. J. Mach. Learn. Res., 3, 1357--1370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sturm, J. F. (1999). Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw., 11, 625--653.Google ScholarGoogle ScholarCross RefCross Ref
  21. Vapnik, V. N. (1995). The nature of statistical learning theory. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Weston, J., Elisseeff, A., Scholkopf, B., & Tipping, M. (2003). Use of zero-norm with linear models and kernel methods. J. Mach. Learn. Res., 3, 1439--1461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. Neural Information Processing Systems.Google ScholarGoogle Scholar
  24. Zhu, J., Rossett, S., Hastie, T., & Tibshirani, R. (2003). 1-norm support vector machines. Neural Information Processing Systems.Google ScholarGoogle Scholar
  1. Direct convex relaxations of sparse SVM

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICML '07: Proceedings of the 24th international conference on Machine learning
        June 2007
        1233 pages
        ISBN:9781595937933
        DOI:10.1145/1273496

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate140of548submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader