skip to main content
10.1145/1150402.1150429acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Training linear SVMs in linear time

Published:20 August 2006Publication History

ABSTRACT

Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples n as well as a large number of features N, while each example has only s << N non-zero features. This paper presents a Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n))for ordinal regression problems. The algorithm is based on an alternative, but equivalent formulation of the SVM optimization problem. Empirically, the Cutting-Plane Algorithm is several orders of magnitude faster than decomposition methods like svm light for large datasets.

References

  1. R. Caruana, T. Joachims, and L. Backstrom. Kddcup 2004: Results and analysis. ACM SIGKDD Newsletter, 6(2):95--108, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.]]Google ScholarGoogle Scholar
  3. R. Collobert and S. Bengio. Svmtorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research (JMLR), 1:143--160, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Díez, J. del Coz, and A. Bahamonde. A support vector method for ranking minimizing the number of swapped pairs. Technical report, Artificial Intelligence Centre, Universidad de Oviedo at Gijón, 2006.]]Google ScholarGoogle Scholar
  5. S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of ACM-CIKM98, November 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Ferris and T. Munson. Interior-point methods for massive support vector machines. SIAM Journal of Optimization, 13(3):783--804, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Fung and O. Mangasarian. Proximal support vector classifiers. In ACM SIGKDD International Conference On Knowledge Discovery and Data Mining (KDD), 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115--132. MIT Press, Cambridge, MA, 2000.]]Google ScholarGoogle Scholar
  9. D. Hush and C. Scovel. Polynomial-time decomposition algorithms for support vector machines. Machine Learning, 51:51--71, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of the European Conference on Machine Learning, pages 137--142, Berlin, 1998. Springer.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Joachims. Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, chapter 11, pages 169--184. MIT Press, Cambridge, MA, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Joachims. Learning to align sequences: A maximum-margin approach. online manuscript, August 2003.]]Google ScholarGoogle Scholar
  14. T. Joachims. A support vector method for multivariate performance measures. In International Conference on Machine Learning (ICML), 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Keerthi and D. DeCoste. A modified finite newton method for fast solution of large scale linear svms. Journal of Machine Learning Research (JMLR), 6:341--361, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Kelley. The cutting-plane method for solving convex programs. Journal of the Society for Industrial Applied Mathematics, 8:703--712, 1960.]]Google ScholarGoogle ScholarCross RefCross Ref
  17. D. Lewis, Y. Yang, T. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research (JMLR), 5:361--397, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Mangasarian and D. Musicant. Lagrangian support vector machines. Journal of Machine Learning Research (JMLR), 1:161--177, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, chapter 12. MIT-Press, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Rakotomamonjy. Svms and area under roc curve. Technical report, PSI-INSA de Rouen, 2004.]]Google ScholarGoogle Scholar
  21. B. Schölkopf and A. J. Smola. Learning with Kernels. The MIT Press, Cambridge, MA, 2002.]]Google ScholarGoogle Scholar
  22. B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12:1207--1245, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. I. Tsang, J. Kwok, and P.-M. Cheung. Core vector machines: Fast svm training on very large data sets. Journal of Machine Learning Research (JMLR), 6:363--392, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6:1453--1484, September 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Training linear SVMs in linear time

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2006
      986 pages
      ISBN:1595933395
      DOI:10.1145/1150402

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader