ABSTRACT
Discriminative training for structured outputs has found increasing applications in areas such as natural language processing, bioinformatics, information retrieval, and computer vision. Focusing on large-margin methods, the most general (in terms of loss function and model structure) training algorithms known to date are based on cutting-plane approaches. While these algorithms are very efficient for linear models, their training complexity becomes quadratic in the number of examples when kernels are used. To overcome this bottleneck, we propose new training algorithms that use approximate cutting planes and random sampling to enable efficient training with kernels. We prove that these algorithms have improved time complexity while providing approximation guarantees. In empirical evaluations, our algorithms produced solutions with training and test error rates close to those of exact solvers. Even on binary classification problems where highly optimized conventional training methods exist (e.g. SVM-light), our methods are about an order of magnitude faster than conventional training methods on large datasets, while remaining competitive in speed on datasets of medium size.
- R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. In NIPS, 2002.Google ScholarDigital Library
- S. Fine and K. Scheinberg. Efficient SVM training using low-rank kernel representations. JMLR, 2:243--264, 2001. Google ScholarDigital Library
- V. Franc and S. Sonnenburg. Optimized cutting plane algorithm for support vector machines. In ICML, 2008. Google ScholarDigital Library
- T. Joachims. A support vector method for multivariate performance measures. In ICML, 2005. Google ScholarDigital Library
- T. Joachims. Training linear SVMs in linear time. In KDD, 2006. Google ScholarDigital Library
- T. Joachims, T. Finley, and C.-N. Yu. Cutting plane training of structural SVMs. MLJ, To Appear. Google ScholarDigital Library
- S. S. Keerthi, O. Chapelle, and D. DeCoste. Building support vector machines with reduced classifier complexity. JMLR, 7:1493--1515, 2006. Google ScholarDigital Library
- J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning, chapter 12. MIT-Press, 1999. Google ScholarDigital Library
- N. D. Ratliff, J. A. Bagnell, and M. A. Zinkevich. (Online) subgradient methods for structured prediction. In AISTATS, 2007.Google Scholar
- M. Saito and M. Matsumoto. SIMD-oriented fast mersenne twister: a 128-bit pseudorandom number generator. In Monte Carlo and Quasi-Monte Carlo Methods 2006, 2006.Google Scholar
- S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In ICML, 2007. Google ScholarDigital Library
- A. J. Smola, S. V. N. Vishwanathan, and Q. Le. Bundle methods for machine learning. In NIPS, 2007.Google Scholar
- B. Taskar, C. Guestrin, and D. Koller. Maximum-Margin Markov networks. In NIPS, 2003.Google Scholar
- B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning. Max-margin Parsing. In EMNLP, 2004.Google Scholar
- I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6:1453--1484, September 2005. Google ScholarDigital Library
- S. V. N. Vishwanathan, N. N. Schraudolph, M. W. Schmidt, and K. P. Murphy. Accelerated training of conditional random fields with stochastic gradient methods. In ICML, 2006. Google ScholarDigital Library
- S. V. N. Vishwanathan, A. J. Smola, and M. N. Murty. SimpleSVM. In ICML, 2003.Google Scholar
- C. K. I. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. In NIPS, 2001.Google ScholarDigital Library
- C.-N. Yu, T. Joachims, R. Elber, and J. Pillardy. Support vector training of protein alignment models. In RECOMB, 2007. Google ScholarDigital Library
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In SIGIR, 2007. Google ScholarDigital Library
Index Terms
- Training structural svms with kernels using sampled cuts
Recommendations
Cutting-plane training of structural SVMs
Discriminative training approaches like structural SVMs have shown much promise for building highly complex and accurate models in areas like natural language processing, protein structure prediction, and information retrieval. However, current training ...
Sparse kernel SVMs via cutting-plane training
We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it ...
Kernels for Generalized Multiple-Instance Learning
The multiple-instance learning (MIL) model has been successful in numerous application areas. Recently, a generalization of this model and an algorithm for it were introduced, showing significant advantages over the conventional MIL model on certain ...
Comments