Cutting-plane training of structural SVMs

Joachims, Thorsten; Finley, Thomas; Yu, Chun-Nam John

doi:10.1007/s10994-009-5108-8

Cutting-plane training of structural SVMs

Published: 09 May 2009

Volume 77, pages 27–59, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Cutting-plane training of structural SVMs

Download PDF

Thorsten Joachims¹,
Thomas Finley¹ &
Chun-Nam John Yu¹

5123 Accesses
598 Citations
3 Altmetric
Explore all metrics

Abstract

Discriminative training approaches like structural SVMs have shown much promise for building highly complex and accurate models in areas like natural language processing, protein structure prediction, and information retrieval. However, current training algorithms are computationally expensive or intractable on large datasets. To overcome this bottleneck, this paper explores how cutting-plane methods can provide fast training not only for classification SVMs, but also for structural SVMs. We show that for an equivalent “1-slack” reformulation of the linear SVM training problem, our cutting-plane method has time complexity linear in the number of training examples. In particular, the number of iterations does not depend on the number of training examples, and it is linear in the desired precision and the regularization parameter. Furthermore, we present an extensive empirical evaluation of the method applied to binary classification, multi-class classification, HMM sequence tagging, and CFG parsing. The experiments show that the cutting-plane algorithm is broadly applicable and fast in practice. On large datasets, it is typically several orders of magnitude faster than conventional training methods derived from decomposition methods like SVM-light, or conventional cutting-plane methods. Implementations of our methods are available at www.joachims.org.

Avoid common mistakes on your manuscript.

References

Altun, Y., Tsochantaridis, I., & Hofmann, T. (2003). Hidden Markov support vector machines. In International conference on machine learning (ICML) (pp. 3–10).
Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., & Ng, A. Y. (2005). Discriminative learning of Markov random fields for segmentation of 3D scan data. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 169–176). Los Alamitos: IEEE Computer Society.
Google Scholar
Bartlett, P., Collins, M., Taskar, B., & McAllester, D. (2004). Exponentiated algorithms for large-margin structured classification. In Advances in neural information processing systems (NIPS) (pp. 305–312).
Caruana, R., Joachims, T., & Backstrom, L. (2004). KDDCup 2004: Results and analysis. ACM SIGKDD Newsletter, 6(2), 95–108.
Article Google Scholar
Chang, C. C., & Lin, C. J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Empirical methods in natural language processing (EMNLP) (pp. 1–8).
Collins, M. (2004). Parameter estimation for statistical parsing models: Theory and practice of distribution-free methods. In New developments in parsing technology. Dordrecht: Kluwer Academic (paper accompanied invited talk at IWPT 2001).
Google Scholar
Collins, M., & Duffy, N. (2002). New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In Annual meeting of the association for computational linguistics (ACL) (pp. 263–270).
Collobert, R., & Bengio, S. (2001). SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research (JMLR), 1, 143–160.
Article MathSciNet Google Scholar
Cortes, C., & Vapnik, V. N. (1995). Support-vector networks. Machine Learning, 20, 273–297.
MATH Google Scholar
Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research (JMLR), 2, 265–292.
Article Google Scholar
Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research (JMLR), 3, 951–991.
Article MATH MathSciNet Google Scholar
Ferris, M., & Munson, T. (2003). Interior-point methods for massive support vector machines. SIAM Journal of Optimization, 13(3), 783–804.
Article MATH MathSciNet Google Scholar
Fukumizu, K., Bach, F., & Jordan, M. (2004). Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research (JMLR), 5, 73–99.
MathSciNet Google Scholar
Fung, G., & Mangasarian, O. (2001). Proximal support vector classifiers. In ACM conference on knowledge discovery and data mining (KDD) (pp. 77–86).
Globerson, A., Koo, T. Y., Carreras, X., & Collins, M. (2007). Exponentiated gradient algorithm for log-linear structured prediction. In International conference on machine learning (ICML) (pp. 305–312).
Joachims, T. (1999). Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning (pp. 169–184). Cambridge: MIT Press. Chap. 11.
Google Scholar
Joachims, T. (2003). Learning to align sequences: A maximum-margin approach. Online manuscript.
Joachims, T. (2005). A support vector method for multivariate performance measures. In International conference on machine learning (ICML) (pp. 377–384).
Joachims, T. (2006). Training linear SVMs in linear time. In ACM SIGKDD international conference on knowledge discovery and data mining (KDD) (pp. 217–226).
Johnson, M. (1998). PCFG models of linguistic tree representations. Computational Linguistics, 24(4), 613–632.
Google Scholar
Keerthi, S., & DeCoste, D. (2005). A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research (JMLR), 6, 341–361.
MathSciNet Google Scholar
Keerthi, S., Chapelle, O., & DeCoste, D. (2006). Building support vector machines with reduced classifier complexity. Journal of Machine Learning Research (JMLR), 7, 1493–1515.
MathSciNet Google Scholar
Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1), 1–63.
Article MATH MathSciNet Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International conference on machine learning (ICML).
Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research (JMLR), 5, 361–397.
Google Scholar
Mangasarian, O., & Musicant, D. (2001). Lagrangian support vector machines. Journal of Machine Learning Research (JMLR), 1, 161–177.
Article MATH MathSciNet Google Scholar
Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
McDonald, R., Crammer, K., & Pereira, F. (2005). Online large-margin training of dependency parsers. In Annual meeting of the association for computational linguistics (ACL) (pp. 91–98).
Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in kernel methods—support vector learning. Cambridge: MIT Press. Chap 12.
Google Scholar
Ratliff, N. D., Bagnell, J. A., & Zinkevich, M. A. (2007). (Online) subgradient methods for structured prediction. In Conference on artificial intelligence and statistics (AISTATS).
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). PEGASOS: Primal Estimated sub-GrAdient SOlver for SVM. In International conference on machine learning (ICML) (pp. 807–814). New York: ACM.
Google Scholar
Smola, A., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In International conference on machine learning (pp. 911–918).
Taskar, B., Guestrin, C., & Koller, D. (2003). Maximum-margin Markov networks. In Advances in neural information processing systems (NIPS).
Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. In Empirical methods in natural language processing (EMNLP).
Taskar, B., Lacoste-Julien, S., & Jordan, M. I. (2005). Structured prediction via the extragradient method. In Advances in neural information processing systems (NIPS).
Teo, C. H., Smola, A., Vishwanathan, S. V., & Le, Q. V. (2007). A scalable modular convex solver for regularized risk minimization. In ACM conference on knowledge discovery and data mining (KDD) (pp. 727–736).
Tsochantaridis, I., Hofmann, T., Joachims, T., & Altun, Y. (2004). Support vector machine learning for interdependent and structured output spaces. In International conference on machine learning (ICML) (pp. 104–112).
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6, 1453–1484.
MathSciNet Google Scholar
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Vishwanathan, S. V. N., Schraudolph, N. N., Schmidt, M. W., & Murphy, K. P. (2006). Accelerated training of conditional random fields with stochastic gradient methods. In International conference on machine learning (ICML) (pp. 969–976).
Yu, C. N., Joachims, T., Elber, R., & Pillardy, J. (2007). Support vector training of protein alignment models. In Proceeding of the international conference on research in computational molecular biology (RECOMB) (pp. 253–267).
Yue, Y., Finley, T., Radlinski, F., & Joachims, T. (2007). A support vector method for optimizing average precision. In ACM SIGIR conference on research and development in information retrieval (SIGIR) (pp. 271–278).

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, USA
Thorsten Joachims, Thomas Finley & Chun-Nam John Yu

Authors

Thorsten Joachims
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Finley
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Nam John Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun-Nam John Yu.

Additional information

Editor: Tony Jebara.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Joachims, T., Finley, T. & Yu, CN.J. Cutting-plane training of structural SVMs. Mach Learn 77, 27–59 (2009). https://doi.org/10.1007/s10994-009-5108-8

Download citation

Received: 15 October 2007
Revised: 02 October 2008
Accepted: 30 January 2009
Published: 09 May 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10994-009-5108-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Cutting-plane training of structural SVMs

Abstract

Article PDF

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cutting-plane training of structural SVMs

Abstract

Article PDF

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on semi-supervised learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation