skip to main content
10.1145/1273496.1273508acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Solving multiclass support vector machines with LaRank

Authors Info & Claims
Published:20 June 2007Publication History

ABSTRACT

Optimization algorithms for large margin multiclass recognizers are often too costly to handle ambitious problems with structured outputs and exponential numbers of classes. Optimization algorithms that rely on the full gradient are not effective because, unlike the solution, the gradient is not sparse and is very large. The LaRank algorithm sidesteps this difficulty by relying on a randomized exploration inspired by the perceptron algorithm. We show that this approach is competitive with gradient based optimizers on simple multiclass problems. Furthermore, a single LaRank pass over the training examples delivers test error rates that are nearly as good as those of the final solution.

References

  1. Bakir, G., Hofmann, T., Schölkopf, B., Smola, A. J., Taskar, B., & Vishwanathan, S. V. N. (Eds.). (2007). Predicting structured outputs. MIT Press. in press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bordes, A., & Bottou, L. (2005). The Huller: a simple and efficient online SVM. Machine Learning: ECML 2005 (pp. 505--512). Springer Verlag. LNAI 3720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6, 1579--1619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Collins, M. (2002). Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing (pp. 1--8). Morristown, NJ: Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3, 951--991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Denoyer, L., & Gallinari, P. (2006). The XML document mining challenge. Advances in XML Information Retrieval and Evaluation, 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006. Schloß Dagsthul, Germany.Google ScholarGoogle Scholar
  8. Freund, Y., & Schapire, R. E. (1998). Large margin classification using the perceptron algorithm. Machine Learning: Proceedings of the Fifteenth International Conference. San Francisco, CA: Morgan Kaufmann.Google ScholarGoogle Scholar
  9. Graepel, T., Herbrich, R., & Williamson, R. C. (2000). From margin to sparsity. In Advances in neural information processing systems, vol. 13, 210--216. MIT Press.Google ScholarGoogle Scholar
  10. Hildreth, C. (1957). A quadratic programming procedure. Naval Research Logistics Quarterly, 4, 79--85. Erratum, ibid. p361.Google ScholarGoogle ScholarCross RefCross Ref
  11. Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13, 415--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. LeCun, Y., Chopra, S., Hadsell, R., HuangFu, J., & Ranzato, M. (2007). A tutorial on energy-based learning. In (Bakir et al., 2007), 192--241. in press.Google ScholarGoogle Scholar
  13. Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods - Support Vector Learning (pp. 185--208). MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rifkin, R. M., & Klautau, A. (2004). In defense of one-vs-all classification. Journal of Machine Learning Research, 5, 101--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. MIT Press.Google ScholarGoogle Scholar
  16. Taskar, B. (2004). Learning structured prediction models: A large margin approach. Doctoral dissertation, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Taskar, B., Chatalbashev, V., Koller, D., & Guestrin, C. (2005). Learning structured prediction models: a large margin approach. International Conference on Machine Learning (ICML) (pp. 896--903). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453--1484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Weston, J., & Watkins, C. (1998). Multi-class support vector machines (Technical Report CSD-TR-98-04). Department of Computer Science, Royal Holloway, University of London, Egham, UK.Google ScholarGoogle Scholar
  1. Solving multiclass support vector machines with LaRank

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICML '07: Proceedings of the 24th international conference on Machine learning
      June 2007
      1233 pages
      ISBN:9781595937933
      DOI:10.1145/1273496

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate140of548submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader