ABSTRACT
Optimization algorithms for large margin multiclass recognizers are often too costly to handle ambitious problems with structured outputs and exponential numbers of classes. Optimization algorithms that rely on the full gradient are not effective because, unlike the solution, the gradient is not sparse and is very large. The LaRank algorithm sidesteps this difficulty by relying on a randomized exploration inspired by the perceptron algorithm. We show that this approach is competitive with gradient based optimizers on simple multiclass problems. Furthermore, a single LaRank pass over the training examples delivers test error rates that are nearly as good as those of the final solution.
- Bakir, G., Hofmann, T., Schölkopf, B., Smola, A. J., Taskar, B., & Vishwanathan, S. V. N. (Eds.). (2007). Predicting structured outputs. MIT Press. in press. Google ScholarDigital Library
- Bordes, A., & Bottou, L. (2005). The Huller: a simple and efficient online SVM. Machine Learning: ECML 2005 (pp. 505--512). Springer Verlag. LNAI 3720. Google ScholarDigital Library
- Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6, 1579--1619. Google ScholarDigital Library
- Collins, M. (2002). Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. EMNLP '02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing (pp. 1--8). Morristown, NJ: Association for Computational Linguistics. Google ScholarDigital Library
- Crammer, K., & Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2, 265--292. Google ScholarDigital Library
- Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3, 951--991. Google ScholarDigital Library
- Denoyer, L., & Gallinari, P. (2006). The XML document mining challenge. Advances in XML Information Retrieval and Evaluation, 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006. Schloß Dagsthul, Germany.Google Scholar
- Freund, Y., & Schapire, R. E. (1998). Large margin classification using the perceptron algorithm. Machine Learning: Proceedings of the Fifteenth International Conference. San Francisco, CA: Morgan Kaufmann.Google Scholar
- Graepel, T., Herbrich, R., & Williamson, R. C. (2000). From margin to sparsity. In Advances in neural information processing systems, vol. 13, 210--216. MIT Press.Google Scholar
- Hildreth, C. (1957). A quadratic programming procedure. Naval Research Logistics Quarterly, 4, 79--85. Erratum, ibid. p361.Google ScholarCross Ref
- Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13, 415--425. Google ScholarDigital Library
- LeCun, Y., Chopra, S., Hadsell, R., HuangFu, J., & Ranzato, M. (2007). A tutorial on energy-based learning. In (Bakir et al., 2007), 192--241. in press.Google Scholar
- Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods - Support Vector Learning (pp. 185--208). MIT Press. Google ScholarDigital Library
- Rifkin, R. M., & Klautau, A. (2004). In defense of one-vs-all classification. Journal of Machine Learning Research, 5, 101--141. Google ScholarDigital Library
- Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. MIT Press.Google Scholar
- Taskar, B. (2004). Learning structured prediction models: A large margin approach. Doctoral dissertation, Stanford University. Google ScholarDigital Library
- Taskar, B., Chatalbashev, V., Koller, D., & Guestrin, C. (2005). Learning structured prediction models: a large margin approach. International Conference on Machine Learning (ICML) (pp. 896--903). Google ScholarDigital Library
- Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453--1484. Google ScholarDigital Library
- Weston, J., & Watkins, C. (1998). Multi-class support vector machines (Technical Report CSD-TR-98-04). Department of Computer Science, Royal Holloway, University of London, Egham, UK.Google Scholar
- Solving multiclass support vector machines with LaRank
Recommendations
Wavelet twin support vector machines based on glowworm swarm optimization
Twin support vector machine is a machine learning algorithm developing from standard support vector machine. The performance of twin support vector machine is always better than support vector machine on datasets that have cross regions. Recently ...
Multiclass reduced-set support vector machines
ICML '06: Proceedings of the 23rd international conference on Machine learningThere are well-established methods for reducing the number of support vectors in a trained binary support vector machine, often with minimal impact on accuracy. We show how reduced-set methods can be applied to multiclass SVMs made up of several binary ...
Inequality distance hyperplane multiclass support vector machines
AbstractIn this study, inequality distance hyperplane multiclass support vector machines (IDH‐MSVM) algorithm is proposed on the basis of multiclassification support vector machine (MSVM) which was proposed by J. Weston and C. Watkins in 1999. It only ...
Comments