ABSTRACT
We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). Our method alternates between stochastic gradient descent steps and projection steps. We prove that the number of iterations required to obtain a solution of accuracy ε is Õ(1/ε). In contrast, previous analyses of stochastic gradient descent methods require Ω (1/ε2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is Õ (d/(λε)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function. We demonstrate the efficiency and applicability of our approach by conducting experiments on large text classification problems, comparing our solver to existing state-of-the-art SVM solvers. For example, it takes less than 5 seconds for our solver to converge when solving a text classification problem from Reuters Corpus Volume 1 (RCV1) with 800,000 training examples.
- Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press. Google ScholarDigital Library
- Censor, Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms, and applications. Oxford University Press, NY. Google ScholarDigital Library
- Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2004). On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50, 2050--2057. Google ScholarDigital Library
- Cesa-Bianchi, N., & Gentile, C. (2006). Improved risk tail bounds for on-line algorithms. NIPS.Google Scholar
- Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive aggressive algorithms. JMLR, 7. Google ScholarDigital Library
- Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge University Press. Google ScholarDigital Library
- Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. Wiley.Google Scholar
- Fine, S., & Scheinberg, K. (2001). Efficient svm training using low-rank kernel representations. JMLR, 2, 242--264. Google ScholarDigital Library
- Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Mach. Learning, 37, 277--296. Google ScholarDigital Library
- Hazan, E., Kalai, A., Kale, S., & Agarwal, A. (2006). Logarithmic regret algorithms for online convex optimization. COLT. Google ScholarDigital Library
- Hush, D., Kelly, P., Scovel, C., & Steinwart, I. (2006). Qp algorithms with guaranteed accuracy and run time for support vector machines. JMLR. Google ScholarDigital Library
- Joachims, T. (1998). Making large-scale support vector machine learning practical. In B. Schölkopf, C. Burges and A. Smola (Eds.), Advances in kernel methods - support vector learning. MIT Press. Google ScholarDigital Library
- Joachims, T. (2006). Training linear svms in linear time. KDD. Google ScholarDigital Library
- Kimeldorf, G., & Wahba, G. (1971). Some results on tchebycheffian spline functions. J. Math. Anal. Applic., 33, 82--95.Google ScholarCross Ref
- Kivinen, J., Smola, A. J., & Williamson, R. C. (2002). Online learning with kernels. IEEE' TSP, 52, 2165--2176. Google ScholarDigital Library
- Platt, J. C. (1998). Fast training of Support Vector Machines using sequential minimal optimization. In B. Schölkopf, C. Burges and A. Smola (Eds.), Advances in kernel methods - support vector learning. MIT Press. Google ScholarDigital Library
- Shalev-Shwartz, S., & Singer, Y. (2007). Logarithmic regret algorithms for strongly convex repeated games (Technical Report). The Hebrew University.Google Scholar
- Vapnik, V. N. (1998). Statistical learning theory. Wiley.Google Scholar
- Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML. Google ScholarDigital Library
- Pegasos: Primal Estimated sub-GrAdient SOlver for SVM
Recommendations
Pegasos: primal estimated sub-gradient solver for SVM
We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\...
Pegasos: primal estimated sub-gradient solver for SVM
Special Issue on "Optimization and Machine learning"; Alexandre d’Aspremont • Francis Bach • Inderjit S. Dhillon • Bin YuWe describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\...
Automatic Korean word spacing using Pegasos algorithm
In this paper, we describe an automatic Korean word spacing approach using structural SVMs to relax the independence assumptions required by HMMs. We use a Pegasos algorithm for fast training of structural SVMs. We show the Pegasos algorithm for ...
Comments