skip to main content
10.1145/1273496.1273598acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Published:20 June 2007Publication History

ABSTRACT

We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). Our method alternates between stochastic gradient descent steps and projection steps. We prove that the number of iterations required to obtain a solution of accuracy ε is Õ(1/ε). In contrast, previous analyses of stochastic gradient descent methods require Ω (1/ε2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is Õ (d/(λε)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach can seamlessly be adapted to employ non-linear kernels while working solely on the primal objective function. We demonstrate the efficiency and applicability of our approach by conducting experiments on large text classification problems, comparing our solver to existing state-of-the-art SVM solvers. For example, it takes less than 5 seconds for our solver to converge when solving a text classification problem from Reuters Corpus Volume 1 (RCV1) with 800,000 training examples.

References

  1. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Censor, Y., & Zenios, S. (1997). Parallel optimization: Theory, algorithms, and applications. Oxford University Press, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2004). On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50, 2050--2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cesa-Bianchi, N., & Gentile, C. (2006). Improved risk tail bounds for on-line algorithms. NIPS.Google ScholarGoogle Scholar
  5. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive aggressive algorithms. JMLR, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. Wiley.Google ScholarGoogle Scholar
  8. Fine, S., & Scheinberg, K. (2001). Efficient svm training using low-rank kernel representations. JMLR, 2, 242--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Mach. Learning, 37, 277--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hazan, E., Kalai, A., Kale, S., & Agarwal, A. (2006). Logarithmic regret algorithms for online convex optimization. COLT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hush, D., Kelly, P., Scovel, C., & Steinwart, I. (2006). Qp algorithms with guaranteed accuracy and run time for support vector machines. JMLR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Joachims, T. (1998). Making large-scale support vector machine learning practical. In B. Schölkopf, C. Burges and A. Smola (Eds.), Advances in kernel methods - support vector learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Joachims, T. (2006). Training linear svms in linear time. KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kimeldorf, G., & Wahba, G. (1971). Some results on tchebycheffian spline functions. J. Math. Anal. Applic., 33, 82--95.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kivinen, J., Smola, A. J., & Williamson, R. C. (2002). Online learning with kernels. IEEE' TSP, 52, 2165--2176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Platt, J. C. (1998). Fast training of Support Vector Machines using sequential minimal optimization. In B. Schölkopf, C. Burges and A. Smola (Eds.), Advances in kernel methods - support vector learning. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Shalev-Shwartz, S., & Singer, Y. (2007). Logarithmic regret algorithms for strongly convex repeated games (Technical Report). The Hebrew University.Google ScholarGoogle Scholar
  18. Vapnik, V. N. (1998). Statistical learning theory. Wiley.Google ScholarGoogle Scholar
  19. Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. ICML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICML '07: Proceedings of the 24th international conference on Machine learning
      June 2007
      1233 pages
      ISBN:9781595937933
      DOI:10.1145/1273496

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate140of548submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader