Abstract
A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed stand-alone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it generates.
Similar content being viewed by others
References
D.P. Bertsekas, Nonlinear Programming, 2nd edn. Athena Scientific: Belmont, MA, 1999.
J. Bi, K.P. Bennett, M. Embrechts, C.M. Breneman, and M. Song, “Dimensionality reduction via sparse support vector machines,” Journal of Machine Learning Research, vol. 3, pp. 1229-1243, 2003.
P.S. Bradley and O.L. Mangasarian, “Feature selection via concave minimization and support vector machines,” in Machine Learning Proceedings of the Fifteenth International Conference (ICML '98). J. Shavlik (Ed.), Morgan Kaufmann: San Francisco, California, 1998, pp. 82-90. ftp://ftp.cs.wisc.edu/math-prog/techreports/98-03.ps.
V. Cherkassky and F. Mulier, Learning from Data-Concepts, Theory and Methods. JohnWiley & Sons: New York, 1998.
M.J. van de Vijver et al. “A gene-expression signature as a predictor of survival in breast cancer,” The New England Journal of Medicine, vol. 347, pp. 1999-2009, 2002.
F. Facchinei, “Minimization of SC1 functions and the Maratos effect,” Operations Research Letters, vol. 17, pp. 131-137, 1995.
A.V. Fiacco and G.P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques. John Wiley & Sons: New York, NY, 1968.
G. Fung and O.L. Mangasarian, “Finite Newton method for Lagrangian support vector machine classification,” Technical Report 02-01, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, February 2002. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/02-01.ps. Neurocomputing, vol. 55, pp. 39-55, 2003.
G.H. Golub and C.F. Van Loan, Matrix Computations, 3rd ed. The John Hopkins University Press: Baltimore, Maryland, 1996.
J.-B. Hiriart-Urruty, J.J. Strodiot, and V.H. Nguyen, “Generalized hessian matrix and second-order optimality conditions for problems with C L1 data,” Applied Mathematics and Optimization, vol. 11, pp. 43-56, 1984.
ILOG CPLEX Division, 889 Alder Avenue, Incline Village, Nevada. CPLEX Optimizer. http://www.cplex.com/.
Y.-J. Lee and O.L. Mangasarian, “RSVM: Reduced support vector machines,” Technical Report 00-07, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, July 2000. Proceedings of the First SIAM International Conference on Data Mining, Chicago, April 5-7, 2001, CD-ROM Proceedings. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/00-07.ps.
S. Lucidi, “A new result in the theory and computation of the least-norm solution of a linear program,” Journal of Optimization Theory and Applications, vol. 55, pp. 103-117, 1987.
O.L. Mangasarian, “Normal solutions of linear programs,” Mathematical Programming Study, vol. 22, pp. 206-216, 1984.
O.L. Mangasarian, Nonlinear Programming, SIAM. Philadelphia: PA, 1994.
O.L. Mangasarian, “Parallel gradient distribution in unconstrained optimization,” SIAM Journal on Control and Optimization, vol. 33, no. 6, pp. 1916-1925, 1995. ftp://ftp.cs.wisc.edu/tech-reports/reports/ 1993/tr1145.ps.
O.L. Mangasarian, “Arbitrary-norm separating plane,” Operations Research Letters, vol. 24, pp. 15-23, 1999. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/97-07r.ps.
O.L. Mangasarian, “Generalized support vector machines,” in Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans (Eds.), MIT Press: Cambridge, MA, 2000, pp. 135-146, ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-14.ps.
O.L. Mangasarian, “A finite Newton method for classification problems,” Technical Report 01-11, Data Mining Institute, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, December 2001. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/01-11.ps. Optimization Methods and Software, vol. 17, pp. 913-929, 2002.
O.L. Mangasarian and R.R. Meyer, “Nonlinear perturbation of linear programs,” SIAM Journal on Control and Optimization, vol. 17, no. 6, pp. 745-752, 1979.
O.L. Mangasarian and D.R. Musicant, “Lagrangian support vector machines,” Journal of Machine Learning Research, vol. 1, pp. 161-177, 2001. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/00-06.ps.
MATLAB, User's Guide. The MathWorks, Inc., Natick, MA01760, 1994-2001. http://www.mathworks.com.
M. Molla, M. Waddell, D. Page, and J. Shavlik, “Using machine learning to design and interpret gene-expression microarrays,” AI Magazine, Special Issue on Bioinformatics, 2003 (to appear). ftp://ftp.cs.wisc.edu/machine-learning/shavlik-group/molla.aimag03.pdf.
P.M. Murphy and D.W. Aha, “UCI machine learning repository,” 1992. www.ics.uci.edu/~mlearn/ MLRepository.html.
S. Odewahn, E. Stockwell, R. Pennington, R. Humphreys, and W. Zumach, “Automated star/galaxy discrimination with neural networks,” Astronomical Journal, vol. 103, no. 1, pp. 318-331, 1992.
D. Page, F. Zhan, J. Cussens, M. Waddell, J. Hardin, B. Barlogie, and J. Shaughnessy, Jr., “Comparative data mining for microarrays: A case study based on multiple myeloma,” Technical Report 1453, Computer Sciences Department, University of Wisconsin, November 2002.
V.N. Vapnik, The Nature of Statistical Learning Theory, 2nd edn. Springer: New York, 2000.
J. Zhy, S. Rosset, T. Hastie, and R. Tibshirani, “1-norm support vector machines,”Working paper, Department of Statistics, Stanford University, Stanford CA 94305, 2003. http://www-stat.stanford.edu/~hastie/Papers/.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fung, G.M., Mangasarian, O. A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Applications 28, 185–202 (2004). https://doi.org/10.1023/B:COAP.0000026884.66338.df
Issue Date:
DOI: https://doi.org/10.1023/B:COAP.0000026884.66338.df