skip to main content
10.1145/1273496.1273512acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Feature selection in a kernel space

Authors Info & Claims
Published:20 June 2007Publication History

ABSTRACT

We address the problem of feature selection in a kernel space to select the most discriminative and informative features for classification and data analysis. This is a difficult problem because the dimension of a kernel space may be infinite. In the past, little work has been done on feature selection in a kernel space. To solve this problem, we derive a basis set in the kernel space as a first step for feature selection. Using the basis set, we then extend the margin-based feature selection algorithms that are proven effective even when many features are dependent. The selected features form a subspace of the kernel space, in which different state-of-the-art classification algorithms can be applied for classification. We conduct extensive experiments over real and simulated data to compare our proposed method with four baseline algorithms. Both theoretical analysis and experimental results validate the effectiveness of our proposed method.

References

  1. Aha, D. W. (1990). A study of instance-based algorithms for supervised learning tasks: Mathematical, empirical, and psychological evaluations. Doctoral dissertation, Department of Information & Computer Science, University of California, Irvine. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baudat, G., & Anouar, F. (2000). Generalized discriminant analysis using a kernel approach. Neural Computation, 12, 2385--2404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baudat, G., & Anouar, F. (2003). Feature vector selection and projection using kernels. Neurocomputing, 55, 21--38.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. ICML '98 (pp. 82--90). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals Eugen., 7, 179--188.Google ScholarGoogle ScholarCross RefCross Ref
  6. Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). San Diego, CA, USA: Academic Press Professional, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gilad-Bachrach, R., Navot, A., & Tishby, N. (2004). Margin based feature selection - theory and algorithms. ICML '04 (p. 43). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. J. Mach. Learn. Res., 3, 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Horn, R. A., & Johnson, C. R. (1985). Matrix analysis. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. T. Jolliffe. (2002). Principal components analysis. Springer Verlag.Google ScholarGoogle Scholar
  11. Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. ML92 (pp. 249--256). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kononenko, I. (1994). Estimating attributes: analysis and extensions of relief. ECML-94: Proceedings of the European conference on machine learning on Machine Learning (pp. 171--182). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Liang, Z., & Zhao, T. (2006). Feature selection for linear support vector machines. ICPR '06 (pp. 606--609). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mika, S., Ratsch, G., Weston, J., Scholkopf, B., & Muller, K. (1999", a). Fisher discriminant analysis with kernels.Google ScholarGoogle Scholar
  15. Mika, S., Schölkopf, B., Smola, A. J., Müller, K.-R., Scholz, M., & Räätsch, G. (1999b). Kernel PCA and de--noising in feature spaces. Advances in Neural Information Processing Systems 11. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Niijima, S., & Kuhara, S. (2006). Gene subset selection in kernel-induced feature space. Pattern Recognition Letters, 27, 1884--1892. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Scholkopf, B., & A. J. Smola. (2002). Learnin with kernels. Cambridge, MA,: The MIT Press.Google ScholarGoogle Scholar
  18. Sun, Y., & Li, J. (2006). Iterative relief for feature weighting. ICML '06 (pp. 913--920). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Weston, J., Elisseeff, A., Schölkopf, B., & Tipping, M. (2003). Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res., 3, 1439--1461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wu, M., Schölkopf, B., & Bakir, G. (2005). Building sparse large margin classifiers. ICML '05 (pp. 996--1003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yan, J., Liu, N., Zhang, B., Yan, S., Chen, Z., Cheng, Q., Fan, W., & Ma, W.-Y. (2005). Ocfs: optimal orthogonal centroid feature selection for text categorization. SIGIR '05 (pp. 122--129). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. ICML '97 (pp. 412--420). Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Feature selection in a kernel space

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICML '07: Proceedings of the 24th international conference on Machine learning
        June 2007
        1233 pages
        ISBN:9781595937933
        DOI:10.1145/1273496

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate140of548submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader