skip to main content
10.1145/1273496.1273516acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Minimum reference set based feature selection for small sample classifications

Published:20 June 2007Publication History

ABSTRACT

We address feature selection problems for classification of small samples and high dimensionality. A practical example is microarray-based cancer classification problems, where sample size is typically less than 100 and number of features is several thousands or higher. One of the commonly used methods in addressing this problem is recursive feature elimination (RFE) method, which utilizes the generalization capability embedded in support vector machines and is thus suitable for small samples problems. We propose a novel method using minimum reference set (MRS) generated by the nearest neighbor rule. MRS is the set of minimum number of samples that correctly classify all the training samples. It is related to structural risk minimization principle and thus leads to good generalization. The proposed MRS based method is compared to RFE method with several real datasets, and experimental results show that the MRS method produces better classification performance.

References

  1. Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745--6750, June.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bishop, C. M. (Ed.) (1998). Neural Networks and Machine Learning, NATO ASI Series, Series F: Computer and Systems Sciences, 168, Berlin: Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bottou, L. and Vapnik, V. (1992). Local Learning Algorithms, Neural Computing, 4, 888--890. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bradley, P. S. and Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. Proc. 13th ICML, 82--90, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chapelle, O. Vapnik, V. Bousquet, O. and Mukherjee, S. (2002). Choosing kernel parameters for support vector machines. Machine Learning, 46(1--3), 131--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Le Cun, Y., Denker, J., and Solla, S. (1990). Optimal Brain Damage. Advances in Neural Information Processing Systems 2, 598--605. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fung, G., Mangasarian, O. L., and Smola, A. J. (2002). Minimal kernel classifiers. Journal of Machine Learning Research 3, 303--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gassenbeek, M. Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531--537.Google ScholarGoogle ScholarCross RefCross Ref
  9. Gorman, R. P., and Sejnowski, T. J. (1988). Analysis of Hidden Unitsin a Layered Network Trained to Classify Sonar Targets, Neural Networks, 1, 75--89.Google ScholarGoogle ScholarCross RefCross Ref
  10. Guyon, I., Weston J., Barnhill, S., and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1--3), 389--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guyon, I., and Elisseeff, A. (2003). An introduction to variable and feature selection. JMRL special Issue on variable and Feature Selection 3, 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Haykin, S. (1999). Neural Networks a comprehensive foundation (2nd edition). Prentice-Hall, NJ, 1999 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Karacal, B., and Krim, H. (2002). Fast Minimization of structural risk by nearest neighbor rule. IEEE transactions on neural networks, 14(1), 127--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Luntz, A. and Brailovsky, V. (1969). On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 3.Google ScholarGoogle Scholar
  15. Pomeroy, S. L., Tamayo, P. Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Kim, J. Y. H., Goumnerova, L. C., Black, P. M., Lau, C., Allen, J. C., Zagzag, D., Olson, J. M., Curran, T. Wetmore, C., Biegel, J. A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D. N., Mesirov, J. P., Lander, E. S., and Golub, T. R. (2002). Prediction of central nervous system embryonal tumor outcome based on gene expression. Letters to Nature, Nature, 415, 436--442.Google ScholarGoogle ScholarCross RefCross Ref
  16. Reunanen, J. (2003). Overfitting in making comparisons between variable selection methods. JMLR special Issue on variable and Feature Selection 3, 1371--1382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Schweizer, S. and Moura, J. (2000). Hyperspectral imagery: clutter adaptation in anomaly detection. IEEE Trans. on Information Theory, vol. 46(5), 1855--1871. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C. T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G. S., Ray, T. S., Koval, M. A., Last, K. W., Norton, A., Lister, T. A., Mesirov, J., Neuberg, D. S., Lander, E. S., Aster, J. C., Golub, T. R. (2002). Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene Expression Profiling and Supervised Machine Learning. Nature Medicine, vol.8, 68--74.Google ScholarGoogle ScholarCross RefCross Ref
  19. Vapnik, V. (1998). Statistical Learning Theory. John Wiley and Sons, New York.Google ScholarGoogle Scholar
  20. Vapnik, V. and Chapelle, O. (2000). Bounds on error expectation for support vector machines. Neural Computation, 12(9), 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. West, B., Blanchette, C., Dressman, H. Huang, E. and et. al. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA, 98, 11462--11467.Google ScholarGoogle ScholarCross RefCross Ref
  22. Weston, J., Mukherjee, S., Chapelle, O. Pontil, M. Poggio, T. and Vapnik, V. (2000). Feature selection for support vector machines. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  23. Weston, J. Elisseeff, A. Scholkopf, B. and Tipping, M.(2003) Use of the zero-norm with linear models and kernel methods. JMLR special Issue on variable and Feature Selection 3, 1439--1461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xiong, H. and Chen, X. (2006). Kernel-Based Distance Metric Learning for Microarray Data Classification. BMC Bioinformatics, 7:299.Google ScholarGoogle ScholarCross RefCross Ref
  1. Minimum reference set based feature selection for small sample classifications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICML '07: Proceedings of the 24th international conference on Machine learning
          June 2007
          1233 pages
          ISBN:9781595937933
          DOI:10.1145/1273496

          Copyright © 2007 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 June 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate140of548submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader