ABSTRACT
We address feature selection problems for classification of small samples and high dimensionality. A practical example is microarray-based cancer classification problems, where sample size is typically less than 100 and number of features is several thousands or higher. One of the commonly used methods in addressing this problem is recursive feature elimination (RFE) method, which utilizes the generalization capability embedded in support vector machines and is thus suitable for small samples problems. We propose a novel method using minimum reference set (MRS) generated by the nearest neighbor rule. MRS is the set of minimum number of samples that correctly classify all the training samples. It is related to structural risk minimization principle and thus leads to good generalization. The proposed MRS based method is compared to RFE method with several real datasets, and experimental results show that the MRS method produces better classification performance.
- Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D., and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745--6750, June.Google ScholarCross Ref
- Bishop, C. M. (Ed.) (1998). Neural Networks and Machine Learning, NATO ASI Series, Series F: Computer and Systems Sciences, 168, Berlin: Springer-Verlag. Google ScholarDigital Library
- Bottou, L. and Vapnik, V. (1992). Local Learning Algorithms, Neural Computing, 4, 888--890. Google ScholarDigital Library
- Bradley, P. S. and Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. Proc. 13th ICML, 82--90, San Francisco, CA. Google ScholarDigital Library
- Chapelle, O. Vapnik, V. Bousquet, O. and Mukherjee, S. (2002). Choosing kernel parameters for support vector machines. Machine Learning, 46(1--3), 131--159. Google ScholarDigital Library
- Le Cun, Y., Denker, J., and Solla, S. (1990). Optimal Brain Damage. Advances in Neural Information Processing Systems 2, 598--605. Google ScholarDigital Library
- Fung, G., Mangasarian, O. L., and Smola, A. J. (2002). Minimal kernel classifiers. Journal of Machine Learning Research 3, 303--321. Google ScholarDigital Library
- Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gassenbeek, M. Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531--537.Google ScholarCross Ref
- Gorman, R. P., and Sejnowski, T. J. (1988). Analysis of Hidden Unitsin a Layered Network Trained to Classify Sonar Targets, Neural Networks, 1, 75--89.Google ScholarCross Ref
- Guyon, I., Weston J., Barnhill, S., and Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1--3), 389--422. Google ScholarDigital Library
- Guyon, I., and Elisseeff, A. (2003). An introduction to variable and feature selection. JMRL special Issue on variable and Feature Selection 3, 1157--1182. Google ScholarDigital Library
- Haykin, S. (1999). Neural Networks a comprehensive foundation (2nd edition). Prentice-Hall, NJ, 1999 Google ScholarDigital Library
- Karacal, B., and Krim, H. (2002). Fast Minimization of structural risk by nearest neighbor rule. IEEE transactions on neural networks, 14(1), 127--137. Google ScholarDigital Library
- Luntz, A. and Brailovsky, V. (1969). On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 3.Google Scholar
- Pomeroy, S. L., Tamayo, P. Gaasenbeek, M., Sturla, L. M., Angelo, M., McLaughlin, M. E., Kim, J. Y. H., Goumnerova, L. C., Black, P. M., Lau, C., Allen, J. C., Zagzag, D., Olson, J. M., Curran, T. Wetmore, C., Biegel, J. A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D. N., Mesirov, J. P., Lander, E. S., and Golub, T. R. (2002). Prediction of central nervous system embryonal tumor outcome based on gene expression. Letters to Nature, Nature, 415, 436--442.Google ScholarCross Ref
- Reunanen, J. (2003). Overfitting in making comparisons between variable selection methods. JMLR special Issue on variable and Feature Selection 3, 1371--1382. Google ScholarDigital Library
- Schweizer, S. and Moura, J. (2000). Hyperspectral imagery: clutter adaptation in anomaly detection. IEEE Trans. on Information Theory, vol. 46(5), 1855--1871. Google ScholarDigital Library
- Shipp, M. A., Ross, K. N., Tamayo, P., Weng, A. P., Kutok, J. L., Aguiar, R. C. T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G. S., Ray, T. S., Koval, M. A., Last, K. W., Norton, A., Lister, T. A., Mesirov, J., Neuberg, D. S., Lander, E. S., Aster, J. C., Golub, T. R. (2002). Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene Expression Profiling and Supervised Machine Learning. Nature Medicine, vol.8, 68--74.Google ScholarCross Ref
- Vapnik, V. (1998). Statistical Learning Theory. John Wiley and Sons, New York.Google Scholar
- Vapnik, V. and Chapelle, O. (2000). Bounds on error expectation for support vector machines. Neural Computation, 12(9), 2000 Google ScholarDigital Library
- West, B., Blanchette, C., Dressman, H. Huang, E. and et. al. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA, 98, 11462--11467.Google ScholarCross Ref
- Weston, J., Mukherjee, S., Chapelle, O. Pontil, M. Poggio, T. and Vapnik, V. (2000). Feature selection for support vector machines. In Advances in Neural Information Processing Systems.Google Scholar
- Weston, J. Elisseeff, A. Scholkopf, B. and Tipping, M.(2003) Use of the zero-norm with linear models and kernel methods. JMLR special Issue on variable and Feature Selection 3, 1439--1461. Google ScholarDigital Library
- Xiong, H. and Chen, X. (2006). Kernel-Based Distance Metric Learning for Microarray Data Classification. BMC Bioinformatics, 7:299.Google ScholarCross Ref
- Minimum reference set based feature selection for small sample classifications
Recommendations
Incremental feature selection by sample selection and feature-based accelerator▪
AbstractIncremental feature selection is an efficient paradigm that updates an optimal feature subset from added-in data without forgetting the previously learned knowledge. Most existing studies of rough set-based incremental feature ...
Highlights- A new feature selection framework is proposed based on discernibility score.
- ...
Margin based sample weighting for stable feature selection
WAIM'10: Proceedings of the 11th international conference on Web-age information managementStability of feature selection is an important issue in knowledge discovery from high-dimensional data. A key factor affecting the stability of a feature selection algorithm is the sample size of training set. To alleviate the problem of small sample ...
Neighborhood based sample and feature selection for SVM classification learning
Support vector machines (SVMs) are a class of popular classification algorithms for their high generalization ability. However, it is time-consuming to train SVMs with a large set of learning samples. Improving learning efficiency is one of most ...
Comments