A novel feature selection method based on normalized mutual information

Vinh, La The; Lee, Sungyoung; Park, Young-Tack; d’Auriol, Brian J.

doi:10.1007/s10489-011-0315-y

A novel feature selection method based on normalized mutual information

Published: 23 August 2011

Volume 37, pages 100–120, (2012)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

La The Vinh¹,
Sungyoung Lee¹,
Young-Tack Park² &
…
Brian J. d’Auriol¹

1948 Accesses
106 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, a novel feature selection method based on the normalization of the well-known mutual information measurement is presented. Our method is derived from an existing approach, the max-relevance and min-redundancy (mRMR) approach. We, however, propose to normalize the mutual information used in the method so that the domination of the relevance or of the redundancy can be eliminated. We borrow some commonly used recognition models including Support Vector Machine (SVM), k-Nearest-Neighbor (kNN), and Linear Discriminant Analysis (LDA) to compare our algorithm with the original (mRMR) and a recently improved version of the mRMR, the Normalized Mutual Information Feature Selection (NMIFS) algorithm. To avoid data-specific statements, we conduct our classification experiments using various datasets from the UCI machine learning repository. The results confirm that our feature selection method is more robust than the others with regard to classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Asuncion A, Newman DJ (2007) Uci machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Article Google Scholar
Bhanu B, Lin Y (2003) Genetic algorithm based feature selection for target detection in sar images. Image Vis Comput 1(7):591–608
Article Google Scholar
Cawley GC, Talbot NLC, Girolami M (2007) Sparse multinomial logistic regression via Bayesian l1 regularisation. Adv Neural Inf Process Syst 19:209–216
Google Scholar
Chang T-W, Huang Y-P, Sandnes FE (2009) Efficient entropy-based features selection for image retrieval. In: Proceedings of the 2009 IEEE international conference on systems, man and cybernetics, pp 2941–2946
Chapter Google Scholar
Dasgupta A, Drineas P, Harb B, Josifovski V, Mahoney MW (2007) Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 230–239
Chapter Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1:131–156
Article Google Scholar
Dimililer N, Varoglu E, Altinçay H (2009) Classifier subset selection for biomedical named entity recognition. Appl Intell 31:267–282
Article Google Scholar
Dy JG, Brodley CE, Kak A, Broderick LS, Aisen AM (2003) Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Trans Pattern Anal Mach Intell 25(3):373–378
Article Google Scholar
Estévez PA, Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw 20(2):189–201
Article Google Scholar
Fodor IK (2002) A survey of dimension reduction techniques. Technical report, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory
Forman G, Alto P (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
MATH Google Scholar
Goulden CH (1956) Methods of statistical analysis, 2nd edn. Wiley, New York
Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, The University of Waikato
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: An update. SIGKDD Explor 11(1):10–18
Article Google Scholar
Kamimura R (2011) Structural enhanced information and its application to improved visualization of self-organizing maps. Appl Intell 34:102–115
Article Google Scholar
Khor K-C, Ting C-Y, Amnuaisuk S-P (2009) A feature selection approach for network intrusion detection. In: Proceedings of the 2009 international conference on information management and engineering, pp 133–137
Chapter Google Scholar
Kwak N, Choi C-H (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159
Article Google Scholar
Li Y, Zeng X (2010) Sequential multi-criteria feature selection algorithm based on agent genetic algorithm. Appl Intell 33:117–131
Article Google Scholar
Narendra PM, Fukunaga K (1977) A branch and bound algorithm for feature subset selection. IEEE Trans Comput 26(9):917–922
Article MATH Google Scholar
Oh I-S, Lee J-S, Moon B-R (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26(11):1424–1437
Article Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):1367–4803
Article Google Scholar
Shen K-Q, Ong C-J, Li X-P (2008) Novel multi-class feature selection methods using sensitivity analysis of posterior probabilities. In: Proceedings of the IEEE international conference on systems, man and cybernetics, pp 1116–1121
Google Scholar
Shie J-D, Chen S-M (2008) Feature subset selection based on fuzzy entropy measures for handling classification problems. Appl Intell 28:69–82
Article Google Scholar
Tsang C-H, Kwong S, Wang H (2007) Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection. Pattern Recognit 40(9):2373–2391
Article MATH Google Scholar
Vinh LT, Thang ND, Lee Y-K (2010) An improved maximum relevance and minimum redundancy feature selection algorithm based on normalized mutual information. In: Proceedings of the 10th IEEE/IPSJ international symposium on applications and the Internet, pp 395–398
Chapter Google Scholar
Xia H, Hu BQ (2006) Feature selection using fuzzy support vector machines. Fuzzy Optim Decis Mak 5(2):187–192
Article MATH Google Scholar
Yan R (2006) MatlabArsenal toolbox for classification algorithms. Informedia School of Computer Science, Carnegie Mellon University
Yang HH, Moody J (1999) Data visualization and feature selection: New algorithms for nongaussian data. In: Advances in neural information processing systems. MIT Press, Cambridge, pp 687–693
Google Scholar
Yu L, Liu H (2004) Redundancy based feature selection for microarray data. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 737–742
Google Scholar
Yuan G-X, Chang K-W, Hsieh C-J, Lin C-J (2010) A comparison of optimization methods and software for large-scale l1-regularized linear classification. J Mach Learn Res 11:3183–234
MathSciNet Google Scholar
Zhao Z, Morstatter F, Sharma S, Alelyani S, Anand A, Liu H (2010) Advancing feature selection research—asu feature selection repository. Technical report, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University

Download references

Author information

Authors and Affiliations

Dept. of Computer Engineering, Kyung Hee University, Seoul, Korea
La The Vinh, Sungyoung Lee & Brian J. d’Auriol
School of IT, Soongsil University, Seoul, Korea
Young-Tack Park

Authors

La The Vinh
View author publications
You can also search for this author in PubMed Google Scholar
Sungyoung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Young-Tack Park
View author publications
You can also search for this author in PubMed Google Scholar
Brian J. d’Auriol
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sungyoung Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vinh, L.T., Lee, S., Park, YT. et al. A novel feature selection method based on normalized mutual information. Appl Intell 37, 100–120 (2012). https://doi.org/10.1007/s10489-011-0315-y

Download citation

Published: 23 August 2011
Issue Date: July 2012
DOI: https://doi.org/10.1007/s10489-011-0315-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel feature selection method based on normalized mutual information

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Siamese Neural Networks: An Overview

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel feature selection method based on normalized mutual information

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Siamese Neural Networks: An Overview

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation