An information-theoretic graph-based approach for feature selection

Das, Amit Kumar; Kumar, Sahil; Jain, Samyak; Goswami, Saptarsi; Chakrabarti, Amlan; Chakraborty, Basabi

doi:10.1007/s12046-019-1238-2

An information-theoretic graph-based approach for feature selection

Published: 30 December 2019

Volume 45, article number 11, (2020)
Cite this article

Sādhanā Aims and scope Submit manuscript

Amit Kumar Das¹,
Sahil Kumar¹,
Samyak Jain¹,
Saptarsi Goswami²,
Amlan Chakrabarti² &
…
Basabi Chakraborty³

567 Accesses
7 Citations
Explore all metrics

Abstract

Feature selection is a critical research problem in data science. The need for feature selection has become more critical with the advent of high-dimensional data sets especially related to text, image and micro-array data. In this paper, a graph-theoretic approach with step-by-step visualization is proposed in the context of supervised feature selection. Mutual information criterion is used to evaluate the relevance of the features with respect to the class. A graph-based representation of the input data set, named as feature information map (FIM) is created, highlighting the vertices representing the less informative features. Amongst the more informative features, the inter-feature similarity is measured to draw edges between features having high similarity. At the end, minimal vertex cover is applied on the connected vertices to identify a subset of features potentially having less similarity among each other. Results of the experiments conducted with standard data sets show that the proposed method gives better results than the competing algorithms for most of the data sets. The proposed algorithm also has a novel contribution of rendering a visualization of features in terms of relevance and redundancy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Cao L 2016 Data science and analytics: a new era. Int. J. Data Sci. Anal. 1: 1–2
Article Google Scholar
Morgulev E, Azar O H and Lidor R 2017 Sports analytics and the big-data era. Int. J. Data Sci. Anal. 5(4): 213-222
Article Google Scholar
Moujahid A and Dornaika F 2017 Feature selection for spatially enhanced LBP: application to face recognition. Int. J. Data Sci. Anal. 5: 11–18
Article Google Scholar
Bandyopadhyay S, Bhadra T, Mitra P and Maulik U 2014 Integration of dense subgraph finding with feature clustering for unsupervised feature selection. Pattern Recognit. Lett. 40: 104–112
Article Google Scholar
Liu H and Yu L 2005 Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17: 491–502
Article Google Scholar
Dash M and Liu H 1997 Feature selection for classification. Intell. Data Anal. 1: 131–156
Article Google Scholar
John G H, Kohavi R and Pfleger K 1994 Irrelevant features and the subset selection problem. In: ICML Proceedings, pp. 121–129
Das A K, Goswami S, Chakraborty B and Chakrabarti A 2016 A graph-theoretic approach for visualization of data set feature association. In: Advanced Computing and Systems for Security, vol. 4, pp. 109–124
Goswami S, Das A K, Chakrabarti A and Chakraborty B 2017 A feature cluster taxonomy based feature selection technique. Expert Syst. Appl. 79: 76–89
Article Google Scholar
Goswami S, Guha P, Tarafdar A, Das A K, Chakraborty S, Chakrabarti A and Chakraborty B 2017 An approach of feature selection using graph-theoretic heuristic and hill climbing. Pattern Anal. Appl. 22(2): 615–631
Article MathSciNet Google Scholar
Liu H and Motoda H 2009 Computational methods of feature selection. Inf. Process. Manag. 45: 490–493
Article Google Scholar
Tang J, Alelyani S and Liu H 2014 Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, pp. 37–64
Das A K, Goswami S, Chakrabarti A and Chakraborty B 2017 A new hybrid feature selection approach using Feature Association Map for supervised and unsupervised classiffication. Expert Syst. Appl. 88: 81–94
Article Google Scholar
Peng H, Long F and Ding C H 2005 Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27: 1226–1238
Article Google Scholar
Estvez P A, Tesmer M, Perez C A and Zurada J M 2009 Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20: 189–201
Article Google Scholar
Battiti R 1994 Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4): 537–550
Article Google Scholar
Hoque N, Bhattacharyya D K and Kalita J K 2014 MIFS-ND: a mutual information-based feature selection method. Expert Syst. Appl. 41: 6371–6385
Article Google Scholar
Zhang Z and Hancock E R 2011 A graph-based approach to feature selection. In: GBRPR Proceedings, pp. 205–214
Zhang Z and Hancock E R 2012 Hypergraph based information-theoretic feature selection. Pattern Recognit. Lett. 33: 1991–1999
Article Google Scholar
Tsamardinos I, Brown L E and Aliferis C F 2006 The max–min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1): 31–78
Article Google Scholar
Gasse M, Aussem A and Elghazel H 2014 A hybrid algorithm for Bayesian network structure learning with application to multi-label learning. Expert Syst. Appl. 41(15): 6755–6772
Article Google Scholar
Chow C K and Liu C N 1968 Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory IT 14(3): 462–467
Article Google Scholar
Zare H and Niazi M 2016 Relevant based structure learning for feature selection. Eng. Appl. Artif. Intell. 55: 93–102
Article Google Scholar
Huang S et al 2013 Alzheimer’s disease neuroimaging initiative—a sparse structure learning algorithm for Gaussian Bayesian network identification from high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 35(6): 1328–1342
Article Google Scholar
Hall M A 2000 Correlation-based feature selection for discrete and numeric class machine learning. In: ICML Proceedings, pp. 359–366
Khan I and Khan S 2014 Experimental comparison of five approximation algorithms for minimum vertex cover. Int. J. Sci. Technol. 7: 69–84
Google Scholar
Li S et al 2011 An algorithm for minimum vertex cover based on Max-I share degree. J. Comput. 6: 1781–1788
Google Scholar
Lichman M and Bache K 2013 UCI machine learning repository [online]. Available: http://archive.ics.uci.edu/ml. Accessed 10 Oct 2018
Taylor R 1990 Interpretation of the correlation coefficient: a basic review. J. Diagn. Med. Sonogr. 6: 35–39
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Institute of Engineering and Management, Kolkata, India
Amit Kumar Das, Sahil Kumar & Samyak Jain
A K Choudhury School of Information Technology, University of Calcutta, Kolkata, India
Saptarsi Goswami & Amlan Chakrabarti
Iwate Prefectural University, Takizawa, Japan
Basabi Chakraborty

Authors

Amit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Sahil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Samyak Jain
View author publications
You can also search for this author in PubMed Google Scholar
Saptarsi Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Amlan Chakrabarti
View author publications
You can also search for this author in PubMed Google Scholar
Basabi Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Basabi Chakraborty.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, A.K., Kumar, S., Jain, S. et al. An information-theoretic graph-based approach for feature selection. Sādhanā 45, 11 (2020). https://doi.org/10.1007/s12046-019-1238-2

Download citation

Received: 01 January 2019
Revised: 18 April 2019
Accepted: 17 October 2019
Published: 30 December 2019
DOI: https://doi.org/10.1007/s12046-019-1238-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An information-theoretic graph-based approach for feature selection

Abstract

Access this article

Similar content being viewed by others

An approach of feature selection using graph-theoretic heuristic and hill climbing

Graph-Based Supervised Feature Selection Using Correlation Exponential

An adaptive heuristic for feature selection based on complementarity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An information-theoretic graph-based approach for feature selection

Abstract

Access this article

Similar content being viewed by others

An approach of feature selection using graph-theoretic heuristic and hill climbing

Graph-Based Supervised Feature Selection Using Correlation Exponential

An adaptive heuristic for feature selection based on complementarity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation