Skip to main content
Erschienen in:
Buchtitelbild

2020 | OriginalPaper | Buchkapitel

Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy

verfasst von : Yu Zhang, Zhuoyi Lin, Chee Keong Kwoh

Erschienen in: Computational Science – ICCS 2020

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature selection is an important preprocessing step in pattern recognition. In this paper, we presented a new feature selection approach in two-class classification problems based on information theory, named minimum Distribution Similarity with Removed Redundancy (mDSRR). Different from the previous methods which use mutual information and greedy iteration with a loss function to rank the features, we rank features according to their distribution similarities in two classes measured by relative entropy, and then remove the high redundant features from the sorted feature subsets. Experimental results on datasets in varieties of fields with different classifiers highlight the value of mDSRR on selecting feature subsets, especially so for choosing small size feature subset. mDSRR is also proved to outperform other state-of-the-art methods in most cases. Besides, we observed that the mutual information may not be a good practice to select the initial feature in the methods with subsequent iterations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Yan, H., Hu, T.: Unsupervised dimensionality reduction for high-dimensional data classification. Mach. Learn. Res. 2, 125–132 (2017) Yan, H., Hu, T.: Unsupervised dimensionality reduction for high-dimensional data classification. Mach. Learn. Res. 2, 125–132 (2017)
2.
Zurück zum Zitat Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft. Comput. 22(3), 811–822 (2018)CrossRef Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft. Comput. 22(3), 811–822 (2018)CrossRef
3.
Zurück zum Zitat Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH
4.
Zurück zum Zitat Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRef Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)CrossRef
5.
Zurück zum Zitat Wah, Y.B., Ibrahim, N., Hamid, H.A., Abdul-Rahman, S., Fong, S.: Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J. Sci. Technol. 26(1), 329–340 (2018) Wah, Y.B., Ibrahim, N., Hamid, H.A., Abdul-Rahman, S., Fong, S.: Feature selection methods: case of filter and wrapper approaches for maximising classification accuracy. Pertanika J. Sci. Technol. 26(1), 329–340 (2018)
6.
Zurück zum Zitat Jain, D., Singh, V.: Feature selection and classification systems for chronic disease prediction: a review. Egypt. Inform. J. 19(3), 179–189 (2018)CrossRef Jain, D., Singh, V.: Feature selection and classification systems for chronic disease prediction: a review. Egypt. Inform. J. 19(3), 179–189 (2018)CrossRef
7.
Zurück zum Zitat Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual information maximisation. Expert Syst. Appl. 42(22), 8520–8532 (2015)CrossRef Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual information maximisation. Expert Syst. Appl. 42(22), 8520–8532 (2015)CrossRef
8.
Zurück zum Zitat Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 1–13 (2015)CrossRef Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data. Adv. Bioinform. 2015, 1–13 (2015)CrossRef
9.
Zurück zum Zitat Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537 (1994)CrossRef Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537 (1994)CrossRef
10.
Zurück zum Zitat Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform. 13, 51–60 (2002) Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Inform. 13, 51–60 (2002)
11.
Zurück zum Zitat Jin, X., Xu, A., Bie, R., Guo, P.: Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In: Li, J., Yang, Q., Tan, A.-H. (eds.) BioDM 2006. LNCS, vol. 3916, pp. 106–115. Springer, Heidelberg (2006). https://doi.org/10.1007/11691730_11CrossRef Jin, X., Xu, A., Bie, R., Guo, P.: Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In: Li, J., Yang, Q., Tan, A.-H. (eds.) BioDM 2006. LNCS, vol. 3916, pp. 106–115. Springer, Heidelberg (2006). https://​doi.​org/​10.​1007/​11691730_​11CrossRef
12.
Zurück zum Zitat Urbanowicz, R.J., Meeker, M., La Cava, W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018)CrossRef Urbanowicz, R.J., Meeker, M., La Cava, W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 85, 189–203 (2018)CrossRef
13.
Zurück zum Zitat Torkkola, K.: Feature extraction by non-parametric mutual information maximization. J. Mach. Learn. Res. 3, 1415–1438 (2003)MathSciNetMATH Torkkola, K.: Feature extraction by non-parametric mutual information maximization. J. Mach. Learn. Res. 3, 1415–1438 (2003)MathSciNetMATH
14.
Zurück zum Zitat Kwak, N., Choi, C.-H.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13(1), 143–159 (2002)CrossRef Kwak, N., Choi, C.-H.: Input feature selection for classification problems. IEEE Trans. Neural Netw. 13(1), 143–159 (2002)CrossRef
15.
Zurück zum Zitat Peng, H., Long, F., Ding, C.D.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Patter Anal. Mach. Intell. 1(8), 1226–1238 (2005)CrossRef Peng, H., Long, F., Ding, C.D.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Patter Anal. Mach. Intell. 1(8), 1226–1238 (2005)CrossRef
16.
Zurück zum Zitat Estévez, P.A., Tesmer, P.A., Perez, C.A., Zurada, J.M.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)CrossRef Estévez, P.A., Tesmer, P.A., Perez, C.A., Zurada, J.M.: Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20(2), 189–201 (2009)CrossRef
17.
Zurück zum Zitat Hoque, N., Bhattacharyya, D.K., Kalita, J.K.: MIFS-ND: a mutual information-based feature selection method. Expert Syst. Appl. 41(14), 6371–6385 (2014)CrossRef Hoque, N., Bhattacharyya, D.K., Kalita, J.K.: MIFS-ND: a mutual information-based feature selection method. Expert Syst. Appl. 41(14), 6371–6385 (2014)CrossRef
18.
Zurück zum Zitat Yang, H., Moody, J.: Feature selection based on joint mutual information. In: Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis (1999) Yang, H., Moody, J.: Feature selection based on joint mutual information. In: Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis (1999)
19.
Zurück zum Zitat Jakulin, A.: Machine learning based on attribute interactions. Univerza v Ljubljani (2006) Jakulin, A.: Machine learning based on attribute interactions. Univerza v Ljubljani (2006)
20.
Zurück zum Zitat Akadi, A.E., Ouardighi, A.E., Aboutajdine, D.: A powerful feature selection approach based on mutual information. Int. J. Comput. Sci. Netw. Secur. 8(4), 116 (2008) Akadi, A.E., Ouardighi, A.E., Aboutajdine, D.: A powerful feature selection approach based on mutual information. Int. J. Comput. Sci. Netw. Secur. 8(4), 116 (2008)
21.
Zurück zum Zitat Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5, 1531–1555 (2004)MathSciNetMATH Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5, 1531–1555 (2004)MathSciNetMATH
23.
Zurück zum Zitat Cheng, G., Qin, Z., Feng, C., Wang, Y., Li, F.: Conditional mutual information-based feature selection analyzing for synergy and redundancy. ETRI J. 33(2), 210–218 (2011)CrossRef Cheng, G., Qin, Z., Feng, C., Wang, Y., Li, F.: Conditional mutual information-based feature selection analyzing for synergy and redundancy. ETRI J. 33(2), 210–218 (2011)CrossRef
27.
Zurück zum Zitat Vasconcelos, N.: Feature selection by maximum marginal diversity. In: Advances in Neural Information Processing Systems, pp. 1375–1382 (2003) Vasconcelos, N.: Feature selection by maximum marginal diversity. In: Advances in Neural Information Processing Systems, pp. 1375–1382 (2003)
Metadaten
Titel
Information Theory-Based Feature Selection: Minimum Distribution Similarity with Removed Redundancy
verfasst von
Yu Zhang
Zhuoyi Lin
Chee Keong Kwoh
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-50426-7_1