Skip to main content

2016 | OriginalPaper | Buchkapitel

Mutual Information Estimation for Filter Based Feature Selection Using Particle Swarm Optimization

verfasst von : Hoai Bach Nguyen, Bing Xue, Peter Andreae

Erschienen in: Applications of Evolutionary Computation

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature selection is a pre-processing step in classification, which selects a small set of important features to improve the classification performance and efficiency. Mutual information is very popular in feature selection because it is able to detect non-linear relationship between features. However the existing mutual information approaches only consider two-way interaction between features. In addition, in most methods, mutual information is calculated by a counting approach, which may lead to an inaccurate results. This paper proposes a filter feature selection algorithm based on particle swarm optimization (PSO) named PSOMIE, which employs a novel fitness function using nearest neighbor mutual information estimation (NNE) to measure the quality of a feature set. PSOMIE is compared with using all features and two traditional feature selection approaches. The experiment results show that the mutual information estimation successfully guides PSO to search for a small number of features while maintaining or improving the classification performance over using all features and the traditional feature selection methods. In addition, PSOMIE provides a strong consistency between training and test results, which may be used to avoid overfitting problem.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH
2.
Zurück zum Zitat Liu, H., Motoda, H., Setiono, R., Zhao, Z.: Feature selection: an ever evolving frontier in data mining. FSDM 10, 4–13 (2010) Liu, H., Motoda, H., Setiono, R., Zhao, Z.: Feature selection: an ever evolving frontier in data mining. FSDM 10, 4–13 (2010)
3.
4.
Zurück zum Zitat Marill, T., Green, D.M.: On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 9(1), 11–17 (1963)CrossRef Marill, T., Green, D.M.: On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 9(1), 11–17 (1963)CrossRef
5.
Zurück zum Zitat Nag, K., Pal, N.R.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern. 46(2), 499–510 (2015)CrossRef Nag, K., Pal, N.R.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern. 46(2), 499–510 (2015)CrossRef
6.
Zurück zum Zitat Lin, F., Liang, D., Yeh, C.C., Huang, J.C.: Novel feature selection methods to financial distress prediction. Expert Syst. Appl. 41(5), 2472–2483 (2014)CrossRef Lin, F., Liang, D., Yeh, C.C., Huang, J.C.: Novel feature selection methods to financial distress prediction. Expert Syst. Appl. 41(5), 2472–2483 (2014)CrossRef
7.
Zurück zum Zitat Chuang, L.Y., Chang, H.W., Tu, C.J., Yang, C.H.: Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 32(1), 29–38 (2008)CrossRefMATH Chuang, L.Y., Chang, H.W., Tu, C.J., Yang, C.H.: Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 32(1), 29–38 (2008)CrossRefMATH
8.
Zurück zum Zitat Hassan, R., Cohanim, B., De Weck, O., Venter, G.: A comparison of particle swarm optimization and the genetic algorithm. In: Proceedings of the 1st AIAA Multidisciplinary Design Optimization Specialist Conference, pp. 1–13 (2005) Hassan, R., Cohanim, B., De Weck, O., Venter, G.: A comparison of particle swarm optimization and the genetic algorithm. In: Proceedings of the 1st AIAA Multidisciplinary Design Optimization Specialist Conference, pp. 1–13 (2005)
9.
Zurück zum Zitat Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRef Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)CrossRef
10.
Zurück zum Zitat Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)CrossRefMATH Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)CrossRefMATH
11.
Zurück zum Zitat Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2012)MATH Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2012)MATH
12.
Zurück zum Zitat Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 98–109. Springer, Heidelberg (2000)CrossRef Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS (LNAI), vol. 1805, pp. 98–109. Springer, Heidelberg (2000)CrossRef
13.
Zurück zum Zitat Hall, M.: Correlation-based feature selection for discrete and numeric class machinelearning. In: Proceedings of 7th International Conference on Machine Learning, Stanford University (2000) Hall, M.: Correlation-based feature selection for discrete and numeric class machinelearning. In: Proceedings of 7th International Conference on Machine Learning, Stanford University (2000)
14.
Zurück zum Zitat Kononenko, I.: On biases in estimating multi-valued attributes. In: IJCAI. vol. 95, pp. 1034–1040. Citeseer (1995) Kononenko, I.: On biases in estimating multi-valued attributes. In: IJCAI. vol. 95, pp. 1034–1040. Citeseer (1995)
15.
Zurück zum Zitat Walters-Williams, J., Li, Y.: Estimation of mutual information: a survey. In: Wen, P., Li, Y., Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds.) RSKT 2009. LNCS, vol. 5589, pp. 389–396. Springer, Heidelberg (2009)CrossRef Walters-Williams, J., Li, Y.: Estimation of mutual information: a survey. In: Wen, P., Li, Y., Polkowski, L., Yao, Y., Tsumoto, S., Wang, G. (eds.) RSKT 2009. LNCS, vol. 5589, pp. 389–396. Springer, Heidelberg (2009)CrossRef
16.
Zurück zum Zitat Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)MathSciNetCrossRef Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating mutual information. Phys. Rev. E 69(6), 066138 (2004)MathSciNetCrossRef
17.
Zurück zum Zitat Kennedy, J., Eberhart, R., et al.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948. Perth, Australia (1995) Kennedy, J., Eberhart, R., et al.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948. Perth, Australia (1995)
19.
Zurück zum Zitat Sturges, H.A.: The choice of a class interval. J. Am. Stat. Assoc. 21(153), 65–66 (1926)CrossRef Sturges, H.A.: The choice of a class interval. J. Am. Stat. Assoc. 21(153), 65–66 (1926)CrossRef
21.
Zurück zum Zitat Doquire, G., Verleysen, M.: A performance evaluation of mutual information estimators for multivariate feature selection. In: Carmona, P.L., Salvado Sánchez, J., Fred, A.L.N. (eds.) ICPRAM 2012. AISC, vol. 204, pp. 51–63. Springer, Heidelberg (2013)CrossRef Doquire, G., Verleysen, M.: A performance evaluation of mutual information estimators for multivariate feature selection. In: Carmona, P.L., Salvado Sánchez, J., Fred, A.L.N. (eds.) ICPRAM 2012. AISC, vol. 204, pp. 51–63. Springer, Heidelberg (2013)CrossRef
22.
Zurück zum Zitat Stearns, S.D.: On selecting features for pattern classifiers. In: Proceedings of the 3rd International Conference on Pattern Recognition (ICPR 1976), pp. 71–75. Coronado, CA (1976) Stearns, S.D.: On selecting features for pattern classifiers. In: Proceedings of the 3rd International Conference on Pattern Recognition (ICPR 1976), pp. 71–75. Coronado, CA (1976)
23.
Zurück zum Zitat Zhu, Z., Ong, Y.S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern. B Cybern. 37(1), 70–76 (2007)CrossRef Zhu, Z., Ong, Y.S., Dash, M.: Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern. B Cybern. 37(1), 70–76 (2007)CrossRef
24.
Zurück zum Zitat Neshatian, K., Zhang, M.: Genetic programming for feature subset ranking in binary classification problems. In: Vanneschi, L., Gustafson, S., Moraglio, A., Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 121–132. Springer, Heidelberg (2009)CrossRef Neshatian, K., Zhang, M.: Genetic programming for feature subset ranking in binary classification problems. In: Vanneschi, L., Gustafson, S., Moraglio, A., Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 121–132. Springer, Heidelberg (2009)CrossRef
25.
Zurück zum Zitat Hunt, R., Neshatian, K., Zhang, M.: A genetic programming approach to hyper-heuristic feature selection. In: Bui, L.T., Ong, Y.S., Hoai, N.X., Ishibuchi, H., Suganthan, P.N. (eds.) SEAL 2012. LNCS, vol. 7673, pp. 320–330. Springer, Heidelberg (2012)CrossRef Hunt, R., Neshatian, K., Zhang, M.: A genetic programming approach to hyper-heuristic feature selection. In: Bui, L.T., Ong, Y.S., Hoai, N.X., Ishibuchi, H., Suganthan, P.N. (eds.) SEAL 2012. LNCS, vol. 7673, pp. 320–330. Springer, Heidelberg (2012)CrossRef
26.
Zurück zum Zitat Sousa, P., Cortez, P., Vaz, R., Rocha, M., Rio, M.: Email spam detection: a symbiotic feature selection approach fostered by evolutionary computation. Int. J. Inf. Technol. Decis. Making 12(04), 863–884 (2013)CrossRef Sousa, P., Cortez, P., Vaz, R., Rocha, M., Rio, M.: Email spam detection: a symbiotic feature selection approach fostered by evolutionary computation. Int. J. Inf. Technol. Decis. Making 12(04), 863–884 (2013)CrossRef
27.
Zurück zum Zitat Bhowan, U., McCloskey, D.: Genetic programming for feature selection and question-answer ranking in IBM watson. In: Machado, P., Heywood, M.I., McDermott, J., Castelli, M., García-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 153–166. Springer, Heidelberg (2015) Bhowan, U., McCloskey, D.: Genetic programming for feature selection and question-answer ranking in IBM watson. In: Machado, P., Heywood, M.I., McDermott, J., Castelli, M., García-Sánchez, P., Burelli, P., Risi, S., Sim, K. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 153–166. Springer, Heidelberg (2015)
28.
Zurück zum Zitat Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl. Soft. Comput. 18, 261–276 (2014)CrossRef Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms. Appl. Soft. Comput. 18, 261–276 (2014)CrossRef
29.
Zurück zum Zitat Cervante, L., Xue, B., Zhang, M., Shang, L.: Binary particle swarm optimisation for feature selection: a filter based approach. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2012) Cervante, L., Xue, B., Zhang, M., Shang, L.: Binary particle swarm optimisation for feature selection: a filter based approach. In: 2012 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2012)
30.
Zurück zum Zitat Butler-Yeoman, T., Xue, B., Zhang, M.: Particle swarm optimisation for feature selection: a hybrid filter-wrapper approach. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 2428–2435. IEEE (2015) Butler-Yeoman, T., Xue, B., Zhang, M.: Particle swarm optimisation for feature selection: a hybrid filter-wrapper approach. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 2428–2435. IEEE (2015)
31.
Zurück zum Zitat Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. published online on 30 November 2015. doi:10.1109/TEVC.2015.2504420 Xue, B., Zhang, M., Browne, W., Yao, X.: A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. published online on 30 November 2015. doi:10.​1109/​TEVC.​2015.​2504420
32.
Zurück zum Zitat Asuncion, A., Newman, D.: UCI machine learning repository (2007) Asuncion, A., Newman, D.: UCI machine learning repository (2007)
33.
Zurück zum Zitat Van Den Bergh, F.: An analysis of particle swarm optimizers. PhD thesis, University of Pretoria (2006) Van Den Bergh, F.: An analysis of particle swarm optimizers. PhD thesis, University of Pretoria (2006)
34.
Zurück zum Zitat Zhai, Y., Ong, Y.S., Tsang, I.W.: The emerging big dimensionality. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)CrossRef Zhai, Y., Ong, Y.S., Tsang, I.W.: The emerging big dimensionality. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)CrossRef
35.
Zurück zum Zitat Eberhart, R.C., Shi, Y.: Comparing inertia weights and constriction factors in particle swarm optimization. In: Proceedings of the 2000 Congress on Evolutionary Computation, vol. 1, pp. 84–88. IEEE (2000) Eberhart, R.C., Shi, Y.: Comparing inertia weights and constriction factors in particle swarm optimization. In: Proceedings of the 2000 Congress on Evolutionary Computation, vol. 1, pp. 84–88. IEEE (2000)
Metadaten
Titel
Mutual Information Estimation for Filter Based Feature Selection Using Particle Swarm Optimization
verfasst von
Hoai Bach Nguyen
Bing Xue
Peter Andreae
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-31204-0_46

Premium Partner