Skip to main content
Top
Published in: Memetic Computing 3/2018

07-03-2018 | Regular Research Paper

PSO with surrogate models for feature selection: static and dynamic clustering-based methods

Authors: Hoai Bach Nguyen, Bing Xue, Peter Andreae

Published in: Memetic Computing | Issue 3/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Feature selection is an important but often expensive process, especially with a large number of instances. This problem can be addressed by using a small training set, i.e. a surrogate set. In this work, we propose to use a hierarchical clustering method to build various surrogate sets, which allows to analyze the effect of surrogate sets with different qualities and quantities on the feature subsets. Further, a dynamic surrogate model is proposed to automatically adjust surrogate sets for different datasets. Based on this idea, a feature selection system is developed using particle swarm optimization as the search mechanism. The experiments show that the hierarchical clustering method can build better surrogate sets to reduce the computational time, improve the feature selection performance, and alleviate overfitting. The dynamic method can automatically choose suitable surrogate sets to further improve the classification accuracy.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Banka H, Dara S (2015) A hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recognit Lett 52:94–100CrossRef Banka H, Dara S (2015) A hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recognit Lett 52:94–100CrossRef
3.
go back to reference Chinnaswamy A, Srinivasan R (2016) Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In: Snášel V, Abraham A, Krömer P, Pant M, Muda A (eds) Innovations in bio-inspired computing and applications. Springer, pp 229–239 Chinnaswamy A, Srinivasan R (2016) Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In: Snášel V, Abraham A, Krömer P, Pant M, Muda A (eds) Innovations in bio-inspired computing and applications. Springer, pp 229–239
4.
go back to reference Eberhart RC, Shi Y (1998) Comparison between genetic algorithms and particle swarm optimization. In: International conference on evolutionary programming. Springer, pp 611–616 Eberhart RC, Shi Y (1998) Comparison between genetic algorithms and particle swarm optimization. In: International conference on evolutionary programming. Springer, pp 611–616
5.
go back to reference Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, BerlinMATH Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, BerlinMATH
6.
go back to reference Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182MATH
7.
go back to reference Jiang S, Chin KS, Wang L, Qu G, Tsui KL (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230CrossRef Jiang S, Chin KS, Wang L, Qu G, Tsui KL (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230CrossRef
8.
go back to reference Kennedy J (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, pp 760–766 Kennedy J (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, pp 760–766
9.
go back to reference Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: IEEE international conference on systems, man, and cybernetics, computational cybernetics and simulation, vol 5. IEEE, pp 4104–4108 Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: IEEE international conference on systems, man, and cybernetics, computational cybernetics and simulation, vol 5. IEEE, pp 4104–4108
10.
go back to reference Koza JR (1999) Genetic programming III: darwinian invention and problem solving, vol 3. Morgan Kaufmann, BurlingMATH Koza JR (1999) Genetic programming III: darwinian invention and problem solving, vol 3. Morgan Kaufmann, BurlingMATH
11.
go back to reference Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150CrossRef Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150CrossRef
13.
go back to reference MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1, pp 281–297 MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1, pp 281–297
14.
go back to reference Marill T, Green DM (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17CrossRef Marill T, Green DM (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17CrossRef
15.
go back to reference Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B (Cybern) 36(1):106–117CrossRef Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B (Cybern) 36(1):106–117CrossRef
16.
go back to reference Murtagh F, Legendre P (2014) Wards hierarchical agglomerative clustering method: which algorithms implement wards criterion? J Classif 31(3):274–295MathSciNetCrossRefMATH Murtagh F, Legendre P (2014) Wards hierarchical agglomerative clustering method: which algorithms implement wards criterion? J Classif 31(3):274–295MathSciNetCrossRefMATH
17.
go back to reference Neshatian K, Zhang M, Andreae P (2012) A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans Evol Comput 16(5):645–661CrossRef Neshatian K, Zhang M, Andreae P (2012) A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans Evol Comput 16(5):645–661CrossRef
18.
go back to reference Nguyen BH, Xue B, Andreae P (2017a) A novel binary particle swarm optimization algorithm and its applications on knapsack and feature selection problems. In: Proceeding of the 20th Asia pacific symposium on intelligent and evolutionary systems. Springer, pp 319–332 Nguyen BH, Xue B, Andreae P (2017a) A novel binary particle swarm optimization algorithm and its applications on knapsack and feature selection problems. In: Proceeding of the 20th Asia pacific symposium on intelligent and evolutionary systems. Springer, pp 319–332
19.
go back to reference Nguyen HB, Xue B, Liu I, Andreae P, Zhang M (2015) Gaussian transformation based representation in particle swarm optimisation for feature selection. In: European conference on the applications of evolutionary computation. Springer, pp 541–553 Nguyen HB, Xue B, Liu I, Andreae P, Zhang M (2015) Gaussian transformation based representation in particle swarm optimisation for feature selection. In: European conference on the applications of evolutionary computation. Springer, pp 541–553
20.
go back to reference Nguyen HB, Xue B, Andreae P (2016) Mutual information for feature selection: estimation or counting? Evol Intel 9(3):95–110CrossRef Nguyen HB, Xue B, Andreae P (2016) Mutual information for feature selection: estimation or counting? Evol Intel 9(3):95–110CrossRef
21.
go back to reference Nguyen HB, Xue B, Andreae P (2017b) Surrogate-model based particle swarm optimisation with local search for feature selection in classification, vol 10199. Springer, Berlin, pp 487–505 Nguyen HB, Xue B, Andreae P (2017b) Surrogate-model based particle swarm optimisation with local search for feature selection in classification, vol 10199. Springer, Berlin, pp 487–505
22.
go back to reference Niu G (2017) Feature selection optimization. Springer, Berlin, pp 139–171 Niu G (2017) Feature selection optimization. Springer, Berlin, pp 139–171
23.
go back to reference Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34(2):133–143CrossRef Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34(2):133–143CrossRef
24.
go back to reference Siedlecki W, Sklansky J (1989) A note on genetic algorithms for large-scale feature selection. Pattern Recognit Lett 10(5):335–347CrossRefMATH Siedlecki W, Sklansky J (1989) A note on genetic algorithms for large-scale feature selection. Pattern Recognit Lett 10(5):335–347CrossRefMATH
25.
go back to reference Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. In: Data classification: algorithms and applications. CRC Press Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. In: Data classification: algorithms and applications. CRC Press
26.
go back to reference Wang F, Liang J (2016) An efficient feature selection algorithm for hybrid data. Neurocomputing 193:33–41CrossRef Wang F, Liang J (2016) An efficient feature selection algorithm for hybrid data. Neurocomputing 193:33–41CrossRef
27.
go back to reference Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103CrossRefMATH Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103CrossRefMATH
28.
go back to reference Xue B, Zhang M, Browne WN (2012) Multi-objective particle swarm optimisation (pso) for feature selection. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 81–88 Xue B, Zhang M, Browne WN (2012) Multi-objective particle swarm optimisation (pso) for feature selection. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 81–88
29.
go back to reference Xue B, Nguyen S, Zhang M (2014) A new binary particle swarm optimisation algorithm for feature selection. In: European conference on the applications of evolutionary computation. Springer, pp 501–513 Xue B, Nguyen S, Zhang M (2014) A new binary particle swarm optimisation algorithm for feature selection. In: European conference on the applications of evolutionary computation. Springer, pp 501–513
30.
go back to reference Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626CrossRef Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626CrossRef
Metadata
Title
PSO with surrogate models for feature selection: static and dynamic clustering-based methods
Authors
Hoai Bach Nguyen
Bing Xue
Peter Andreae
Publication date
07-03-2018
Publisher
Springer Berlin Heidelberg
Published in
Memetic Computing / Issue 3/2018
Print ISSN: 1865-9284
Electronic ISSN: 1865-9292
DOI
https://doi.org/10.1007/s12293-018-0254-9

Other articles of this Issue 3/2018

Memetic Computing 3/2018 Go to the issue

Editorial

Editorial

Premium Partner