Skip to main content

2019 | OriginalPaper | Buchkapitel

10. Filter-Based Feature Selection Methods Using Hill Climbing Approach

verfasst von : Saptarsi Goswami, Sanjay Chakraborty, Priyanka Guha, Arunabha Tarafdar, Aman Kedia

Erschienen in: Natural Computing for Unsupervised Learning

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature selection remains one of the most important steps for usability of a model for both supervised and unsupervised classification. For a dataset, with n features, the number of possible feature subsets is 2n. Even for a moderate size of n, there is a combinatorial explosion in the search space. Feature selection is a NP-hard problem; hence finding the optimal solution is not feasible. Typically various kinds of intelligent and metaheuristic search techniques can be employed for this purpose. Hill climbing is arguably the simplest of such techniques. It has many variants based on (a) trade-off between greediness and randomness, (b) direction of the search, and (c) size of the neighborhood. Consequently it might not be trivial for the practitioner to choose a suitable method for the task in hand. In this paper, we have attempted to address this issue in the context of feature selection. The descriptions of the methods are followed by an extensive empirical study over 20 publicly available datasets. Finally a comparison has been done with genetic algorithm, which shows the effectiveness of hill climbing methods in the context of feature selection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Liu H, Yu L (2005 Apr) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502CrossRef Liu H, Yu L (2005 Apr) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502CrossRef
3.
Zurück zum Zitat Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288MathSciNetMATH Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288MathSciNetMATH
4.
Zurück zum Zitat Das AK, Goswami S, Chakrabarti A, Chakraborty B (2017) A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Syst Appl 88:81–94CrossRef Das AK, Goswami S, Chakrabarti A, Chakraborty B (2017) A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Syst Appl 88:81–94CrossRef
5.
Zurück zum Zitat Goswami S, Das AK, Guha P, Tarafdar A, Chakraborty S, Chakrabarti A, Chakraborty B (2017) An approach of feature selection using graph-theoretic heuristic and hill climbing. Pattern Anal Applic:1–17 Goswami S, Das AK, Guha P, Tarafdar A, Chakraborty S, Chakrabarti A, Chakraborty B (2017) An approach of feature selection using graph-theoretic heuristic and hill climbing. Pattern Anal Applic:1–17
6.
Zurück zum Zitat Goswami S, Chakrabarti A, Chakraborty B (2016) A proposal for recommendation of feature selection algorithm based on data set characteristics. J UCS 22(6):760–781MathSciNet Goswami S, Chakrabarti A, Chakraborty B (2016) A proposal for recommendation of feature selection algorithm based on data set characteristics. J UCS 22(6):760–781MathSciNet
7.
Zurück zum Zitat Goswami S, Saha S, Chakravorty S, Chakrabarti A, Chakraborty B (2015) A new evaluation measure for feature subset selection with genetic algorithm. Int J Intell Syst Appl MECS 7(10):28CrossRef Goswami S, Saha S, Chakravorty S, Chakrabarti A, Chakraborty B (2015) A new evaluation measure for feature subset selection with genetic algorithm. Int J Intell Syst Appl MECS 7(10):28CrossRef
8.
Zurück zum Zitat Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43(1):5–13CrossRef Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43(1):5–13CrossRef
9.
Zurück zum Zitat De La Iglesia B (2013) Evolutionary computation for feature selection in classification problems. Wiley Interdiscip Rev Data Min Knowl Disc 3(6):381–407CrossRef De La Iglesia B (2013) Evolutionary computation for feature selection in classification problems. Wiley Interdiscip Rev Data Min Knowl Disc 3(6):381–407CrossRef
10.
Zurück zum Zitat Goswami S, Das AK, Chakrabarti A, Chakraborty B (2017) A feature cluster taxonomy based feature selection technique. Expert Syst Appl 79:76–89CrossRef Goswami S, Das AK, Chakrabarti A, Chakraborty B (2017) A feature cluster taxonomy based feature selection technique. Expert Syst Appl 79:76–89CrossRef
11.
Zurück zum Zitat Goswami S, Chakraborty S, Saha HN (2017) An univariate feature elimination strategy for clustering based on metafeatures. Int J Intell Syst Appl 9(10):20 Goswami S, Chakraborty S, Saha HN (2017) An univariate feature elimination strategy for clustering based on metafeatures. Int J Intell Syst Appl 9(10):20
12.
Zurück zum Zitat Goswami S, Chakrabarti A, Chakraborty B (2017) An efficient feature selection technique for clustering based on a new measure of feature importance. J Intell Fuzzy Syst 32(6):3847–3858CrossRef Goswami S, Chakrabarti A, Chakraborty B (2017) An efficient feature selection technique for clustering based on a new measure of feature importance. J Intell Fuzzy Syst 32(6):3847–3858CrossRef
13.
Zurück zum Zitat Gent IP, Walsh T (1993) Towards an understanding of hill-climbing procedures for SAT. In: AAAI, vol 93, pp 28–33 Gent IP, Walsh T (1993) Towards an understanding of hill-climbing procedures for SAT. In: AAAI, vol 93, pp 28–33
14.
Zurück zum Zitat Wang R, Youssef AM, Elhakeem AK (2006) On some feature selection strategies for spam filter design. In: Electrical and computer engineering, 2006. CCECE'06, Canadian Conference on 2006 May. IEEE, pp 2186–2189 Wang R, Youssef AM, Elhakeem AK (2006) On some feature selection strategies for spam filter design. In: Electrical and computer engineering, 2006. CCECE'06, Canadian Conference on 2006 May. IEEE, pp 2186–2189
15.
Zurück zum Zitat Burke EK, Bykov Y (2008) A late acceptance strategy in hill-climbing for exam timetabling problems. PATAT 2008 Conference, Montreal Burke EK, Bykov Y (2008) A late acceptance strategy in hill-climbing for exam timetabling problems. PATAT 2008 Conference, Montreal
16.
Zurück zum Zitat Lang KJ (2016) Hill climbing beats genetic search on a boolean circuit synthesis problem of koza's. In: Proceedings of the twelfth international conference on machine learning 2016 Jan 22, pp 340–343 Lang KJ (2016) Hill climbing beats genetic search on a boolean circuit synthesis problem of koza's. In: Proceedings of the twelfth international conference on machine learning 2016 Jan 22, pp 340–343
17.
Zurück zum Zitat Bykov Y, Petrovic S (2016) A step counting hill climbing algorithm applied to university examination timetabling. J Schedul:1–4 Bykov Y, Petrovic S (2016) A step counting hill climbing algorithm applied to university examination timetabling. J Schedul:1–4
18.
Zurück zum Zitat Seyedmahmoudian M, Horan B, Rahmani R, Maung Than Oo A, Stojcevski A (2016) Efficient photovoltaic system maximum power point tracking using a new technique. Energies 9(3):147CrossRef Seyedmahmoudian M, Horan B, Rahmani R, Maung Than Oo A, Stojcevski A (2016) Efficient photovoltaic system maximum power point tracking using a new technique. Energies 9(3):147CrossRef
19.
Zurück zum Zitat Saichandana B, Srinivas K, Kumar RK (2014) Clustering algorithm combined with hill climbing for classification of remote sensing image. Int J Electr Comput Eng 4(6):923–930 Saichandana B, Srinivas K, Kumar RK (2014) Clustering algorithm combined with hill climbing for classification of remote sensing image. Int J Electr Comput Eng 4(6):923–930
20.
Zurück zum Zitat Ou TC, Su WF, Liu XZ, Huang SJ, Tai TY (2016) A modified bird-mating optimization with hill-climbing for connection decisions of transformers. Energies 9(9):671CrossRef Ou TC, Su WF, Liu XZ, Huang SJ, Tai TY (2016) A modified bird-mating optimization with hill-climbing for connection decisions of transformers. Energies 9(9):671CrossRef
21.
Zurück zum Zitat Nunes CM, Britto AS, Kaestner CA, Sabourin R (2004) An optimized hill climbing algorithm for feature subset selection: Evaluation on handwritten character recognition. In: Frontiers in handwriting recognition, 2004. IWFHR-9 2004. Ninth international workshop on 2004 Oct 26. IEEE, pp 365–370 Nunes CM, Britto AS, Kaestner CA, Sabourin R (2004) An optimized hill climbing algorithm for feature subset selection: Evaluation on handwritten character recognition. In: Frontiers in handwriting recognition, 2004. IWFHR-9 2004. Ninth international workshop on 2004 Oct 26. IEEE, pp 365–370
22.
Zurück zum Zitat Gelbart D, Morgan N, Tsymbal A (2009) Hill-climbing feature selection for multi-stream ASR. In: INTERSPEECH 2009, pp 2967–2970 Gelbart D, Morgan N, Tsymbal A (2009) Hill-climbing feature selection for multi-stream ASR. In: INTERSPEECH 2009, pp 2967–2970
23.
Zurück zum Zitat Hall MA, Smith LA (1997) Feature subset selection: a correlation based filter approach. In: International conference on neural information processing and intelligent information systems, pp 855–858 Hall MA, Smith LA (1997) Feature subset selection: a correlation based filter approach. In: International conference on neural information processing and intelligent information systems, pp 855–858
24.
Zurück zum Zitat Liu Y, Schumann M (2005) Data mining feature selection for credit scoring models. J Oper Res Soc 56(9):1099–1108CrossRef Liu Y, Schumann M (2005) Data mining feature selection for credit scoring models. J Oper Res Soc 56(9):1099–1108CrossRef
25.
Zurück zum Zitat Begg RK, Palaniswami M, Owen B (2005) Support vector machines for automated gait classification. IEEE Trans Biomed Eng 52(5):828–838CrossRef Begg RK, Palaniswami M, Owen B (2005) Support vector machines for automated gait classification. IEEE Trans Biomed Eng 52(5):828–838CrossRef
26.
Zurück zum Zitat Farmer ME, Bapna S, Jain AK (2004) Large scale feature selection using modified random mutation hill climbing. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on 2004 Aug 23, vol 2. IEEE, pp 287–290 Farmer ME, Bapna S, Jain AK (2004) Large scale feature selection using modified random mutation hill climbing. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on 2004 Aug 23, vol 2. IEEE, pp 287–290
27.
Zurück zum Zitat Malakasiotis P (2009) Paraphrase recognition using machine learning to combine similarity measures. In: Proceedings of the ACL-IJCNLP 2009 student research workshop 2009 Aug 4. Association for Computational Linguistics, pp 27–35 Malakasiotis P (2009) Paraphrase recognition using machine learning to combine similarity measures. In: Proceedings of the ACL-IJCNLP 2009 student research workshop 2009 Aug 4. Association for Computational Linguistics, pp 27–35
28.
Zurück zum Zitat Caruana R, Freitag D (1994) Greedy Attribute Selection. In: ICML, pp 28–36 Caruana R, Freitag D (1994) Greedy Attribute Selection. In: ICML, pp 28–36
29.
Zurück zum Zitat Lewis R (2009) A general-purpose hill-climbing method for order independent minimum grouping problems: A case study in graph colouring and bin packing. Comput Oper Res 36(7):2295–2310MathSciNetCrossRef Lewis R (2009) A general-purpose hill-climbing method for order independent minimum grouping problems: A case study in graph colouring and bin packing. Comput Oper Res 36(7):2295–2310MathSciNetCrossRef
30.
Zurück zum Zitat Mitchell M, Holland JH, Forrest S (2014) Relative building-block fitness and the building block hypothesis. D. Whitley. Found Genet Algorithms 2:109–126 Mitchell M, Holland JH, Forrest S (2014) Relative building-block fitness and the building block hypothesis. D. Whitley. Found Genet Algorithms 2:109–126
31.
Zurück zum Zitat Lourenço HR, Martin OC, Stützle T (2003) Iterated local search. In: Handbook of metaheuristics. Springer, Boston, pp 320–353 Lourenço HR, Martin OC, Stützle T (2003) Iterated local search. In: Handbook of metaheuristics. Springer, Boston, pp 320–353
32.
Zurück zum Zitat Mitchell M, Holland JH When will a genetic algorithm outperform hill-climbing? Mitchell M, Holland JH When will a genetic algorithm outperform hill-climbing?
33.
Zurück zum Zitat Hall MA Correlation-based feature selection for machine learning. Doctoral dissertation, The University of Waikato Hall MA Correlation-based feature selection for machine learning. Doctoral dissertation, The University of Waikato
35.
Zurück zum Zitat Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J Mult Valued Log Soft Comput 17(2-3):255–287 Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J Mult Valued Log Soft Comput 17(2-3):255–287
36.
Zurück zum Zitat R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/ R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://​www.​R-project.​org/
40.
Zurück zum Zitat Gutowski MW (2005) Biology, physics, small worlds and genetic algorithms. In: Shannon S (ed) Leading edge computer science research. Nova Science Publishers Inc, Hauppage, pp 165–218 Gutowski MW (2005) Biology, physics, small worlds and genetic algorithms. In: Shannon S (ed) Leading edge computer science research. Nova Science Publishers Inc, Hauppage, pp 165–218
41.
Zurück zum Zitat Therneau T, Atkinson B, Ripley B (2012) rpart: recursive partitioning. R package version 4.1-0 Therneau T, Atkinson B, Ripley B (2012) rpart: recursive partitioning. R package version 4.1-0
Metadaten
Titel
Filter-Based Feature Selection Methods Using Hill Climbing Approach
verfasst von
Saptarsi Goswami
Sanjay Chakraborty
Priyanka Guha
Arunabha Tarafdar
Aman Kedia
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-319-98566-4_10

Neuer Inhalt