Skip to main content
Erschienen in: Granular Computing 3/2016

01.09.2016 | Original Paper

Semi-greedy heuristics for feature selection with test cost constraints

verfasst von: Fan Min, Juan Xu

Erschienen in: Granular Computing | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In real-world applications, the test cost of data collection should not exceed a given budget. The problem of selecting an informative feature subset under this budget is referred to as feature selection with test cost constraints. Greedy heuristics are a natural and efficient method for this kind of combinatorial optimization problem. However, the recursive selection of locally optimal choices means that the global optimum is often missed. In this paper, we present a three-step semi-greedy heuristic method that directly forms a population of candidate solutions to obtain better results. In the first step, we design the heuristic function. The second step involves the random selection of a feature from the current best k features at each iteration. This is the major difference from conventional greedy heuristics. In the third step, we obtain p candidate solutions and select the best one. Through a series of experiments on four datasets, we compare our algorithm with a classic greedy heuristic approach and an information gain-based \(\lambda \)-weighted greedy heuristic method. The results show that the new approach is more likely to obtain optimal solutions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Aarts E, Korst J (1988) Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing. Wiley, New YorkMATH Aarts E, Korst J (1988) Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing. Wiley, New YorkMATH
Zurück zum Zitat Al-Khatib W, Day YF, Ghafoor A, Berra PB (1999) Semantic modeling and knowledge representation in multimedia databases. IEEE Trans Knowl Data Eng 11(1):64–80CrossRef Al-Khatib W, Day YF, Ghafoor A, Berra PB (1999) Semantic modeling and knowledge representation in multimedia databases. IEEE Trans Knowl Data Eng 11(1):64–80CrossRef
Zurück zum Zitat Bargiela A, Pedrycz W (2012) Granular computing: an introduction. Springer Science and Business Media, BerlinMATH Bargiela A, Pedrycz W (2012) Granular computing: an introduction. Springer Science and Business Media, BerlinMATH
Zurück zum Zitat Blake C, Merz CJ (1998) UCI Repository of machine learning databases Blake C, Merz CJ (1998) UCI Repository of machine learning databases
Zurück zum Zitat Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134CrossRef Boehm O, Hardoon DR, Manevitz LM (2011) Classifying cognitive states of brain activity via one-class neural networks with feature selection by genetic algorithms. Int J Mach Learn Cybern 2(3):125–134CrossRef
Zurück zum Zitat Burke EK, Newall JP, Weare RF (1995) A memetic algorithm for university exam timetabling. In: Practice and theory of automated timetabling, pp 241–250. Springer, Berlin Burke EK, Newall JP, Weare RF (1995) A memetic algorithm for university exam timetabling. In: Practice and theory of automated timetabling, pp 241–250. Springer, Berlin
Zurück zum Zitat Cai JL, Zhu W, Ding HJ, Min F (2014) An improved artificial bee colony algorithm for minimal time cost reduction. Int J Mach Learn Cybern 5(5):743–752CrossRef Cai JL, Zhu W, Ding HJ, Min F (2014) An improved artificial bee colony algorithm for minimal time cost reduction. Int J Mach Learn Cybern 5(5):743–752CrossRef
Zurück zum Zitat Chen DG, Wang CZ, Hu QH (2007) A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets. Inform Sci 177(17):3500–3518MathSciNetCrossRefMATH Chen DG, Wang CZ, Hu QH (2007) A new approach to attribute reduction of consistent and inconsistent covering decision systems with covering rough sets. Inform Sci 177(17):3500–3518MathSciNetCrossRefMATH
Zurück zum Zitat Fan AJ, Zhao H, Zhu W (2015) Test-cost-sensitive attribute reduction on heterogeneous data for adaptive neighborhood model. Soft Comput 1–12 Fan AJ, Zhao H, Zhu W (2015) Test-cost-sensitive attribute reduction on heterogeneous data for adaptive neighborhood model. Soft Comput 1–12
Zurück zum Zitat Gu SM, Wu WZ (2013) On knowledge acquisition in multi-scale decision systems. Int J Mach Learn Cybern 4(5):477–486CrossRef Gu SM, Wu WZ (2013) On knowledge acquisition in multi-scale decision systems. Int J Mach Learn Cybern 4(5):477–486CrossRef
Zurück zum Zitat He X, Min F, Zhu W (2013) Parametric rough sets with application to granular association rule mining. Math Probl Eng 2013:1–13 He X, Min F, Zhu W (2013) Parametric rough sets with application to granular association rule mining. Math Probl Eng 2013:1–13
Zurück zum Zitat Hu QH, Xie ZX, Yu DR (2007) Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognit 40(12):3509–3521CrossRefMATH Hu QH, Xie ZX, Yu DR (2007) Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognit 40(12):3509–3521CrossRefMATH
Zurück zum Zitat Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inform Sci 178(18):3577–3594MathSciNetCrossRefMATH Hu QH, Yu DR, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inform Sci 178(18):3577–3594MathSciNetCrossRefMATH
Zurück zum Zitat Janusz A, Ślȩzak D (2014) Rough set methods for attribute clustering and selection. Appl Artif Intell 28(3):220–242CrossRef Janusz A, Ślȩzak D (2014) Rough set methods for attribute clustering and selection. Appl Artif Intell 28(3):220–242CrossRef
Zurück zum Zitat Jensen R, Shen Q (2003) Finding rough set reducts with ant colony optimization. In: Proceedings of the 2003 UK workshop on computational intelligence, vol 1, issue 2 Jensen R, Shen Q (2003) Finding rough set reducts with ant colony optimization. In: Proceedings of the 2003 UK workshop on computational intelligence, vol 1, issue 2
Zurück zum Zitat Jia XY, Liao WH, Tang ZM, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set model. Inform Sci 219:151–167MathSciNetCrossRefMATH Jia XY, Liao WH, Tang ZM, Shang L (2013) Minimum cost attribute reduction in decision-theoretic rough set model. Inform Sci 219:151–167MathSciNetCrossRefMATH
Zurück zum Zitat Kirkpatrick S (1984) Optimization by simulated annealing: quantitative studies. J Statis Phys 34(5–6):975–986MathSciNetCrossRef Kirkpatrick S (1984) Optimization by simulated annealing: quantitative studies. J Statis Phys 34(5–6):975–986MathSciNetCrossRef
Zurück zum Zitat Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324CrossRefMATH Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324CrossRefMATH
Zurück zum Zitat Li XJ, Zhao H, Zhu W (2014a) An exponent weighted algorithm for minimal cost feature selection. Int J Mach Learn Cybern 1–10 Li XJ, Zhao H, Zhu W (2014a) An exponent weighted algorithm for minimal cost feature selection. Int J Mach Learn Cybern 1–10
Zurück zum Zitat Li J, Zhao H, Zhu W (2014b) Fast randomized algorithm with restart strategy for minimal test cost feature selection. Int J Mach Learn Cybern 6(3):435–442CrossRef Li J, Zhao H, Zhu W (2014b) Fast randomized algorithm with restart strategy for minimal test cost feature selection. Int J Mach Learn Cybern 6(3):435–442CrossRef
Zurück zum Zitat Lin TY (1998) Granular computing on binary relations I: data mining and neighborhood systems. Rough Sets Knowl Discov 1:107–121MATH Lin TY (1998) Granular computing on binary relations I: data mining and neighborhood systems. Rough Sets Knowl Discov 1:107–121MATH
Zurück zum Zitat Lin TY, Cercone N (1996) Rough sets and data mining: analysis of imprecise data. Kluwer Academic Publishers, DordrechtCrossRef Lin TY, Cercone N (1996) Rough sets and data mining: analysis of imprecise data. Kluwer Academic Publishers, DordrechtCrossRef
Zurück zum Zitat Liu JB, Min F, Zhao H, Zhu W (2014) Feature selection with positive region constraint for test-cost-sensitive data. Trans Rough Sets XVIII:23–33 Liu JB, Min F, Zhao H, Zhu W (2014) Feature selection with positive region constraint for test-cost-sensitive data. Trans Rough Sets XVIII:23–33
Zurück zum Zitat Min F, He HP, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inform Sci 181(22):4928–4942CrossRef Min F, He HP, Qian Y, Zhu W (2011) Test-cost-sensitive attribute reduction. Inform Sci 181(22):4928–4942CrossRef
Zurück zum Zitat Min F, Zhu W (2011) Optimal sub-reducts with test cost constraint. Rough Sets Knowl Technol 57–62 Min F, Zhu W (2011) Optimal sub-reducts with test cost constraint. Rough Sets Knowl Technol 57–62
Zurück zum Zitat Modrzejewski M (1993) Feature selection using rough sets theory. In: Machine learning: ECML-93, pp 213–226. Springer, Berlin Modrzejewski M (1993) Feature selection using rough sets theory. In: Machine learning: ECML-93, pp 213–226. Springer, Berlin
Zurück zum Zitat Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Courier Dover Publications, USAMATH Papadimitriou CH, Steiglitz K (1998) Combinatorial optimization: algorithms and complexity. Courier Dover Publications, USAMATH
Zurück zum Zitat Pedrycz W (2001) Granular computing: an emerging paradigm, vol 70. Springer Science and Business Media, BerlinCrossRefMATH Pedrycz W (2001) Granular computing: an emerging paradigm, vol 70. Springer Science and Business Media, BerlinCrossRefMATH
Zurück zum Zitat Pedrycz W (2013) Granular computing: analysis and design of intelligent systems. CRC press, Boca RatonCrossRef Pedrycz W (2013) Granular computing: analysis and design of intelligent systems. CRC press, Boca RatonCrossRef
Zurück zum Zitat Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618MathSciNetCrossRefMATH Qian YH, Liang JY, Pedrycz W, Dang CY (2010) Positive approximation: an accelerator for attribute reduction in rough set theory. Artif Intell 174(9–10):597–618MathSciNetCrossRefMATH
Zurück zum Zitat Qian J, Miao DQ, Zhang ZH, Yue XD (2014) Parallel attribute reduction algorithms using mapreduce. Inform Sci 279:671–690MathSciNetCrossRef Qian J, Miao DQ, Zhang ZH, Yue XD (2014) Parallel attribute reduction algorithms using mapreduce. Inform Sci 279:671–690MathSciNetCrossRef
Zurück zum Zitat Qin YX, Zheng DQ, Zhao TJ (2012) Research on search results optimization technology with category features integration. Int J Mach Learn Cybern 3(1):71–76CrossRef Qin YX, Zheng DQ, Zhao TJ (2012) Research on search results optimization technology with category features integration. Int J Mach Learn Cybern 3(1):71–76CrossRef
Zurück zum Zitat Skowron A, Rauszer C (1992) The discernibility matrices and functions in information systems. Intell Decis Support 331–362 Skowron A, Rauszer C (1992) The discernibility matrices and functions in information systems. Intell Decis Support 331–362
Zurück zum Zitat Ślȩzak D, Wróblewski J (2003) Order based genetic algorithms for the search of approximate entropy reducts. In: Rough sets, fuzzy sets, data mining, and granular computing, pp 308–311. Springer, Berlin Ślȩzak D, Wróblewski J (2003) Order based genetic algorithms for the search of approximate entropy reducts. In: Rough sets, fuzzy sets, data mining, and granular computing, pp 308–311. Springer, Berlin
Zurück zum Zitat Słowiński R (1992) Intelligent decision support: handbook of applications and advances of the rough sets theory, vol 11. Springer Science and Business Media, BerlinMATH Słowiński R (1992) Intelligent decision support: handbook of applications and advances of the rough sets theory, vol 11. Springer Science and Business Media, BerlinMATH
Zurück zum Zitat Turney PD (1995) Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409 Turney PD (1995) Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409
Zurück zum Zitat Turney PD (2000) Types of cost in inductive concept learning. Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7 Turney PD (2000) Types of cost in inductive concept learning. Proceedings of the workshop on cost-sensitive learning at the 17th ICML, pp 1–7
Zurück zum Zitat Wang GY, Zhang QH (2009) Granular computing based cognitive computing. In: 8th IEEE international conference on cognitive informatics, IEEE, pp 155–161 Wang GY, Zhang QH (2009) Granular computing based cognitive computing. In: 8th IEEE international conference on cognitive informatics, IEEE, pp 155–161
Zurück zum Zitat Wang GY, Yu H, Yang DC (2002a) Decision table reduction based on conditional information entropy. Chin J Comput 2(7):759–766MathSciNet Wang GY, Yu H, Yang DC (2002a) Decision table reduction based on conditional information entropy. Chin J Comput 2(7):759–766MathSciNet
Zurück zum Zitat Wang GY, Yu H, Yang DC (2002b) Decision table reduction based on conditional information entropy. Chin J Comput 25(7):759–766MathSciNet Wang GY, Yu H, Yang DC (2002b) Decision table reduction based on conditional information entropy. Chin J Comput 25(7):759–766MathSciNet
Zurück zum Zitat Wang XZ, Zhai JH, Lu SX (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inform Sci 178(16):3188–3202MathSciNetCrossRefMATH Wang XZ, Zhai JH, Lu SX (2008) Induction of multiple fuzzy decision trees based on rough set technique. Inform Sci 178(16):3188–3202MathSciNetCrossRefMATH
Zurück zum Zitat Wei P, Ma PJ, Hu QH, Su XH, Ma CQ (2014) Comparative analysis on margin based feature selection algorithms. Int J Mach Learn Cybern 5(3):339–367CrossRef Wei P, Ma PJ, Hu QH, Su XH, Ma CQ (2014) Comparative analysis on margin based feature selection algorithms. Int J Mach Learn Cybern 5(3):339–367CrossRef
Zurück zum Zitat Wu WZ, Leung Y (2011) Theory and applications of granular labelled partitions in multi-scale decision tables. Inform Sci 181(18):3878–3897CrossRefMATH Wu WZ, Leung Y (2011) Theory and applications of granular labelled partitions in multi-scale decision tables. Inform Sci 181(18):3878–3897CrossRefMATH
Zurück zum Zitat Yang XB, Qi YS, Song XN, Yang JY (2013) Test cost sensitive multigranulation rough set: model and minimal cost selection. Inform Sci 250:184–199MathSciNetCrossRefMATH Yang XB, Qi YS, Song XN, Yang JY (2013) Test cost sensitive multigranulation rough set: model and minimal cost selection. Inform Sci 250:184–199MathSciNetCrossRefMATH
Zurück zum Zitat Yao YY (2000) Granular computing: basic issues and possible solutions. In: Proceedings of the 5th joint conference on information sciences, vol 1, pp 186–189 Yao YY (2000) Granular computing: basic issues and possible solutions. In: Proceedings of the 5th joint conference on information sciences, vol 1, pp 186–189
Zurück zum Zitat Yao YY (2004) A partition model of granular computing. In: Transactions on rough sets I, pp 232–253. Springer, Berlin Yao YY (2004) A partition model of granular computing. In: Transactions on rough sets I, pp 232–253. Springer, Berlin
Zurück zum Zitat Yao YY, Zhao Y, Wang J (2006) On reduct construction algorithms. In: Rough sets and knowledge technology, pp 297–304. Springer, Berlin Yao YY, Zhao Y, Wang J (2006) On reduct construction algorithms. In: Rough sets and knowledge technology, pp 297–304. Springer, Berlin
Zurück zum Zitat Yao JT, Vasilakos AV, Pedrycz W (2013) Granular computing: perspectives and challenges. IEEE Trans Syst Man Cybern Part C Appl Rev 43(6):1977–1989 Yao JT, Vasilakos AV, Pedrycz W (2013) Granular computing: perspectives and challenges. IEEE Trans Syst Man Cybern Part C Appl Rev 43(6):1977–1989
Zurück zum Zitat Zhai JH, Zhai MY, Bai CY (2013) An improved algorithm for calculating fuzzy attribute reducts. J Intell Fuzzy Syst Appl Eng Technol 25(2):303–313MathSciNetMATH Zhai JH, Zhai MY, Bai CY (2013) An improved algorithm for calculating fuzzy attribute reducts. J Intell Fuzzy Syst Appl Eng Technol 25(2):303–313MathSciNetMATH
Zurück zum Zitat Zhang WX, Wei L, Qi JJ (2005) Attribute reduction in concept lattice based on discernibility matrix. In: Rough sets, fuzzy sets, data mining, and granular computing, pp 157–165. Springer, Berlin Zhang WX, Wei L, Qi JJ (2005) Attribute reduction in concept lattice based on discernibility matrix. In: Rough sets, fuzzy sets, data mining, and granular computing, pp 157–165. Springer, Berlin
Zurück zum Zitat Zhao H, Min F, Zhu W (2013) Test-cost-sensitive attribute reduction of data with normal distribution measurement errors. Math Probl Eng 2013:1–12MathSciNetMATH Zhao H, Min F, Zhu W (2013) Test-cost-sensitive attribute reduction of data with normal distribution measurement errors. Math Probl Eng 2013:1–12MathSciNetMATH
Zurück zum Zitat Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82CrossRef Zhao H, Zhu W (2014) Optimal cost-sensitive granularization based on rough sets for variable costs. Knowl Based Syst 65:72–82CrossRef
Metadaten
Titel
Semi-greedy heuristics for feature selection with test cost constraints
verfasst von
Fan Min
Juan Xu
Publikationsdatum
01.09.2016
Verlag
Springer International Publishing
Erschienen in
Granular Computing / Ausgabe 3/2016
Print ISSN: 2364-4966
Elektronische ISSN: 2364-4974
DOI
https://doi.org/10.1007/s41066-016-0017-2

Premium Partner