Skip to main content

2016 | OriginalPaper | Buchkapitel

A Targeted Estimation of Distribution Algorithm Compared to Traditional Methods in Feature Selection

verfasst von : Geoffrey Neumann, David Cairns

Erschienen in: Computational Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Targeted Estimation of Distribution Algorithm (TEDA) introduces into an EDA/GA hybrid framework a ‘Targeting’ process, whereby the number of active genes, or ‘control points’, in a solution is driven in an optimal direction. For larger feature selection problems with over a thousand features, traditional methods such as forward and backward selection are inefficient. Traditional EAs may perform better but are slow to optimize if a problem is sufficiently noisy that most large solutions are equally ineffective and it is only when much smaller solutions are discovered that effective optimization may begin. By using targeting, TEDA is able to drive down the feature set size quickly and so speeds up this process. This approach was tested on feature selection problems with between 500 and 20,000 features using all of these approaches and it was confirmed that TEDA finds effective solutions significantly faster than the other approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baluja, S.: Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94-163, Computer Science Department, Carnegie Mellon University (1994) Baluja, S.: Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94-163, Computer Science Department, Carnegie Mellon University (1994)
2.
Zurück zum Zitat Bo, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biol. 3(4), 1–17 (2002)CrossRef Bo, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biol. 3(4), 1–17 (2002)CrossRef
3.
Zurück zum Zitat Cantu-Paz, E.: Feature subset selection by estimation of distribution algorithms. In Proceedings of Genetic and Evolutionary Computation Conference MIT Press, pp. 303-310 (2002) Cantu-Paz, E.: Feature subset selection by estimation of distribution algorithms. In Proceedings of Genetic and Evolutionary Computation Conference MIT Press, pp. 303-310 (2002)
4.
Zurück zum Zitat Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011) Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
5.
Zurück zum Zitat Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)CrossRef Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)CrossRef
6.
Zurück zum Zitat Frank, A., Asuncion, A.: UCI machine learning repository (2010) Frank, A., Asuncion, A.: UCI machine learning repository (2010)
7.
Zurück zum Zitat Godley, P., Cairns, D., Cowie, J., McCall, J.: Fitness directed intervention crossover approaches applied to bio-scheduling problems. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp 120-127 (2008) Godley, P., Cairns, D., Cowie, J., McCall, J.: Fitness directed intervention crossover approaches applied to bio-scheduling problems. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, pp 120-127 (2008)
8.
Zurück zum Zitat Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003) Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
9.
Zurück zum Zitat Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. Adv. Neural Inf. Process. Syst. 17, 545–552 (2004) Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. Adv. Neural Inf. Process. Syst. 17, 545–552 (2004)
10.
Zurück zum Zitat Inza, I., Larranaga, P., Etxeberria, R., Sierra, B.: Feature subset selection by bayesian networks based on optimization. Artif. Intell. 123(1), 157–184 (2000)CrossRef Inza, I., Larranaga, P., Etxeberria, R., Sierra, B.: Feature subset selection by bayesian networks based on optimization. Artif. Intell. 123(1), 157–184 (2000)CrossRef
11.
Zurück zum Zitat Inza, I., Larranaga, P., Sierra, B.: Feature subset selection by bayesian networks: a comparison with genetic and sequential algorithms. Int. J. Approx. Reason. 27(2), 143–164 (2001)CrossRef Inza, I., Larranaga, P., Sierra, B.: Feature subset selection by bayesian networks: a comparison with genetic and sequential algorithms. Int. J. Approx. Reason. 27(2), 143–164 (2001)CrossRef
12.
Zurück zum Zitat Keller, J., Gray, M., Givens, J.: A fuzzy k-nearest neighbor algorithm. In: IEEE Transactions on Systems, Man and Cybernetics, vol. 4, pp. 580–585 (1985)CrossRef Keller, J., Gray, M., Givens, J.: A fuzzy k-nearest neighbor algorithm. In: IEEE Transactions on Systems, Man and Cybernetics, vol. 4, pp. 580–585 (1985)CrossRef
13.
Zurück zum Zitat Lai, C., Reinders, M., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognit. Lett. 27(10), 1067–1076 (2006)CrossRef Lai, C., Reinders, M., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognit. Lett. 27(10), 1067–1076 (2006)CrossRef
14.
Zurück zum Zitat Larranaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool For Evolutionary Computation, vol 2. Springer (2002) Larranaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool For Evolutionary Computation, vol 2. Springer (2002)
15.
Zurück zum Zitat Muhlenbein, H., Paass, G.: Recombination of genes to the estimation of distributions. PPSN, pp. 178–187. Springer, Berlin (1996) Muhlenbein, H., Paass, G.: Recombination of genes to the estimation of distributions. PPSN, pp. 178–187. Springer, Berlin (1996)
16.
Zurück zum Zitat Neumann, G., Cairns, D.: Targeted eda adapted for a routing problem with variable length chromosomes. In: IEEE Congress on Evolutionary Computation (CEC), pp. 220–225 (2012) Neumann, G., Cairns, D.: Targeted eda adapted for a routing problem with variable length chromosomes. In: IEEE Congress on Evolutionary Computation (CEC), pp. 220–225 (2012)
17.
Zurück zum Zitat Neumann, G.K., Cairns, D.E.: Introducing intervention targeting into estimation of distribution algorithms. In: Proceedings of the 27th ACM Symposium on Applied Computing, pp. 334 - 341 (2012) Neumann, G.K., Cairns, D.E.: Introducing intervention targeting into estimation of distribution algorithms. In: Proceedings of the 27th ACM Symposium on Applied Computing, pp. 334 - 341 (2012)
18.
Zurück zum Zitat Pena, J., Robles, V., Larranaga, P., Herves, V., Rosales, F., Perez, M.: GA-EDA: Hybrid evolutionary algorithm using genetic and estimation of distribution algorithms. Innovations in Applied Artificial Intelligence, pp. 361–371. Springer, Berlin (2004)CrossRef Pena, J., Robles, V., Larranaga, P., Herves, V., Rosales, F., Perez, M.: GA-EDA: Hybrid evolutionary algorithm using genetic and estimation of distribution algorithms. Innovations in Applied Artificial Intelligence, pp. 361–371. Springer, Berlin (2004)CrossRef
19.
Zurück zum Zitat Posik, P.: Preventing premature convergence in a simple eda via global step size setting. In: Proceedings of the 10th International Conference on PPSN X (2008)CrossRef Posik, P.: Preventing premature convergence in a simple eda via global step size setting. In: Proceedings of the 10th International Conference on PPSN X (2008)CrossRef
20.
Zurück zum Zitat Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)CrossRef Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)CrossRef
21.
Zurück zum Zitat Saeys, Y., Degroeve, S., Aeyels, D., de Peer, Y.V., Rouz, P.: Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction. Bioinformatics 19(suppl 2), 179–188 (2003)CrossRef Saeys, Y., Degroeve, S., Aeyels, D., de Peer, Y.V., Rouz, P.: Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction. Bioinformatics 19(suppl 2), 179–188 (2003)CrossRef
22.
Zurück zum Zitat Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3, 1399–1414 (2003) Stoppiglia, H., Dreyfus, G., Dubois, R., Oussar, Y.: Ranking a random feature for variable and feature selection. J. Mach. Learn. Res. 3, 1399–1414 (2003)
23.
Zurück zum Zitat Zhang, Q., Sun, J., Tsang, E.: Combinations of estimation of distribution algorithms and other techniques. Int. J. Autom. Comput. 4(3), 273–280 (2007)CrossRef Zhang, Q., Sun, J., Tsang, E.: Combinations of estimation of distribution algorithms and other techniques. Int. J. Autom. Comput. 4(3), 273–280 (2007)CrossRef
Metadaten
Titel
A Targeted Estimation of Distribution Algorithm Compared to Traditional Methods in Feature Selection
verfasst von
Geoffrey Neumann
David Cairns
Copyright-Jahr
2016
Verlag
Springer International Publishing
DOI
https://doi.org/10.1007/978-3-319-23392-5_5