Skip to main content
Erschienen in: Soft Computing 8/2011

01.08.2011 | Focus

Simulated annealing for supervised gene selection

verfasst von: Maurizio Filippone, Francesco Masulli, Stefano Rovetta

Erschienen in: Soft Computing | Ausgabe 8/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Genomic data, and more generally biomedical data, are often characterized by high dimensionality. An input selection procedure can attain the two objectives of highlighting the relevant variables (genes) and possibly improving classification results. In this paper, we propose a wrapper approach to gene selection in classification of gene expression data using simulated annealing along with supervised classification. The proposed approach can perform global combinatorial searches through the space of all possible input subsets, can handle cases with numerical, categorical or mixed inputs, and is able to find (sub-)optimal subsets of inputs giving low classification errors. The method has been tested on publicly available bioinformatics data sets using support vector machines and on a mixed type data set using classification trees. We also propose some heuristics able to speed up the convergence. The experimental results highlight the ability of the method to select minimal sets of relevant features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrafiotis DK, Cedeo W (2002) Feature selection for structure-activity correlation using binary particle swarms. J Med Chem 45:1098–1107CrossRef Agrafiotis DK, Cedeo W (2002) Feature selection for structure-activity correlation using binary particle swarms. J Med Chem 45:1098–1107CrossRef
Zurück zum Zitat Albrecht AA, Vinterbo SA, Ohno-Machado L (2003) An epicurean learning approach to gene-expression data classification. Artif Intell Med 28(1):75–87CrossRef Albrecht AA, Vinterbo SA, Ohno-Machado L (2003) An epicurean learning approach to gene-expression data classification. Artif Intell Med 28(1):75–87CrossRef
Zurück zum Zitat Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750 Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Zurück zum Zitat Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99:6562–6566MATHCrossRef Ambroise C, McLachlan GJ (2002) Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci USA 99:6562–6566MATHCrossRef
Zurück zum Zitat Andonie R, Fabry-Asztalos L, Abdul-Wahid, Collar C, S, Salim N (2006) An integrated soft computing approach for predicting biological activity of potential HIV-1 protease inhibitors. In: Proceedings of the IEEE international conference on neural networks, pp 7495–7502 Andonie R, Fabry-Asztalos L, Abdul-Wahid, Collar C, S, Salim N (2006) An integrated soft computing approach for predicting biological activity of potential HIV-1 protease inhibitors. In: Proceedings of the IEEE international conference on neural networks, pp 7495–7502
Zurück zum Zitat Bangalore AS, Shaffer RE, Small GW, Arnold MA (1996) Genetic Algorithm-based method for selecting wavelength and model size for use with partial least-squares regression: application to near-infrared spectroscopy. Anal Chem 68:4200–4212CrossRef Bangalore AS, Shaffer RE, Small GW, Arnold MA (1996) Genetic Algorithm-based method for selecting wavelength and model size for use with partial least-squares regression: application to near-infrared spectroscopy. Anal Chem 68:4200–4212CrossRef
Zurück zum Zitat Barkai E (2003) Aging in subdiffusion generated by a deterministic dynamical system. Phys Rev Lett 90:104101CrossRef Barkai E (2003) Aging in subdiffusion generated by a deterministic dynamical system. Phys Rev Lett 90:104101CrossRef
Zurück zum Zitat Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton
Zurück zum Zitat Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth & Brooks, Pacific GroveMATH Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth & Brooks, Pacific GroveMATH
Zurück zum Zitat Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297MATH Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297MATH
Zurück zum Zitat Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkMATH Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New YorkMATH
Zurück zum Zitat Filippone M, Masulli F, Rovetta S (2005) Unsupervised gene selection and clustering using simulated annealing. In: Bloch I, Petrosino A, Tettamanzi A (eds) WILF, Lecture notes in computer science, vol 3849. Springer, New York, pp 229–235 Filippone M, Masulli F, Rovetta S (2005) Unsupervised gene selection and clustering using simulated annealing. In: Bloch I, Petrosino A, Tettamanzi A (eds) WILF, Lecture notes in computer science, vol 3849. Springer, New York, pp 229–235
Zurück zum Zitat Ganesan D, Greenstein B, Perelyubskiy D, Estrin D, Heidemann J (2003) An evaluation of multi-resolution storage for sensor networks. In: Proceedings of the first ACM conference on embedded networked sensor systems (SenSys 2003). ACM, pp 89–102 Ganesan D, Greenstein B, Perelyubskiy D, Estrin D, Heidemann J (2003) An evaluation of multi-resolution storage for sensor networks. In: Proceedings of the first ACM conference on embedded networked sensor systems (SenSys 2003). ACM, pp 89–102
Zurück zum Zitat Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHCrossRef Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHCrossRef
Zurück zum Zitat Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422MATHCrossRef Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422MATHCrossRef
Zurück zum Zitat Izrailev S, Agrafiotis DK (2002) Variable selection for QSAR by artificial ant colony systems. SAR QSAR Environ Res 13:417–423CrossRef Izrailev S, Agrafiotis DK (2002) Variable selection for QSAR by artificial ant colony systems. SAR QSAR Environ Res 13:417–423CrossRef
Zurück zum Zitat Jouan-Rimbaud D, Massart D-L, Leardi R, de Noord OE (1995) Genetic algorithms as a tool for wavelength selection in multivariate calibration. Anal Chem 67:4295–4301CrossRef Jouan-Rimbaud D, Massart D-L, Leardi R, de Noord OE (1995) Genetic algorithms as a tool for wavelength selection in multivariate calibration. Anal Chem 67:4295–4301CrossRef
Zurück zum Zitat Debuse JCW, Rayward-Smith VJ (1997) Feature subset selection within a simulated annealing data mining algorithm. J Intell Inf Syst 9:57–81CrossRef Debuse JCW, Rayward-Smith VJ (1997) Feature subset selection within a simulated annealing data mining algorithm. J Intell Inf Syst 9:57–81CrossRef
Zurück zum Zitat Kalivas JH, Roberts N, Sutter JM (1989) Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry. Anal Chem 61:2024–2030CrossRef Kalivas JH, Roberts N, Sutter JM (1989) Global optimization by simulated annealing with wavelength selection for ultraviolet-visible spectrophotometry. Anal Chem 61:2024–2030CrossRef
Zurück zum Zitat Kira K, Rendell L (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of 10th national conference on artificial intelligence (AAAI-92), pp 129–134 Kira K, Rendell L (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of 10th national conference on artificial intelligence (AAAI-92), pp 129–134
Zurück zum Zitat Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324MATHCrossRef Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324MATHCrossRef
Zurück zum Zitat Koller D, Sahami M (1996) Toward optimal feature selection. In: Saitta L (ed) Proceedings of the thirteenth international conference (ICML ’96). Morgan Kaufmann, pp 284–292 Koller D, Sahami M (1996) Toward optimal feature selection. In: Saitta L (ed) Proceedings of the thirteenth international conference (ICML ’96). Morgan Kaufmann, pp 284–292
Zurück zum Zitat Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of seventh European conference machine learning, pp 171–182 Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Proceedings of seventh European conference machine learning, pp 171–182
Zurück zum Zitat Kubinyi H (1994) Variable selection in QSAR studies. I. An evolutionary algorithm. Quant Struct-Act Relat 13:285–294 Kubinyi H (1994) Variable selection in QSAR studies. I. An evolutionary algorithm. Quant Struct-Act Relat 13:285–294
Zurück zum Zitat Leardi R, González AL (1998) Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemom Intell Lab Syst 41:195–207CrossRef Leardi R, González AL (1998) Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemom Intell Lab Syst 41:195–207CrossRef
Zurück zum Zitat Masulli F, Rovetta S (2003) Random Voronoi ensembles for gene selection. Neurocomputing 55(3–4):721–726CrossRef Masulli F, Rovetta S (2003) Random Voronoi ensembles for gene selection. Neurocomputing 55(3–4):721–726CrossRef
Zurück zum Zitat Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations for fast computing machines. J Chem Phys 21:1087–1092CrossRef Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations for fast computing machines. J Chem Phys 21:1087–1092CrossRef
Zurück zum Zitat Michalewicz Z (1998) Genetic algorithms + data structures = evolution programs, 3rd edn. Springer-Verlag, Berlin Michalewicz Z (1998) Genetic algorithms + data structures = evolution programs, 3rd edn. Springer-Verlag, Berlin
Zurück zum Zitat Moneta C, Parodi GC, Rovetta S, Zunino R (1992) Automated diagnosis and disease characterization using neural network analysis. In: Proceedings of the 1992 IEEE international conference on systems, man and cybernetics, Chicago, IL, USA, pp 123–128 Moneta C, Parodi GC, Rovetta S, Zunino R (1992) Automated diagnosis and disease characterization using neural network analysis. In: Proceedings of the 1992 IEEE international conference on systems, man and cybernetics, Chicago, IL, USA, pp 123–128
Zurück zum Zitat Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1992) Numerical recipes in C, 2nd edn. Cambridge University Press, Cambridge Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1992) Numerical recipes in C, 2nd edn. Cambridge University Press, Cambridge
Zurück zum Zitat R Development Core Team (2005) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN:3-900051-07-0. http://www.R-project.org R Development Core Team (2005) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN:3-900051-07-0. http://​www.​R-project.​org
Zurück zum Zitat Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, CambridgeMATH Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, CambridgeMATH
Zurück zum Zitat Romeo F, Sangiovanni-Vincentelli A (1985) Probabilistic hill-climbing algorithms: properties and applications. Computer Science Press, Chapell Hill Romeo F, Sangiovanni-Vincentelli A (1985) Probabilistic hill-climbing algorithms: properties and applications. Computer Science Press, Chapell Hill
Zurück zum Zitat Siedlecki W, Sklansky J (1989) A note on genetic algorithms for large-scale feature selection. Pattern Recognit Lett 10:335–347MATHCrossRef Siedlecki W, Sklansky J (1989) A note on genetic algorithms for large-scale feature selection. Pattern Recognit Lett 10:335–347MATHCrossRef
Zurück zum Zitat Slonim N, Tishby N (2000) Agglomerative information bottleneck. In: Advances in neural information processing systems, pp 617–623 Slonim N, Tishby N (2000) Agglomerative information bottleneck. In: Advances in neural information processing systems, pp 617–623
Zurück zum Zitat Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B 36(1):111–147MATH Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc B 36(1):111–147MATH
Zurück zum Zitat Sutter JM, Kalivas JH (1993) Comparison of forward selection, backward elimination, and generalized simulated annealing for variable selection. Microchem J 47:60–66CrossRef Sutter JM, Kalivas JH (1993) Comparison of forward selection, backward elimination, and generalized simulated annealing for variable selection. Microchem J 47:60–66CrossRef
Zurück zum Zitat Sutter JM, Dixon SL, Jurs PC (1995) Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J Chem Inf Comput Sci 35:77–84 Sutter JM, Dixon SL, Jurs PC (1995) Automated descriptor selection for quantitative structure-activity relationships using generalized simulated annealing. J Chem Inf Comput Sci 35:77–84
Zurück zum Zitat Tanenbaum A (2001) Modern operating systems, 2nd edn. Prentice Hall, Englewood Cliffs Tanenbaum A (2001) Modern operating systems, 2nd edn. Prentice Hall, Englewood Cliffs
Zurück zum Zitat Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New YorkMATH Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag, New YorkMATH
Zurück zum Zitat Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29(1):37–46CrossRef Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29(1):37–46CrossRef
Zurück zum Zitat Weston J, Elisseff A, Schoelkopf B, Tipping M (2003) Use of the zero norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461MATHCrossRef Weston J, Elisseff A, Schoelkopf B, Tipping M (2003) Use of the zero norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461MATHCrossRef
Metadaten
Titel
Simulated annealing for supervised gene selection
verfasst von
Maurizio Filippone
Francesco Masulli
Stefano Rovetta
Publikationsdatum
01.08.2011
Verlag
Springer-Verlag
Erschienen in
Soft Computing / Ausgabe 8/2011
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-010-0597-8

Weitere Artikel der Ausgabe 8/2011

Soft Computing 8/2011 Zur Ausgabe

Premium Partner