Skip to main content
Top
Published in: Soft Computing 2/2010

01-01-2010 | Focus

k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification

Authors: Sejong Yoon, Saejoon Kim

Published in: Soft Computing | Issue 2/2010

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Top Scoring Pair (TSP) and its ensemble counterpart, k-Top Scoring Pair (k-TSP), were recently introduced as competitive options for solving classification problems of microarray data. However, support vector machine (SVM) which was compared with these approaches is not equipped with feature or variable selection mechanism while TSP itself is a kind of variable selection algorithm. Moreover, an ensemble of SVMs should also be considered as a possible competitor to k-TSP. In this work, we conducted a fair comparison between TSP and SVM-recursive feature elimination (SVM-RFE) as the feature selection method for SVM. We also compared k-TSP with two ensemble methods using SVM as their base classifier. Results on ten public domain microarray data indicated that TSP family classifiers serve as good feature selection schemes which may be combined effectively with other classification methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Alizadeh AA, Eisen MB, Davis EE et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511CrossRef Alizadeh AA, Eisen MB, Davis EE et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769):503–511CrossRef
go back to reference Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750CrossRef Alon U, Barkai N, Notterman DA et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750CrossRef
go back to reference Beer DG, Kardia SL, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816–824 Beer DG, Kardia SL, Huang CC et al (2002) Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8(8):816–824
go back to reference Buciu I, Kotropoulos C, Pitas I (2006) Demonstrating the stability of support vector machines for classification. Signal Process 86(9):2364–2380CrossRef Buciu I, Kotropoulos C, Pitas I (2006) Demonstrating the stability of support vector machines for classification. Signal Process 86(9):2364–2380CrossRef
go back to reference Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRef Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRef
go back to reference Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46CrossRef Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46CrossRef
go back to reference Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–205CrossRefMathSciNet Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–205CrossRefMathSciNet
go back to reference Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MATHCrossRefMathSciNet Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139MATHCrossRefMathSciNet
go back to reference Geman D, d’Avignon C, Naiman D, Winslow R (2004) Classifying gene expression profiles from pairwise mrna comparisons. Stat Appl Genet Mol Biol 3(1):19MathSciNet Geman D, d’Avignon C, Naiman D, Winslow R (2004) Classifying gene expression profiles from pairwise mrna comparisons. Stat Appl Genet Mol Biol 3(1):19MathSciNet
go back to reference Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRef Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537CrossRef
go back to reference Gordon GJ, Jensen RV, li Hsiao L et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967 Gordon GJ, Jensen RV, li Hsiao L et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62:4963–4967
go back to reference Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422MATHCrossRef Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422MATHCrossRef
go back to reference Joachims T (1999) Making large-scale support vector machine learning practical. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 169–184 Joachims T (1999) Making large-scale support vector machine learning practical. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 169–184
go back to reference Kim HC, Pang S, Je HM, Kim D, Bang SY (2003) Constructing support vector machine ensemble. Pattern Recognit 36(12):2757–2767MATHCrossRef Kim HC, Pang S, Je HM, Kim D, Bang SY (2003) Constructing support vector machine ensemble. Pattern Recognit 36(12):2757–2767MATHCrossRef
go back to reference Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRef
go back to reference Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208 Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning. MIT Press, Cambridge, pp 185–208
go back to reference Pomeroy SL, Tamayo P, Gaasenbeek M et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442CrossRef Pomeroy SL, Tamayo P, Gaasenbeek M et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870):436–442CrossRef
go back to reference Rosenwald A, Wright G, Chan WC et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947CrossRef Rosenwald A, Wright G, Chan WC et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. N Engl J Med 346(25):1937–1947CrossRef
go back to reference Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74CrossRef Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74CrossRef
go back to reference Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209CrossRef Singh D, Febbo PG, Ross K et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209CrossRef
go back to reference Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D (2005) Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 21(20):3896–3904CrossRef Tan AC, Naiman DQ, Xu L, Winslow RL, Geman D (2005) Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics 21(20):3896–3904CrossRef
go back to reference Vapnik VN (1998) Statistical Learning Theory. Wiley-Interscience Vapnik VN (1998) Statistical Learning Theory. Wiley-Interscience
go back to reference Wigle DA, Jurisica I, Radulovich N et al (2002) Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res 62:3005–3008 Wigle DA, Jurisica I, Radulovich N et al (2002) Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res 62:3005–3008
go back to reference Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann
Metadata
Title
k-Top Scoring Pair Algorithm for feature selection in SVM with applications to microarray data classification
Authors
Sejong Yoon
Saejoon Kim
Publication date
01-01-2010
Publisher
Springer-Verlag
Published in
Soft Computing / Issue 2/2010
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-009-0437-x

Other articles of this Issue 2/2010

Soft Computing 2/2010 Go to the issue

Premium Partner