Skip to main content
Erschienen in: Annals of Data Science 3/2017

11.03.2017

A Rough Based Hybrid Binary PSO Algorithm for Flat Feature Selection and Classification in Gene Expression Data

verfasst von: Suresh Dara, Haider Banka, Chandra Sekhara Rao Annavarapu

Erschienen in: Annals of Data Science | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Feature selection in high dimensional data, particularly, in gene expression data, is one of the challenging task in bioinformatics due to the curse of dimensionality, data redundancy and noise values. In gene expression data, insignificant features causes poor classification, hence feature selection reduces feature subset, improving classification accuracy. Feature selection algorithms in gene expression data(such as filter based, wrapper based and hybrid methods) performing poor accuracy, where as few methods takes too much time to converge for an acceptable results. For example, in NSGA-II, over 10,000 generations, on an average, to converge in the search space. where it incurs increased computational time. Proposed rough based hybrid binary PSO algorithm, which uses a heuristic based fast processing strategy to reduce crude domain features by statistical elimination of redundant features and then discretized subsequently into a binary table, known as distinction table, in rough set theory. This distinction table is later used as input to evaluate and optimize the objectives functions i.e., to generate reduct in rough set theory. The proposed hybrid binary PSO is then used to tune the objective functions, to choose the most important features (i:e:reduct). The fitness function is used in such a way that it can reduce the cardinality of the features and at the same time, improve the classification performance as well. Results have been demonstrated to show the effectiveness of the proposed method, on existing three benchmark datasets (i.e. colon cancer, lymphoma and leukemia data), from literature.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Xia Jianguo, Gill Erin E, Hancock Robert EW (2015) Networkanalyst for statistical, visual and network-based meta-analysis of gene expression data. Nat Protoc 10(6):823–844CrossRef Xia Jianguo, Gill Erin E, Hancock Robert EW (2015) Networkanalyst for statistical, visual and network-based meta-analysis of gene expression data. Nat Protoc 10(6):823–844CrossRef
2.
Zurück zum Zitat Xia Jianguo, Sinelnikov Igor V, Han Beomsoo, Wishart David S (2015) Metaboanalyst 3.0 making metabolomics more meaningful. Nucleic Acids Res 43(W1):W251–W257CrossRef Xia Jianguo, Sinelnikov Igor V, Han Beomsoo, Wishart David S (2015) Metaboanalyst 3.0 making metabolomics more meaningful. Nucleic Acids Res 43(W1):W251–W257CrossRef
3.
Zurück zum Zitat Li Jiangeng, Lei Su, Pang Zenan (2015) A filter feature selection method based on mfa score and redundancy excluding and its application to tumor gene expression data analysis. Interdiscip Sci 7(4):391–396CrossRef Li Jiangeng, Lei Su, Pang Zenan (2015) A filter feature selection method based on mfa score and redundancy excluding and its application to tumor gene expression data analysis. Interdiscip Sci 7(4):391–396CrossRef
4.
Zurück zum Zitat Kar Subhajit, Sharma Kaushik Das, Maitra Madhubanti (2015) Gene selection from microarray gene expression data for classification of cancer subgroups employing pso and adaptive k-nearest neighborhood technique. Expert Syst Appl 42(1):612–627CrossRef Kar Subhajit, Sharma Kaushik Das, Maitra Madhubanti (2015) Gene selection from microarray gene expression data for classification of cancer subgroups employing pso and adaptive k-nearest neighborhood technique. Expert Syst Appl 42(1):612–627CrossRef
5.
Zurück zum Zitat Skowron A, Rauszer C, The discernibility matrices and functions in information systems. In: Intelligent decision support, handbook of applications and advances of the rough sets theory, Kluwer Academic, Amsterdam (1992-93), pp 331–362 Skowron A, Rauszer C, The discernibility matrices and functions in information systems. In: Intelligent decision support, handbook of applications and advances of the rough sets theory, Kluwer Academic, Amsterdam (1992-93), pp 331–362
6.
Zurück zum Zitat Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data classification: algorithms and applications. CRC Press, Cambridge Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data classification: algorithms and applications. CRC Press, Cambridge
7.
Zurück zum Zitat Li Gang-Guo, Wang Zheng-Zhi (2009) Evaluation of similarity measures for gene expression data and their correspondent combined measures. Interdisciplinary Sciences: Computational. Life Sci 1(1):72–80 Li Gang-Guo, Wang Zheng-Zhi (2009) Evaluation of similarity measures for gene expression data and their correspondent combined measures. Interdisciplinary Sciences: Computational. Life Sci 1(1):72–80
8.
Zurück zum Zitat Lazar C et al (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 9(4):1106–1119CrossRef Lazar C et al (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf (TCBB) 9(4):1106–1119CrossRef
9.
Zurück zum Zitat Park Chan Hee, Kim Seoung Bum (2015) Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 42(5):2336–2342CrossRef Park Chan Hee, Kim Seoung Bum (2015) Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 42(5):2336–2342CrossRef
10.
Zurück zum Zitat Atul Kumar, Jeya Sundara Sharmila D (2015) Algorithmic approach for removing the redundancy in diabetic gene categories based on semantic similarity and gene expression data. Interdiscip Sci 8:1–7 Atul Kumar, Jeya Sundara Sharmila D (2015) Algorithmic approach for removing the redundancy in diabetic gene categories based on semantic similarity and gene expression data. Interdiscip Sci 8:1–7
11.
Zurück zum Zitat El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowledge Inf Syst 26(3):487–500CrossRef El Akadi A, Amine A, El Ouardighi A, Aboutajdine D (2011) A two-stage gene selection scheme utilizing mrmr filter and ga wrapper. Knowledge Inf Syst 26(3):487–500CrossRef
12.
Zurück zum Zitat Neshantain K, Zhang M (2009) Dimensionality reduction in face detection: A genetic programing approach. In: Proceedings of the 24th international conference image and vision computing (IVCNZ’09), New Zealand, pp 391–396 Neshantain K, Zhang M (2009) Dimensionality reduction in face detection: A genetic programing approach. In: Proceedings of the 24th international conference image and vision computing (IVCNZ’09), New Zealand, pp 391–396
13.
Zurück zum Zitat Vieira SM, Sousa MCJ, Kaymak U (2012) Fuzzy criteria for feature selection. Fuzzy Sets Syst 189:1–18CrossRef Vieira SM, Sousa MCJ, Kaymak U (2012) Fuzzy criteria for feature selection. Fuzzy Sets Syst 189:1–18CrossRef
14.
Zurück zum Zitat Cervante L, Xue B, Zhand M, Shang L (2012) Binary particle swarm optimisation for feature selection: a filter based approach. In: IEEE world congress on computational intelligence. Australia Cervante L, Xue B, Zhand M, Shang L (2012) Binary particle swarm optimisation for feature selection: a filter based approach. In: IEEE world congress on computational intelligence. Australia
15.
Zurück zum Zitat Anirudha R, Kannan R, Patil N (2014) Genetic algorithm based wrapper feature selection on hybrid prediction model for analysis of high dimensional data. In: 2014 9th international conference on industrial and information systems (ICIIS), IEEE, pp 1–6 Anirudha R, Kannan R, Patil N (2014) Genetic algorithm based wrapper feature selection on hybrid prediction model for analysis of high dimensional data. In: 2014 9th international conference on industrial and information systems (ICIIS), IEEE, pp 1–6
16.
Zurück zum Zitat Banerjee M, Mitra S, Banka H (2007) Evolutionary rough feature selection in gene expression data. IEEE Trans Syst Man Cybern C Appl Rev 37:622–632CrossRef Banerjee M, Mitra S, Banka H (2007) Evolutionary rough feature selection in gene expression data. IEEE Trans Syst Man Cybern C Appl Rev 37:622–632CrossRef
17.
Zurück zum Zitat Xue B, Zhang M, Browne W (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 99:1–16 Xue B, Zhang M, Browne W (2013) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 99:1–16
18.
Zurück zum Zitat Chakraborty B (2013) Genetic algorithm with fuzzy fitness function for feature selection. In: Proceedings of the 2002 IEEE International Symposium on Industrial Electronics, 2002. ISIE 2002, IEEE x, vol. 1, pp 315–319 Chakraborty B (2013) Genetic algorithm with fuzzy fitness function for feature selection. In: Proceedings of the 2002 IEEE International Symposium on Industrial Electronics, 2002. ISIE 2002, IEEE x, vol. 1, pp 315–319
19.
Zurück zum Zitat Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 28(4):459–471CrossRef Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recognit Lett 28(4):459–471CrossRef
20.
Zurück zum Zitat Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38CrossRef Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38CrossRef
21.
Zurück zum Zitat Pawlak Z (1997) Rough set approach to knowledge-based decision support. Eur J Oper Res 99:48–57CrossRef Pawlak Z (1997) Rough set approach to knowledge-based decision support. Eur J Oper Res 99:48–57CrossRef
22.
Zurück zum Zitat Wroblewski J (1995) Finding minimal reducts using genetic algorithms. Warsaw Institute of Technology Institute of Computer Science, Poland Wroblewski J (1995) Finding minimal reducts using genetic algorithms. Warsaw Institute of Technology Institute of Computer Science, Poland
23.
Zurück zum Zitat Kennedy J (2010) Particle swarm optimization. In: Encyclopedia of machine learning, Springer, Berlin, pp 760–766 Kennedy J (2010) Particle swarm optimization. In: Encyclopedia of machine learning, Springer, Berlin, pp 760–766
24.
Zurück zum Zitat Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: Proceedings of the IEEE international conference on evolutionary computation, Anchorage, pp 69–73 Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: Proceedings of the IEEE international conference on evolutionary computation, Anchorage, pp 69–73
25.
Zurück zum Zitat Sudholt D, Witt C (2008) Runtime analysis of binary pso. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, New York, pp 135–142 Sudholt D, Witt C (2008) Runtime analysis of binary pso. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, New York, pp 135–142
26.
Zurück zum Zitat Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization. In: Proceedings of the 1999 congress on evolutionary computation, 1999. CEC 99, vol. 3. IEEE Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization. In: Proceedings of the 1999 congress on evolutionary computation, 1999. CEC 99, vol. 3. IEEE
27.
Zurück zum Zitat mohamad MS, Omatu S, Deris S, Yoshioka M (2009) Particle swarm optimization for gene selection in classifying cancer classes. Artif Life Robot 14:16–19CrossRef mohamad MS, Omatu S, Deris S, Yoshioka M (2009) Particle swarm optimization for gene selection in classifying cancer classes. Artif Life Robot 14:16–19CrossRef
28.
Zurück zum Zitat Huang CJ (2004) class prediction of cancer using probabilistic neural networks and relatice correlation metric. Appl Artif Intell 18:117–128CrossRef Huang CJ (2004) class prediction of cancer using probabilistic neural networks and relatice correlation metric. Appl Artif Intell 18:117–128CrossRef
29.
Zurück zum Zitat Krishnapuram B, Hartemink JA, Carin L, Figueredo MA (2004) A Bayesian approach to joint feature selection and classifier design. Artif Life Robot 26(9):1105–1111 Krishnapuram B, Hartemink JA, Carin L, Figueredo MA (2004) A Bayesian approach to joint feature selection and classifier design. Artif Life Robot 26(9):1105–1111
30.
Zurück zum Zitat Chuang LY, Tsai SW, Yang CH (2011) Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst Appl 38:12699–12707CrossRef Chuang LY, Tsai SW, Yang CH (2011) Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst Appl 38:12699–12707CrossRef
31.
Zurück zum Zitat Liu H, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: 2012 IEEE 24th international conference on tools with artificial intelligence, IEEE Computer Society, pp 388–388 Liu H, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: 2012 IEEE 24th international conference on tools with artificial intelligence, IEEE Computer Society, pp 388–388
32.
Zurück zum Zitat Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML 3:856–863 Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. ICML 3:856–863
33.
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, New York
34.
Zurück zum Zitat Montgomery DC, Runger GC, Hubele NF (2009) Engineering statistics. Wiley, New York Montgomery DC, Runger GC, Hubele NF (2009) Engineering statistics. Wiley, New York
35.
Zurück zum Zitat Gastwirth JL (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54(3):306–316 Gastwirth JL (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54(3):306–316
36.
Zurück zum Zitat Gardeux V, Natowicz R, Wanderley MFB, Chelouah R (2013) Optimization for feature selection in DNA microarrays. In: Heuristics: Theory and applications, pp 287–310 Gardeux V, Natowicz R, Wanderley MFB, Chelouah R (2013) Optimization for feature selection in DNA microarrays. In: Heuristics: Theory and applications, pp 287–310
37.
Zurück zum Zitat Deutsch J (2003) Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19(1):45–52CrossRef Deutsch J (2003) Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19(1):45–52CrossRef
38.
Zurück zum Zitat Orsenigo C (2008) Gene selection and cancer microarray data classification via mixed-integer optimization. In: Evolutionary computation, machine learning and data mining in bioinformatics, Springer, Berlin, pp 141–152 Orsenigo C (2008) Gene selection and cancer microarray data classification via mixed-integer optimization. In: Evolutionary computation, machine learning and data mining in bioinformatics, Springer, Berlin, pp 141–152
39.
Zurück zum Zitat Pochet N, De Smet F, Suykens JA, De Moor BL (2004) Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics 20(17):3185–3195CrossRef Pochet N, De Smet F, Suykens JA, De Moor BL (2004) Systematic benchmarking of microarray data classification: assessing the role of non-linearity and dimensionality reduction. Bioinformatics 20(17):3185–3195CrossRef
40.
Zurück zum Zitat Rakotomamonjy A (2003) Variable selection using svm based criteria. J Mach Learn Res 3:1357–1370 Rakotomamonjy A (2003) Variable selection using svm based criteria. J Mach Learn Res 3:1357–1370
41.
Zurück zum Zitat Cawley GC, Talbot NL (2006) Gene selection in cancer classification using sparse logistic regression with bayesian regularization. Bioinformatics 22(19):2348–2355CrossRef Cawley GC, Talbot NL (2006) Gene selection in cancer classification using sparse logistic regression with bayesian regularization. Bioinformatics 22(19):2348–2355CrossRef
42.
Zurück zum Zitat Wei L (1981) Asymptotic conservativeness and efficiency of kruskal-wallis test for k dependent samples. J Am Stat Assoc 76(376):1006–1009 Wei L (1981) Asymptotic conservativeness and efficiency of kruskal-wallis test for k dependent samples. J Am Stat Assoc 76(376):1006–1009
43.
Zurück zum Zitat Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform and Comput Biol 3(02):185–205CrossRef Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform and Comput Biol 3(02):185–205CrossRef
44.
Zurück zum Zitat Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324CrossRef Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324CrossRef
Metadaten
Titel
A Rough Based Hybrid Binary PSO Algorithm for Flat Feature Selection and Classification in Gene Expression Data
verfasst von
Suresh Dara
Haider Banka
Chandra Sekhara Rao Annavarapu
Publikationsdatum
11.03.2017
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 3/2017
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-017-0106-3

Weitere Artikel der Ausgabe 3/2017

Annals of Data Science 3/2017 Zur Ausgabe