Skip to main content
Erschienen in: Neural Computing and Applications 1/2016

01.01.2016 | Extreme Learning Machine and Applications

Improving ELM-based microarray data classification by diversified sequence features selection

verfasst von: Yuhai Zhao, Guoren Wang, Ying Yin, Yuan Li, Zhanghui Wang

Erschienen in: Neural Computing and Applications | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we focus on the problem of extreme learning machine (ELM)-based microarray data classification. Different from the traditional classification problem, the goal in this case is not just to predict the class labels for the unseen samples, but to make clear what lead to the results, i.e., the genes involving with a specific disease. This is especially significant for biologists, since they need to decipher the causes of disease. As a black-box method, ELM could not measure up to the task by itself. In this work, we propose a diversified sequence feature selection-based framework to address the problem. In this framework, (1) a sequence model, EWave, is introduced to ensure the structural ordering information among genes exploitable; (2) a concept of irreducible sequence is proposed, where the genes work as an orderly whole to keep high confidence with a specific class and any reduction in the genes decreases the confidence much. An efficient sequence mining algorithm together with some effective pruning rules is developed to mine such sequences; and (3) we study how to extract a set of diversified sequence features as the representative of all mined results. The problem is proved to be NP-hard. A greedy algorithm is presented to approximate the optimal solution. Experimental results show that the proposed approach significantly improves the efficiency and the effectiveness of ELM w.r.t some widely used feature selection techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
That is, IG (information gain), TR (twoing rule), SM (sum minority), MM (max minority), GI (Gini index) and SV (sum of variance).
 
Literatur
1.
Zurück zum Zitat Tavazoie S, Hughes J, Campbell M, Cho R, Church G (1999) Systematic determination of genetic network architecture. Nat Genetics 22:281–285CrossRef Tavazoie S, Hughes J, Campbell M, Cho R, Church G (1999) Systematic determination of genetic network architecture. Nat Genetics 22:281–285CrossRef
2.
Zurück zum Zitat Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868CrossRef Eisen M, Spellman P, Brown P, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95:14863–14868CrossRef
3.
Zurück zum Zitat Alizadeh A (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403:503–511CrossRef Alizadeh A (2000) Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403:503–511CrossRef
4.
Zurück zum Zitat Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol 2, (Budapest, Hungary), pp 985–990 Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol 2, (Budapest, Hungary), pp 985–990
5.
Zurück zum Zitat Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501CrossRef Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501CrossRef
6.
Zurück zum Zitat Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N (2006) Can threshold networks be trained directly? IEEE Trans Circuits Syst II 53(3):187–191CrossRef Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N (2006) Can threshold networks be trained directly? IEEE Trans Circuits Syst II 53(3):187–191CrossRef
7.
Zurück zum Zitat Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multi-category classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495CrossRef Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multi-category classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495CrossRef
8.
Zurück zum Zitat Zhao X, Wang G, Bi X, Gong P, Zhao Y (2011) Xml document classification based on elm. Neurocomputing 74(16):2444–2451CrossRef Zhao X, Wang G, Bi X, Gong P, Zhao Y (2011) Xml document classification based on elm. Neurocomputing 74(16):2444–2451CrossRef
9.
Zurück zum Zitat Wang G, Zhao Y, Wang D (2008) A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(1–3):262–268CrossRef Wang G, Zhao Y, Wang D (2008) A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(1–3):262–268CrossRef
10.
Zurück zum Zitat Wang DD, Wang R, Yan H (2014) Fast prediction of protein-protein interaction sites based on extreme learning machines. Neurocomputing 128:258–266CrossRef Wang DD, Wang R, Yan H (2014) Fast prediction of protein-protein interaction sites based on extreme learning machines. Neurocomputing 128:258–266CrossRef
11.
Zurück zum Zitat Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495CrossRef Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinform 4(3):485–495CrossRef
12.
Zurück zum Zitat Yeu CWT, Lim MH, Huang GB, Agarwal A, Ong YS (2006) A new machine learning paradigm for terrain reconstruction. IEEE Geosci Remote Sens Lett 3(3):382–386CrossRef Yeu CWT, Lim MH, Huang GB, Agarwal A, Ong YS (2006) A new machine learning paradigm for terrain reconstruction. IEEE Geosci Remote Sens Lett 3(3):382–386CrossRef
13.
Zurück zum Zitat Huang G-B, Ding X, Zhou H (2010) Optimization method based extreme learning machine for classification. Neurocomputing 74(1–3):155–163CrossRef Huang G-B, Ding X, Zhou H (2010) Optimization method based extreme learning machine for classification. Neurocomputing 74(1–3):155–163CrossRef
14.
Zurück zum Zitat Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537CrossRef
15.
Zurück zum Zitat Lo D, Khoo S-C, Li J (2008) Mining and ranking generators of sequential patterns. In: SDM, pp 553–564 Lo D, Khoo S-C, Li J (2008) Mining and ranking generators of sequential patterns. In: SDM, pp 553–564
16.
Zurück zum Zitat Cong G, Tung AKH, Xu X et al (2004) Farmer: finding interesting rule groups in microarray datasets. In: SIGMOD, pp 143–154 Cong G, Tung AKH, Xu X et al (2004) Farmer: finding interesting rule groups in microarray datasets. In: SIGMOD, pp 143–154
17.
Zurück zum Zitat Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: ICDE, pp 79–90 Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: ICDE, pp 79–90
18.
Zurück zum Zitat Gao C, Wang J, He Y (2008) Efficient mining of frequent sequence generators. In: WWW, pp 1051–1052 Gao C, Wang J, He Y (2008) Efficient mining of frequent sequence generators. In: WWW, pp 1051–1052
19.
Zurück zum Zitat Ding CHQ, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–206CrossRefMathSciNet Ding CHQ, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–206CrossRefMathSciNet
20.
Zurück zum Zitat Yu L, Liu H (2004) Redundancy based feature selection for microarray data. In: KDD, pp 737–742 Yu L, Liu H (2004) Redundancy based feature selection for microarray data. In: KDD, pp 737–742
22.
Zurück zum Zitat Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74CrossRef Shipp MA, Ross KN, Tamayo P et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1):68–74CrossRef
23.
Zurück zum Zitat Hedenfalk I, Duggan D, Chen Y et al (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548CrossRef Hedenfalk I, Duggan D, Chen Y et al (2001) Gene-expression profiles in hereditary breast cancer. N Engl J Med 344(8):539–548CrossRef
24.
Zurück zum Zitat Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S (2003) Rankgene: identification of diagnostic genes based on expression data. Bioinformatics 19(12):1578–1579CrossRef Su Y, Murali TM, Pavlovic V, Schaffer M, Kasif S (2003) Rankgene: identification of diagnostic genes based on expression data. Bioinformatics 19(12):1578–1579CrossRef
25.
Zurück zum Zitat Lee KE, Sha N, Dougherty ER et al (2003) Gene selection: a bayesian variable selection approach. Bioinformatics 19(1):90–97CrossRef Lee KE, Sha N, Dougherty ER et al (2003) Gene selection: a bayesian variable selection approach. Bioinformatics 19(1):90–97CrossRef
26.
Zurück zum Zitat Udler M, Maia AT, Cebrian A et al (2007) Common germline genetic variation in antioxidant defense genes and survival after diagnosis of breast cancer. J Clin Oncol 25(21):3015–3023CrossRef Udler M, Maia AT, Cebrian A et al (2007) Common germline genetic variation in antioxidant defense genes and survival after diagnosis of breast cancer. J Clin Oncol 25(21):3015–3023CrossRef
Metadaten
Titel
Improving ELM-based microarray data classification by diversified sequence features selection
verfasst von
Yuhai Zhao
Guoren Wang
Ying Yin
Yuan Li
Zhanghui Wang
Publikationsdatum
01.01.2016
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 1/2016
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-014-1571-7

Weitere Artikel der Ausgabe 1/2016

Neural Computing and Applications 1/2016 Zur Ausgabe

Premium Partner