Skip to main content
Erschienen in: Neural Computing and Applications 5/2018

19.12.2016 | Original Article

Cloud computing-based parallel genetic algorithm for gene selection in cancer classification

verfasst von: Dino Kečo, Abdulhamit Subasi, Jasmin Kevric

Erschienen in: Neural Computing and Applications | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cancer classification is one of the main steps during patient healing process. This fact enforces modern clinical researchers to use advanced bioinformatics methods for cancer classification. Cancer classification is usually performed using gene expression data gained in microarray experiment and advanced machine learning methods. Microarray experiment generates huge amount of data, and its processing via machine learning methods represents a big challenge. In this study, two-step classification paradigm which merges genetic algorithm feature selection and machine learning classifiers is utilized. Genetic algorithm is built in MapReduce programming spirit which makes this algorithm highly scalable for Hadoop cluster. In order to improve the performance of the proposed algorithm, it is extended into a parallel algorithm which process on microarray data in distributed manner using the Hadoop MapReduce framework. In this paper, the algorithm was tested on eleven GEMS data sets (9 tumors, 11 tumors, 14 tumors, brain tumor 1, lung cancer, brain tumor 2, leukemia 1, DLBCL, leukemia 2, SRBCT, and prostate tumor) and its accuracy reached 100% for less than 25 selected features. The proposed cloud computing-based MapReduce parallel genetic algorithm performed well on gene expression data. In addition, the scalability of the suggested algorithm is unlimited because of underlying Hadoop MapReduce platform. The presented results indicate that the proposed method can be effectively implemented for real-world microarray data in the cloud environment. In addition, the Hadoop MapReduce framework demonstrates substantial decrease in the computation time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Mohammed EA, Far BH, Naugler C (2014) Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min 7:22CrossRef Mohammed EA, Far BH, Naugler C (2014) Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. BioData Min 7:22CrossRef
3.
Zurück zum Zitat Coulouris GF, Dollimore J, Kindberg T (2005) Distributed systems: concepts and design. Pearson Education, Upper Saddle RiverMATH Coulouris GF, Dollimore J, Kindberg T (2005) Distributed systems: concepts and design. Pearson Education, Upper Saddle RiverMATH
5.
Zurück zum Zitat Quackenbush J, John Q (2001) Computational genetics: computational analysis of microarray data. Nat Rev Genet 2(6):418–427CrossRef Quackenbush J, John Q (2001) Computational genetics: computational analysis of microarray data. Nat Rev Genet 2(6):418–427CrossRef
6.
Zurück zum Zitat Wong T-T, Tzu-Tsung W, Ching-Han H (2008) Two-stage classification methods for microarray data. Expert Syst Appl 34(1):375–383CrossRef Wong T-T, Tzu-Tsung W, Ching-Han H (2008) Two-stage classification methods for microarray data. Expert Syst Appl 34(1):375–383CrossRef
7.
Zurück zum Zitat Lee C-P, Chien-Pang L, Wen-Shin L, Yuh-Min C, Bo-Jein K (2011) Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst Appl 38(5):4661–4667CrossRef Lee C-P, Chien-Pang L, Wen-Shin L, Yuh-Min C, Bo-Jein K (2011) Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst Appl 38(5):4661–4667CrossRef
8.
Zurück zum Zitat Roffo G, Giorgio R, Simone M, Marco C (2015) Infinite Feature Selection, in 2015 IEEE International Conference on Computer Vision (ICCV) Roffo G, Giorgio R, Simone M, Marco C (2015) Infinite Feature Selection, in 2015 IEEE International Conference on Computer Vision (ICCV)
9.
Zurück zum Zitat Phuong TM, Lin Z, Altman RB (2005) Choosing SNPs using feature selection. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp 301–309 Phuong TM, Lin Z, Altman RB (2005) Choosing SNPs using feature selection. In: Proceedings of IEEE Computational Systems Bioinformatics Conference, pp 301–309
10.
Zurück zum Zitat Hong J-H, Jin-Hyuk H, Sung-Bae C (2006) Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognit Lett 27(2):143–150CrossRef Hong J-H, Jin-Hyuk H, Sung-Bae C (2006) Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognit Lett 27(2):143–150CrossRef
11.
Zurück zum Zitat Mohamad MS, Safaai D, Illias RMD (2005) A hybrid of genetic algorithm and support vector machine for features selection and classification of gene expression microarray. Int J Comput Intell Appl 05(01):91–107CrossRef Mohamad MS, Safaai D, Illias RMD (2005) A hybrid of genetic algorithm and support vector machine for features selection and classification of gene expression microarray. Int J Comput Intell Appl 05(01):91–107CrossRef
12.
Zurück zum Zitat Hung C-L, Chen W-P, Hua G-J, Zheng H, Tsai S-JJ, Lin Y-L (2015) Cloud computing-based TagSNP selection algorithm for human genome data. Int J Mol Sci 16(1):1096–1110CrossRef Hung C-L, Chen W-P, Hua G-J, Zheng H, Tsai S-JJ, Lin Y-L (2015) Cloud computing-based TagSNP selection algorithm for human genome data. Int J Mol Sci 16(1):1096–1110CrossRef
13.
Zurück zum Zitat Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinform 11(Suppl 12):S1CrossRef Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinform 11(Suppl 12):S1CrossRef
14.
Zurück zum Zitat Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369CrossRef Schatz MC (2009) CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11):1363–1369CrossRef
15.
Zurück zum Zitat Hung C-L, Lin Y-L (2013) Implementation of a parallel protein structure alignment service on cloud. Int J Genom Proteom 2013:439681 Hung C-L, Lin Y-L (2013) Implementation of a parallel protein structure alignment service on cloud. Int J Genom Proteom 2013:439681
16.
Zurück zum Zitat Hung C-L, Hua G-J (2013) Cloud computing for protein-ligand binding site comparison. Biomed Res Int 2013:170356 Hung C-L, Hua G-J (2013) Cloud computing for protein-ligand binding site comparison. Biomed Res Int 2013:170356
17.
Zurück zum Zitat Gunarathne T (2015) Hadoop MapReduce v2 Cookbook, 2nd edn. Packt Publishing Ltd, Birmingham Gunarathne T (2015) Hadoop MapReduce v2 Cookbook, 2nd edn. Packt Publishing Ltd, Birmingham
18.
Zurück zum Zitat Keco D, Subasi A (2012) Parallelization of genetic algorithms using Hadoop Map/Reduce. SouthEast Eur J Soft Comput 1(2):56–59 Keco D, Subasi A (2012) Parallelization of genetic algorithms using Hadoop Map/Reduce. SouthEast Eur J Soft Comput 1(2):56–59
19.
Zurück zum Zitat Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 74(7–8):491–503CrossRef Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF (2005) GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform 74(7–8):491–503CrossRef
20.
Zurück zum Zitat Negnevitsky M (2005) Artificial intelligence: a guide to intelligent systems. Pearson Education, Upper Saddle River Negnevitsky M (2005) Artificial intelligence: a guide to intelligent systems. Pearson Education, Upper Saddle River
21.
Zurück zum Zitat Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
22.
Zurück zum Zitat Lee W-P, Hsiao Y-T, Hwang W-C (2014) Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment. BMC Syst Biol 8:5CrossRef Lee W-P, Hsiao Y-T, Hwang W-C (2014) Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment. BMC Syst Biol 8:5CrossRef
Metadaten
Titel
Cloud computing-based parallel genetic algorithm for gene selection in cancer classification
verfasst von
Dino Kečo
Abdulhamit Subasi
Jasmin Kevric
Publikationsdatum
19.12.2016
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 5/2018
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-016-2780-z

Weitere Artikel der Ausgabe 5/2018

Neural Computing and Applications 5/2018 Zur Ausgabe