Skip to main content

2017 | OriginalPaper | Buchkapitel

A Comparative Study of Feature Selection and Classification Techniques for High-Throughput DNA Methylation Data

verfasst von : Alhasan Alkuhlani, Mohammad Nassef, Ibrahim Farag

Erschienen in: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The high dimensionality of data is a common problem in classification. In this work, a small number of significant features is investigated to classify data of two sample groups. Various feature selection and classification techniques are applied in a collection of four high-throughput DNA methylation microarray data sets. Using accuracy as a performance metric, the repeated 10-fold cross-validation strategy is implemented to evaluate the different proposed techniques. Combining the Signal to Noise Ratio (SNR) and Wilcoxon rank-sum test filter methods with Support Vector Machine-Recursive Feature Elimination (SVM-RFE) as an embedded method has resulted in a perfect performance. In addition, the linear classifiers showed excellent results compared to others classifiers when applied to such data sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Li, D., Xie, Z., Le Pape, M., Dye, T.: An evaluation of statistical methods for dna methylation microarray data analysis. BMC Bioinform. 16(1), 1 (2015)CrossRef Li, D., Xie, Z., Le Pape, M., Dye, T.: An evaluation of statistical methods for dna methylation microarray data analysis. BMC Bioinform. 16(1), 1 (2015)CrossRef
2.
Zurück zum Zitat Das, P.M., Singal, R.: DNA methylation and cancer. J. Clin. Oncol. 22(22), 4632–4642 (2004)CrossRef Das, P.M., Singal, R.: DNA methylation and cancer. J. Clin. Oncol. 22(22), 4632–4642 (2004)CrossRef
3.
Zurück zum Zitat Zhuang, J., Widschwendter, M., Teschendorff, A.E.: A comparison of feature selection and classification methods in dna methylation studies using the illumina infinium platform. BMC Bioinform. 13(1), 59 (2012)CrossRef Zhuang, J., Widschwendter, M., Teschendorff, A.E.: A comparison of feature selection and classification methods in dna methylation studies using the illumina infinium platform. BMC Bioinform. 13(1), 59 (2012)CrossRef
4.
Zurück zum Zitat Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)CrossRef Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. 11(1), 208–213 (2011)CrossRef
5.
Zurück zum Zitat Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRef Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRef
6.
Zurück zum Zitat Cai, Z., Xu, D., Zhang, Q., Zhang, J., Ngai, S.M., Shao, J.: Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol. BioSyst. 11(3), 791–800 (2015)CrossRef Cai, Z., Xu, D., Zhang, Q., Zhang, J., Ngai, S.M., Shao, J.: Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol. BioSyst. 11(3), 791–800 (2015)CrossRef
7.
Zurück zum Zitat Ma, Z., Teschendorff, A.E.: A variational bayes beta mixture model for feature selection in dna methylation studies. J. Bioinform. Computat. Biol. 11(04), 1350005 (2013)CrossRef Ma, Z., Teschendorff, A.E.: A variational bayes beta mixture model for feature selection in dna methylation studies. J. Bioinform. Computat. Biol. 11(04), 1350005 (2013)CrossRef
8.
Zurück zum Zitat Meng, H., Murrelle, E.L., Li, G.: Identification of a small optimal subset of CpG sites as bio-markers from high-throughput DNA methylation profiles. BMC Bioinform. 9(1), 457 (2008)CrossRef Meng, H., Murrelle, E.L., Li, G.: Identification of a small optimal subset of CpG sites as bio-markers from high-throughput DNA methylation profiles. BMC Bioinform. 9(1), 457 (2008)CrossRef
9.
Zurück zum Zitat Amin, I.I., Hassanien, A.E., Kassim, S.K., Hefny, H.A.: Big DNA methylation data analysis and visualizing in a common form of breast cancer. In: Hassanien, A.E., Azar, A.T., Snasael, V., Kacprzyk, J., Abawajy, J.H. (eds.) Big Data in Complex Systems. SBD, vol. 9, pp. 375–392. Springer, Heidelberg (2015) Amin, I.I., Hassanien, A.E., Kassim, S.K., Hefny, H.A.: Big DNA methylation data analysis and visualizing in a common form of breast cancer. In: Hassanien, A.E., Azar, A.T., Snasael, V., Kacprzyk, J., Abawajy, J.H. (eds.) Big Data in Complex Systems. SBD, vol. 9, pp. 375–392. Springer, Heidelberg (2015)
10.
Zurück zum Zitat Valavanis, I., Pilalis, E., Georgiadis, P., Kyrtopoulos, S., Chatziioannou, A.: Cancer biomarkers from genome-scale DNA methylation: Comparison of evolutionary and semantic analysis methods. Microarrays 4(4), 647–670 (2015)CrossRef Valavanis, I., Pilalis, E., Georgiadis, P., Kyrtopoulos, S., Chatziioannou, A.: Cancer biomarkers from genome-scale DNA methylation: Comparison of evolutionary and semantic analysis methods. Microarrays 4(4), 647–670 (2015)CrossRef
11.
Zurück zum Zitat Gunavathi, C., Premalatha, K.: Cuckoo search optimisation for feature selection in cancer classification: a new approach. Int. J. Data Min. Bioinform. 13(3), 248–265 (2015)CrossRef Gunavathi, C., Premalatha, K.: Cuckoo search optimisation for feature selection in cancer classification: a new approach. Int. J. Data Min. Bioinform. 13(3), 248–265 (2015)CrossRef
12.
Zurück zum Zitat Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)MATHCrossRef Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)MATHCrossRef
13.
Zurück zum Zitat Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9), 1106–1114 (2007)CrossRef Zhou, X., Tuck, D.P.: MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics 23(9), 1106–1114 (2007)CrossRef
14.
Zurück zum Zitat Bibikova, M., Le, J., Barnes, B., Saedinia-Melnyk, S., Zhou, L., Shen, R., Gunderson, K.L.: Genome-wide dna methylation profiling using infinium\(\textregistered \) assay. Epigenomics 1(1), 177–200 (2009)CrossRef Bibikova, M., Le, J., Barnes, B., Saedinia-Melnyk, S., Zhou, L., Shen, R., Gunderson, K.L.: Genome-wide dna methylation profiling using infinium\(\textregistered \) assay. Epigenomics 1(1), 177–200 (2009)CrossRef
15.
Zurück zum Zitat Bibikova, M., Barnes, B., Tsan, C., Ho, V., Klotzle, B., Le, J.M., Delano, D., Zhang, L., Schroth, G.P., Gunderson, K.L., et al.: High density dna methylation array with single CpG site resolution. Genomics 98(4), 288–295 (2011)CrossRef Bibikova, M., Barnes, B., Tsan, C., Ho, V., Klotzle, B., Le, J.M., Delano, D., Zhang, L., Schroth, G.P., Gunderson, K.L., et al.: High density dna methylation array with single CpG site resolution. Genomics 98(4), 288–295 (2011)CrossRef
16.
Zurück zum Zitat Lipworth, L., Morgans, A.K., Edwards, T.L., Barocas, D.A., Chang, S.S., Herrell, S.D., Penson, D.F., Resnick, M.J., Smith, J.A., Clark, P.E.: Renal cell cancer histological subtype distribution differs by race and sex. BJU Int. 117(2), 260–265 (2016)CrossRef Lipworth, L., Morgans, A.K., Edwards, T.L., Barocas, D.A., Chang, S.S., Herrell, S.D., Penson, D.F., Resnick, M.J., Smith, J.A., Clark, P.E.: Renal cell cancer histological subtype distribution differs by race and sex. BJU Int. 117(2), 260–265 (2016)CrossRef
17.
Zurück zum Zitat Liu, Y., Aryee, M.J., Padyukov, L., Fallin, M.D., Hesselberg, E., Runarsson, A., Reinius, L., Acevedo, N., Taub, M., Ronninger, M., et al.: Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat. Biotechnol. 31(2), 142–147 (2013)CrossRef Liu, Y., Aryee, M.J., Padyukov, L., Fallin, M.D., Hesselberg, E., Runarsson, A., Reinius, L., Acevedo, N., Taub, M., Ronninger, M., et al.: Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat. Biotechnol. 31(2), 142–147 (2013)CrossRef
18.
Zurück zum Zitat Teschendorff, A.E., Menon, U., Gentry-Maharaj, A., Ramus, S.J., Weisenberger, D.J., Shen, H., Campan, M., Noushmehr, H., Bell, C.G., Maxwell, A.P., et al.: Age-dependent dna methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 20(4), 440–446 (2010)CrossRef Teschendorff, A.E., Menon, U., Gentry-Maharaj, A., Ramus, S.J., Weisenberger, D.J., Shen, H., Campan, M., Noushmehr, H., Bell, C.G., Maxwell, A.P., et al.: Age-dependent dna methylation of genes that are suppressed in stem cells is a hallmark of cancer. Genome Res. 20(4), 440–446 (2010)CrossRef
19.
Zurück zum Zitat Dedeurwaerder, S., Defrance, M., Bizet, M., Calonne, E., Bontempi, G., Fuks, F.: A comprehensive overview of infinium humanmethylation450 data processing. Briefings Bioinform. 15(6), 929–941 (2013)CrossRef Dedeurwaerder, S., Defrance, M., Bizet, M., Calonne, E., Bontempi, G., Fuks, F.: A comprehensive overview of infinium humanmethylation450 data processing. Briefings Bioinform. 15(6), 929–941 (2013)CrossRef
20.
Zurück zum Zitat Chen, Y.A., Lemire, M., Choufani, S., Butcher, D.T., Grafodatskaya, D., Zanke, B.W., Gallinger, S., Hudson, T.J., Weksberg, R.: Discovery of cross-reactive probes and polymorphic CpGs in the illumina infinium humanmethylation450 microarray. Epigenetics 8(2), 203–209 (2013)CrossRef Chen, Y.A., Lemire, M., Choufani, S., Butcher, D.T., Grafodatskaya, D., Zanke, B.W., Gallinger, S., Hudson, T.J., Weksberg, R.: Discovery of cross-reactive probes and polymorphic CpGs in the illumina infinium humanmethylation450 microarray. Epigenetics 8(2), 203–209 (2013)CrossRef
21.
Zurück zum Zitat Zhang, Q., Wu, H., Zheng, H.: Aberrantly methylated CpG island detection in colon cancer. J. Proteomics Bioinform. 2015 (2015) Zhang, Q., Wu, H., Zheng, H.: Aberrantly methylated CpG island detection in colon cancer. J. Proteomics Bioinform. 2015 (2015)
23.
Zurück zum Zitat Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 8(1), 1 (2007)CrossRef Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 8(1), 1 (2007)CrossRef
24.
Zurück zum Zitat Liang, J.D., Ping, X.O., Tseng, Y.J., Huang, G.T., Lai, F., Yang, P.M.: Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods. Comput. Methods Programs Biomed. 117(3), 425–434 (2014)CrossRef Liang, J.D., Ping, X.O., Tseng, Y.J., Huang, G.T., Lai, F., Yang, P.M.: Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods. Comput. Methods Programs Biomed. 117(3), 425–434 (2014)CrossRef
25.
Zurück zum Zitat Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992) Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
26.
Zurück zum Zitat Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)MATHCrossRef Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)MATHCrossRef
27.
Zurück zum Zitat Keller, A.D., Schummer, M., Hood, L., Ruzzo, W.L.: Bayesian classification of DNA array expression data. Technical Report UW-CSE-2000-08-01 (2000) Keller, A.D., Schummer, M., Hood, L., Ruzzo, W.L.: Bayesian classification of DNA array expression data. Technical Report UW-CSE-2000-08-01 (2000)
28.
Zurück zum Zitat Huerta, E.B., Duval, B., Hao, J.K.: A hybrid LDA and genetic algorithm for gene selection and classification of microarray data. Neurocomputing 73(13), 2375–2383 (2010)CrossRef Huerta, E.B., Duval, B., Hao, J.K.: A hybrid LDA and genetic algorithm for gene selection and classification of microarray data. Neurocomputing 73(13), 2375–2383 (2010)CrossRef
29.
Zurück zum Zitat Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, pp. 421–427 (2007) Kuncheva, L.I.: A stability index for feature selection. In: Artificial Intelligence and Applications, pp. 421–427 (2007)
Metadaten
Titel
A Comparative Study of Feature Selection and Classification Techniques for High-Throughput DNA Methylation Data
verfasst von
Alhasan Alkuhlani
Mohammad Nassef
Ibrahim Farag
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-48308-5_76