Skip to main content

Adaptive Machine Learning Algorithm and Analytics of Big Genomic Data for Gene Prediction

  • Chapter
  • First Online:
Tracking and Preventing Diseases with Artificial Intelligence

Abstract

Artificial intelligence helps in tracking and preventing diseases. For instance, machine learning algorithms can analyze big genomic data and predict genes, which helps researchers and scientists to gain deep insights about protein-coding genes in viruses that cause certain diseases. To elaborate, prediction of protein-coding genes from the genome of organisms is important to the synthesis of protein and the understating of the regulatory function of the non-coding region. Over the past few years, researchers have developed methods for finding protein-coding genes. Notwithstanding, the recent data explosion in genomics accentuates the need for efficient gene prediction algorithms. This book chapter presents an adaptive naive Bayes-based machine learning (NBML) algorithm to deploy over a cluster of the Apache Spark framework for efficient prediction of genes in the genome of eukaryotic organisms. To evaluate the NBML algorithm on its discovery of the protein-coding genes from the human genome chromosome GRCh37, a confusion matrix was constructed and its results show that NBML led to high specificity, precision and accuracy of 94.01%, 95.04% and 96.02%, respectively. Moreover, the algorithm can be effective for transfer knowledge in new genomic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ddbj.nig.ac.jp/index-e.html.

  2. 2.

    https://www.ebi.ac.uk/.

  3. 3.

    https://flybase.org/.

  4. 4.

    http://mirbase.org/.

  5. 5.

    https://www.ncbi.nlm.nih.gov/.

References

  1. Abbasi, O., Rostami, A., Karimian, G.: Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform. BMC Bioinform. 12, 430:1–430:14 (2011). https://doi.org/10.1186/1471-2105-12-430

  2. Ahn, S., Couture, S.V., Cuzzocrea, A., Dam, K., Grasso, G.M., Leung, C.K., Kaleigh L. McCormick, Bryan H. Wodi: A fuzzy logic based machine learning tool for supporting big data business analytics in complex artificial intelligence environments. FUZZ-IEEE 2019, 1259–1264 (2019). https://doi.org/10.1109/FUZZ-IEEE.2019.8858791

  3. Alaee, S., Kamgar, K., Keogh, E.J.: Matrix profile XXII: exact discovery of time series motifs under DTW. IEEE ICDM 2020, 900–905 (2020). https://doi.org/10.1109/ICDM50108.2020.00099

    Article  Google Scholar 

  4. Alam, M.T., Ahmed, C.F., Samiullah, M., Leung, C.K.: Discriminating frequent pattern based supervised graph embedding for classification. PAKDD 2021 Part II, 16–28 (2021). https://doi.org/10.1007/978-3-030-75765-6_2

  5. Anaissi, A., Goyal, M., Catchpoole, D.R., Braytee, A., Kennedy, P.J.: Ensemble feature learning of genomic data using support vector machine, PLOS ONE 11(6), e0157330:1–e0157330:17 (2016). https://doi.org/10.1371/journal.pone.0157330

  6. Awe, O.I., Makolo, A., Fatumo, S.: Computational prediction of protein-coding regions in human transcriptomes: an application to the elderly. IREHI 2017, 29–32 (2017). https://doi.org/10.1109/IREEHI.2017.8350465

  7. Bandyopadhyay, S., Maulik, U., Roy, D.: Gene identification: classical and computational intelligence approaches. IEEE TSMCC 38(1), 55–68 (2008). https://doi.org/10.1109/TSMCC.2007.906066

    Article  Google Scholar 

  8. Bauckhage, C., Drachen, A., Sifa, R.: Clustering game behavior data. IEEE TCIAIG 7(3), 266–278 (2015). https://doi.org/10.1109/TCIAIG.2014.2376982

    Article  Google Scholar 

  9. Benchaira, K., Bitam, S., Mellouk, A., Tahri, A., Okbi, R.: AfibPred: a novel atrial fibrillation prediction approach based on short single-lead ECG using deep transfer knowledge. BDIoT 2019, 26:1–26:6 (2019). https://doi.org/10.1145/3372938.3372964

  10. Birney, E., Durbin, R.: Using GeneWise in the Drosophila annotation experiment. Gen. Res. 10(4), 547–548 (2000). https://doi.org/10.1101/gr.10.4.547

    Article  Google Scholar 

  11. Boateng, E.Y., Oduro, F.T.: Predicting microfinance credit default: a study of Nsoatreman Rural Bank Ghana. J. Adv. Math. Comput. Sci. 26(1), 33569:1–33569:9 (2018). https://doi.org/10.9734/JAMCS/2018/33569

  12. Braun, P., Cuzzocrea, A., Keding, T.D., Leung, C.K., Pazdor, A.G.M., Sayson, D.: Game data mining: clustering and visualization of online game data in cyber-physical worlds. Proc. Comput. Sci. 112, 2259–2268 (2017). https://doi.org/10.1016/j.procs.2017.08.141

    Article  Google Scholar 

  13. Brown, J.A., Cuzzocrea, A., Kresta, M., Kristjanson, K.D.L., Leung, C.K., Tebinka, T.W.: A machine learning system for supporting advanced knowledge discovery from chess game data. IEEE ICMLA 2017, 649–654 (2017). https://doi.org/10.1109/ICMLA.2017.00-87

    Article  Google Scholar 

  14. Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mole. Biol. 268(1), 78–94 (1997). https://doi.org/10.1006/jmbi.1997.0951

    Article  Google Scholar 

  15. Chalmers, E., Contreras, E.B., Robertson, B., Luczak, A., Gruber, A.: Learning to predict consequences as a method of knowledge transfer in reinforcement learning. IEEE TNNLS 29(6), 2259–2270 (2018). https://doi.org/10.1109/TNNLS.2017.2690910

    Article  Google Scholar 

  16. Chanda, A.K., Ahmed, C.F., Samiullah, M., Leung, C.K.: A new framework for mining weighted periodic patterns in time series databases. ESWA 79, 207–224 (2017). https://doi.org/10.1016/j.eswa.2017.02.028

    Article  Google Scholar 

  17. Chen, Y., Leung, C.K., Shang, S., Wen, Q.: Temporal data analytics on COVID-19 data with ubiquitous computing. IEEE ISPA-BDCloud-SocialCom-SustainCom 2020, 958–965 (2020). https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00146

  18. Cheng, J.: Machine Learning Algorithms for Protein Structure Prediction. University of California, Irvine, USA (2007). PhD thesis

    Google Scholar 

  19. Cheng, J., Tegge, A.N., Baldi, P.: Machine learning methods for protein structure prediction. IEEE RBME 1, 41–49 (2008). https://doi.org/10.1109/RBME.2008.2008239

    Article  Google Scholar 

  20. Choudhary, R., Gianey, H.K.: Comprehensive review on supervised machine learning algorithms. MLDS 2017, 37–43 (2017). https://doi.org/10.1109/MLDS.2017.11

    Article  Google Scholar 

  21. Claverie, J.: Computational methods for the identification of genes in vertebrate, genomic sequences. Human Mole. Gen. 6(10), 1735–1744 (1997). https://doi.org/10.1093/hmg/6.10.1735

    Article  Google Scholar 

  22. Cuong, P., Binh, K., Tran, N.T.: A high-performance FPGA-based BWA-MEM DNA sequence alignment. CCPE 33(2) (2021). https://doi.org/10.1002/cpe.5328

  23. Dada, E.G., Bassi, J.S, Chiroma, H., Abdulhamid, S.M, Adetunmbi, A.O, Ajibuwa, O.E.: Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5(6), e01802:1–e01802:23 (2019). https://doi.org/10.1016/j.heliyon.2019.e01802

  24. Daemen, A., Gevaert, O., De Moor, B.: Integration of clinical and microarray data with kernel methods. IEEE EMBS 2007, 5411–5415 (2007). https://doi.org/10.1109/IEMBS.2007.4353566

  25. Dai, W., Xue, G., Yang, Q., Yu, Y.: Transferring naive Bayes classifiers for text classification. AAAI 2007, 540–545 (2007)

    Google Scholar 

  26. De Guia, J., Devaraj, M., Leung, C.K.: DeepGx: deep learning using gene expression for cancer classification. IEEE/ACM ASONAM 2019, 913–920 (2019). https://doi.org/10.1145/3341161.3343516

    Article  Google Scholar 

  27. De Vries, et al.: Effect of a comprehensive surgical system on patient outcomes. New England J. Med. 363(20), 1928–1937 (2010). https://doi.org/10.1056/nejmsa0911535

  28. Dierckens, K.E., Harrison, A.B., Leung, C.K., Pind, A.V.: A data science and engineering solution for fast k-means clustering of big data. IEEE TrustCom-BigDataSE-ICESS 2017, 925–932 (2017). https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.332

    Article  Google Scholar 

  29. Do, J.H., Choi, D.K.: Computational approaches to gene prediction. J. Microbiol. 44(2), 137–144 (2006)

    Google Scholar 

  30. Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P.: Cross-organism learning method to discover new gene functionalities. Comput. Methods Progr. Biomed. 12, 20–34 (2016). https://doi.org/10.1016/j.cmpb.2015.12.002

    Article  Google Scholar 

  31. Ekblom, R., Wolf, J.B.: A field guide to whole-genome sequencing, assembly and annotation. Evol. Appl. 7(9), 1026–1042 (2014). https://doi.org/10.1111/eva.12178

    Article  Google Scholar 

  32. Fariha, A., Ahmed, C.F., Leung, C.K., Abdullah, S.M., Cao, L.: Mining frequent patterns from human interactions in meetings using directed acyclic graphs. PAKDD 2013, Part I, 38–49 (2013). https://doi.org/10.1007/978-3-642-37453-1_4

  33. Galpert, D., del Río, S., Herrera, F., Ancede-Gallardo, E., Antunes, A., Agüero-Chapin, G.: An effective big data supervised imbalanced classification approach for ortholog detection in related yeast species. BioMed. Res. Int. 2015, 748681:1–748681:12 (2015). https://doi.org/10.1155/2015/748681

  34. Gelfand, M.S.: Gene recognition via spliced sequence alignment. PNAS 93(17), 9061–9066 (1996). https://doi.org/10.1073/pnas.93.17.9061

    Article  Google Scholar 

  35. Gross, T., Faull, J., Ketteridge, S., Springham, D.: Eukaryotic microorganisms. In: Introductory Microbiology, pp. 241–286 (1995). https://doi.org/10.1007/978-1-4899-7194-4_9

  36. Guigo, R., Agarwal, P., Abril, J.F., Burset, M., Fickett, J.W.: An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10(10), 1631–1642 (2000). https://doi.org/10.1101/gr.122800

    Article  Google Scholar 

  37. Gunawan, T.S., Epps, J., Ambikairajah, E.: Boosting approach to exon detection in DNA sequences. Electron. Lett. 44(4), 323–324 (2008). https://doi.org/10.1049/el:20082343

    Article  Google Scholar 

  38. Heidema, A.G., Boer, J.M.A., Nagelkerke, N., Mariman, E.C.M., van der A, D.L., Feskens, E.J.M.: The challenge for genetic epidemiologists: how to analyze large number of SNPs in relation to complex diseases. BMC Gen. 7, 23:1–23:15 (2006). https://doi.org/10.1186/1471-2156-7-23

  39. Holmes, G., Pfahringer, G., Kirkby, B., Frank, R., Hall, E.M.: Multiclass alternating decision trees. ECML 2002, 161–172 (2002). https://doi.org/10.1007/3-540-36755-1_14

    Article  MathSciNet  MATH  Google Scholar 

  40. Jiang, F., Leung, C.K.: A data analytic algorithm for managing, querying, and processing uncertain big data in cloud environments. Algorithms 8(4), 1175–1194 (2015). https://doi.org/10.3390/a8041175

    Article  Google Scholar 

  41. Jiang, F., Leung, C.K., Sarumi, O.A., Zhang, C.Y.: Mining sequential patterns from uncertain big DNA in the Spark framework. IEEE BIBM, 874–88 (2016). https://doi.org/10.1109/BIBM.2016.7822641

  42. Kan, Z., Rouchka, E.C., Gish, W.R., States, D.J.: Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 11(5), 889–900 (2001). https://doi.org/10.1101/gr.155001

    Article  Google Scholar 

  43. Kaya, M., Sarhan, A., Alhajj, R.: Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. Comput. Methods Programs Biomed. 114(1), 38–49 (2014). https://doi.org/10.1016/j.cmpb.2014.01.013

    Article  Google Scholar 

  44. Kobusinska, A., Leung, C.K., Hsu, C., Raghavendra, S., Chang, V.: Emerging trends, issues and challenges in Internet of Things, big data and cloud computing. FGCS 87, 416–419 (2018). https://doi.org/10.1016/j.future.2018.05.021

    Article  Google Scholar 

  45. Le, D.H., Xuan, H.N., Kwon, Y.K.: A comparative study of classification-based machine learning methods for novel disease gene prediction. KSE 2014, 577–588 (2015). https://doi.org/10.1007/978-3-319-11680-8_46

  46. Lee, R.C., Cuzzocrea, A., Lee, W., Leung, C.K.: An innovative majority voting mechanism in interactive social network clustering. ACM WIMS 2017, 14:1–14:10 (2017). https://doi.org/10.1145/3102254.3102268

  47. Leung, C.K.: Big data analysis and mining. In: Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics, pp. 15–27 (2019). https://doi.org/10.4018/978-1-5225-7598-6.ch002

  48. Leung, C.K.: Uncertain frequent pattern mining. In: Frequent Pattern Mining, pp. 417–453 (2014). https://doi.org/10.1007/978-3-319-07821-2_14

  49. Leung, C.K., Braun, P., Cuzzocrea, A.: AI-based sensor information fusion for supporting deep supervised learning. Sensors 19(6), 1345:1–1345:12 (2019).https://doi.org/10.3390/s19061345

  50. Leung, C.K., Braun, P., Pazdor, A.G.M.: Effective classification of ground transportation modes for urban data mining in smart cities. DaWaK 2018, 83–97 (2018). https://doi.org/10.1007/978-3-319-98539-8_7

    Article  Google Scholar 

  51. Leung, C.K., Carmichael, C.L.: FpVAT: a visual analytic tool for supporting frequent pattern mining. ACM SIGKDD Explorations 11(2), 39–48 (2009). https://doi.org/10.1145/1809400.1809407

    Article  Google Scholar 

  52. Leung, C.K., Chen, Y., Hoi, C.S.H., Shang, S., Cuzzocrea, A.: Machine learning and OLAP on big COVID-19 data. IEEE BigData 2020, 5118–5127 (2020). https://doi.org/10.1109/BigData50022.2020.9378407

  53. Leung, C.K., Chen, Y., Hoi, C.S.H., Shang, S., Wen, Y., Cuzzocrea, A.: Big data visualization and visual analytics of COVID-19 data. IV 2020, 415–420 (2020). https://doi.org/10.1109/IV51561.2020.00073

  54. Leung, C.K., Chen, Y., Shang, S., Deng, D.: Big data science on COVID-19 data. IEEE BigDataSE 2020, 14–21 (2020). https://doi.org/10.1109/BigDataSE50710.2020.00010

    Article  Google Scholar 

  55. Leung, C.K., Cuzzocrea, A., Mai, J.J., Deng, D., Jiang, F.: Personalized DeepInf: enhanced social influence prediction with deep learning and transfer learning. IEEE BigData 2019, 2871–2880 (2019). https://doi.org/10.1109/BigData47090.2019.9005969

    Article  Google Scholar 

  56. Leung, C.K., Elias, J.D., Minuk, S.M., de Jesus, A.R.R., Cuzzocrea, A.: An innovative fuzzy logic-based machine learning algorithm for supporting predictive analytics on big transportation data. FUZZ-IEEE 2020, 1905–1912 (2020). https://doi.org/10.1109/FUZZ48607.2020.9177823

    Article  Google Scholar 

  57. Leung, C.K., Jiang, F., Zhang, Y.: Explainable machine learning and mining of influential patterns from sparse web. IEEE/WIC/ACM WI-IAT 2020 (2020)

    Google Scholar 

  58. Leung, C.K., MacKinnon, R.K., Wang, Y.: A machine learning approach for stock price prediction. IDEAS 2014, 274–277 (2014). https://doi.org/10.1145/2628194.2628211

    Article  Google Scholar 

  59. Leung, C.K., Sarumi, O.A., Zhang, C.Y.: Predictive analytics on genomic data with high-performance computing. IEEE BIBM 2020, 2187–2194 (2020). https://doi.org/10.1109/BIBM49941.2020.9312982

    Article  Google Scholar 

  60. Lim, H., Xie, L.: A new weighted imputed neighborhood-regularized tri-factorization one-class collaborative filtering algorithm: application to target gene prediction of transcription factors. IEEE/ACM TCBB 18(1), 126–137 (2021). https://doi.org/10.1109/TCBB.2020.2968442

    Article  Google Scholar 

  61. Liu, B., Blasch, E., Chen, Y., Shen, D., Chen, G.: Scalable sentiment classification for big data analysis using naive Bayes classifier. IEEE BigData 2013, 99–104 (2013). https://doi.org/10.1109/BigData.2013.6691740

    Article  Google Scholar 

  62. MacKinnon, R.K., Leung, C.K.: Stock price prediction in undirected graphs using a structural support vector machine. IEEE/WIC/ACM WI-IAT 2015, 548–555 (2015). https://doi.org/10.1109/WI-IAT.2015.189

    Article  Google Scholar 

  63. Maji, S., Garg, D.: Progress in gene prediction: principles and challenges. Curr. Bioinform. 8(2), 226–243 (2013). https://doi.org/10.2174/1574893611308020011

    Article  Google Scholar 

  64. Margulis, L.: The classification and evolution of prokaryotes and eukaryotes. In: Bacteria, Bacteriophages, and Fungi, pp. 1–41. (1974). https://doi.org/10.1007/978-1-4899-1710-2_1

  65. Martins, P.V.L.: Gene Prediction Using Deep Learning. Master’s dissertation, University of Porto, Portugal (2018). https://repositorio-aberto.up.pt/handle/10216/114372

  66. Mathe, C., Sagot, M., Schiex, T., Rouze, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 30(19), 4103–4117 (2002). https://doi.org/10.1093/nar/gkf543

    Article  Google Scholar 

  67. McElwain, M.: A Critical Review of Gene Prediction Software. BIOC 218 final paper, Stanford University, USA (2007)

    Google Scholar 

  68. Meisler, M.H.: Evolutionarily conserved noncoding DNA in the human genome: how much and what for? Genome Res. 11(10), 1617–1618 (2000). https://doi.org/10.1101/gr.211401

    Article  Google Scholar 

  69. Meyer, M., Durbin, R.: Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18(10), 1309–1318 (2002). https://doi.org/10.1093/bioinformatics/18.10.1309

    Article  Google Scholar 

  70. Miao, Y., Jiang, H., Liu, H., Yao, Y.: An Alzheimers disease related genes identification method based on multiple classifier integration. Comput. Methods Programs Biomed. 150, 107–115 (2017). https://doi.org/10.1016/j.cmpb.2017.08.006

    Article  Google Scholar 

  71. Mignone, F.: Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis. Nucleic Acids Res. 31(15), 4639–4645 (2003). https://doi.org/10.1093/nar/gkg483

    Article  Google Scholar 

  72. Min, B., Oh, H., Ryu, G., Choi, S.H., Leung, C.K., Yoo, K.: Image classification for agricultural products using transfer learning. BigDAS 2020, 48–52 (2020)

    Google Scholar 

  73. Min, X.J., Butler, G., Storms, R., Sang, A.T.: OrfPredictor: predicting protein-coding regions in EST-derived sequences. Nucleic Acids Res. 33, W677–W680 (2005). https://doi.org/10.1093/nar/gki394

    Article  Google Scholar 

  74. Morris, K.J., Egan, S.D., Linsangan, J.L., Leung, C.K., Cuzzocrea, A., Hoi, C.S.H.: Hoi: Token-based adaptive time-series prediction by ensembling linear and non-linear estimators: a machine learning approach for predictive analytics on big stock data. IEEE ICMLA 2018, 1486–1491 (2018). https://doi.org/10.1109/ICMLA.2018.00242

    Article  Google Scholar 

  75. Nagaraj, K., Sharvani, G.S., Sridhar, A.: Emerging trend of big data analytics in bioinformatics: a literature review. IJBRA 14(1–2), 144–205 (2018). https://doi.org/10.1504/IJBRA.2018.089175

    Article  Google Scholar 

  76. Olson, R.S., La Cava, W., Mustahsan, Z., Varik, A., Moore, J.H.: Data-driven advice for applying machine learning to bioinformatics problems. Biocomputing 2018, 192–203 (2018). https://doi.org/10.1142/9789813235533_0018

    Article  Google Scholar 

  77. Palit, I., Reddy, C.K., Schwartz, K.L.: Differential predictive modeling for racial disparities in breast cancer. IEEE BIBM 2009, 239–245 (2009). https://doi.org/10.1109/BIBM.2009.89

  78. Parmar, B.S., Mehta, M.A: Computer-aided diagnosis of thyroid dysfunction: a survey. BDA 2020, 164–189 (2020). https://doi.org/10.1007/978-3-030-66665-1_12

  79. Patelia, V., Patel, M.S.: Brain computer interface: applications and P300 Speller overview. ICCCNT 2019, 2129–2133 (2019). https://doi.org/10.1109/ICCCNT45670.2019.8944461

    Article  Google Scholar 

  80. Pawliszak, T., Chua, M., Leung, C.K., Tremblay-Savard, O.: Operon-based approach for the inference of rRNA and tRNA evolutionary histories in bacteria. BMC Gen. 21(Supplement 2), 252:1–252:14 (2020). https://doi.org/10.1186/s12864-020-6612-2

  81. Peralta, D., del Río, S., Ramírez-Gallego, S., Triguero, I., Benitez, J.M., Herrera, F.: Evolutionary feature selection for big data classification: a MapReduce approach. Math. Probl. Eng. 2015, 246139:1–246139:11 (2015). https://doi.org/10.1155/2015/246139

  82. Picardi, E., Pesole, G.: Computational methods for ab initio and comparative gene finding. In: Data Mining Techniques for the Life Sciences, pp. 269–284 (2010). https://doi.org/10.1007/978-1-60327-241-4_16

  83. Quinlan, J.R.: Decision trees and decision-making. IEEE TSMC 20(2), 339–346 (1990). https://doi.org/10.1109/21.52545

    Article  Google Scholar 

  84. Sacar, D., Allmer, J.: Machine learning methods for microRNA gene prediction. Methods Mol. Biol. 1107, 177–187 (2014). https://doi.org/10.1007/978-1-62703-748-8_10

    Article  Google Scholar 

  85. Sarumi, O.A., Leung, C.K.: Exploiting anti-monotonic constraints for mining palindromic motifs from big genomic data. IEEE BigData 2019, 4864–4873 (2019). https://doi.org/10.1109/BigData47090.2019.9006397

    Article  Google Scholar 

  86. Sarumi, O.A., Leung, C.K.: Scalable data science and machine learning algorithm for gene prediction. BigDAS 2019, 118–126 (2019)

    Google Scholar 

  87. Sarumi, O.A., Leung, C.K., Adetunmbi, O.A.: Spark-based data analytics of sequence motifs in large omics data. Proc. Comput. Sci. 126, 596–605 (2018). https://doi.org/10.1016/j.procs.2018.07.294

    Article  Google Scholar 

  88. Schneider, H.W., Raiol, T., Brigido, M.M., Walter, M.E.M., Stadler, P.F.: A support vector machine based method to distinguish long non-coding RNAs from protein coding transcripts. BMC Gen. 18(1), 804:1–804:14 (2017). https://doi.org/10.1186/s12864-017-4178-4

  89. Shang, S., Chen, Y., Leung, C.K., Pazdor, A.G.M.: Spatial data science of COVID-19 data. IEEE HPCC-SmartCity-DSS 2020, 1370–1375 (2020). https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00177

  90. She, R., Chu, J.S., Wang, K., Chen, N.: Fast and accurate gene prediction by decision tree classification. SIAM DM 2010, 790–801 (2010). https://doi.org/10.1137/1.9781611972801.69

    Article  Google Scholar 

  91. Shnorhavorian, M., Bittner, R., Wright, J.L., Schwartz, S.M.: Maternal risk factors for congenital urinary anomalies: results of a population-based case-control study. Urology 78(5), 1156–1161 (2011). https://doi.org/10.1016/j.urology.2011.04.022

    Article  Google Scholar 

  92. Singh, S.P., Leung, C.K., Hamilton, J.D.: Analytics of similar-sounding names from the web with phonetic based clustering. IEEE/WIC/ACM WI-IAT 2020 (2020)

    Google Scholar 

  93. Song, Y., Liu, C., Wang, Z.: A machine learning approach for accurate annotation of noncoding RNAs. IEEE/ACM TCBB 12(3), 551–559 (2015). https://doi.org/10.1109/TCBB.2014.2366758

    Article  Google Scholar 

  94. Souza, J., Leung, C.K., Cuzzocrea, A.: An innovative big data predictive analytics framework over hybrid big data sources with an application for disease analytics. AINA 2020, 669–680 (2020). https://doi.org/10.1007/978-3-030-44041-1_59

    Article  Google Scholar 

  95. Toivonen, J., Das, P.K., Taipale, J., Ukkonen, E.: MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs. Bioinformatics 36(9), 2690–2696 (2020). https://doi.org/10.1093/bioinformatics/btaa045

    Article  Google Scholar 

  96. van der Schaar, M., Alaa, A.M., Floto, R.A., Gimson, A., Scholtes, S., Wood, A.M., McKinney, E.F., Jarrett, D., Lió, P., Ercole, A.: How artificial intelligence and machine learning can help healthcare systems respond to COVID-19. Mach. Learn. 110(1), 1–14 (2021). https://doi.org/10.1007/s10994-020-05928-x

    Article  MathSciNet  Google Scholar 

  97. Vanitha, C.D.A., Devaraj, D., Venkatesulu, M.: Gene expression data classification using support vector machine and mutual information-based gene selection. Proc. Comput. Sci. 47, 13–21 (2015). https://doi.org/10.1016/j.procs.2015.03.178

    Article  Google Scholar 

  98. Venter, J.C., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001). https://doi.org/10.1126/science.1058040

    Article  Google Scholar 

  99. Wang, Z., Chen, Y., Li, Y.: A brief review of computational gene prediction methods. Gen. Proteom. Bioinform. 2(4), 216–221 (2004). https://doi.org/10.1016/s1672-0229(04)02028-5

    Article  Google Scholar 

  100. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3, 9:1–9:40 (2016). https://doi.org/10.1186/s40537-016-0043-6

  101. Williams-DeVane, C.R., Reif, D.M., Cohen Hubal, E.C., Bushel, P.R., Hudgens, E.E., Gallagher, J.E., Edwards, S.W.: Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes. BMC Syst. Biol. 7, 119:1–119:19 (2013). https://doi.org/10.1186/1752-0509-7-119

  102. Wu, J.M., Srivastava, G., Jolfaei, A., Fournier-Viger, P., Lin, J.C.: Hiding sensitive information in eHealth datasets. FGCS 117, 169–180 (2021). https://doi.org/10.1016/j.future.2020.11.026

    Article  Google Scholar 

  103. Yassour, M., Kaplan, T., Fraser, H.B., Levin, J.Z., Pfiner, J., Adiconis, X., Schroth, G., Luo, S., Khrebtukova, I., Gnirke, A.: Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. PNAS 106(9), 3264–3269 (2009). https://doi.org/10.1073/pnas.0812841106

    Article  Google Scholar 

  104. Ying, C., Yu, J., He, J.: Towards fault tolerance optimization based on checkpoints of in-memory framework Spark. J. Ambient. Intell. Humaniz. Comput. (2018). https://doi.org/10.1007/s12652-018-1018-6

    Article  Google Scholar 

  105. Yip, K.Y., Cheng C., Gerstein M.: Machine learning and genome annotation: a match meant to be? Gen. Biol. 14(5), 205:1–205:10 (2013). https://doi.org/10.1186/gb-2013-14-5-205

  106. Yu, N., Yu, Z., Li, B., Gu, F., Pan, Y.: A comprehensive review of emerging computational methods for gene identification. J. Inf. Process. Syst. 12(1), 1–34 (2016). https://doi.org/10.3745/JIPS.04.0023

    Article  Google Scholar 

  107. Zhang, C.T., Wang, J.: Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res. 28(14), 2804–2814 (2002). https://doi.org/10.1093/nar/28.14.2804

    Article  Google Scholar 

  108. Zhang, X., Lu, X., Shi, Q., Xu, X-Q., Hon-chiu E.L., Harris, L.N., Iglehart, J.D., Miron, A., Liu, J.S., Wong, W.H.: Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinform. 7, 197:1–179:13 (2006). https://doi.org/10.1186/1471-2105-7-197

Download references

Acknowledgements

This project is partially supported by (a) Association of Commonwealth Universities (ACU), (b) Natural Sciences and Engineering Research Council of Canada (NSERC), and (c) University of Manitoba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carson K. Leung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sarumi, O.A., Leung, C.K. (2022). Adaptive Machine Learning Algorithm and Analytics of Big Genomic Data for Gene Prediction. In: Mehta, M., Fournier-Viger, P., Patel, M., Lin, J.CW. (eds) Tracking and Preventing Diseases with Artificial Intelligence. Intelligent Systems Reference Library, vol 206. Springer, Cham. https://doi.org/10.1007/978-3-030-76732-7_5

Download citation

Publish with us

Policies and ethics