Abstract
Artificial intelligence helps in tracking and preventing diseases. For instance, machine learning algorithms can analyze big genomic data and predict genes, which helps researchers and scientists to gain deep insights about protein-coding genes in viruses that cause certain diseases. To elaborate, prediction of protein-coding genes from the genome of organisms is important to the synthesis of protein and the understating of the regulatory function of the non-coding region. Over the past few years, researchers have developed methods for finding protein-coding genes. Notwithstanding, the recent data explosion in genomics accentuates the need for efficient gene prediction algorithms. This book chapter presents an adaptive naive Bayes-based machine learning (NBML) algorithm to deploy over a cluster of the Apache Spark framework for efficient prediction of genes in the genome of eukaryotic organisms. To evaluate the NBML algorithm on its discovery of the protein-coding genes from the human genome chromosome GRCh37, a confusion matrix was constructed and its results show that NBML led to high specificity, precision and accuracy of 94.01%, 95.04% and 96.02%, respectively. Moreover, the algorithm can be effective for transfer knowledge in new genomic datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbasi, O., Rostami, A., Karimian, G.: Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform. BMC Bioinform. 12, 430:1–430:14 (2011). https://doi.org/10.1186/1471-2105-12-430
Ahn, S., Couture, S.V., Cuzzocrea, A., Dam, K., Grasso, G.M., Leung, C.K., Kaleigh L. McCormick, Bryan H. Wodi: A fuzzy logic based machine learning tool for supporting big data business analytics in complex artificial intelligence environments. FUZZ-IEEE 2019, 1259–1264 (2019). https://doi.org/10.1109/FUZZ-IEEE.2019.8858791
Alaee, S., Kamgar, K., Keogh, E.J.: Matrix profile XXII: exact discovery of time series motifs under DTW. IEEE ICDM 2020, 900–905 (2020). https://doi.org/10.1109/ICDM50108.2020.00099
Alam, M.T., Ahmed, C.F., Samiullah, M., Leung, C.K.: Discriminating frequent pattern based supervised graph embedding for classification. PAKDD 2021 Part II, 16–28 (2021). https://doi.org/10.1007/978-3-030-75765-6_2
Anaissi, A., Goyal, M., Catchpoole, D.R., Braytee, A., Kennedy, P.J.: Ensemble feature learning of genomic data using support vector machine, PLOS ONE 11(6), e0157330:1–e0157330:17 (2016). https://doi.org/10.1371/journal.pone.0157330
Awe, O.I., Makolo, A., Fatumo, S.: Computational prediction of protein-coding regions in human transcriptomes: an application to the elderly. IREHI 2017, 29–32 (2017). https://doi.org/10.1109/IREEHI.2017.8350465
Bandyopadhyay, S., Maulik, U., Roy, D.: Gene identification: classical and computational intelligence approaches. IEEE TSMCC 38(1), 55–68 (2008). https://doi.org/10.1109/TSMCC.2007.906066
Bauckhage, C., Drachen, A., Sifa, R.: Clustering game behavior data. IEEE TCIAIG 7(3), 266–278 (2015). https://doi.org/10.1109/TCIAIG.2014.2376982
Benchaira, K., Bitam, S., Mellouk, A., Tahri, A., Okbi, R.: AfibPred: a novel atrial fibrillation prediction approach based on short single-lead ECG using deep transfer knowledge. BDIoT 2019, 26:1–26:6 (2019). https://doi.org/10.1145/3372938.3372964
Birney, E., Durbin, R.: Using GeneWise in the Drosophila annotation experiment. Gen. Res. 10(4), 547–548 (2000). https://doi.org/10.1101/gr.10.4.547
Boateng, E.Y., Oduro, F.T.: Predicting microfinance credit default: a study of Nsoatreman Rural Bank Ghana. J. Adv. Math. Comput. Sci. 26(1), 33569:1–33569:9 (2018). https://doi.org/10.9734/JAMCS/2018/33569
Braun, P., Cuzzocrea, A., Keding, T.D., Leung, C.K., Pazdor, A.G.M., Sayson, D.: Game data mining: clustering and visualization of online game data in cyber-physical worlds. Proc. Comput. Sci. 112, 2259–2268 (2017). https://doi.org/10.1016/j.procs.2017.08.141
Brown, J.A., Cuzzocrea, A., Kresta, M., Kristjanson, K.D.L., Leung, C.K., Tebinka, T.W.: A machine learning system for supporting advanced knowledge discovery from chess game data. IEEE ICMLA 2017, 649–654 (2017). https://doi.org/10.1109/ICMLA.2017.00-87
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mole. Biol. 268(1), 78–94 (1997). https://doi.org/10.1006/jmbi.1997.0951
Chalmers, E., Contreras, E.B., Robertson, B., Luczak, A., Gruber, A.: Learning to predict consequences as a method of knowledge transfer in reinforcement learning. IEEE TNNLS 29(6), 2259–2270 (2018). https://doi.org/10.1109/TNNLS.2017.2690910
Chanda, A.K., Ahmed, C.F., Samiullah, M., Leung, C.K.: A new framework for mining weighted periodic patterns in time series databases. ESWA 79, 207–224 (2017). https://doi.org/10.1016/j.eswa.2017.02.028
Chen, Y., Leung, C.K., Shang, S., Wen, Q.: Temporal data analytics on COVID-19 data with ubiquitous computing. IEEE ISPA-BDCloud-SocialCom-SustainCom 2020, 958–965 (2020). https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00146
Cheng, J.: Machine Learning Algorithms for Protein Structure Prediction. University of California, Irvine, USA (2007). PhD thesis
Cheng, J., Tegge, A.N., Baldi, P.: Machine learning methods for protein structure prediction. IEEE RBME 1, 41–49 (2008). https://doi.org/10.1109/RBME.2008.2008239
Choudhary, R., Gianey, H.K.: Comprehensive review on supervised machine learning algorithms. MLDS 2017, 37–43 (2017). https://doi.org/10.1109/MLDS.2017.11
Claverie, J.: Computational methods for the identification of genes in vertebrate, genomic sequences. Human Mole. Gen. 6(10), 1735–1744 (1997). https://doi.org/10.1093/hmg/6.10.1735
Cuong, P., Binh, K., Tran, N.T.: A high-performance FPGA-based BWA-MEM DNA sequence alignment. CCPE 33(2) (2021). https://doi.org/10.1002/cpe.5328
Dada, E.G., Bassi, J.S, Chiroma, H., Abdulhamid, S.M, Adetunmbi, A.O, Ajibuwa, O.E.: Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5(6), e01802:1–e01802:23 (2019). https://doi.org/10.1016/j.heliyon.2019.e01802
Daemen, A., Gevaert, O., De Moor, B.: Integration of clinical and microarray data with kernel methods. IEEE EMBS 2007, 5411–5415 (2007). https://doi.org/10.1109/IEMBS.2007.4353566
Dai, W., Xue, G., Yang, Q., Yu, Y.: Transferring naive Bayes classifiers for text classification. AAAI 2007, 540–545 (2007)
De Guia, J., Devaraj, M., Leung, C.K.: DeepGx: deep learning using gene expression for cancer classification. IEEE/ACM ASONAM 2019, 913–920 (2019). https://doi.org/10.1145/3341161.3343516
De Vries, et al.: Effect of a comprehensive surgical system on patient outcomes. New England J. Med. 363(20), 1928–1937 (2010). https://doi.org/10.1056/nejmsa0911535
Dierckens, K.E., Harrison, A.B., Leung, C.K., Pind, A.V.: A data science and engineering solution for fast k-means clustering of big data. IEEE TrustCom-BigDataSE-ICESS 2017, 925–932 (2017). https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.332
Do, J.H., Choi, D.K.: Computational approaches to gene prediction. J. Microbiol. 44(2), 137–144 (2006)
Domeniconi, G., Masseroli, M., Moro, G., Pinoli, P.: Cross-organism learning method to discover new gene functionalities. Comput. Methods Progr. Biomed. 12, 20–34 (2016). https://doi.org/10.1016/j.cmpb.2015.12.002
Ekblom, R., Wolf, J.B.: A field guide to whole-genome sequencing, assembly and annotation. Evol. Appl. 7(9), 1026–1042 (2014). https://doi.org/10.1111/eva.12178
Fariha, A., Ahmed, C.F., Leung, C.K., Abdullah, S.M., Cao, L.: Mining frequent patterns from human interactions in meetings using directed acyclic graphs. PAKDD 2013, Part I, 38–49 (2013). https://doi.org/10.1007/978-3-642-37453-1_4
Galpert, D., del Río, S., Herrera, F., Ancede-Gallardo, E., Antunes, A., Agüero-Chapin, G.: An effective big data supervised imbalanced classification approach for ortholog detection in related yeast species. BioMed. Res. Int. 2015, 748681:1–748681:12 (2015). https://doi.org/10.1155/2015/748681
Gelfand, M.S.: Gene recognition via spliced sequence alignment. PNAS 93(17), 9061–9066 (1996). https://doi.org/10.1073/pnas.93.17.9061
Gross, T., Faull, J., Ketteridge, S., Springham, D.: Eukaryotic microorganisms. In: Introductory Microbiology, pp. 241–286 (1995). https://doi.org/10.1007/978-1-4899-7194-4_9
Guigo, R., Agarwal, P., Abril, J.F., Burset, M., Fickett, J.W.: An assessment of gene prediction accuracy in large DNA sequences. Genome Res. 10(10), 1631–1642 (2000). https://doi.org/10.1101/gr.122800
Gunawan, T.S., Epps, J., Ambikairajah, E.: Boosting approach to exon detection in DNA sequences. Electron. Lett. 44(4), 323–324 (2008). https://doi.org/10.1049/el:20082343
Heidema, A.G., Boer, J.M.A., Nagelkerke, N., Mariman, E.C.M., van der A, D.L., Feskens, E.J.M.: The challenge for genetic epidemiologists: how to analyze large number of SNPs in relation to complex diseases. BMC Gen. 7, 23:1–23:15 (2006). https://doi.org/10.1186/1471-2156-7-23
Holmes, G., Pfahringer, G., Kirkby, B., Frank, R., Hall, E.M.: Multiclass alternating decision trees. ECML 2002, 161–172 (2002). https://doi.org/10.1007/3-540-36755-1_14
Jiang, F., Leung, C.K.: A data analytic algorithm for managing, querying, and processing uncertain big data in cloud environments. Algorithms 8(4), 1175–1194 (2015). https://doi.org/10.3390/a8041175
Jiang, F., Leung, C.K., Sarumi, O.A., Zhang, C.Y.: Mining sequential patterns from uncertain big DNA in the Spark framework. IEEE BIBM, 874–88 (2016). https://doi.org/10.1109/BIBM.2016.7822641
Kan, Z., Rouchka, E.C., Gish, W.R., States, D.J.: Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 11(5), 889–900 (2001). https://doi.org/10.1101/gr.155001
Kaya, M., Sarhan, A., Alhajj, R.: Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. Comput. Methods Programs Biomed. 114(1), 38–49 (2014). https://doi.org/10.1016/j.cmpb.2014.01.013
Kobusinska, A., Leung, C.K., Hsu, C., Raghavendra, S., Chang, V.: Emerging trends, issues and challenges in Internet of Things, big data and cloud computing. FGCS 87, 416–419 (2018). https://doi.org/10.1016/j.future.2018.05.021
Le, D.H., Xuan, H.N., Kwon, Y.K.: A comparative study of classification-based machine learning methods for novel disease gene prediction. KSE 2014, 577–588 (2015). https://doi.org/10.1007/978-3-319-11680-8_46
Lee, R.C., Cuzzocrea, A., Lee, W., Leung, C.K.: An innovative majority voting mechanism in interactive social network clustering. ACM WIMS 2017, 14:1–14:10 (2017). https://doi.org/10.1145/3102254.3102268
Leung, C.K.: Big data analysis and mining. In: Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics, pp. 15–27 (2019). https://doi.org/10.4018/978-1-5225-7598-6.ch002
Leung, C.K.: Uncertain frequent pattern mining. In: Frequent Pattern Mining, pp. 417–453 (2014). https://doi.org/10.1007/978-3-319-07821-2_14
Leung, C.K., Braun, P., Cuzzocrea, A.: AI-based sensor information fusion for supporting deep supervised learning. Sensors 19(6), 1345:1–1345:12 (2019).https://doi.org/10.3390/s19061345
Leung, C.K., Braun, P., Pazdor, A.G.M.: Effective classification of ground transportation modes for urban data mining in smart cities. DaWaK 2018, 83–97 (2018). https://doi.org/10.1007/978-3-319-98539-8_7
Leung, C.K., Carmichael, C.L.: FpVAT: a visual analytic tool for supporting frequent pattern mining. ACM SIGKDD Explorations 11(2), 39–48 (2009). https://doi.org/10.1145/1809400.1809407
Leung, C.K., Chen, Y., Hoi, C.S.H., Shang, S., Cuzzocrea, A.: Machine learning and OLAP on big COVID-19 data. IEEE BigData 2020, 5118–5127 (2020). https://doi.org/10.1109/BigData50022.2020.9378407
Leung, C.K., Chen, Y., Hoi, C.S.H., Shang, S., Wen, Y., Cuzzocrea, A.: Big data visualization and visual analytics of COVID-19 data. IV 2020, 415–420 (2020). https://doi.org/10.1109/IV51561.2020.00073
Leung, C.K., Chen, Y., Shang, S., Deng, D.: Big data science on COVID-19 data. IEEE BigDataSE 2020, 14–21 (2020). https://doi.org/10.1109/BigDataSE50710.2020.00010
Leung, C.K., Cuzzocrea, A., Mai, J.J., Deng, D., Jiang, F.: Personalized DeepInf: enhanced social influence prediction with deep learning and transfer learning. IEEE BigData 2019, 2871–2880 (2019). https://doi.org/10.1109/BigData47090.2019.9005969
Leung, C.K., Elias, J.D., Minuk, S.M., de Jesus, A.R.R., Cuzzocrea, A.: An innovative fuzzy logic-based machine learning algorithm for supporting predictive analytics on big transportation data. FUZZ-IEEE 2020, 1905–1912 (2020). https://doi.org/10.1109/FUZZ48607.2020.9177823
Leung, C.K., Jiang, F., Zhang, Y.: Explainable machine learning and mining of influential patterns from sparse web. IEEE/WIC/ACM WI-IAT 2020 (2020)
Leung, C.K., MacKinnon, R.K., Wang, Y.: A machine learning approach for stock price prediction. IDEAS 2014, 274–277 (2014). https://doi.org/10.1145/2628194.2628211
Leung, C.K., Sarumi, O.A., Zhang, C.Y.: Predictive analytics on genomic data with high-performance computing. IEEE BIBM 2020, 2187–2194 (2020). https://doi.org/10.1109/BIBM49941.2020.9312982
Lim, H., Xie, L.: A new weighted imputed neighborhood-regularized tri-factorization one-class collaborative filtering algorithm: application to target gene prediction of transcription factors. IEEE/ACM TCBB 18(1), 126–137 (2021). https://doi.org/10.1109/TCBB.2020.2968442
Liu, B., Blasch, E., Chen, Y., Shen, D., Chen, G.: Scalable sentiment classification for big data analysis using naive Bayes classifier. IEEE BigData 2013, 99–104 (2013). https://doi.org/10.1109/BigData.2013.6691740
MacKinnon, R.K., Leung, C.K.: Stock price prediction in undirected graphs using a structural support vector machine. IEEE/WIC/ACM WI-IAT 2015, 548–555 (2015). https://doi.org/10.1109/WI-IAT.2015.189
Maji, S., Garg, D.: Progress in gene prediction: principles and challenges. Curr. Bioinform. 8(2), 226–243 (2013). https://doi.org/10.2174/1574893611308020011
Margulis, L.: The classification and evolution of prokaryotes and eukaryotes. In: Bacteria, Bacteriophages, and Fungi, pp. 1–41. (1974). https://doi.org/10.1007/978-1-4899-1710-2_1
Martins, P.V.L.: Gene Prediction Using Deep Learning. Master’s dissertation, University of Porto, Portugal (2018). https://repositorio-aberto.up.pt/handle/10216/114372
Mathe, C., Sagot, M., Schiex, T., Rouze, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 30(19), 4103–4117 (2002). https://doi.org/10.1093/nar/gkf543
McElwain, M.: A Critical Review of Gene Prediction Software. BIOC 218 final paper, Stanford University, USA (2007)
Meisler, M.H.: Evolutionarily conserved noncoding DNA in the human genome: how much and what for? Genome Res. 11(10), 1617–1618 (2000). https://doi.org/10.1101/gr.211401
Meyer, M., Durbin, R.: Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18(10), 1309–1318 (2002). https://doi.org/10.1093/bioinformatics/18.10.1309
Miao, Y., Jiang, H., Liu, H., Yao, Y.: An Alzheimers disease related genes identification method based on multiple classifier integration. Comput. Methods Programs Biomed. 150, 107–115 (2017). https://doi.org/10.1016/j.cmpb.2017.08.006
Mignone, F.: Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis. Nucleic Acids Res. 31(15), 4639–4645 (2003). https://doi.org/10.1093/nar/gkg483
Min, B., Oh, H., Ryu, G., Choi, S.H., Leung, C.K., Yoo, K.: Image classification for agricultural products using transfer learning. BigDAS 2020, 48–52 (2020)
Min, X.J., Butler, G., Storms, R., Sang, A.T.: OrfPredictor: predicting protein-coding regions in EST-derived sequences. Nucleic Acids Res. 33, W677–W680 (2005). https://doi.org/10.1093/nar/gki394
Morris, K.J., Egan, S.D., Linsangan, J.L., Leung, C.K., Cuzzocrea, A., Hoi, C.S.H.: Hoi: Token-based adaptive time-series prediction by ensembling linear and non-linear estimators: a machine learning approach for predictive analytics on big stock data. IEEE ICMLA 2018, 1486–1491 (2018). https://doi.org/10.1109/ICMLA.2018.00242
Nagaraj, K., Sharvani, G.S., Sridhar, A.: Emerging trend of big data analytics in bioinformatics: a literature review. IJBRA 14(1–2), 144–205 (2018). https://doi.org/10.1504/IJBRA.2018.089175
Olson, R.S., La Cava, W., Mustahsan, Z., Varik, A., Moore, J.H.: Data-driven advice for applying machine learning to bioinformatics problems. Biocomputing 2018, 192–203 (2018). https://doi.org/10.1142/9789813235533_0018
Palit, I., Reddy, C.K., Schwartz, K.L.: Differential predictive modeling for racial disparities in breast cancer. IEEE BIBM 2009, 239–245 (2009). https://doi.org/10.1109/BIBM.2009.89
Parmar, B.S., Mehta, M.A: Computer-aided diagnosis of thyroid dysfunction: a survey. BDA 2020, 164–189 (2020). https://doi.org/10.1007/978-3-030-66665-1_12
Patelia, V., Patel, M.S.: Brain computer interface: applications and P300 Speller overview. ICCCNT 2019, 2129–2133 (2019). https://doi.org/10.1109/ICCCNT45670.2019.8944461
Pawliszak, T., Chua, M., Leung, C.K., Tremblay-Savard, O.: Operon-based approach for the inference of rRNA and tRNA evolutionary histories in bacteria. BMC Gen. 21(Supplement 2), 252:1–252:14 (2020). https://doi.org/10.1186/s12864-020-6612-2
Peralta, D., del Río, S., Ramírez-Gallego, S., Triguero, I., Benitez, J.M., Herrera, F.: Evolutionary feature selection for big data classification: a MapReduce approach. Math. Probl. Eng. 2015, 246139:1–246139:11 (2015). https://doi.org/10.1155/2015/246139
Picardi, E., Pesole, G.: Computational methods for ab initio and comparative gene finding. In: Data Mining Techniques for the Life Sciences, pp. 269–284 (2010). https://doi.org/10.1007/978-1-60327-241-4_16
Quinlan, J.R.: Decision trees and decision-making. IEEE TSMC 20(2), 339–346 (1990). https://doi.org/10.1109/21.52545
Sacar, D., Allmer, J.: Machine learning methods for microRNA gene prediction. Methods Mol. Biol. 1107, 177–187 (2014). https://doi.org/10.1007/978-1-62703-748-8_10
Sarumi, O.A., Leung, C.K.: Exploiting anti-monotonic constraints for mining palindromic motifs from big genomic data. IEEE BigData 2019, 4864–4873 (2019). https://doi.org/10.1109/BigData47090.2019.9006397
Sarumi, O.A., Leung, C.K.: Scalable data science and machine learning algorithm for gene prediction. BigDAS 2019, 118–126 (2019)
Sarumi, O.A., Leung, C.K., Adetunmbi, O.A.: Spark-based data analytics of sequence motifs in large omics data. Proc. Comput. Sci. 126, 596–605 (2018). https://doi.org/10.1016/j.procs.2018.07.294
Schneider, H.W., Raiol, T., Brigido, M.M., Walter, M.E.M., Stadler, P.F.: A support vector machine based method to distinguish long non-coding RNAs from protein coding transcripts. BMC Gen. 18(1), 804:1–804:14 (2017). https://doi.org/10.1186/s12864-017-4178-4
Shang, S., Chen, Y., Leung, C.K., Pazdor, A.G.M.: Spatial data science of COVID-19 data. IEEE HPCC-SmartCity-DSS 2020, 1370–1375 (2020). https://doi.org/10.1109/HPCC-SmartCity-DSS50907.2020.00177
She, R., Chu, J.S., Wang, K., Chen, N.: Fast and accurate gene prediction by decision tree classification. SIAM DM 2010, 790–801 (2010). https://doi.org/10.1137/1.9781611972801.69
Shnorhavorian, M., Bittner, R., Wright, J.L., Schwartz, S.M.: Maternal risk factors for congenital urinary anomalies: results of a population-based case-control study. Urology 78(5), 1156–1161 (2011). https://doi.org/10.1016/j.urology.2011.04.022
Singh, S.P., Leung, C.K., Hamilton, J.D.: Analytics of similar-sounding names from the web with phonetic based clustering. IEEE/WIC/ACM WI-IAT 2020 (2020)
Song, Y., Liu, C., Wang, Z.: A machine learning approach for accurate annotation of noncoding RNAs. IEEE/ACM TCBB 12(3), 551–559 (2015). https://doi.org/10.1109/TCBB.2014.2366758
Souza, J., Leung, C.K., Cuzzocrea, A.: An innovative big data predictive analytics framework over hybrid big data sources with an application for disease analytics. AINA 2020, 669–680 (2020). https://doi.org/10.1007/978-3-030-44041-1_59
Toivonen, J., Das, P.K., Taipale, J., Ukkonen, E.: MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs. Bioinformatics 36(9), 2690–2696 (2020). https://doi.org/10.1093/bioinformatics/btaa045
van der Schaar, M., Alaa, A.M., Floto, R.A., Gimson, A., Scholtes, S., Wood, A.M., McKinney, E.F., Jarrett, D., Lió, P., Ercole, A.: How artificial intelligence and machine learning can help healthcare systems respond to COVID-19. Mach. Learn. 110(1), 1–14 (2021). https://doi.org/10.1007/s10994-020-05928-x
Vanitha, C.D.A., Devaraj, D., Venkatesulu, M.: Gene expression data classification using support vector machine and mutual information-based gene selection. Proc. Comput. Sci. 47, 13–21 (2015). https://doi.org/10.1016/j.procs.2015.03.178
Venter, J.C., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001). https://doi.org/10.1126/science.1058040
Wang, Z., Chen, Y., Li, Y.: A brief review of computational gene prediction methods. Gen. Proteom. Bioinform. 2(4), 216–221 (2004). https://doi.org/10.1016/s1672-0229(04)02028-5
Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3, 9:1–9:40 (2016). https://doi.org/10.1186/s40537-016-0043-6
Williams-DeVane, C.R., Reif, D.M., Cohen Hubal, E.C., Bushel, P.R., Hudgens, E.E., Gallagher, J.E., Edwards, S.W.: Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes. BMC Syst. Biol. 7, 119:1–119:19 (2013). https://doi.org/10.1186/1752-0509-7-119
Wu, J.M., Srivastava, G., Jolfaei, A., Fournier-Viger, P., Lin, J.C.: Hiding sensitive information in eHealth datasets. FGCS 117, 169–180 (2021). https://doi.org/10.1016/j.future.2020.11.026
Yassour, M., Kaplan, T., Fraser, H.B., Levin, J.Z., Pfiner, J., Adiconis, X., Schroth, G., Luo, S., Khrebtukova, I., Gnirke, A.: Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. PNAS 106(9), 3264–3269 (2009). https://doi.org/10.1073/pnas.0812841106
Ying, C., Yu, J., He, J.: Towards fault tolerance optimization based on checkpoints of in-memory framework Spark. J. Ambient. Intell. Humaniz. Comput. (2018). https://doi.org/10.1007/s12652-018-1018-6
Yip, K.Y., Cheng C., Gerstein M.: Machine learning and genome annotation: a match meant to be? Gen. Biol. 14(5), 205:1–205:10 (2013). https://doi.org/10.1186/gb-2013-14-5-205
Yu, N., Yu, Z., Li, B., Gu, F., Pan, Y.: A comprehensive review of emerging computational methods for gene identification. J. Inf. Process. Syst. 12(1), 1–34 (2016). https://doi.org/10.3745/JIPS.04.0023
Zhang, C.T., Wang, J.: Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res. 28(14), 2804–2814 (2002). https://doi.org/10.1093/nar/28.14.2804
Zhang, X., Lu, X., Shi, Q., Xu, X-Q., Hon-chiu E.L., Harris, L.N., Iglehart, J.D., Miron, A., Liu, J.S., Wong, W.H.: Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinform. 7, 197:1–179:13 (2006). https://doi.org/10.1186/1471-2105-7-197
Acknowledgements
This project is partially supported by (a) Association of Commonwealth Universities (ACU), (b) Natural Sciences and Engineering Research Council of Canada (NSERC), and (c) University of Manitoba.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sarumi, O.A., Leung, C.K. (2022). Adaptive Machine Learning Algorithm and Analytics of Big Genomic Data for Gene Prediction. In: Mehta, M., Fournier-Viger, P., Patel, M., Lin, J.CW. (eds) Tracking and Preventing Diseases with Artificial Intelligence. Intelligent Systems Reference Library, vol 206. Springer, Cham. https://doi.org/10.1007/978-3-030-76732-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-76732-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76731-0
Online ISBN: 978-3-030-76732-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)