Skip to main content
Erschienen in: Soft Computing 12/2015

16.12.2014 | Focus

A novel approach for predicting DNA splice junctions using hybrid machine learning algorithms

verfasst von: Indrajit Mandal

Erschienen in: Soft Computing | Ausgabe 12/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Accurate identification of splice junctions in a DNA sequence is an active area of research. The knowledge of splice junction’s occurrence provides valuable information about its internal genomic structure and aids in its deeper analysis and interpretation. The major problems faced during gene analysis are diversity, complexity and the uncertainty nature of DNA sequences. The application of computational techniques using machine learning algorithms in this direction has attracted enormous attention in the last few decades. In this study, the development of hybrid machine learning ensembles approaches is presented that address the splice junction problem more effectively. Multiple classifier systems consisting of random subspace, rotation forest and boosting methods are implemented and are validated over the real genome sequence dataset. A novel feature selection technique based on attribute’s correlation estimation using Best first strategy is proposed. The average prediction accuracy achieved is more than 98 % in identifying the splice junctions. All the computations are performed with 95 % confidence interval. The results presented in this study are superior as compared to the state-of-the-art approaches in the literature. This work strengthens the viability of expanding and using machine learning models to similar problems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Baten AKMA, Chang BCH, Halgamuge SK, Li J (2006) Splice site identification using probabilistic parameters and SVM classification. BMC Bioinf 7(SUPPL. 5), art. no. S15. doi:10.1186/1471-2105-7-S5-S15 Baten AKMA, Chang BCH, Halgamuge SK, Li J (2006) Splice site identification using probabilistic parameters and SVM classification. BMC Bioinf 7(SUPPL. 5), art. no. S15. doi:10.​1186/​1471-2105-7-S5-S15
Zurück zum Zitat Churbanov A, Rogozin IB, Deogun JS, Ali H (2006) Method of predicting splice sites based on signal interactions. Biol Direct 1, art. no. 10. doi:10.1186/1745-6150-1-10 Churbanov A, Rogozin IB, Deogun JS, Ali H (2006) Method of predicting splice sites based on signal interactions. Biol Direct 1, art. no. 10. doi:10.​1186/​1745-6150-1-10
Zurück zum Zitat Ciuffo B, Punzo V (2014) ’No free lunch’ theorems applied to the calibration of traffic simulation models. IEEE Trans Intell Transp Syst 15(2):553–562, art. no. 6670773. doi:10.1109/TITS.2013.2287720 Ciuffo B, Punzo V (2014) ’No free lunch’ theorems applied to the calibration of traffic simulation models. IEEE Trans Intell Transp Syst 15(2):553–562, art. no. 6670773. doi:10.​1109/​TITS.​2013.​2287720
Zurück zum Zitat Dietterich TG (2000) Experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157. doi:10.1023/A:1007607513941 CrossRef Dietterich TG (2000) Experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157. doi:10.​1023/​A:​1007607513941 CrossRef
Zurück zum Zitat Dogan RI, Getoor L, Wilbur WJ, Mount SM (2007) SplicePort–an interactive splice-site analysis tool. Nucl Acids Res 35(SUPPL.2): W285–W291. doi:10.1093/nar/gkm407 Dogan RI, Getoor L, Wilbur WJ, Mount SM (2007) SplicePort–an interactive splice-site analysis tool. Nucl Acids Res 35(SUPPL.2): W285–W291. doi:10.​1093/​nar/​gkm407
Zurück zum Zitat Kamath U, Compton J, Islamaj-Dogan R, De Jong KA, Shehu A (2012) An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction. IEEE/ACM Trans Comput Biol Bioinf 9(5):1387–1398, art. no. 6185531. doi:10.1109/TCBB.2012.53 Kamath U, Compton J, Islamaj-Dogan R, De Jong KA, Shehu A (2012) An evolutionary algorithm approach for feature generation from sequence data and its application to DNA splice site prediction. IEEE/ACM Trans Comput Biol Bioinf 9(5):1387–1398, art. no. 6185531. doi:10.​1109/​TCBB.​2012.​53
Zurück zum Zitat Kashiwabara AY, Vieira DCG, Machado-Lima A, Durham AM (2007) Splice site prediction using stochastic regular grammars. Genet Mol Res 6(1):105–115 Kashiwabara AY, Vieira DCG, Machado-Lima A, Durham AM (2007) Splice site prediction using stochastic regular grammars. Genet Mol Res 6(1):105–115
Zurück zum Zitat Lu X, Peng X, Deng Y, Feng B, Liu P, Liao B (2014) A novel feature selection method based on correlation-based feature selection in cancer recognition. J Comput Theor Nanosci 11(2):427–433. doi:10.1166/jctn.2014.3374 CrossRef Lu X, Peng X, Deng Y, Feng B, Liu P, Liao B (2014) A novel feature selection method based on correlation-based feature selection in cancer recognition. J Comput Theor Nanosci 11(2):427–433. doi:10.​1166/​jctn.​2014.​3374 CrossRef
Zurück zum Zitat Malousi A, Chouvarda I, Koutkias V, Kouidou S, Maglaveras N (2010) SpliceIT: a hybrid method for splice signal identification based on probabilistic and biological inference. J Biomed Inf 43(2):208–217. doi:10.1016/j.jbi.2009.09.004 CrossRef Malousi A, Chouvarda I, Koutkias V, Kouidou S, Maglaveras N (2010) SpliceIT: a hybrid method for splice signal identification based on probabilistic and biological inference. J Biomed Inf 43(2):208–217. doi:10.​1016/​j.​jbi.​2009.​09.​004 CrossRef
Zurück zum Zitat Sun Z, Sang L, Ju L, Zhu H (2008) A new method for splice site prediction based on the sequence patterns of splicing signals and regulatory elements. Chin Sci Bull 53(21):3331–3340. doi:10.1007/s11434-008-0448-5 CrossRef Sun Z, Sang L, Ju L, Zhu H (2008) A new method for splice site prediction based on the sequence patterns of splicing signals and regulatory elements. Chin Sci Bull 53(21):3331–3340. doi:10.​1007/​s11434-008-0448-5 CrossRef
Zurück zum Zitat Towell GG, Shavlik JW (1993) Extracting refined rules from knowledge-based neural networks. Mach Learn 13(1):71–101. doi:10.1007/BF00993103 Towell GG, Shavlik JW (1993) Extracting refined rules from knowledge-based neural networks. Mach Learn 13(1):71–101. doi:10.​1007/​BF00993103
Zurück zum Zitat Wei D, Zhang H, Wei Y, Jiang Q (2013) A novel splice site prediction method using support vector machine. J Comput Inf Syst 9(20):8053–8060. doi:10.12733/jcis6763 Wei D, Zhang H, Wei Y, Jiang Q (2013) A novel splice site prediction method using support vector machine. J Comput Inf Syst 9(20):8053–8060. doi:10.​12733/​jcis6763
Zurück zum Zitat Xu J, Yang G, Man H, He H (2013) L1 graph based on sparse coding for feature selection. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 7951. LNCS (PART 1), pp 594–601. doi:10.1007/978-3-642-39065-4-71 Xu J, Yang G, Man H, He H (2013) L1 graph based on sparse coding for feature selection. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 7951. LNCS (PART 1), pp 594–601. doi:10.​1007/​978-3-642-39065-4-71
Zurück zum Zitat Yu H, Hong S, Yang X, Ni J, Dan Y, Qin B (2013) Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers. BioMed Res Int 2013, art. no. 239628. doi:10.1155/2013/239628. PUBMED ID: 24078908 Yu H, Hong S, Yang X, Ni J, Dan Y, Qin B (2013) Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers. BioMed Res Int 2013, art. no. 239628. doi:10.​1155/​2013/​239628. PUBMED ID: 24078908
Zurück zum Zitat Zhang L-R, Luo L-F (2004) Recognition of splice sites in genes by use of diversity measure method. Progr Biochem Biophys 31(1):77–82 Zhang L-R, Luo L-F (2004) Recognition of splice sites in genes by use of diversity measure method. Progr Biochem Biophys 31(1):77–82
Zurück zum Zitat Zięba M, Tomczak JM, Lubicz M, Swiątek J (2014) Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl Soft Comput J 14(PART A):99–108. doi:10.1016/j.asoc.2013.07.016 Zięba M, Tomczak JM, Lubicz M, Swiątek J (2014) Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl Soft Comput J 14(PART A):99–108. doi:10.​1016/​j.​asoc.​2013.​07.​016
Metadaten
Titel
A novel approach for predicting DNA splice junctions using hybrid machine learning algorithms
verfasst von
Indrajit Mandal
Publikationsdatum
16.12.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 12/2015
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-014-1550-z

Weitere Artikel der Ausgabe 12/2015

Soft Computing 12/2015 Zur Ausgabe