Skip to main content
Erschienen in: Neural Computing and Applications 8/2018

05.04.2017 | New Trends in data pre-processing methods for signal and image classification

Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches

verfasst von: Serkan Kurt, Ersoy Öz, Öyküm Esra Aşkın, Yeliz Yücel Öz

Erschienen in: Neural Computing and Applications | Ausgabe 8/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Knowledge of DNA sequences is indispensable for basic biological research. Many researchers use DNA sequencing for various purposes including molecular biology research and sequence comparison for individual identification. Automated DNA sequencing devices use four colored chromatograms or base-calling signals to indicate strength of hybridization for each base channel. Typically, relative strengths of peaks at each base location are used to quantify the quality and/or reliability of individual readings. However, assessment of overall quality of whole DNA trace files remains to be an open problem. Therefore, classification of raw DNA trace files as high or low quality is an important issue for efficient utilization of resources. In this study, we have used several supervised machine learning approaches, including logistic regression and ensemble decision trees, to identify high- or acceptable-quality chromatogram files and compared their prediction performances. In order to test and develop our ideas, we have used a public DNA trace repository consisting of 1626 high- and 631 low-quality files marked by our expert molecular biologist. Our results indicate that, although all of the methods tried offer comparable and acceptable performances, random forest decision tree algorithm with adapting boosting ensemble learning shows slightly higher prediction accuracy with as few as four features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74(12):5463–5467CrossRef Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 74(12):5463–5467CrossRef
4.
Zurück zum Zitat Benhamou CL, Poupon S, Lespessailles E, Loiseau S, Jennane R, Siroux V, Ohley W, Pothuaud L (2001) Fractal analysis of radiographic trabecular bone texture and bone mineral density: two complementary parameters related to osteoporotic fractures. J Bone Miner Res 16:697–704. doi:10.1359/jbmr.2001.16.4.697 CrossRef Benhamou CL, Poupon S, Lespessailles E, Loiseau S, Jennane R, Siroux V, Ohley W, Pothuaud L (2001) Fractal analysis of radiographic trabecular bone texture and bone mineral density: two complementary parameters related to osteoporotic fractures. J Bone Miner Res 16:697–704. doi:10.​1359/​jbmr.​2001.​16.​4.​697 CrossRef
5.
Zurück zum Zitat Tartar A, Kilic N, Akan A (2013) Classification of pulmonary nodules by using hybrid features. Computational and Mathematical Methods in Medicine Article ID 148363, 11 pages. doi:10.1155/2013/148363 Tartar A, Kilic N, Akan A (2013) Classification of pulmonary nodules by using hybrid features. Computational and Mathematical Methods in Medicine Article ID 148363, 11 pages. doi:10.​1155/​2013/​148363
7.
Zurück zum Zitat Tartar A, Akan A, Kilic N (2014) A novel approach to malignant-benign classification of pulmonary nodules by using ensemble learning classifiers. In: 36th Annual international conference of the IEEE engineering in medicine and biology society 4651–4654. doi: 10.1109/EMBC.2014.6944661 Tartar A, Akan A, Kilic N (2014) A novel approach to malignant-benign classification of pulmonary nodules by using ensemble learning classifiers. In: 36th Annual international conference of the IEEE engineering in medicine and biology society 4651–4654. doi: 10.​1109/​EMBC.​2014.​6944661
9.
Zurück zum Zitat Manaster C, Zheng W, Teuber M, Wachter S, Doring F, Schreiber S, Hampe J (2005) InSNP: a tool for automated detection and visualization of SNPs and InDels. Hum Mutat 26(1):11–19. doi:10.1002/humu.20188 CrossRef Manaster C, Zheng W, Teuber M, Wachter S, Doring F, Schreiber S, Hampe J (2005) InSNP: a tool for automated detection and visualization of SNPs and InDels. Hum Mutat 26(1):11–19. doi:10.​1002/​humu.​20188 CrossRef
10.
Zurück zum Zitat Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, HobokenMATH Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, HobokenMATH
14.
Zurück zum Zitat Sushilkumar K (2015) Analysis of WEKA data mining algorithm REPTree, Simple CART and random tree for classification of Indian news. Int J Innov Sci Eng Technol 2(2):438–446 Sushilkumar K (2015) Analysis of WEKA data mining algorithm REPTree, Simple CART and random tree for classification of Indian news. Int J Innov Sci Eng Technol 2(2):438–446
15.
Zurück zum Zitat Quinlan JR (2014) C4. 5: programs for machine learning. Morgan Kaufmann, San Francisco Quinlan JR (2014) C4. 5: programs for machine learning. Morgan Kaufmann, San Francisco
19.
20.
Zurück zum Zitat Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of conference on multiple classifier systems 1–15 Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of conference on multiple classifier systems 1–15
23.
Zurück zum Zitat Ridgeway G (1999) The state of boosting. Comput Sci Stat 31:172–181 Ridgeway G (1999) The state of boosting. Comput Sci Stat 31:172–181
26.
Zurück zum Zitat Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B (Methodol) 36(2):111–147MathSciNetMATH Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B (Methodol) 36(2):111–147MathSciNetMATH
27.
Zurück zum Zitat Hall MA (1999) Correlation-based feature selection for machine learning. Dissertation, The University of Waikato Hall MA (1999) Correlation-based feature selection for machine learning. Dissertation, The University of Waikato
Metadaten
Titel
Classification of nucleotide sequences for quality assessment using logistic regression and decision tree approaches
verfasst von
Serkan Kurt
Ersoy Öz
Öyküm Esra Aşkın
Yeliz Yücel Öz
Publikationsdatum
05.04.2017
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 8/2018
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-2960-5

Weitere Artikel der Ausgabe 8/2018

Neural Computing and Applications 8/2018 Zur Ausgabe

New Trends in data pre-processing methods for signal and image classification

A novel system for automatic detection of K-complexes in sleep EEG

New Trends in data pre-processing methods for signal and image classification

Fuzzy logic-based segmentation of manufacturing defects on reflective surfaces

New Trends in data pre-processing methods for signal and image classification

A new denoising method for fMRI based on weighted three-dimensional wavelet transform

New Trends in data pre-processing methods for signal and image classification

A novel numerical mapping method based on entropy for digitizing DNA sequences