Skip to main content

Comparative Study of Machine Learning Techniques for Genome Scale Discrimination of Recombinant HIV-1 Strains

Buy Article:

$107.14 + tax (Refund Policy)

The whole genomes of HIV-1 strains were analyzed for discriminating genomes of circulated recombinant forms from other non-recombinant genomes using naïve bays, logistic regression, support vector machine, k-nearest neighbor and classification tree using codon frequencies as sequence attributes. The performance of all five techniques were compared on different indices like, classification accuracy, sensitivity, specificity, Matthews's correlation coefficient and brier score. Moreover the techniques were compared using receiver-operating curves and on calibration graphs for their calibration ability. All techniques were validated using tenfold cross validation and evaluated on training data sets, comprising 4215 genomes, including 3004 non-recombinant strains, and 1211 circulating recombinant strains. Highest classification accuracy of 94.47% were achieved using K-nearest neighbor on tenfold cross validation. Moreover, classification accuracy of 84.49%, 88.28%, 92.22%, 86.31% were achieved using Naïve Bayes, Logistic Regression, Support Vector Machine and Classification Trees respectively, on tenfold cross validation. Furthermore, on receiver operating curve k-Nearest Neighbor performed best by having area under the curve near to one (0.9754). Our results indicates that supervised machine learning techniques can effectively applied for the efficient discrimination of recombinant strains of HIV-1 from nonrecombinant strains at genome scale using frequency of codons.

Keywords: CLASSIFICATION; HIV-1; MACHINE LEARNING; NON RECOMBINANT; RECOMBINANT

Document Type: Research Article

Publication date: 01 April 2016

More about this publication?
  • Journal of Medical Imaging and Health Informatics (JMIHI) is a medium to disseminate novel experimental and theoretical research results in the field of biomedicine, biology, clinical, rehabilitation engineering, medical image processing, bio-computing, D2H2, and other health related areas.
  • Editorial Board
  • Information for Authors
  • Subscribe to this Title
  • Ingenta Connect is not responsible for the content or availability of external websites
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content