Skip to main content

Hidden Markov Models in Bioinformatics: SNV Inference from Next Generation Sequence

  • Protocol
  • First Online:
Hidden Markov Models

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1552))

Abstract

The rapid development of next generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research. Hidden Markov models (HMMs) have wide applications in pattern recognition as well as Bioinformatics such as transcription factor binding sites and cis-regulatory modules detection. An application of HMM is introduced in this chapter with the in-deep developing of NGS. Single nucleotide variants (SNVs) inferred from NGS are expected to reveal gene mutations in cancer. However, NGS has lower sequence coverage and poor SNV detection capability in the regulatory regions of the genome. A specific HMM is developed for this purpose to infer the genotype for each position on the genome by incorporating the mapping quality of each read and the corresponding base quality on the reads into the emission probability of HMM. The procedure and the implementation of the algorithm is presented in detail for understanding and programming.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145

    Article  CAS  PubMed  Google Scholar 

  2. Chapman MA et al (2011) Initial genome sequencing and analysis of multiple myeloma. Nature 471:467–472

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Beck D, Ayers S, Wen J et al (2011) Integrative analysis of next generation sequencing for small non-coding RNAs and transcriptional regulation in Myelodysplastic Syndromes. BMC Med Genomics 4:4–19

    Article  Google Scholar 

  4. Wu J, Xie J (2008) Computation-based discovery of cis-regulatory modules by hidden markov model. J Comput Biol 15:279–290

    Article  CAS  PubMed  Google Scholar 

  5. Wang H, Zhou X (2013) Detection and characterization of regulatory elements using probabilistic conditional random field and hidden Markov model. Chin J Cancer 32:186–194

    Article  PubMed  PubMed Central  Google Scholar 

  6. Liu C, Ma J, Chang CJ et al (2013) FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq. BMC Bioinformatics 14:193. doi:10.1186/1471-2105-14-193#_blank

  7. Kandoth C, Kandoth MD, Vandin F et al (2013) Mutational landscape and significance across 12 major cancer types. Nature 502:333–339

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286

    Article  Google Scholar 

  9. Bian J, Liu C, Wang H et al (2013) SNVHMM: predicting single nucleotide variants from next generation sequencing. BMC Bioinformatics 14:225

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Li R, Li Y, Yang H et al (2009) SNP detection for massively parallel whole-genome resequencing. Genome Res 19:1124–1132

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Koboldt DC, Chen K, Wylie T et al (2009) VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25:2283–2285

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Shen Y, Wang Z, Coarfa C et al (2010) A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res 20:273–280

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Martin ER, Kinnamon DD, Schmidt MA et al (2010) SeqEM: an adaptive genotype-calling approach for next generation sequencing studies. Bioinformatics 26:2803–2810

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Goya R, Sun MG, Morin RD et al (2010) SNVMix: predicting single nucleotide variants from next generation sequencing of tumors. Bioinformatics 26:730–736

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wang W, Wei Z, lam TW et al (2011) Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Sci Rep 1:1–7

    Google Scholar 

  17. The International SNP Map Working Group (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928–933

    Article  Google Scholar 

  18. Bejar R, Stevenson K, Abdel-Wahab O et al (2011) Clinical effect of point mutations in Myelodysplastic Syndromes. N Engl J Med 364:2496–2506

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Thol F, Kade S, Schlarmann C et al (2012) Frequency and prognostic impact of mutations in SRSF2, U2AF1, and ZRSR2 in patients with myelodysplastic syndromes. Blood 119:3578–3584

    Article  CAS  PubMed  Google Scholar 

  20. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaobo Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Bian, J., Zhou, X. (2017). Hidden Markov Models in Bioinformatics: SNV Inference from Next Generation Sequence. In: Westhead, D., Vijayabaskar, M. (eds) Hidden Markov Models. Methods in Molecular Biology, vol 1552. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6753-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6753-7_9

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6751-3

  • Online ISBN: 978-1-4939-6753-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics