Abstract
The rapid development of next generation sequencing (NGS) technology provides a novel avenue for genomic exploration and research. Hidden Markov models (HMMs) have wide applications in pattern recognition as well as Bioinformatics such as transcription factor binding sites and cis-regulatory modules detection. An application of HMM is introduced in this chapter with the in-deep developing of NGS. Single nucleotide variants (SNVs) inferred from NGS are expected to reveal gene mutations in cancer. However, NGS has lower sequence coverage and poor SNV detection capability in the regulatory regions of the genome. A specific HMM is developed for this purpose to infer the genotype for each position on the genome by incorporating the mapping quality of each read and the corresponding base quality on the reads into the emission probability of HMM. The procedure and the implementation of the algorithm is presented in detail for understanding and programming.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145
Chapman MA et al (2011) Initial genome sequencing and analysis of multiple myeloma. Nature 471:467–472
Beck D, Ayers S, Wen J et al (2011) Integrative analysis of next generation sequencing for small non-coding RNAs and transcriptional regulation in Myelodysplastic Syndromes. BMC Med Genomics 4:4–19
Wu J, Xie J (2008) Computation-based discovery of cis-regulatory modules by hidden markov model. J Comput Biol 15:279–290
Wang H, Zhou X (2013) Detection and characterization of regulatory elements using probabilistic conditional random field and hidden Markov model. Chin J Cancer 32:186–194
Liu C, Ma J, Chang CJ et al (2013) FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq. BMC Bioinformatics 14:193. doi:10.1186/1471-2105-14-193#_blank
Kandoth C, Kandoth MD, Vandin F et al (2013) Mutational landscape and significance across 12 major cancer types. Nature 502:333–339
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286
Bian J, Liu C, Wang H et al (2013) SNVHMM: predicting single nucleotide variants from next generation sequencing. BMC Bioinformatics 14:225
Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18:1851–1858
Li R, Li Y, Yang H et al (2009) SNP detection for massively parallel whole-genome resequencing. Genome Res 19:1124–1132
Koboldt DC, Chen K, Wylie T et al (2009) VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25:2283–2285
Shen Y, Wang Z, Coarfa C et al (2010) A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res 20:273–280
Martin ER, Kinnamon DD, Schmidt MA et al (2010) SeqEM: an adaptive genotype-calling approach for next generation sequencing studies. Bioinformatics 26:2803–2810
Goya R, Sun MG, Morin RD et al (2010) SNVMix: predicting single nucleotide variants from next generation sequencing of tumors. Bioinformatics 26:730–736
Wang W, Wei Z, lam TW et al (2011) Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. Sci Rep 1:1–7
The International SNP Map Working Group (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928–933
Bejar R, Stevenson K, Abdel-Wahab O et al (2011) Clinical effect of point mutations in Myelodysplastic Syndromes. N Engl J Med 364:2496–2506
Thol F, Kade S, Schlarmann C et al (2012) Frequency and prognostic impact of mutations in SRSF2, U2AF1, and ZRSR2 in patients with myelodysplastic syndromes. Blood 119:3578–3584
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Bian, J., Zhou, X. (2017). Hidden Markov Models in Bioinformatics: SNV Inference from Next Generation Sequence. In: Westhead, D., Vijayabaskar, M. (eds) Hidden Markov Models. Methods in Molecular Biology, vol 1552. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6753-7_9
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6753-7_9
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6751-3
Online ISBN: 978-1-4939-6753-7
eBook Packages: Springer Protocols