Skip to main content
Top

2018 | OriginalPaper | Chapter

CIGenotyper: A Machine Learning Approach for Genotyping Complex Indel Calls

Authors : Tian Zheng, Yang Li, Yu Geng, Zhongmeng Zhao, Xuanping Zhang, Xiao Xiao, Jiayin Wang

Published in: Bioinformatics and Biomedical Engineering

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Complex insertion and deletion (complex indel) is a rare category of genomic structural variations. A complex indel presents as one or multiple DNA fragments inserted into the genomic location where a deletion occurs. Several studies emphasize the importance of complex indels, and some state-of-the-art approaches are proposed to detect them from sequencing data. However, genotyping complex indel calls is another challenged computational problem because some commonly used features for genotyping indel calls from the sequencing data could be invalid due to the components of complex indels. Thus, in this article, we propose a machine learning approach, CIGenotyper to estimate genotypes of complex indel calls. CIGenotyper adopts a relevance vector machine (RVM) framework. For each candidate call, it first extracts a set of features from the candidate region, which usually includes the read depth, the variant allelic frequency for aligned contigs, the numbers of the splitting and discordant paired-end reads, etc. For a complex indel call, given its features to a trained RVM, the model outputs the genotype with highest likelihood. An algorithm is also proposed to train the RVM. We compare our approach to two popular approaches, Gindel and Pindel, on multiple groups of artificial datasets. The results of our model outperforms them on average success rates in most of the cases when vary the coverages of the given data, the read lengths and the distributions of the lengths of the pre-set complex indels.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference The Computational Pan-Genomics Consortium: Computational pan-genomics: status, promises and challenges. Briefings Bioinf. 19(1), 118–135 (2018) The Computational Pan-Genomics Consortium: Computational pan-genomics: status, promises and challenges. Briefings Bioinf. 19(1), 118–135 (2018)
2.
go back to reference Lu, C., Xie, M., Wendl, M., et al.: Patterns and functional implications of rare germline variants across 12 cancer types. Nat. Commun. 6, 10086 (2015)CrossRef Lu, C., Xie, M., Wendl, M., et al.: Patterns and functional implications of rare germline variants across 12 cancer types. Nat. Commun. 6, 10086 (2015)CrossRef
3.
go back to reference DePristo, M., Banks, E., Polon, R., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)CrossRef DePristo, M., Banks, E., Polon, R., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43(5), 491–498 (2011)CrossRef
4.
go back to reference Ye, K., Wang, J., Jayasinghe, R., et al.: Systematic discovery of complex insertions and deletions in human cancers. Nat. Med. 22(1), 97–104 (2016)CrossRef Ye, K., Wang, J., Jayasinghe, R., et al.: Systematic discovery of complex insertions and deletions in human cancers. Nat. Med. 22(1), 97–104 (2016)CrossRef
5.
go back to reference Iakovishina, D., Janoueix-Lerosey, I., Barillot, E., et al.: SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability. Bioinformatics 32(7), 984–992 (2016)CrossRef Iakovishina, D., Janoueix-Lerosey, I., Barillot, E., et al.: SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability. Bioinformatics 32(7), 984–992 (2016)CrossRef
6.
go back to reference Kloosterman, W., Francioli, L., Hormozdiari, F., et al.: Characteristics of de novo structural changes in the human genome. Genome Res. 25(6), 792–801 (2015)CrossRef Kloosterman, W., Francioli, L., Hormozdiari, F., et al.: Characteristics of de novo structural changes in the human genome. Genome Res. 25(6), 792–801 (2015)CrossRef
7.
go back to reference Zhang, X., Chen, H., Zhang, R., et al.: Detecting complex indels with wide length-spectrum from the third generation sequencing data. BIBM 2017, 1980–1987 (2017) Zhang, X., Chen, H., Zhang, R., et al.: Detecting complex indels with wide length-spectrum from the third generation sequencing data. BIBM 2017, 1980–1987 (2017)
8.
go back to reference Geng, Y., Zhao, Z., Xu, J., et al.: Identifying heterogeneity patterns of allelic imbalance on germline variants to infer clonal architecture. In: Huang, D., Jo, K., Figueroa-García, J. (eds.) ICIC 2017. LNCS, vol. 10362, pp. 286–297. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63312-1_26 Geng, Y., Zhao, Z., Xu, J., et al.: Identifying heterogeneity patterns of allelic imbalance on germline variants to infer clonal architecture. In: Huang, D., Jo, K., Figueroa-García, J. (eds.) ICIC 2017. LNCS, vol. 10362, pp. 286–297. Springer, Cham (2017). https://​doi.​org/​10.​1007/​978-3-319-63312-1_​26
9.
go back to reference Geng, Y., Zhao, Z., Zhang, X., et al.: An improved burden-test pipeline for identifying associations from rare germline and somatic variants. BMC Genom. 18(7:55), 55–62 (2017) Geng, Y., Zhao, Z., Zhang, X., et al.: An improved burden-test pipeline for identifying associations from rare germline and somatic variants. BMC Genom. 18(7:55), 55–62 (2017)
10.
go back to reference Zhang, J., Wang, J., Wu, Y.: An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data. BMC Bioinf. 13(6), S6 (2012) Zhang, J., Wang, J., Wu, Y.: An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data. BMC Bioinf. 13(6), S6 (2012)
11.
go back to reference Bansal, V., Libiger, O.: A probabilistic method for the detection and genotyping of small indels from population-scale sequence data. Bioinformatics 27(15), 2047–2053 (2011)CrossRef Bansal, V., Libiger, O.: A probabilistic method for the detection and genotyping of small indels from population-scale sequence data. Bioinformatics 27(15), 2047–2053 (2011)CrossRef
12.
go back to reference Marschall, T., Hajirasouliha, I., Schonhuth, A.: MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels. Bioinformatics 29(24), 3143–3150 (2013)CrossRef Marschall, T., Hajirasouliha, I., Schonhuth, A.: MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels. Bioinformatics 29(24), 3143–3150 (2013)CrossRef
13.
go back to reference Chu, C., Zhang, J., Wu, Y.: GINDEL: accurate genotype calling of insertions and deletions from low coverage population sequence reads. PLoS One 9(11), e113324 (2014)CrossRef Chu, C., Zhang, J., Wu, Y.: GINDEL: accurate genotype calling of insertions and deletions from low coverage population sequence reads. PLoS One 9(11), e113324 (2014)CrossRef
14.
go back to reference Camps-Valls, G., Martínez-Ramón, M., Rojo-Alvarez, J., et al.: Nonlinear system identification with composite relevance vector machines. IEEE Sig. Process. Lett. 14(4), 279–282 (2007)CrossRef Camps-Valls, G., Martínez-Ramón, M., Rojo-Alvarez, J., et al.: Nonlinear system identification with composite relevance vector machines. IEEE Sig. Process. Lett. 14(4), 279–282 (2007)CrossRef
15.
go back to reference Zhang, X., Xu, M., Wang, Y., et al.: A graph-based algorithm for prioritizing cancer susceptibility genes from gene fusion data. BIBM 2017, 2204–2210 (2017) Zhang, X., Xu, M., Wang, Y., et al.: A graph-based algorithm for prioritizing cancer susceptibility genes from gene fusion data. BIBM 2017, 2204–2210 (2017)
Metadata
Title
CIGenotyper: A Machine Learning Approach for Genotyping Complex Indel Calls
Authors
Tian Zheng
Yang Li
Yu Geng
Zhongmeng Zhao
Xuanping Zhang
Xiao Xiao
Jiayin Wang
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-78723-7_41

Premium Partner