nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

SPRITE: A Fast Parallel SNP Detection Pipeline

verfasst von : Vasudevan Rengasamy, Kamesh Madduri

Erschienen in: High Performance Computing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present Sprite, a new high-performance data analysis pipeline for detecting single nucleotide polymorphisms (SNPs) in the human genome. A SNP detection pipeline for next-generation sequencing data uses several software tools, including tools for read alignment, processing alignment output, and SNP identification. We target end-to-end scalability and I/O efficiency in Sprite by merging tools in this pipeline and eliminating redundancies. For a benchmark human whole-genome sequencing data set, Sprite takes less than 50 min on 16 nodes of the TACC Stampede supercomputer. A key component of our optimized pipeline is parsnip, a new parallel method and software tool for SNP detection. We find that the quality of results obtained by parsnip (sensitivity and precision using high-confidence variant calls as ground truth) is comparable to state-of-the-art SNP-calling software. A prototype implementation of Sprite is available at sprite-psu.sourceforge.net.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel OpenAtom: Scalable Ab-Initio Molecular Dynamics with Diverse Capabilities

Nächstes Kapitel Predictive Modeling for Job Power Consumption in HPC Systems

Adjeroh, D., Bell, T.C., Mukherjee, A.: The Burrows-Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching. Springer, Heidelberg (2008)CrossRef

Burrows, M., Wheeler, D.J.: A block sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation, Palo Alto, CA (1994)

Challis, D., Yu, J., Evani, U.S., Jackson, A.R., Paithankar, S., Coarfa, C., Milosavljevic, A., Gibbs, R.A., Yu, F.: An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 13(1), 8 (2012)CrossRef

Chiang, C., Layer, R.M., Faust, G.G., Lindberg, M.R., Rose, D.B., Garrison, E.P., Marth, G.T., Quinlan, A.R., Hall, I.M.: SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat. Methods 12, 966–968 (2015)CrossRef

Depristo, M., Banks, E., Poplin, R., Garimella, K., Maguire, J., Hartl, C.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–8 (2011)CrossRef

Faust, G., Hall, I.: SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–5 (2014)CrossRef

Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings Symposium on Foundations of Computer Science, pp. 390–398 (2000)

Garrison, E., Marth, G.: Haplotype-based variant detection from short-read sequencing (2012). http://arxiv.org/abs/1207.3907

GATK best practices. https://www.broadinstitute.org/gatk/guide/best-practices.php. Accessed May 2016

10.

Abecasis Lab GLF tools. http://www.sph.umich.edu/csg/abecasis/glfTools. Accessed May 2016

11.

Kathiresan, N., Temanni, M.R., Al-Ali, R.: Performance improvement of BWA MEM algorithm using data-parallel with concurrent parallelization. In: Proceedings of the International Conference on Parallel, Distributed and Grid Computing (PDGC) (2014)

12.

Kelly, B., Fitch, J., Hu, Y., Corsmeier, D., Zhong, H., Wetzel, A., Nordquist, R., Newsom, D., White, P.: Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics. Genome Biol. 16(1), 6 (2015)CrossRef

13.

Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)CrossRef

14.

Li, H.: A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27(21), 2987–2993 (2011)CrossRef

15.

Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (2013). http://arxiv.org/abs/1303.3997v2

16.

Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRef

17.

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.: 1000 Genome Project Data Processing Subgroup: The aequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)

18.

Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11), 1851–1858 (2008)CrossRef

19.

Li, R., Yu, C., Li, Y., Lam, T.-W., Yiu, S.-M., Kristiansen, K., Wang, J.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)CrossRef

20.

Liu, C., Wong, T., Wu, E., Luo, R., Yiu, S., Li, Y., Wang, B., Yu, C., Chu, X., Zhao, K., Li, R., Lam, T.: SOAP3: ultra-fast GPU-based parallel alignment tool for short reads. Bioinformatics 28(6), 878–879 (2012)CrossRef

21.

Liu, X., Han, S., Wang, Z., Gelernter, J., Yang, B.-Z.: Variant callers for next-generation sequencing data: a comparison study. PLoS ONE 8(9), e75619 (2013)CrossRef

22.

Luo, R., Wong, Y.-L., Law, W.-C., Lee, L.-K., Cheung, J., Liu, C.-M., Lam, T.-W.: BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU. PeerJ 2, e421 (2014)CrossRef

23.

Nielsen, R., Paul, J., Albrechtsen, A., Song, Y.: Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011)CrossRef

24.

Peters, D., Luo, X., Qiu, K., Liang, P.: Speeding up large-scale next generation sequencing data analysis with pBWA. J. Appl. Bioinform. Comput. Biol. 1(1), 1–6 (2012)

25.

Picard tools. http://broadinstitute.github.io/picard. Accessed Dec 2015

26.

pMap: Parallel sequence mapping tool. http://bmi.osu.edu/hpc/software/pmap/pmap.html. Accessed May 2016

27.

Raczy, C., Petrovski, R., Saunders, C.T., Chorny, I., Kruglyak, S., Margulies, E.H., Chuang, H.-Y., Kllberg, M., Kumar, S.A., Liao, A., Little, K.M., Strömberg, M.P., Tanner, S.W.: Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29(16), 2041–2043 (2013)CrossRef

28.

Rengasamy, V., Madduri, K.: Engineering a high-performance SNP detection pipeline. Technical report, The Pennsylvania State University (2015)

29.

Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: Shrimp: accurate mapping of short color-space reads. PLoS Comput. Biol. 5(5), e1000386 (2009)CrossRef

30.

Sambamba: process your BAM data faster! http://lomereiter.github.io/sambamba/. Accessed May 2016

31.

Single Nucleotide Polymorphism - SNPedia. http://www.snpedia.com/index.php/Single_Nucleotide_Polymorphism. Accessed May 2016

32.

Talwalkar, A., Liptrap, J., Newcomb, J., Hartl, C., Terhorst, J., Curtis, K., Bresler, M., Song, Y.S., Jordan, M.I., Patterson, D.: SMaSH: a benchmarking toolkit for human genome variant calling. Bioinformatics 30(19), 2787–2795 (2014)CrossRef

33.

Zook, J., Chapman, B., Wang, J., Mittelman, D., Hofmann, O., Hide, W.: Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014)CrossRef

Titel: SPRITE: A Fast Parallel SNP Detection Pipeline
verfasst von: Vasudevan Rengasamy
Kamesh Madduri
Verlag: Springer International Publishing
Buch: High Performance Computing
Print ISBN: 978-3-319-41320-4

Electronic ISBN: 978-3-319-41321-1

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-41321-1_9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.