Skip to main content
Top

2015 | OriginalPaper | Chapter

BWTCP: A Parallel Method for Constructing BWT in Large Collection of Genomic Reads

Authors : Heng Wang, Shaoliang Peng, Yutong Lu, Chengkun Wu, Jiajun Wen, Jie Liu, Xiaoqian Zhu

Published in: High Performance Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Short-read alignment and assembly are fundamental procedures for analyses of DNA sequencing data. Many state-of-the-art short-read aligners employ Burrows-Wheeler transform (BWT) as an in-memory index for the reference genome. BWT has also found its use in genome assembly, for indexing the reads. In a typical data set, the volume of reads can be as large as several hundred Gigabases. Consequently, fast construction of the BWT index for reads is essential for an efficient sequence processing. In this paper, we present a parallel method called BWTCP for BWT construction at a large scale. BWTCP is characterized by its ability to harness heterogeneous computing power including multi-core CPU, multiple CPUs, and accelerators like GPU or Intel Xeon Phi. BWTCP is also featured by its novel pruning strategy. Using BWTCP, we managed to construct the BWT for 1 billion 100bp reads within 30 m using 16 compute nodes (2 CPUs per node) on Tianhe-2 Supercomputer. It significantly outperforms the baseline tool BCR, which would need 13 h to finish all processing for the same dataset. BWTCP is freely available at https://​github.​com/​hwang91/​BWTCP.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Deshpande, V., Fung, E.D.K., Pham, S., Bafna, V.: Cerulean: a hybrid assembly using high throughput short and long reads. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 349–363. Springer, Heidelberg (2013) CrossRef Deshpande, V., Fung, E.D.K., Pham, S., Bafna, V.: Cerulean: a hybrid assembly using high throughput short and long reads. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 349–363. Springer, Heidelberg (2013) CrossRef
2.
go back to reference Deshpande, V.: Sequencing, assembling, and annotating a mid-sized genome. In: Plant and Animal Genome XXII Conference. Plant and Animal Genome (2014) Deshpande, V.: Sequencing, assembling, and annotating a mid-sized genome. In: Plant and Animal Genome XXII Conference. Plant and Animal Genome (2014)
3.
go back to reference Huang, L., Popic, V., Batzoglou, S.: Short read alignment with populations of genomes. Bioinformatics 29(13), i361–i370 (2013)CrossRef Huang, L., Popic, V., Batzoglou, S.: Short read alignment with populations of genomes. Bioinformatics 29(13), i361–i370 (2013)CrossRef
4.
go back to reference Blazewicz, J., Frohmberg, W., Gawron, P., et al.: DNA sequence assembly involving an acyclic graph model. Found. Comput. Decis. Sci. 38(1), 25–34 (2013) Blazewicz, J., Frohmberg, W., Gawron, P., et al.: DNA sequence assembly involving an acyclic graph model. Found. Comput. Decis. Sci. 38(1), 25–34 (2013)
5.
go back to reference Li, B., Fillmore, N., Bai, Y., et al.: Evaluation of de novo transcriptome assemblies from RNASeqdata. bioRxiv (2014) Li, B., Fillmore, N., Bai, Y., et al.: Evaluation of de novo transcriptome assemblies from RNASeqdata. bioRxiv (2014)
6.
go back to reference Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm (1994) Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm (1994)
7.
go back to reference Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRef Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRef
8.
go back to reference Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)CrossRef Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)CrossRef
9.
go back to reference Chan, S.H., Cheung, J., Wu, E., et al.: MICA: a fast short-read aligner that takes full advantage of intel many integrated core architecture (MIC) (2014). arXiv preprint arXiv:1402.4876 Chan, S.H., Cheung, J., Wu, E., et al.: MICA: a fast short-read aligner that takes full advantage of intel many integrated core architecture (MIC) (2014). arXiv preprint arXiv:​1402.​4876
10.
go back to reference Simpson, J.T., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22(3), 549–556 (2012)CrossRef Simpson, J.T., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22(3), 549–556 (2012)CrossRef
11.
go back to reference Bauer, M.J., Cox, A.J., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theoret. Comput. Sci. 483, 134–148 (2013)MATHMathSciNetCrossRef Bauer, M.J., Cox, A.J., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theoret. Comput. Sci. 483, 134–148 (2013)MATHMathSciNetCrossRef
12.
go back to reference Liu, C.M., Luo, R., Lam, T.W.: GPU-accelerated BWT construction for large collection of short reads (2014). arXiv preprint arXiv:1401.7457 Liu, C.M., Luo, R., Lam, T.W.: GPU-accelerated BWT construction for large collection of short reads (2014). arXiv preprint arXiv:​1401.​7457
13.
go back to reference Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p. 390–398. IEEE (2000) Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p. 390–398. IEEE (2000)
15.
go back to reference Mcllroy, P.M., Bostic, K., Mcllroy, M.D.: Engineering radix sort. Comput. Syst. 6(1), 5–27 (1993) Mcllroy, P.M., Bostic, K., Mcllroy, M.D.: Engineering radix sort. Comput. Syst. 6(1), 5–27 (1993)
16.
go back to reference Luo, R., Liu, B., Xie, Y., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1), 18 (2012)CrossRef Luo, R., Liu, B., Xie, Y., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1), 18 (2012)CrossRef
Metadata
Title
BWTCP: A Parallel Method for Constructing BWT in Large Collection of Genomic Reads
Authors
Heng Wang
Shaoliang Peng
Yutong Lu
Chengkun Wu
Jiajun Wen
Jie Liu
Xiaoqian Zhu
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-20119-1_13