Skip to main content
Erschienen in: The Journal of Supercomputing 1/2015

01.01.2015

A survey of genome sequence assembly techniques and algorithms using high-performance computing

verfasst von: Munib Ahmed, Ishfaq Ahmad, Mohammad Saad Ahmad

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Genome assembly has been an area of active research since the DNA structure was discovered and has gathered more steam after the Human Genome project was launched. A large number of genomes have been assembled and many more are in the pipeline. A number of full-scale assemblers and other special-purpose modules have been reported. Since the volume of data involved in the genome assembly process is extraordinarily large and requires significantly large computational power and processing time, many assemblers have utilized parallel computing to achieve faster and more efficient reconstruction of the DNA. A genome assembler is a multi-step process including different components that may be partly or fully parallelized. Although several assemblers and individual modules that perform various tasks, such as pairwise alignment, multiple sequence alignment, and repeat finding, have been analyzed and documented before, this paper provides a holistic view of the assembly process in the realm of parallel and distributed computing, streamlining all the individual tasks related, but not limited to, the whole genome shotgun sequencing into a sequence of loosely coupled stages where one stage consumes the output of the preceding stage and passes its results to the next one. Many of these tasks are essential to the current and next-generation sequence assemblers. The paper walks through the entire streamlined process while describing, analyzing, and commenting on the algorithms and techniques that have been designed and implemented for each of the stages. Where applicable, the paper suggests improvements that may form the basis of potentially new research work.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ahmed M, Ahmad I, Khan S (2011) A theoretical analysis of scalability of the parallel genome assembly algorithms. In: Second international conference on bioinformatics models, methods and algorithms. pp 234–237 Ahmed M, Ahmad I, Khan S (2011) A theoretical analysis of scalability of the parallel genome assembly algorithms. In: Second international conference on bioinformatics models, methods and algorithms. pp 234–237
3.
Zurück zum Zitat Ahmed M, Ahmad M, Ahmad I (2008) A multi-pronged parallel approach to enhance speed and accuracy of sequence assembly. In: Biotechnology and bioinformatics symposium Ahmed M, Ahmad M, Ahmad I (2008) A multi-pronged parallel approach to enhance speed and accuracy of sequence assembly. In: Biotechnology and bioinformatics symposium
4.
Zurück zum Zitat Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402CrossRef Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402CrossRef
5.
Zurück zum Zitat Aluru S, Futamura N, Mehrotra K (2003) Parallel biological sequence comparison using prefix computations. J Parallel Distrib Comput 63(3):264–272CrossRefMATH Aluru S, Futamura N, Mehrotra K (2003) Parallel biological sequence comparison using prefix computations. J Parallel Distrib Comput 63(3):264–272CrossRefMATH
6.
Zurück zum Zitat Bao Z, Eddy S (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 8:1269–1276CrossRef Bao Z, Eddy S (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 8:1269–1276CrossRef
7.
Zurück zum Zitat Batzoglou S, Jaffe D, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov J, Lander E (2002) Arachne: a whole-genome shotgun assembler. Genome Res 12(1):177–189CrossRef Batzoglou S, Jaffe D, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov J, Lander E (2002) Arachne: a whole-genome shotgun assembler. Genome Res 12(1):177–189CrossRef
8.
Zurück zum Zitat Berger M, Munson P (1991) A novel randomized iterative strategy for aligning multiple protein sequences. CABIOS 7:479–484 Berger M, Munson P (1991) A novel randomized iterative strategy for aligning multiple protein sequences. CABIOS 7:479–484
9.
Zurück zum Zitat Blackshields G, Wallace I, Larkin M, Higgins D (2006) Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biol 6:321–339 Blackshields G, Wallace I, Larkin M, Higgins D (2006) Analysis and comparison of benchmarks for multiple sequence alignment. In Silico Biol 6:321–339
10.
Zurück zum Zitat Blazewicz J, Figlerowicz M, Jackowiak P, Janny D, Jarczynski D, Kasprzak M, Nalewaj M, Nowierski B, Styszynski R, Szajkowski L, Widera P (2004) Parallel DNA sequence assembly. In: Proceedings of the fifth Mexican international conference in computer science (ENC ’04). IEEE Computer Society, New York, pp 378–382 Blazewicz J, Figlerowicz M, Jackowiak P, Janny D, Jarczynski D, Kasprzak M, Nalewaj M, Nowierski B, Styszynski R, Szajkowski L, Widera P (2004) Parallel DNA sequence assembly. In: Proceedings of the fifth Mexican international conference in computer science (ENC ’04). IEEE Computer Society, New York, pp 378–382
11.
Zurück zum Zitat Brudno M, Batzoglou S (2004) ProbCons: Probabilistic consistency based multiple alignment of amino acid sequences. In: Proceedings of nineteenth national conference on artificial intelligence. pp 703–708 Brudno M, Batzoglou S (2004) ProbCons: Probabilistic consistency based multiple alignment of amino acid sequences. In: Proceedings of nineteenth national conference on artificial intelligence. pp 703–708
12.
Zurück zum Zitat Chao K, Pearson W, Miller W (1992) Aligning two sequences within a specified diagonal band. Comput Appl Biosci 8:481–487 Chao K, Pearson W, Miller W (1992) Aligning two sequences within a specified diagonal band. Comput Appl Biosci 8:481–487
13.
Zurück zum Zitat Cheetham J, Dehne F, Pitre S, Rau-Chaplin A, Taillon P (2003) Parallel CLUSTAL W for PC clusters. In: International conference on computational science and its applications. Lecture notes in computer science, vol 2668. pp 300–309 Cheetham J, Dehne F, Pitre S, Rau-Chaplin A, Taillon P (2003) Parallel CLUSTAL W for PC clusters. In: International conference on computational science and its applications. Lecture notes in computer science, vol 2668. pp 300–309
14.
Zurück zum Zitat Darling A, Carey L, Feng W (2003) The design, implementation, and evaluation of mpiBLAST. In: Fourth international conference on Linux clusters: the HPC revolution 2003 in conjunction with The ClusterWorld Conference & Expo Darling A, Carey L, Feng W (2003) The design, implementation, and evaluation of mpiBLAST. In: Fourth international conference on Linux clusters: the HPC revolution 2003 in conjunction with The ClusterWorld Conference & Expo
15.
Zurück zum Zitat Deng X, Li E, Shan J, Chen W (2006) Parallel implementation and performance characterization of MUSCLE. In: Parallel and distributed processing symposium Deng X, Li E, Shan J, Chen W (2006) Parallel implementation and performance characterization of MUSCLE. In: Parallel and distributed processing symposium
16.
Zurück zum Zitat Dovichi N, Zhang J (2000) How capillary electrophoresis sequenced the human genome. Angew Chemie Int Edition 39:4463–4468CrossRef Dovichi N, Zhang J (2000) How capillary electrophoresis sequenced the human genome. Angew Chemie Int Edition 39:4463–4468CrossRef
17.
Zurück zum Zitat Du Z, Lin F (2006) pNJTree: a parallel program for reconstruction of neighbor-joining tree and its application in ClustalW. J Parallel Comput 32:5–6CrossRef Du Z, Lin F (2006) pNJTree: a parallel program for reconstruction of neighbor-joining tree and its application in ClustalW. J Parallel Comput 32:5–6CrossRef
18.
Zurück zum Zitat Ebedes J, Datta A (2004) Multiple sequence alignment in parallel on a workstation cluster. Bioinformatics 20(7):1193–1195 Ebedes J, Datta A (2004) Multiple sequence alignment in parallel on a workstation cluster. Bioinformatics 20(7):1193–1195
19.
Zurück zum Zitat Edgar R (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797CrossRef Edgar R (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797CrossRef
20.
Zurück zum Zitat Edgar R, Myers E (2005) PILER: identification and classification of genomic repeats. Bioinformatics 1 21(Supplement 1):i152–i158 Edgar R, Myers E (2005) PILER: identification and classification of genomic repeats. Bioinformatics 1 21(Supplement 1):i152–i158
21.
Zurück zum Zitat Essoussi N, Boujenfa K, Limam M (2008) A comparison of MSA tools. Bioinformatics 2:452–455 Essoussi N, Boujenfa K, Limam M (2008) A comparison of MSA tools. Bioinformatics 2:452–455
22.
Zurück zum Zitat Ewing B, Hillier L, Wendl M, Green P (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8(3):175–185CrossRef Ewing B, Hillier L, Wendl M, Green P (1998) Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8(3):175–185CrossRef
23.
Zurück zum Zitat Felsenfeld A, Peterson J, Schloss J, Guyer M (1999) Assessing the quality of the DNA sequence from the human genome project. Genome Res 9:1–4 Felsenfeld A, Peterson J, Schloss J, Guyer M (1999) Assessing the quality of the DNA sequence from the human genome project. Genome Res 9:1–4
24.
Zurück zum Zitat Grama A, Gupta A, Kumar V (1993) Isoefficiency: measuring the scalability of parallel algorithms and architectures. IEEE Parallel Distrib Technol 1(3):12–21CrossRef Grama A, Gupta A, Kumar V (1993) Isoefficiency: measuring the scalability of parallel algorithms and architectures. IEEE Parallel Distrib Technol 1(3):12–21CrossRef
26.
Zurück zum Zitat Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202CrossRef Gordon D, Abajian C, Green P (1998) Consed: a graphical tool for sequence finishing. Genome Res 8:195–202CrossRef
27.
Zurück zum Zitat Gusfield D (1997) Algorithms on strings, trees and sequences. Cambridge University Press, Cambridge, pp 9–10 Gusfield D (1997) Algorithms on strings, trees and sequences. Cambridge University Press, Cambridge, pp 9–10
28.
Zurück zum Zitat Higgins D, Sharp P (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1):237–44CrossRef Higgins D, Sharp P (1988) CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene 73(1):237–44CrossRef
29.
Zurück zum Zitat Higgins D (1994) CLUSTAL V: multiple alignment of dna and protein sequences. Methods Mol Biol 25:307–318 Higgins D (1994) CLUSTAL V: multiple alignment of dna and protein sequences. Methods Mol Biol 25:307–318
30.
Zurück zum Zitat Hirosawa M, Totoki Y, Hoshida M, Ishikawa M (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11:13–18 Hirosawa M, Totoki Y, Hoshida M, Ishikawa M (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput Appl Biosci 11:13–18
31.
Zurück zum Zitat Huang X, Wang J, Aluru S, Yang S, Hillier L (2003) PCAP: a whole-genome assembly program. Genome Res 13:2164–2170CrossRef Huang X, Wang J, Aluru S, Yang S, Hillier L (2003) PCAP: a whole-genome assembly program. Genome Res 13:2164–2170CrossRef
32.
Zurück zum Zitat Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9(9):868–877CrossRef Huang X, Madan A (1999) CAP3: a DNA sequence assembly program. Genome Res 9(9):868–877CrossRef
33.
Zurück zum Zitat Isokawa M, Wayama M, Shimizu T (1996) Multiple sequence alignment using a genetic algorithm. Genome Inform 7:176–177 Isokawa M, Wayama M, Shimizu T (1996) Multiple sequence alignment using a genetic algorithm. Genome Inform 7:176–177
34.
Zurück zum Zitat Jeanmougin F, Thompson J, Gouy M, Higgins D, Gibson T (1998) Multiple sequence alignment with clustal X. Trends Biochem Sci 23:403–405CrossRef Jeanmougin F, Thompson J, Gouy M, Higgins D, Gibson T (1998) Multiple sequence alignment with clustal X. Trends Biochem Sci 23:403–405CrossRef
35.
Zurück zum Zitat Johnson D, Metaxas P (1997) Connected components in O(log3/2n) parallel time for the CREW PRAM. J Comput Syst Sci 54(2):227–242 Johnson D, Metaxas P (1997) Connected components in O(log3/2n) parallel time for the CREW PRAM. J Comput Syst Sci 54(2):227–242
36.
Zurück zum Zitat Kalyanaraman A, Kothari S, Brendel V, Aluru S (2003) Efficient clustering of large EST data sets on parallel computers. Nucleic Acids Res 31(11):2963–2964CrossRef Kalyanaraman A, Kothari S, Brendel V, Aluru S (2003) Efficient clustering of large EST data sets on parallel computers. Nucleic Acids Res 31(11):2963–2964CrossRef
37.
Zurück zum Zitat Kalyanaraman A, Aluru S, Brendel V, Kothari S (2003) Space and time efficient parallel algorithms and software for EST clustering. IEEE Trans Parallel Distrib Syst 14:1209–1221CrossRef Kalyanaraman A, Aluru S, Brendel V, Kothari S (2003) Space and time efficient parallel algorithms and software for EST clustering. IEEE Trans Parallel Distrib Syst 14:1209–1221CrossRef
39.
Zurück zum Zitat Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace A, Wilm R, Lopez R, Thompson J, Gibson T, Higgins D (2007) Clustal W and clustal X version 2.0. Bioinformatics 23:2947–2948CrossRef Larkin M, Blackshields G, Brown N, Chenna R, McGettigan P, McWilliam H, Valentin F, Wallace A, Wilm R, Lopez R, Thompson J, Gibson T, Higgins D (2007) Clustal W and clustal X version 2.0. Bioinformatics 23:2947–2948CrossRef
40.
Zurück zum Zitat Lee Z, Su S, Chuang C, Liu K (2008) Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 8:55–78CrossRef Lee Z, Su S, Chuang C, Liu K (2008) Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 8:55–78CrossRef
41.
Zurück zum Zitat Li K (2003) ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19(12) :1585–1586 Li K (2003) ClustalW-MPI: ClustalW analysis using distributed and parallel computing. Bioinformatics 19(12) :1585–1586
42.
Zurück zum Zitat Li R, Zhu H, Ruan J, Qian W, Li S, Yang H, Wang J (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20(2):265–272CrossRef Li R, Zhu H, Ruan J, Qian W, Li S, Yang H, Wang J (2010) De novo assembly of human genomes with massively parallel short read sequencing. Genome Res 20(2):265–272CrossRef
43.
Zurück zum Zitat Lipman D, Altschul S, Kececioglu D (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci 86:4412–4415CrossRef Lipman D, Altschul S, Kececioglu D (1989) A tool for multiple sequence alignment. Proc Natl Acad Sci 86:4412–4415CrossRef
44.
Zurück zum Zitat Liu X, Pande P, Meyerhenke H, Bader D (2013) PASQUAL: parallel techniques for next generation genome sequence assembly. IEEE Trans Parallel Distrib Syst 24(5):977–986CrossRef Liu X, Pande P, Meyerhenke H, Bader D (2013) PASQUAL: parallel techniques for next generation genome sequence assembly. IEEE Trans Parallel Distrib Syst 24(5):977–986CrossRef
45.
Zurück zum Zitat Luo J, Ahmad I, Ahmed M (2005) Parallel multiple sequence alignment using dynamic scheduling. In: International conference on information technology: coding and computing, vol 1. pp 8–13 Luo J, Ahmad I, Ahmed M (2005) Parallel multiple sequence alignment using dynamic scheduling. In: International conference on information technology: coding and computing, vol 1. pp 8–13
46.
Zurück zum Zitat Mardis E (2008) Next-generation DNA sequencing methods. Ann Rev Genomics Hum Genet 9:387–402CrossRef Mardis E (2008) Next-generation DNA sequencing methods. Ann Rev Genomics Hum Genet 9:387–402CrossRef
47.
Zurück zum Zitat Martins W, Cuvillo J, Francisco B, Theobald J, Gao G (2001) A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison. In: Proceedings of the Pacific symposium on biocomputing. pp 311–332 Martins W, Cuvillo J, Francisco B, Theobald J, Gao G (2001) A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison. In: Proceedings of the Pacific symposium on biocomputing. pp 311–332
48.
Zurück zum Zitat Miller P, Nadkarni P, Carriero N (1991) Parallel computation and FASTA: confronting the problem of parallel database search for a fast sequence comparison algorithm. Comput Appl Biosci 7(1):71–78 Miller P, Nadkarni P, Carriero N (1991) Parallel computation and FASTA: confronting the problem of parallel database search for a fast sequence comparison algorithm. Comput Appl Biosci 7(1):71–78
49.
Zurück zum Zitat Mullikin J, Ning Z (2003) The Phusion assembler. Genome Res 1:81–90CrossRef Mullikin J, Ning Z (2003) The Phusion assembler. Genome Res 1:81–90CrossRef
50.
Zurück zum Zitat Myers E, Sutton G, Smith H, Adams M, Venter J (2002) On the sequencing and assembly of the human genome. Proc Natl Acad Sci 99(7):4145–4146CrossRef Myers E, Sutton G, Smith H, Adams M, Venter J (2002) On the sequencing and assembly of the human genome. Proc Natl Acad Sci 99(7):4145–4146CrossRef
51.
Zurück zum Zitat Needleman S, Wunsch C (1970) A general method applicable to the search for similarities in the amino acid sequence of two sequences. J Mol Biol 48:443–453CrossRef Needleman S, Wunsch C (1970) A general method applicable to the search for similarities in the amino acid sequence of two sequences. J Mol Biol 48:443–453CrossRef
52.
Zurück zum Zitat Notredame C, Higgins D, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217CrossRef Notredame C, Higgins D, Heringa J (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217CrossRef
53.
Zurück zum Zitat Ogden T, Rosenberg M (2006) Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 56(2):314–328CrossRef Ogden T, Rosenberg M (2006) Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 56(2):314–328CrossRef
54.
Zurück zum Zitat Pevzner P, Tang H, Waterman S (2001) An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA 98(17):9748–9753CrossRefMathSciNetMATH Pevzner P, Tang H, Waterman S (2001) An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci USA 98(17):9748–9753CrossRefMathSciNetMATH
55.
Zurück zum Zitat Pevzner P, Tang H, Tesler G (2004) De novo repeat classification and fragment assembly. Genome Res 14(9):1786–1796CrossRef Pevzner P, Tang H, Tesler G (2004) De novo repeat classification and fragment assembly. Genome Res 14(9):1786–1796CrossRef
56.
Zurück zum Zitat Porreca G (2010) Genome sequencing on nanoballs. Nat Biotechnol 28(1):43–44CrossRef Porreca G (2010) Genome sequencing on nanoballs. Nat Biotechnol 28(1):43–44CrossRef
57.
Zurück zum Zitat Prism ABIABI (1996) DNA sequencing analysis software. In: User’s manual, PE Applied Biosystems, Foster City Prism ABIABI (1996) DNA sequencing analysis software. In: User’s manual, PE Applied Biosystems, Foster City
58.
Zurück zum Zitat Ralston A (1982) De Bruijn sequences—a model example of the interaction of discrete mathematics and computer science. Math Magaz 55:131–143CrossRefMathSciNetMATH Ralston A (1982) De Bruijn sequences—a model example of the interaction of discrete mathematics and computer science. Math Magaz 55:131–143CrossRefMathSciNetMATH
59.
Zurück zum Zitat Ronaghi M, Uhlen M, Nyren P (1998) A sequencing method based on real-time pyrophosphate. Science 281(5375):363CrossRef Ronaghi M, Uhlen M, Nyren P (1998) A sequencing method based on real-time pyrophosphate. Science 281(5375):363CrossRef
60.
Zurück zum Zitat Rusk N (2011) Torrents of sequence. Nat Methods 8(1):44–44 Rusk N (2011) Torrents of sequence. Nat Methods 8(1):44–44
61.
Zurück zum Zitat Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol 4:406–425 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol 4:406–425
62.
Zurück zum Zitat Sanger F, Nicklen S, Coulson A (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci 74:5463–7CrossRef Sanger F, Nicklen S, Coulson A (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci 74:5463–7CrossRef
63.
Zurück zum Zitat Simpson J, Wong K, Jackman S, Schein J, Jones S, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123CrossRef Simpson J, Wong K, Jackman S, Schein J, Jones S, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19:1117–1123CrossRef
64.
Zurück zum Zitat Shi W, Zhou W (2005) A parallel Euler approach for large-scale biological sequence assembly. In: Proceedings of the third international conference on information technology and applications Shi W, Zhou W (2005) A parallel Euler approach for large-scale biological sequence assembly. In: Proceedings of the third international conference on information technology and applications
66.
Zurück zum Zitat Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRef Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRef
67.
Zurück zum Zitat Southern E (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98:503–517CrossRef Southern E (1975) Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol 98:503–517CrossRef
68.
Zurück zum Zitat Sutton G, White O, Adams M, Kerlavage A (1995) TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci Technol 1(1):9–19CrossRef Sutton G, White O, Adams M, Kerlavage A (1995) TIGR assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci Technol 1(1):9–19CrossRef
69.
Zurück zum Zitat Thompson J, Plewniak F, Poch O (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 27(13):12682–2690CrossRef Thompson J, Plewniak F, Poch O (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 27(13):12682–2690CrossRef
70.
Zurück zum Zitat Valouev A, Ichikawa J, Tonthat T, Stuart J, Ranade S, Peckham H, Zeng K, Malek J, Costa G, McKernan K, Sidow A, Fire A, Johnson S (2008) A high-resolution nucleosome position map of C. Elegans reveals a lack of universal sequence-dictated positioning. Genome Res 18(7):1051–1063CrossRef Valouev A, Ichikawa J, Tonthat T, Stuart J, Ranade S, Peckham H, Zeng K, Malek J, Costa G, McKernan K, Sidow A, Fire A, Johnson S (2008) A high-resolution nucleosome position map of C. Elegans reveals a lack of universal sequence-dictated positioning. Genome Res 18(7):1051–1063CrossRef
71.
Zurück zum Zitat Venter J, Adams M, Myers E (2001) The sequence of the human genome. Science 16(291):1304–1351CrossRef Venter J, Adams M, Myers E (2001) The sequence of the human genome. Science 16(291):1304–1351CrossRef
72.
Zurück zum Zitat Volfovsky N, Haas B, Salzberg S (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2(8) Volfovsky N, Haas B, Salzberg S (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2(8)
73.
Zurück zum Zitat Watson J, Crick F (1953) Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171:737–738CrossRef Watson J, Crick F (1953) Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171:737–738CrossRef
74.
Zurück zum Zitat Yap T, Munson P, Frieder O, Martino R (1995) Parallel multiple sequence alignment using speculative computation. In: Proceedings of the international conference on parallel processing Yap T, Munson P, Frieder O, Martino R (1995) Parallel multiple sequence alignment using speculative computation. In: Proceedings of the international conference on parallel processing
75.
Zurück zum Zitat Yap T, Frieder O, Martino R (1998) Parallel computation in biological sequence analysis. IEEE Trans Parallel Distrib Syst 9(3) :283–294 Yap T, Frieder O, Martino R (1998) Parallel computation in biological sequence analysis. IEEE Trans Parallel Distrib Syst 9(3) :283–294
76.
Zurück zum Zitat Zhang C, Wong A (1997) A genetic algorithm for multiple molecular sequence alignment. Comput Appl Biosci 13(6):565–581 Zhang C, Wong A (1997) A genetic algorithm for multiple molecular sequence alignment. Comput Appl Biosci 13(6):565–581
77.
Zurück zum Zitat Zhao F, Li T, Bryant D (2008) A new pheromone trail-based genetic algorithm for comparative genome assembly. Nucleic Acids Res 36(10):3455–3462CrossRef Zhao F, Li T, Bryant D (2008) A new pheromone trail-based genetic algorithm for comparative genome assembly. Nucleic Acids Res 36(10):3455–3462CrossRef
78.
Zurück zum Zitat Zola J, Yang X, Rospondek S, Aluru S (2007) Parallel T-Coffee: a parallel multiple sequence aligner. In: Proceedings of international society for computers and their applications, parallel and distributed computing systems. pp 248–253 Zola J, Yang X, Rospondek S, Aluru S (2007) Parallel T-Coffee: a parallel multiple sequence aligner. In: Proceedings of international society for computers and their applications, parallel and distributed computing systems. pp 248–253
Metadaten
Titel
A survey of genome sequence assembly techniques and algorithms using high-performance computing
verfasst von
Munib Ahmed
Ishfaq Ahmad
Mohammad Saad Ahmad
Publikationsdatum
01.01.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1297-4

Weitere Artikel der Ausgabe 1/2015

The Journal of Supercomputing 1/2015 Zur Ausgabe