Skip to main content
Erschienen in: Cluster Computing 3/2017

22.04.2017

Two level parallelism and I/O reduction in genome comparisons

verfasst von: Oscar Torreno, Oswaldo Trelles

Erschienen in: Cluster Computing | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Genome comparison poses important computational challenges, especially in CPU-time, memory allocation and I/O operations. Although there already exist parallel approaches of multiple sequence comparisons algorithms, they face a significant limitation on the input sequence length. GECKO appeared as a computational and memory efficient method to overcome such limitation. However, its performance could be greatly increased by applying parallel strategies and I/O optimisations. We have applied two different strategies to accelerate GECKO while producing the same results. First, a two-level parallel approach parallelising each independent internal pairwise comparison in the first level, and the GECKO modules in the second level. A second approach consists on a complete rewrite of the original code to reduce I/O. Both strategies outperform the original code, which was already faster than equivalent software. Thus, much faster pairwise and multiple genome comparisons can be performed, what is really important with the ever-growing list of available genomes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Butenhof, D.R.: Programming with POSIX threads. Addison-Wesley Professional, Boston (1997) Butenhof, D.R.: Programming with POSIX threads. Addison-Wesley Professional, Boston (1997)
2.
Zurück zum Zitat Caffarena, G., Pedreira, C., Carreras, C., Bojanic, S., Nieto-Taladriz, O.: FPGA acceleration for DNA sequence alignment. J. Circuits Syst. Comput. 16(02), 245–266 (2007)CrossRef Caffarena, G., Pedreira, C., Carreras, C., Bojanic, S., Nieto-Taladriz, O.: FPGA acceleration for DNA sequence alignment. J. Circuits Syst. Comput. 16(02), 245–266 (2007)CrossRef
3.
Zurück zum Zitat Cui, Y., Liao, X., Zhu, X., Wang, B., Peng, S.: mbwa: a massively parallel sequence reads aligner. In: 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014), pp. 113–120. Springer (2014) Cui, Y., Liao, X., Zhu, X., Wang, B., Peng, S.: mbwa: a massively parallel sequence reads aligner. In: 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014), pp. 113–120. Springer (2014)
4.
Zurück zum Zitat Darling, A.E., Mau, B., Perna, N.T.: Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5(6), e11147 (2010)CrossRef Darling, A.E., Mau, B., Perna, N.T.: Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5(6), e11147 (2010)CrossRef
5.
Zurück zum Zitat Duvigneau, R., Kloczko, T., Praveen, C.: A three-level parallelization strategy for robust design in aerodynamics. In: Proceedings of the 20th International Conference on Parallel Computational Fluid Dynamics, pp. 379–384 (2008) Duvigneau, R., Kloczko, T., Praveen, C.: A three-level parallelization strategy for robust design in aerodynamics. In: Proceedings of the 20th International Conference on Parallel Computational Fluid Dynamics, pp. 379–384 (2008)
6.
Zurück zum Zitat Farrar, M.: Striped Smith–Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)CrossRef Farrar, M.: Striped Smith–Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23(2), 156–161 (2007)CrossRef
7.
Zurück zum Zitat Harris, R.: Improved pairwise alignment of genomic DNA. 2007. PhD diss., The Pennsylvania State University (2007) Harris, R.: Improved pairwise alignment of genomic DNA. 2007. PhD diss., The Pennsylvania State University (2007)
8.
Zurück zum Zitat Ino, F., Munekawa, Y., Hagihara, K.: Sequence homology search using fine grained cycle sharing of idle GPUs. IEEE Trans. Parallel Distrib. Syst. 23(4), 751–759 (2012)CrossRef Ino, F., Munekawa, Y., Hagihara, K.: Sequence homology search using fine grained cycle sharing of idle GPUs. IEEE Trans. Parallel Distrib. Syst. 23(4), 751–759 (2012)CrossRef
9.
Zurück zum Zitat Kiełbasa, S.M., Wan, R., Sato, K., Horton, P., Frith, M.C.: Adaptive seeds tame genomic sequence comparison. Genome Res. 21(3), 487–493 (2011)CrossRef Kiełbasa, S.M., Wan, R., Sato, K., Horton, P., Frith, M.C.: Adaptive seeds tame genomic sequence comparison. Genome Res. 21(3), 487–493 (2011)CrossRef
10.
Zurück zum Zitat Krishnajith, A.P., Kelly, W., Hayward, R., Tian, Y.C.: Managing memory and reducing i/o cost for correlation matrix calculation in bioinformatics. In: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 36–43. IEEE (2013) Krishnajith, A.P., Kelly, W., Hayward, R., Tian, Y.C.: Managing memory and reducing i/o cost for correlation matrix calculation in bioinformatics. In: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 36–43. IEEE (2013)
11.
Zurück zum Zitat Krumsiek, J., Arnold, R., Rattei, T.: Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23(8), 1026–1028 (2007)CrossRef Krumsiek, J., Arnold, R., Rattei, T.: Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23(8), 1026–1028 (2007)CrossRef
12.
Zurück zum Zitat Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)CrossRef Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)CrossRef
13.
Zurück zum Zitat Lin, H., Balaji, P., Poole, R., Sosa, C., Ma, X., Feng, W.C.: Massively parallel genomic sequence search on the blue gene/p architecture. In: 2008 SC-International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2008) Lin, H., Balaji, P., Poole, R., Sosa, C., Ma, X., Feng, W.C.: Massively parallel genomic sequence search on the blue gene/p architecture. In: 2008 SC-International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2008)
14.
Zurück zum Zitat Liu, Y., Schmidt, B.: SWAPHI: Smith–Waterman protein database search on Xeon Phi coprocessors. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, pp. 184–185. IEEE (2014) Liu, Y., Schmidt, B.: SWAPHI: Smith–Waterman protein database search on Xeon Phi coprocessors. In: 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors, pp. 184–185. IEEE (2014)
15.
Zurück zum Zitat Liu, Y., Schmidt, B.: GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)CrossRef Liu, Y., Schmidt, B.: GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)CrossRef
16.
Zurück zum Zitat Liu, Y., Tran, T.T., Lauenroth, F., Schmidt, B.: SWAPHI-LS: Smith–Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 257–265. IEEE (2014) Liu, Y., Tran, T.T., Lauenroth, F., Schmidt, B.: SWAPHI-LS: Smith–Waterman algorithm on Xeon Phi coprocessors for long DNA sequences. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 257–265. IEEE (2014)
17.
Zurück zum Zitat Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform. 14(1), 1 (2013)CrossRef Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith–Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform. 14(1), 1 (2013)CrossRef
18.
Zurück zum Zitat Maleki, S., Musuvathi, M., Mytkowicz, T.: Parallelizing dynamic programming through rank convergence. ACM SIGPLAN Not. 49(8), 219–232 (2014)CrossRef Maleki, S., Musuvathi, M., Mytkowicz, T.: Parallelizing dynamic programming through rank convergence. ACM SIGPLAN Not. 49(8), 219–232 (2014)CrossRef
19.
Zurück zum Zitat Meng, X., Chaudhary, V.: A high-performance heterogeneous computing platform for biological sequence analysis. IEEE Trans. Parallel Distrib. Syst. 21(9), 1267–1280 (2010)CrossRef Meng, X., Chaudhary, V.: A high-performance heterogeneous computing platform for biological sequence analysis. IEEE Trans. Parallel Distrib. Syst. 21(9), 1267–1280 (2010)CrossRef
20.
Zurück zum Zitat Momcilovic, S., Roma, N., Sousa, L.: Multi-level parallelization of advanced video coding on hybrid cpu+ gpu platforms. In: Euro-Par 2012: Parallel Processing Workshops, pp. 165–174. Springer (2012) Momcilovic, S., Roma, N., Sousa, L.: Multi-level parallelization of advanced video coding on hybrid cpu+ gpu platforms. In: Euro-Par 2012: Parallel Processing Workshops, pp. 165–174. Springer (2012)
22.
Zurück zum Zitat PD Krishnajith, A., Kelly, W., Tian, Y.C.: Optimizing i/o cost and managing memory for composition vector method based on correlation matrix calculation in bioinformatics. Curr. Bioinform. 9(3), 234–245 (2014)CrossRef PD Krishnajith, A., Kelly, W., Tian, Y.C.: Optimizing i/o cost and managing memory for composition vector method based on correlation matrix calculation in bioinformatics. Curr. Bioinform. 9(3), 234–245 (2014)CrossRef
23.
Zurück zum Zitat Rognes, T.: Faster Smith–Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12(1), 1 (2011)CrossRef Rognes, T.: Faster Smith–Waterman database searches with inter-sequence SIMD parallelisation. BMC Bioinform. 12(1), 1 (2011)CrossRef
24.
Zurück zum Zitat Rucci, E., De Giusti, A., Naiouf, M., Botella, G., García, C., Prieto-Matias, M.: Smith–Waterman algorithm on heterogeneous systems: A case study. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 323–330. IEEE (2014) Rucci, E., De Giusti, A., Naiouf, M., Botella, G., García, C., Prieto-Matias, M.: Smith–Waterman algorithm on heterogeneous systems: A case study. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 323–330. IEEE (2014)
25.
Zurück zum Zitat Sandes, E.F.D.O., Boukerche, A., Melo, A.C.M.A.D.: Parallel optimal pairwise biological sequence comparison: algorithms, platforms, and classification. ACM Comput. Surv. (CSUR) 48(4), 63 (2016)CrossRef Sandes, E.F.D.O., Boukerche, A., Melo, A.C.M.A.D.: Parallel optimal pairwise biological sequence comparison: algorithms, platforms, and classification. ACM Comput. Surv. (CSUR) 48(4), 63 (2016)CrossRef
26.
Zurück zum Zitat Sandes, E.F.D.O., de Melo, A.C.M.: Retrieving Smith–Waterman alignments with optimizations for megabase biological sequences using GPU. IEEE Trans. Parallel Distrib. Syst. 24(5), 1009–1021 (2013)CrossRef Sandes, E.F.D.O., de Melo, A.C.M.: Retrieving Smith–Waterman alignments with optimizations for megabase biological sequences using GPU. IEEE Trans. Parallel Distrib. Syst. 24(5), 1009–1021 (2013)CrossRef
27.
Zurück zum Zitat Sarkar, S., Kulkarni, G.R., Pande, P.P., Kalyanaraman, A.: Network-on-chip hardware accelerators for biological sequence alignment. IEEE Trans. Comput. 59(1), 29–41 (2010)MathSciNetCrossRefMATH Sarkar, S., Kulkarni, G.R., Pande, P.P., Kalyanaraman, A.: Network-on-chip hardware accelerators for biological sequence alignment. IEEE Trans. Comput. 59(1), 29–41 (2010)MathSciNetCrossRefMATH
32.
Zurück zum Zitat Torreno, O., Trelles, O.: Breaking the computational barriers of pairwise genome comparison. BMC Bioinform. 16(1), 1 (2015)CrossRef Torreno, O., Trelles, O.: Breaking the computational barriers of pairwise genome comparison. BMC Bioinform. 16(1), 1 (2015)CrossRef
33.
Zurück zum Zitat Torreno, O., Trelles, O.: Two-level parallelism to accelerate multiple genome comparisons. In: 2016 22nd International Conference on Parallel and Distributed Computing (Euro-Par). Springer (2016) Torreno, O., Trelles, O.: Two-level parallelism to accelerate multiple genome comparisons. In: 2016 22nd International Conference on Parallel and Distributed Computing (Euro-Par). Springer (2016)
34.
Zurück zum Zitat Wang, L., Chan, Y., Duan, X., Lan, H., Meng, X., Liu, W.: XSW: Accelerating biological database search on xeon phi. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 950–957. IEEE (2014) Wang, L., Chan, Y., Duan, X., Lan, H., Meng, X., Liu, W.: XSW: Accelerating biological database search on xeon phi. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 950–957. IEEE (2014)
35.
Zurück zum Zitat Wienbrandt, L.: Bioinformatics applications on the FPGA-based high-performance computer RIVYERA. In: High-Performance Computing Using FPGAs, pp. 81–103. Springer (2013) Wienbrandt, L.: Bioinformatics applications on the FPGA-based high-performance computer RIVYERA. In: High-Performance Computing Using FPGAs, pp. 81–103. Springer (2013)
36.
Zurück zum Zitat Zhou, Y., Xu, W., Donald, B.R., Zeng, J.: An efficient parallel algorithm for accelerating computational protein design. Bioinformatics 30(12), i255–i263 (2014)CrossRef Zhou, Y., Xu, W., Donald, B.R., Zeng, J.: An efficient parallel algorithm for accelerating computational protein design. Bioinformatics 30(12), i255–i263 (2014)CrossRef
Metadaten
Titel
Two level parallelism and I/O reduction in genome comparisons
verfasst von
Oscar Torreno
Oswaldo Trelles
Publikationsdatum
22.04.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 3/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0873-9

Weitere Artikel der Ausgabe 3/2017

Cluster Computing 3/2017 Zur Ausgabe