Skip to main content
Erschienen in: Cluster Computing 4/2020

06.01.2020

A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach

verfasst von: Luay Alawneh, Mohammed A. Shehab, Mahmoud Al-Ayyoub, Yaser Jararweh, Ziad A. Al-Sharif

Erschienen in: Cluster Computing | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Bioinformatics is an interdisciplinary field that applies trending techniques in information technology, mathematics, and statistics in studying large biological data. Bioinformatics involves several computational techniques such as sequence and structural alignment, data mining, macromolecular geometry, prediction of protein structure and gene finding. Protein structure and sequence analysis are vital to the understanding of cellular processes. Understanding cellular processes contributes to the development of drugs for metabolic pathways. Protein sequence alignment is concerned with identifying the similarities and the relationships among different protein structures. In this paper, we target two well-known protein sequence alignment algorithms, the Needleman–Wunsch and the Smith–Waterman algorithms. These two algorithms are computationally expensive which hinders their applicability for large data sets. Thus, we propose a hybrid parallel approach that combines the capabilities of multi-core CPUs and the power of contemporary GPUs, and significantly speeds up the execution of the target algorithms. The validity of our approach is tested on real protein sequences. Moreover, the scalability of the approach is verified on randomly generated sequences with predefined similarity levels. The results showed that the proposed hybrid approach was up to 242 times faster than the sequential approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Fortes, J., Matsunaga, A., Tsugawa, M.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: 2008 IEEE Fourth International Conference on eScience, pp. 222–229 (2008) Fortes, J., Matsunaga, A., Tsugawa, M.: Cloudblast: combining mapreduce and virtualization on distributed resources for bioinformatics applications. In: 2008 IEEE Fourth International Conference on eScience, pp. 222–229 (2008)
2.
Zurück zum Zitat Pinkel, D., Albertson, D.G.: Array comparative genomic hybridization and its applications in cancer. Nat. Genet. 37, S11–S17 (2005)CrossRef Pinkel, D., Albertson, D.G.: Array comparative genomic hybridization and its applications in cancer. Nat. Genet. 37, S11–S17 (2005)CrossRef
3.
Zurück zum Zitat Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)CrossRef Krasnogor, N., Pelta, D.A.: Measuring the similarity of protein structures by means of the universal similarity metric. Bioinformatics 20(7), 1015–1021 (2004)CrossRef
5.
Zurück zum Zitat Enright, A.J., Ouzounis, C.A.: Generage: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16(5), 451–457 (2000)CrossRef Enright, A.J., Ouzounis, C.A.: Generage: a robust algorithm for sequence clustering and domain detection. Bioinformatics 16(5), 451–457 (2000)CrossRef
6.
Zurück zum Zitat Rognes, T.: Faster Smith–Waterman database searches with inter-sequence simd parallelisation. BMC Bioinform. 12(1), 221 (2011)CrossRef Rognes, T.: Faster Smith–Waterman database searches with inter-sequence simd parallelisation. BMC Bioinform. 12(1), 221 (2011)CrossRef
7.
Zurück zum Zitat Al-Ayyoub, M., Qussai, Y., Shehab, M., Jararweh, Y., Albalas, F.: Accelerating clustering algorithms using GPUs. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC-2016), vol. 1 (2016) Al-Ayyoub, M., Qussai, Y., Shehab, M., Jararweh, Y., Albalas, F.: Accelerating clustering algorithms using GPUs. In: 2016 IEEE High Performance Extreme Computing Conference (HPEC-2016), vol. 1 (2016)
8.
Zurück zum Zitat Shehab, M., Al-Ayyoub, M., Jararweh, Y., Jarrah, M.: Accelerating compute-intensive image segmentation algorithms using GPUs. J. Supercomput. 73, 1929–1951 (2016)CrossRef Shehab, M., Al-Ayyoub, M., Jararweh, Y., Jarrah, M.: Accelerating compute-intensive image segmentation algorithms using GPUs. J. Supercomput. 73, 1929–1951 (2016)CrossRef
9.
Zurück zum Zitat Alandoli, M., Shehab, M., Al-Ayyoub, M., Jararweh, Y., Al-Smadi, M.: Using GPUs to speed-up fcm-based community detection in social networks. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT), pp. 1–6 (2016) Alandoli, M., Shehab, M., Al-Ayyoub, M., Jararweh, Y., Al-Smadi, M.: Using GPUs to speed-up fcm-based community detection in social networks. In: 2016 7th International Conference on Computer Science and Information Technology (CSIT), pp. 1–6 (2016)
10.
Zurück zum Zitat Hains, D., Cashero, Z., Ottenberg, M., Bohm, W., Rajopadhye, S., Improving cudasw++, a parallelization of Smith–Waterman for cuda enabled devices, in Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), : IEEE International Symposium on. IEEE 2011, 490–501 (2011) Hains, D., Cashero, Z., Ottenberg, M., Bohm, W., Rajopadhye, S., Improving cudasw++, a parallelization of Smith–Waterman for cuda enabled devices, in Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), : IEEE International Symposium on. IEEE 2011, 490–501 (2011)
11.
Zurück zum Zitat Khajeh-Saeed, A., Poole, S., Perot, J.B.: Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors. J. Comput. Phys. 229(11), 4247–4258 (2010)MathSciNetMATHCrossRef Khajeh-Saeed, A., Poole, S., Perot, J.B.: Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors. J. Comput. Phys. 229(11), 4247–4258 (2010)MathSciNetMATHCrossRef
12.
Zurück zum Zitat Liu, Y., Schmidt, B., Maskell, D.L.: Msa-cuda: multiple sequence alignment on graphics processing units with cuda. In: 2009 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 121–128 (2009) Liu, Y., Schmidt, B., Maskell, D.L.: Msa-cuda: multiple sequence alignment on graphics processing units with cuda. In: 2009 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 121–128 (2009)
13.
Zurück zum Zitat Shehab, M.A., Al-Ayyoub, M., Jararweh, Y.: Improving fcm and t2fcm algorithms performance using GPUs for medical images segmentation. In: 2015 6th International Conference on Information and Communication Systems (ICICS). IEEE, pp. 130–135 (2015) Shehab, M.A., Al-Ayyoub, M., Jararweh, Y.: Improving fcm and t2fcm algorithms performance using GPUs for medical images segmentation. In: 2015 6th International Conference on Information and Communication Systems (ICICS). IEEE, pp. 130–135 (2015)
14.
Zurück zum Zitat Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Newnes, Oxford (2012) Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Newnes, Oxford (2012)
15.
Zurück zum Zitat Eklund, A., Dufort, P., Forsberg, D., LaConte, S.M.: Medical image processing on the GPU-past, present and future. Med Image Anal 17(8), 1073–1094 (2013)CrossRef Eklund, A., Dufort, P., Forsberg, D., LaConte, S.M.: Medical image processing on the GPU-past, present and future. Med Image Anal 17(8), 1073–1094 (2013)CrossRef
16.
Zurück zum Zitat Shehab, M.A., Ghadawi, A.A., Alawneh, L., Al-Ayyoub, M., Jararweh, Y.: A hybrid CPU–GPU implementation to accelerate multiple pairwise protein sequence alignment. In: 2017 8th International Conference on Information and Communication Systems (ICICS), pp. 12–17 (2017) Shehab, M.A., Ghadawi, A.A., Alawneh, L., Al-Ayyoub, M., Jararweh, Y.: A hybrid CPU–GPU implementation to accelerate multiple pairwise protein sequence alignment. In: 2017 8th International Conference on Information and Communication Systems (ICICS), pp. 12–17 (2017)
17.
Zurück zum Zitat Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRef Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRef
18.
Zurück zum Zitat Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)CrossRef Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)CrossRef
19.
Zurück zum Zitat Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: Lagan and multi-lagan: efficient tools for large-scale multiple alignment of genomic dna. Genome Res. 13(4), 721–31 (2003)CrossRef Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: Lagan and multi-lagan: efficient tools for large-scale multiple alignment of genomic dna. Genome Res. 13(4), 721–31 (2003)CrossRef
20.
Zurück zum Zitat Wilton, R., Budavari, T., Langmead, B., Wheelan, S.J., Salzberg, S., Szalay, A.: Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space. bioRxiv (2014) Wilton, R., Budavari, T., Langmead, B., Wheelan, S.J., Salzberg, S., Szalay, A.: Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space. bioRxiv (2014)
21.
Zurück zum Zitat Hung, C.-L., Lin, Y.-S., Lin, C.-Y., Chung, Y.-C., Chung, Y.-F.: CUDA ClustalW: an efficient parallel algorithm for progressive multiple sequence alignment on multi-GPUs. Comput. Biol. Chem. 58, 62–68 (2015)CrossRef Hung, C.-L., Lin, Y.-S., Lin, C.-Y., Chung, Y.-C., Chung, Y.-F.: CUDA ClustalW: an efficient parallel algorithm for progressive multiple sequence alignment on multi-GPUs. Comput. Biol. Chem. 58, 62–68 (2015)CrossRef
22.
Zurück zum Zitat Frohmberg, W., Kierzynka, M., Blazewicz, J., Gawron, P., Wojciechowski, P.: G-dna-a highly efficient multi-GPU/mpi tool for aligning nucleotide reads. Bull. Pol. Acad. Sci. 61(4), 989–992 (2013) Frohmberg, W., Kierzynka, M., Blazewicz, J., Gawron, P., Wojciechowski, P.: G-dna-a highly efficient multi-GPU/mpi tool for aligning nucleotide reads. Bull. Pol. Acad. Sci. 61(4), 989–992 (2013)
23.
Zurück zum Zitat Orobitg, M., Cores, F., Guirado, F., Kemena, C., Notredame, C., Ripoll, A.: Enhancing the scalability of consistency-based progressive multiple sequences alignment applications. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 71–82 (2012) Orobitg, M., Cores, F., Guirado, F., Kemena, C., Notredame, C., Ripoll, A.: Enhancing the scalability of consistency-based progressive multiple sequences alignment applications. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 71–82 (2012)
24.
Zurück zum Zitat Lin, C.Y., Lin, Y.S.: Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units. Int. J. Comput. Sci. Eng. 9(1–2), 11–20 (2014) Lin, C.Y., Lin, Y.S.: Efficient parallel algorithm for multiple sequence alignments with regular expression constraints on graphics processing units. Int. J. Comput. Sci. Eng. 9(1–2), 11–20 (2014)
25.
Zurück zum Zitat Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)CrossRef Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)CrossRef
27.
Zurück zum Zitat Ye, W., Chen, Y., Zhang, Y., Xu, Y.: H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs. Bioinformatics 33(8), 1130–1138 (2017) Ye, W., Chen, Y., Zhang, Y., Xu, Y.: H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs. Bioinformatics 33(8), 1130–1138 (2017)
28.
Zurück zum Zitat Zhu, X., Li, K., Salah, A., Shi, L., Li, K.: Parallel implementation of MAFFT on cuda-enabled graphics hardware. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(1), 205–218 (2015)CrossRef Zhu, X., Li, K., Salah, A., Shi, L., Li, K.: Parallel implementation of MAFFT on cuda-enabled graphics hardware. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(1), 205–218 (2015)CrossRef
29.
Zurück zum Zitat Katoh, K., Toh, H.: Recent developments in the mafft multiple sequence alignment program. Brief. Bioinform. 92, 86–98 (2008) Katoh, K., Toh, H.: Recent developments in the mafft multiple sequence alignment program. Brief. Bioinform. 92, 86–98 (2008)
30.
Zurück zum Zitat Liu, W., Schmidt, B., Voss, G., Müller-Wittig, W., GPU-clustalw: using graphics hardware to accelerate multiple sequence alignment. In: Proceedings of the 13th International Conference on High Performance Computing, Ser. HiPC’06. Springer, pp. 363–374 (2006) Liu, W., Schmidt, B., Voss, G., Müller-Wittig, W., GPU-clustalw: using graphics hardware to accelerate multiple sequence alignment. In: Proceedings of the 13th International Conference on High Performance Computing, Ser. HiPC’06. Springer, pp. 363–374 (2006)
31.
Zurück zum Zitat Sandes, E.F., Miranda, G., Martorell, X., Ayguade, E., Teodoro, G., Melo, A.C.: Cudalign 4.0: incremental speculative traceback for exact chromosome-wide alignment in GPU clusters. IEEE Trans. Parallel Distrib. Syst. 27(10), 2838–2850 (2016)CrossRef Sandes, E.F., Miranda, G., Martorell, X., Ayguade, E., Teodoro, G., Melo, A.C.: Cudalign 4.0: incremental speculative traceback for exact chromosome-wide alignment in GPU clusters. IEEE Trans. Parallel Distrib. Syst. 27(10), 2838–2850 (2016)CrossRef
32.
Zurück zum Zitat de Oliveira Sandes, E.F., de Melo, A.C.M.A.: Cudalign: using GPU to accelerate the comparison of megabase genomic sequences. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2010, Bangalore, January 9–14, 2010, pp. 137–146 (2010) de Oliveira Sandes, E.F., de Melo, A.C.M.A.: Cudalign: using GPU to accelerate the comparison of megabase genomic sequences. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2010, Bangalore, January 9–14, 2010, pp. 137–146 (2010)
33.
Zurück zum Zitat Zou, H., Huihui, S., Yu, C., Fu, H., Li, Y., Tang, W.: Asw: accelerating Smith–Waterman algorithm on coupled CPU–GPU architecture. Int. J. Parallel Program. 47, 388–402 (2018)CrossRef Zou, H., Huihui, S., Yu, C., Fu, H., Li, Y., Tang, W.: Asw: accelerating Smith–Waterman algorithm on coupled CPU–GPU architecture. Int. J. Parallel Program. 47, 388–402 (2018)CrossRef
34.
Zurück zum Zitat Liu, Y., Schmidt, B.: Gswabe: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short dna sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)CrossRef Liu, Y., Schmidt, B.: Gswabe: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short dna sequences. Concurr. Comput. Pract. Exp. 27(4), 958–972 (2015)CrossRef
35.
Zurück zum Zitat Chaudhary, A., Kagathara, D., Patel, V.: A GPU based implementation of Needleman–Wunsch algorithm using skewing transformation. In: Eighth International Conference on Contemporary Computing, IC3 2015, Noida, India, August 20–22, 2015, pp. 498–502 (2015) Chaudhary, A., Kagathara, D., Patel, V.: A GPU based implementation of Needleman–Wunsch algorithm using skewing transformation. In: Eighth International Conference on Contemporary Computing, IC3 2015, Noida, India, August 20–22, 2015, pp. 498–502 (2015)
36.
Zurück zum Zitat Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)CrossRef Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)CrossRef
37.
Zurück zum Zitat Needleman, S.B., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef Needleman, S.B., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRef
38.
Zurück zum Zitat Nvidia: Nvidias next generation cuda compute architecture: Kepler gk110. Technical Reports (2012) Nvidia: Nvidias next generation cuda compute architecture: Kepler gk110. Technical Reports (2012)
39.
Zurück zum Zitat Jones, S.: Introduction to dynamic parallelism. In: GPU Technology Conference Presentation S, vol. 338 (2012) Jones, S.: Introduction to dynamic parallelism. In: GPU Technology Conference Presentation S, vol. 338 (2012)
40.
Zurück zum Zitat Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular biology of the cell (garland science, new york, 2002 (1997) Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., Walter, P.: Molecular biology of the cell (garland science, new york, 2002 (1997)
41.
Zurück zum Zitat Sinden, R.R.: DNA Structure and Function. Elsevier, Amsterdam (2012) Sinden, R.R.: DNA Structure and Function. Elsevier, Amsterdam (2012)
42.
Zurück zum Zitat Intel: Intel® hyper-threading technology on the intel® xeontm processor family for servers. White Paper, vol. 6, no. 1 (2002) Intel: Intel® hyper-threading technology on the intel® xeontm processor family for servers. White Paper, vol. 6, no. 1 (2002)
43.
Zurück zum Zitat Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., Su, E.: Intel® openmp c++/fortran compiler for hyper-threading technology: implementation and performance. Intel Technol. J. 6, 1 (2002) Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., Su, E.: Intel® openmp c++/fortran compiler for hyper-threading technology: implementation and performance. Intel Technol. J. 6, 1 (2002)
44.
Zurück zum Zitat NVIDIA: Nvidias next generation cuda compute architecture. White Paper, vol. 6, no. 1 (2017) NVIDIA: Nvidias next generation cuda compute architecture. White Paper, vol. 6, no. 1 (2017)
48.
Zurück zum Zitat Cheng, J., Grossman, M., McKercher, T.: Professional Cuda C Programming. Wiley, Hoboken (2014) Cheng, J., Grossman, M., McKercher, T.: Professional Cuda C Programming. Wiley, Hoboken (2014)
Metadaten
Titel
A scalable multiple pairwise protein sequence alignment acceleration using hybrid CPU–GPU approach
verfasst von
Luay Alawneh
Mohammed A. Shehab
Mahmoud Al-Ayyoub
Yaser Jararweh
Ziad A. Al-Sharif
Publikationsdatum
06.01.2020
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 4/2020
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-019-03035-8

Weitere Artikel der Ausgabe 4/2020

Cluster Computing 4/2020 Zur Ausgabe

Premium Partner