Skip to main content
Top
Published in: The Journal of Supercomputing 6/2017

17-11-2016

A hybrid MPI/OpenMP parallel implementation of NSGA-II for finding patterns in protein sequences

Authors: David L. González-Álvarez, Miguel A. Vega-Rodríguez, Álvaro Rubio-Largo

Published in: The Journal of Supercomputing | Issue 6/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Since the late 1970s, when the first DNA-based genome was sequenced, the field of biology is experiencing a significant growth in the amount of data that needs to be processed. Long ago it became impractical to analyze all this information manually, resulting in a great need for new techniques, algorithms and strategies to facilitate this work. Within the vast world of bioinformatics, we will focus on proteomics, more specifically, on the discovery of small repeated common patterns on sets of protein sequences that may represent some biological functionality. When we analyze a large number of sequences, the problem shows non-deterministic polynomial times, it implies that we could benefit from the combination of high-performance computing and computational intelligence techniques. In this paper, we address the discovery of repeated common patterns as a multiobjective optimization problem by means of a hybrid MPI/OpenMP approach which parallelizes a well-known multiobjective metaheuristic, the fast non-dominated sorting genetic algorithm (NSGA-II). Our main objective is to combine the benefits of shared-memory and distributed-memory programming paradigms to discover patterns in an accurate and efficient manner. Experiments conducted on six different datasets, comparisons with other well-known biological tools, and the obtained speed-ups and efficiencies show that our approach is able to achieve a significant performance in terms of parallel and biological results.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Adhianto L, Chapman B (2007) Performance modeling of communication and computation in hybrid MPI and OpenMP applications. Simul Model Pract Theory 15(4):481–491CrossRef Adhianto L, Chapman B (2007) Performance modeling of communication and computation in hybrid MPI and OpenMP applications. Simul Model Pract Theory 15(4):481–491CrossRef
2.
go back to reference Anderson NL, Anderson NG (1998) Proteome and proteomics: new technologies, new concepts, and new words. Electrophoresis 19(11):1853–1861CrossRef Anderson NL, Anderson NG (1998) Proteome and proteomics: new technologies, new concepts, and new words. Electrophoresis 19(11):1853–1861CrossRef
3.
go back to reference Bailey TL, Bodn M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) MEME SUITE: tools for motif discovery and searching. Nucl Acids Res 37(2):W202–W208CrossRef Bailey TL, Bodn M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS (2009) MEME SUITE: tools for motif discovery and searching. Nucl Acids Res 37(2):W202–W208CrossRef
4.
go back to reference Bork P, Koonin EV (1996) Protein sequence motifs. Curr Opin Struct Biol 6(3):366–376CrossRef Bork P, Koonin EV (1996) Protein sequence motifs. Curr Opin Struct Biol 6(3):366–376CrossRef
5.
go back to reference Chan TK, Leung KS, Lee KH (2008) TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24(3):341–349CrossRef Chan TK, Leung KS, Lee KH (2008) TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics 24(3):341–349CrossRef
6.
go back to reference Chan TK, Li G, Leung KS, Lee KH (2009) Discovering multiple realistic TFBS motifs based on a generalized model. BMC Bioinform 10:321 Chan TK, Li G, Leung KS, Lee KH (2009) Discovering multiple realistic TFBS motifs based on a generalized model. BMC Bioinform 10:321
7.
go back to reference Chapman B, Jost G, van der Pas R (2007) Using OpenMP: portable shared memory parallel programming. The MIT Press, Cambridge ISBN: 978-0262533027 Chapman B, Jost G, van der Pas R (2007) Using OpenMP: portable shared memory parallel programming. The MIT Press, Cambridge ISBN: 978-0262533027
8.
go back to reference Che D, Song Y, Rashedd K (2005) MDGA: Motif discovery using a genetic algorithm. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation (GECCO’05), pp 447–452 Che D, Song Y, Rashedd K (2005) MDGA: Motif discovery using a genetic algorithm. In: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation (GECCO’05), pp 447–452
9.
go back to reference Chen C, Schmidt B, Weiguo L, Mller-Wittig W (2008) GPU-MEME: using graphics hardware to accelerate motif finding in DNA sequences. Pattern Recognit Bioinform LNCS 5265:448–459CrossRef Chen C, Schmidt B, Weiguo L, Mller-Wittig W (2008) GPU-MEME: using graphics hardware to accelerate motif finding in DNA sequences. Pattern Recognit Bioinform LNCS 5265:448–459CrossRef
10.
go back to reference Coello Coello, CA, Lamont GB, Veldhuizen DA (2007) Evolutionary algorithms for solving multi-objective problems., 2nd edn. Springer-Verlag, New York ISBN: 978-0-387-33254-3MATH Coello Coello, CA, Lamont GB, Veldhuizen DA (2007) Evolutionary algorithms for solving multi-objective problems., 2nd edn. Springer-Verlag, New York ISBN: 978-0-387-33254-3MATH
11.
go back to reference Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14:1188–1190CrossRef Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14:1188–1190CrossRef
12.
go back to reference Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197CrossRef Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197CrossRef
13.
go back to reference Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from incomplete data via the EM algorithm (with Discussion). J R Stat Soc Ser B 39(1):1–38MATH Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from incomplete data via the EM algorithm (with Discussion). J R Stat Soc Ser B 39(1):1–38MATH
14.
go back to reference Eskin E, Pevzner PA (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl 1):S354–S363CrossRef Eskin E, Pevzner PA (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl 1):S354–S363CrossRef
15.
go back to reference Favorov AV, Gelfand MS, Gerasimova AV, Ravcheev DA, Mironov AA, Makeev VJ (2005) A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length. Bioinformatics 21(10):2240–2245CrossRef Favorov AV, Gelfand MS, Gerasimova AV, Ravcheev DA, Mironov AA, Makeev VJ (2005) A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length. Bioinformatics 21(10):2240–2245CrossRef
16.
go back to reference Fogel GB, Porto VW, Varga G, Dow ER, Crave AM, Powers DM, Harlow HB, Su EW, Onyia JE, Su C (2008) Evolutionary computation for discovery of composite transcription factor binding sites. Nucl Acids Res 36(21):e142, 1–14 Fogel GB, Porto VW, Varga G, Dow ER, Crave AM, Powers DM, Harlow HB, Su EW, Onyia JE, Su C (2008) Evolutionary computation for discovery of composite transcription factor binding sites. Nucl Acids Res 36(21):e142, 1–14
17.
go back to reference Fogel GB, Weekes DG, Varga G, Dow ER, Harlow HB, Onyia JE, Su C (2004) Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucl Acids Res 32(13):3826–3835CrossRef Fogel GB, Weekes DG, Varga G, Dow ER, Harlow HB, Onyia JE, Su C (2004) Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucl Acids Res 32(13):3826–3835CrossRef
18.
go back to reference Frith MC, Hansen U, Spouge JL, Weng Z (2004) Finding functional sequence elements by multiple local alignment. Nucl Acids Res 32(1):189–200CrossRef Frith MC, Hansen U, Spouge JL, Weng Z (2004) Finding functional sequence elements by multiple local alignment. Nucl Acids Res 32(1):189–200CrossRef
19.
go back to reference Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson Education Limited, EdinburghMATH Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Pearson Education Limited, EdinburghMATH
20.
go back to reference Gropp W, Lusk W, Skjellum A (1999) Using MPI: portable parallel programming with the message passing interface, 2nd edn. The MIT Press, Cambridge ISBN: 0-262-57132-3MATH Gropp W, Lusk W, Skjellum A (1999) Using MPI: portable parallel programming with the message passing interface, 2nd edn. The MIT Press, Cambridge ISBN: 0-262-57132-3MATH
21.
go back to reference Grundy WN, Bailey TL, Elkan CP (1996) ParaMEME: a parallel implementation and a web interface for a dna and protein motif discovery tool. Comput Appl Biosci 12(4):303–310 Grundy WN, Bailey TL, Elkan CP (1996) ParaMEME: a parallel implementation and a web interface for a dna and protein motif discovery tool. Comput Appl Biosci 12(4):303–310
22.
go back to reference Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577CrossRef Hertz GZ, Stormo GD (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7–8):563–577CrossRef
23.
go back to reference Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
24.
go back to reference Hughes JD, Estep PW, Tavazoie S, Church GM (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214CrossRef Hughes JD, Estep PW, Tavazoie S, Church GM (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol 296(5):1205–1214CrossRef
25.
go back to reference James P (1997) Protein identification in the post-genome era: the rapid rise of proteomics. Q Rev Biophys 30(4):279–331CrossRef James P (1997) Protein identification in the post-genome era: the rapid rise of proteomics. Q Rev Biophys 30(4):279–331CrossRef
27.
go back to reference Liu FFM, Tsai JJP, Chen RM, Chen SN, Shih SH (2004) FMGA: finding motifs by genetic algorithm. In: Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04), pp 459–466 Liu FFM, Tsai JJP, Chen RM, Chen SN, Shih SH (2004) FMGA: finding motifs by genetic algorithm. In: Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE’04), pp 459–466
28.
go back to reference Liu Y, Schmidt B, Liu W, Maskell DL (2010) CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognit Lett 31(14):2170–2177CrossRef Liu Y, Schmidt B, Liu W, Maskell DL (2010) CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognit Lett 31(14):2170–2177CrossRef
29.
go back to reference Liu Y, Schmidt B, Maskell DL (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 428–434 Liu Y, Schmidt B, Maskell DL (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 428–434
30.
go back to reference Lones MA, Tyrrell AM (2007) Regulatory motif discovery using a population clustering evolutionary algorithm. IEEE/ACM Trans Comput Biol Bioinform 4(3):403–414CrossRef Lones MA, Tyrrell AM (2007) Regulatory motif discovery using a population clustering evolutionary algorithm. IEEE/ACM Trans Comput Biol Bioinform 4(3):403–414CrossRef
31.
go back to reference Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucl Acids Res 24(8):1515–1524CrossRef Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucl Acids Res 24(8):1515–1524CrossRef
32.
go back to reference Pavesi G, Mereghetti P, Zambelli F, Stefani M, Mauri G, Pesole G (2006) MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes. Nucl Acids Res 34:W566–W570CrossRef Pavesi G, Mereghetti P, Zambelli F, Stefani M, Mauri G, Pesole G (2006) MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes. Nucl Acids Res 34:W566–W570CrossRef
33.
go back to reference Qin J, Pinkenburg S, Rosenstiel W (2005) Parallel motif search using ParSEQ. In: IASTED International Conference on Parallel and Distributed Computing and Networks, pp 601–607 Qin J, Pinkenburg S, Rosenstiel W (2005) Parallel motif search using ParSEQ. In: IASTED International Conference on Parallel and Distributed Computing and Networks, pp 601–607
34.
go back to reference Regnier M, Denise A (2004) Rare events and conditional events on random strings. Discret Math Theoret Comput Sci 6(2):191–214MathSciNetMATH Regnier M, Denise A (2004) Rare events and conditional events on random strings. Discret Math Theoret Comput Sci 6(2):191–214MathSciNetMATH
35.
go back to reference Sandve GK, Nedland M, Syrstad OB, Eidsheim LA, Abul O, Drablos F (2006) Accelerating motif discovery: Motif matching on parallel hardware. Algorithms Bioinform LNCS 4175:197–206CrossRef Sandve GK, Nedland M, Syrstad OB, Eidsheim LA, Abul O, Drablos F (2006) Accelerating motif discovery: Motif matching on parallel hardware. Algorithms Bioinform LNCS 4175:197–206CrossRef
36.
go back to reference Schröder J, Wienbrandt L, Pfeiffer G, Schimmler M (2008) Massively parallelized DNA motif search on the reconfigurable hardware platform COPACOBANA. In: Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics, pp 436–447 Schröder J, Wienbrandt L, Pfeiffer G, Schimmler M (2008) Massively parallelized DNA motif search on the reconfigurable hardware platform COPACOBANA. In: Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics, pp 436–447
37.
go back to reference Shao L, Chen Y (2009) Bacterial foraging optimization algorithm integrating tabu search for motif discovery. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM’09), pp 415–418 Shao L, Chen Y (2009) Bacterial foraging optimization algorithm integrating tabu search for motif discovery. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM’09), pp 415–418
38.
go back to reference Shao L, Chen Y, Abraham A (2009) Motif discovery using evolutionary algorithms. In: International Conference of Soft Computing and Pattern Recognition (SOCPAR’09), pp 420–425 Shao L, Chen Y, Abraham A (2009) Motif discovery using evolutionary algorithms. In: International Conference of Soft Computing and Pattern Recognition (SOCPAR’09), pp 420–425
39.
go back to reference Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L., Xenarios I (2012) New and continuing developments at PROSITE. Nucl Acids Res 41(Database issue): D344–D347 Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L., Xenarios I (2012) New and continuing developments at PROSITE. Nucl Acids Res 41(Database issue): D344–D347
40.
go back to reference Sinha S, Tompa M (2003) YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucl Acids Res 31(13):3586–3588CrossRef Sinha S, Tompa M (2003) YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucl Acids Res 31(13):3586–3588CrossRef
41.
go back to reference Stine M, Dasgupta D, Mukatira S (2003) Motif discovery in upstream sequences of coordinately expressed genes. 2003 Congress Evol Comput (CEC’03) 3:1596–1603 Stine M, Dasgupta D, Mukatira S (2003) Motif discovery in upstream sequences of coordinately expressed genes. 2003 Congress Evol Comput (CEC’03) 3:1596–1603
42.
go back to reference Sutou T, Tamura K, Mori Y, Kitakami H (2003) Design and implementation of parallel modified prefixspan method. Int Sympos High Perform Comput 2858:412–422CrossRef Sutou T, Tamura K, Mori Y, Kitakami H (2003) Design and implementation of parallel modified prefixspan method. Int Sympos High Perform Comput 2858:412–422CrossRef
43.
go back to reference Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouzé P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12):1113–1122CrossRef Thijs G, Lescot M, Marchal K, Rombauts S, De Moor B, Rouzé P, Moreau Y (2001) A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12):1113–1122CrossRef
44.
go back to reference Thompson WA, Newberg LA, Conlan S, McCue LA, Lawrence CE (2007) The gibbs centroid sampler. Nucl Acids Res 35(Web Server issue):W232–W237 Thompson WA, Newberg LA, Conlan S, McCue LA, Lawrence CE (2007) The gibbs centroid sampler. Nucl Acids Res 35(Web Server issue):W232–W237
45.
go back to reference van Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mole Biol 281(5):827–842CrossRef van Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mole Biol 281(5):827–842CrossRef
46.
go back to reference Wei Z, Jensen S (2006) GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22(13):1577–1584CrossRef Wei Z, Jensen S (2006) GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics 22(13):1577–1584CrossRef
47.
go back to reference Workman CT, Stormo GD (2000) ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. In: Pacifc symposium on biocomputing, pp 467–478 Workman CT, Stormo GD (2000) ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. In: Pacifc symposium on biocomputing, pp 467–478
48.
go back to reference Yang JY, Yang MQ, Zhu M, Arabnia HR, Deng Y (2008) Promoting synergistic research and education in genomics and bioinformatics. BMC Genom 9(Suppl 1):I1CrossRef Yang JY, Yang MQ, Zhu M, Arabnia HR, Deng Y (2008) Promoting synergistic research and education in genomics and bioinformatics. BMC Genom 9(Suppl 1):I1CrossRef
49.
go back to reference Yang JY, Yang MQ, Arabnia HR, Deng Y (2008) Review: genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research. BMC Genom 9(Suppl 2):I1CrossRef Yang JY, Yang MQ, Arabnia HR, Deng Y (2008) Review: genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research. BMC Genom 9(Suppl 2):I1CrossRef
50.
go back to reference Yang MQ, Athey BD, Arabnia HR, Sung AH, Liu Q, Yang JY, Mao J, Deng Y (2009) High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genom 10(1) Yang MQ, Athey BD, Arabnia HR, Sung AH, Liu Q, Yang JY, Mao J, Deng Y (2009) High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics. BMC Genom 10(1)
51.
go back to reference Yu L, Xu Y (2009) A parallel gibbs sampling algorithm for motif finding on gpu. In: IEEE International Symposium on Parallel and Distributed Processing with Applications, pp 555–558 Yu L, Xu Y (2009) A parallel gibbs sampling algorithm for motif finding on gpu. In: IEEE International Symposium on Parallel and Distributed Processing with Applications, pp 555–558
52.
go back to reference Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evol Comput 3(4):257–271CrossRef Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans Evol Comput 3(4):257–271CrossRef
Metadata
Title
A hybrid MPI/OpenMP parallel implementation of NSGA-II for finding patterns in protein sequences
Authors
David L. González-Álvarez
Miguel A. Vega-Rodríguez
Álvaro Rubio-Largo
Publication date
17-11-2016
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 6/2017
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-016-1916-3

Other articles of this Issue 6/2017

The Journal of Supercomputing 6/2017 Go to the issue

Premium Partner