Skip to main content
Erschienen in: The Journal of Supercomputing 2/2014

01.08.2014

Parallelizing exact motif finding algorithms on multi-core

verfasst von: Mostafa M. Abbas, Hazem M. Bahig, Mohamed Abouelhoda, M. M. Mohie-Eldin

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The motif finding problem is one of the important and challenging problems in bioinformatics. A variety of sequential algorithms have been proposed to find exact motifs, but the running time is still not suitable due to high computational complexity of finding motifs. In this paper we parallelize three efficient sequential algorithms which are HEPPMSprune, PMS5 and PMS6. We implement the algorithms on a Dual Quad-Core machine using openMP to measure the performance of each algorithm. Our experiment on simulated data show that: (1) the parallel PMS6 is faster than the other algorithms in case of challenging instances, while the parallel HEPPMSprune is faster than the other algorithms in most of solvable instances; (2) the scalability of parallel HEPPMSprune is linear for all instances, while the scalability of parallel PMS5 and PMS6 is linear in case of challenging instances only; (3) the memory used by HEPPMSprune is less than that of the other algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Pevzner P, Sze S (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of eighth international conference on intelligent systems for molecular biology, pp 269–278 Pevzner P, Sze S (2000) Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of eighth international conference on intelligent systems for molecular biology, pp 269–278
2.
Zurück zum Zitat Buhler J, Tompa M (2002) Finding motifs using random projections. J Comput Biol 9(2):225–242CrossRef Buhler J, Tompa M (2002) Finding motifs using random projections. J Comput Biol 9(2):225–242CrossRef
3.
Zurück zum Zitat Leung H, Chin F (2005) Generalized planted \((l, d)\)-motif problem with negative set. In: Proceedings of workshop on algorithms in bioinformatics (LNCS), vol 3692, pp 264–275 Leung H, Chin F (2005) Generalized planted \((l, d)\)-motif problem with negative set. In: Proceedings of workshop on algorithms in bioinformatics (LNCS), vol 3692, pp 264–275
4.
Zurück zum Zitat Lawrence C, Reilly A (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct Funct Genet 7(1):41–51CrossRef Lawrence C, Reilly A (1990) An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins Struct Funct Genet 7(1):41–51CrossRef
5.
Zurück zum Zitat Lawrence C, Altschul S, Boguski M, Liu J, Neuwald A, Wootton J (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214CrossRef Lawrence C, Altschul S, Boguski M, Liu J, Neuwald A, Wootton J (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214CrossRef
6.
Zurück zum Zitat Bailey T, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21:51–80 Bailey T, Elkan C (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach Learn 21:51–80
7.
Zurück zum Zitat Fraenkel Y, Mandel Y, Friedberg D, Margalit H (1995) Identification of common motifs in unaligned DNA sequences: application to \(Escherichia\ coli\) Lrpregulon. Bioinformatics 11:379–387 Fraenkel Y, Mandel Y, Friedberg D, Margalit H (1995) Identification of common motifs in unaligned DNA sequences: application to \(Escherichia\ coli\) Lrpregulon. Bioinformatics 11:379–387
8.
Zurück zum Zitat Rigoutsos I, Floratos A (1998) Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14:55–67CrossRef Rigoutsos I, Floratos A (1998) Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14:55–67CrossRef
9.
Zurück zum Zitat Hertz G, Stormo G (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15:563–577CrossRef Hertz G, Stormo G (1999) Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15:563–577CrossRef
10.
Zurück zum Zitat Gelfand M, Koonin E, Mironov A (2000) Prediction of transcription regulatory sites in archaea by a comparative genomic approach. Nucl Acid Res 28:695–705CrossRef Gelfand M, Koonin E, Mironov A (2000) Prediction of transcription regulatory sites in archaea by a comparative genomic approach. Nucl Acid Res 28:695–705CrossRef
11.
Zurück zum Zitat Price A, Ramabhadran S, Pevzner P (2003) Finding subtle motifs by branching from sample strings. Bioinformatics 19(2):ii149–ii155 Price A, Ramabhadran S, Pevzner P (2003) Finding subtle motifs by branching from sample strings. Bioinformatics 19(2):ii149–ii155
12.
Zurück zum Zitat Huang C, Lee W, Hsieh S (2011) An improved heuristic algorithm for finding motif signals in DNA sequences. IEEE/ACM Trans Comput Biol Bioinf 8(4):959–975CrossRef Huang C, Lee W, Hsieh S (2011) An improved heuristic algorithm for finding motif signals in DNA sequences. IEEE/ACM Trans Comput Biol Bioinf 8(4):959–975CrossRef
13.
Zurück zum Zitat Galas D, Eggert M, Waterman M (1985) Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from \(Escherichia coli\). J Mol Biol 186(1):117–128CrossRef Galas D, Eggert M, Waterman M (1985) Rigorous pattern-recognition methods for DNA sequences: analysis of promoter sequences from \(Escherichia coli\). J Mol Biol 186(1):117–128CrossRef
14.
Zurück zum Zitat Staden R (1989) Methods for discovering novel motifs in nucleic acid sequences. Comput Appl Biosci 5(4):293–298 Staden R (1989) Methods for discovering novel motifs in nucleic acid sequences. Comput Appl Biosci 5(4):293–298
15.
Zurück zum Zitat Brazma A, Jonassen I, Vilo J, Ukkonen E (1998) Predicting gene regulatory elements in silico on a genomic scale. Genome Res 15:1202–1215 Brazma A, Jonassen I, Vilo J, Ukkonen E (1998) Predicting gene regulatory elements in silico on a genomic scale. Genome Res 15:1202–1215
16.
Zurück zum Zitat Sagot M (1998) Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi CL, Moura AV (eds) Latin’98: theoretical informatics, LNCS, vol 1380, pp 111–127 Sagot M (1998) Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi CL, Moura AV (eds) Latin’98: theoretical informatics, LNCS, vol 1380, pp 111–127
17.
Zurück zum Zitat Van-Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5):827–842CrossRef Van-Helden J, Andre B, Collado-Vides J (1998) Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5):827–842CrossRef
18.
Zurück zum Zitat Tompa M (1999) An exact method for finding short motifs in sequences with application to the ribosome binding site problem. In: Proceedings of seventh international conference on intelligent systems for molecular biology, pp 262–271 Tompa M (1999) An exact method for finding short motifs in sequences with application to the ribosome binding site problem. In: Proceedings of seventh international conference on intelligent systems for molecular biology, pp 262–271
19.
Zurück zum Zitat Marsan L, Sagot M (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol 7(3–4):345–362CrossRef Marsan L, Sagot M (2000) Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol 7(3–4):345–362CrossRef
20.
Zurück zum Zitat Sinha S, Tompa M (2000) A statistical method for finding transcription factor binding sites. In: Proceedings of eighth international conference on intelligent systems for molecular biology, pp 344–354 Sinha S, Tompa M (2000) A statistical method for finding transcription factor binding sites. In: Proceedings of eighth international conference on intelligent systems for molecular biology, pp 344–354
21.
Zurück zum Zitat Blanchette M, Schwikowski B, Tompa M (2002) Algorithms for phylogenetic footprinting. J Comput Biol 9(2):211–223CrossRef Blanchette M, Schwikowski B, Tompa M (2002) Algorithms for phylogenetic footprinting. J Comput Biol 9(2):211–223CrossRef
22.
Zurück zum Zitat Eskin E, Pevzner P (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(1):354–363CrossRef Eskin E, Pevzner P (2002) Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(1):354–363CrossRef
23.
Zurück zum Zitat Evans P, Smith A (2003) Toward optimal motif enumeration. In: Proceedings of eighth international workshop algorithms and data structures ( WADS03), pp 47–58 Evans P, Smith A (2003) Toward optimal motif enumeration. In: Proceedings of eighth international workshop algorithms and data structures ( WADS03), pp 47–58
24.
Zurück zum Zitat Carvalho A, Freitas A, Oliveira A, Sagot M (2005) A highly scalable algorithm for the extraction of CIS-Regulatory regions. In: Proceedings of third Asia Pacific bioinformatics conference, pp 273–282 Carvalho A, Freitas A, Oliveira A, Sagot M (2005) A highly scalable algorithm for the extraction of CIS-Regulatory regions. In: Proceedings of third Asia Pacific bioinformatics conference, pp 273–282
25.
Zurück zum Zitat Chin F, Leung H (2005) Voting algorithms for discovering long motifs. In: Proceedings of third Asia Pacific bioinformatics conference, pp 261–271 Chin F, Leung H (2005) Voting algorithms for discovering long motifs. In: Proceedings of third Asia Pacific bioinformatics conference, pp 261–271
26.
Zurück zum Zitat Rajasekaran S, Balla S, Huang C (2005) Exact algorithms for planted motif problems. J Comput Biol 12(8):1117–1128CrossRef Rajasekaran S, Balla S, Huang C (2005) Exact algorithms for planted motif problems. J Comput Biol 12(8):1117–1128CrossRef
27.
Zurück zum Zitat Davila J, Balla S, Rajasekaran S (2006) Space and time efficient algorithms for planted motif search. In: Proceedings of second international workshop on bioinformatics research and applications ( LNCS 3992), pp 822–829 Davila J, Balla S, Rajasekaran S (2006) Space and time efficient algorithms for planted motif search. In: Proceedings of second international workshop on bioinformatics research and applications ( LNCS 3992), pp 822–829
28.
Zurück zum Zitat Pisanti N, Carvalho A, Marsan L, Sagot M (2006) RISOTTO: fast extraction of motifs with mismatches. In: Proceedings of seventh Latin American theoretical informatics symposium, pp 757–768 Pisanti N, Carvalho A, Marsan L, Sagot M (2006) RISOTTO: fast extraction of motifs with mismatches. In: Proceedings of seventh Latin American theoretical informatics symposium, pp 757–768
29.
Zurück zum Zitat Davila J, Balla S, Rajasekaran S (2007) Fast and practical algorithms for planted \((l, d)\) motif search. IEEE/ACM Trans Comput Biol Bioinf 4(4):544–552CrossRef Davila J, Balla S, Rajasekaran S (2007) Fast and practical algorithms for planted \((l, d)\) motif search. IEEE/ACM Trans Comput Biol Bioinf 4(4):544–552CrossRef
30.
Zurück zum Zitat Dinh H, Rajasekaran S, Kundeti V (2011) PMS5: an efficient exact algorithm for the \((l, d)\)-motif finding problem. BMC Bioinf 12:410–420CrossRef Dinh H, Rajasekaran S, Kundeti V (2011) PMS5: an efficient exact algorithm for the \((l, d)\)-motif finding problem. BMC Bioinf 12:410–420CrossRef
31.
Zurück zum Zitat Abbas M, Abouelhoda M, Bahig H (2012) A hybrid method for the exact planted (l, d) motif finding problem and its parallelization. BMC Bioinformatics, vol 13, supplement 17, Article S10 Abbas M, Abouelhoda M, Bahig H (2012) A hybrid method for the exact planted (l, d) motif finding problem and its parallelization. BMC Bioinformatics, vol 13, supplement 17, Article S10
32.
Zurück zum Zitat Bandyopadhyay S, Sahni S, Rajasekaran S (2012) PMS6: a faster algorithm for motif discovery. In: Proceedings of the second IEEE international conference on computational advances in bio and medical sciences (ICCABS 2012), pp 1–6 Bandyopadhyay S, Sahni S, Rajasekaran S (2012) PMS6: a faster algorithm for motif discovery. In: Proceedings of the second IEEE international conference on computational advances in bio and medical sciences (ICCABS 2012), pp 1–6
33.
Zurück zum Zitat Grundy W, Bailey T, Elkan C (1996) ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery tool. Comput Appl Biosci 12(4):303–310 Grundy W, Bailey T, Elkan C (1996) ParaMEME: a parallel implementation and a web interface for a DNA and protein motif discovery tool. Comput Appl Biosci 12(4):303–310
34.
Zurück zum Zitat Carvalho A, Freitas A, Oliveira A, Sagot M (2004) A parallel algorithm for the extraction of structured motifs. In: Proceedings of the 19th ACM symposium on applied computing (SAC’04), pp 147–153 Carvalho A, Freitas A, Oliveira A, Sagot M (2004) A parallel algorithm for the extraction of structured motifs. In: Proceedings of the 19th ACM symposium on applied computing (SAC’04), pp 147–153
35.
Zurück zum Zitat Hamdani H, Rashid N, Abdulrazzaq A, Ghadban R, Wajidi M (2009) Fast phylocon algorithm using OpenMP. In: Proceedings of the IEEE international conference on computer technology and development, pp 550–553 Hamdani H, Rashid N, Abdulrazzaq A, Ghadban R, Wajidi M (2009) Fast phylocon algorithm using OpenMP. In: Proceedings of the IEEE international conference on computer technology and development, pp 550–553
36.
Zurück zum Zitat Yu L, Xu Y (2009) A parallel Gibbs sampling algorithm for motif finding on GPU. In: Proceedings of the IEEE international symposium on parallel and distributed processing with applications, pp 555–558 Yu L, Xu Y (2009) A parallel Gibbs sampling algorithm for motif finding on GPU. In: Proceedings of the IEEE international symposium on parallel and distributed processing with applications, pp 555–558
37.
Zurück zum Zitat Faheem H (2010) Accelerating motif finding problem using grid computing with enhanced brute force. In: Proceedings of the 12th international conference on advanced communication technology (ICACT), pp 197–202 Faheem H (2010) Accelerating motif finding problem using grid computing with enhanced brute force. In: Proceedings of the 12th international conference on advanced communication technology (ICACT), pp 197–202
38.
Zurück zum Zitat Dasari N, Desh R, Zubair M (2010a) An efficient multicore implementation of planted motif problem. In: Proceedings of the international conference on high performance computing and simulation, pp 9–15 Dasari N, Desh R, Zubair M (2010a) An efficient multicore implementation of planted motif problem. In: Proceedings of the international conference on high performance computing and simulation, pp 9–15
39.
Zurück zum Zitat Dasari N, Desh R, Zubair M (2010b) Solving planted motif problem on GPU. In: International workshop on GPUs and scientific applications Dasari N, Desh R, Zubair M (2010b) Solving planted motif problem on GPU. In: International workshop on GPUs and scientific applications
40.
Zurück zum Zitat Dasari N, Desh R, Zubair M (2011) High performance implementation of planted motif problem using suffix trees. In: Proceedings of the international conference on high performance computing and simulation, pp 200–206 Dasari N, Desh R, Zubair M (2011) High performance implementation of planted motif problem using suffix trees. In: Proceedings of the international conference on high performance computing and simulation, pp 200–206
41.
Zurück zum Zitat Sahoo B, Sourav R, Ranjan R, Padhy S (2011) Parallel implementation of exact algorithm for planted motif search problem using SMP cluster. Eur J Sci Res 64(4):484–496 Sahoo B, Sourav R, Ranjan R, Padhy S (2011) Parallel implementation of exact algorithm for planted motif search problem using SMP cluster. Eur J Sci Res 64(4):484–496
42.
Zurück zum Zitat Liu Y, Schmidt B, Maskell D (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: Proceedings of the IEEE international parallel and distributed processing symposium, pp 423–429 Liu Y, Schmidt B, Maskell D (2011) An ultrafast scalable many-core motif discovery algorithm for multiple GPUs. In: Proceedings of the IEEE international parallel and distributed processing symposium, pp 423–429
Metadaten
Titel
Parallelizing exact motif finding algorithms on multi-core
verfasst von
Mostafa M. Abbas
Hazem M. Bahig
Mohamed Abouelhoda
M. M. Mohie-Eldin
Publikationsdatum
01.08.2014
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2014
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1180-3

Weitere Artikel der Ausgabe 2/2014

The Journal of Supercomputing 2/2014 Zur Ausgabe