Skip to main content
Top

2009 | OriginalPaper | Chapter

Discovering Almost Any Hidden Motif from Multiple Sequences in Polynomial Time with Low Sample Complexity and High Success Probability

Authors : Bin Fu, Ming-Yang Kao, Lusheng Wang

Published in: Theory and Applications of Models of Computation

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

We study a natural probabilistic model for motif discovery that has been used to experimentally test the effectiveness of motif discovery programs. In this model, there are

k

background sequences, and each character in a background sequence is a random character from an alphabet

Σ

. A motif

G

 = 

g

1

g

2

...

g

m

is a string of

m

characters. Each background sequence is implanted a probabilistically generated approximate copy of

G

. For a probabilistically generated approximate copy

b

1

b

2

...

b

m

of

G

, every character is probabilistically generated such that the probability for

b

i

 ≠ 

g

i

is at most

α

. It has been conjectured that multiple background sequences can help with finding faint motifs

G

.

In this paper, we develop an efficient algorithm that can discover a hidden motif from a set of sequences for any alphabet

Σ

with |

Σ

| ≥ 2 and is applicable to DNA motif discovery. We prove that for

$\alpha<{1\over 4}(1-{1\over |\Sigma|})$

and any constant

x

 ≥ 8, there exist positive constants

c

0

,

ε

,

δ

1

and

δ

2

such that if the length

ρ

of motif

G

is at least

δ

1

log

n

, and there are

k

 ≥ 

c

0

log

n

input sequences, then in

O

(

n

2

 + 

kn

) time this algorithm finds the motif with probability at least

$1-{1\over 2^x}$

for every

$G\in \Sigma^{\rho}-\Psi_{\rho, h,\epsilon}(\Sigma)$

, where

ρ

is the length of the motif,

h

is a parameter with

ρ

 ≥ 4

h

 ≥ 

δ

2

log

n

, and

Ψ

ρ

,

h

,

ε

(

Σ

) is a small subset of at most

$2^{-\Theta(\epsilon^2 h)}$

fraction of the sequences in

Σ

ρ

. The constants

c

0

,

ε

,

δ

1

and

δ

2

do not depend on

x

when

x

is a parameter of order

O

(log

n

). Our algorithm can take any number

k

sequences as input.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Metadata
Title
Discovering Almost Any Hidden Motif from Multiple Sequences in Polynomial Time with Low Sample Complexity and High Success Probability
Authors
Bin Fu
Ming-Yang Kao
Lusheng Wang
Copyright Year
2009
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-02017-9_26

Premium Partner