Published in:

2009 | OriginalPaper | Chapter

Discovering Almost Any Hidden Motif from Multiple Sequences in Polynomial Time with Low Sample Complexity and High Success Probability

Authors : Bin Fu, Ming-Yang Kao, Lusheng Wang

Published in: Theory and Applications of Models of Computation

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

We study a natural probabilistic model for motif discovery that has been used to experimentally test the effectiveness of motif discovery programs. In this model, there are

background sequences, and each character in a background sequence is a random character from an alphabet

. A motif

...

is a string of

characters. Each background sequence is implanted a probabilistically generated approximate copy of

. For a probabilistically generated approximate copy

...

, every character is probabilistically generated such that the probability for

≠

is at most

. It has been conjectured that multiple background sequences can help with finding faint motifs

In this paper, we develop an efficient algorithm that can discover a hidden motif from a set of sequences for any alphabet

with |

| ≥ 2 and is applicable to DNA motif discovery. We prove that for

$\alpha<{1\over 4}(1-{1\over |\Sigma|})$

and any constant

≥ 8, there exist positive constants

and

such that if the length

of motif

is at least

log

, and there are

≥

log

input sequences, then in

(

) time this algorithm finds the motif with probability at least

$1-{1\over 2^x}$

for every

$G\in \Sigma^{\rho}-\Psi_{\rho, h,\epsilon}(\Sigma)$

, where

is the length of the motif,

is a parameter with

≥ 4

≥

log

, and

(

) is a small subset of at most

$2^{-\Theta(\epsilon^2 h)}$

fraction of the sequences in

. The constants

and

do not depend on

when

is a parameter of order

(log

). Our algorithm can take any number

sequences as input.

Springer Professional

Discovering Almost Any Hidden Motif from Multiple Sequences in Polynomial Time with Low Sample Complexity and High Success Probability

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner