2010 | OriginalPaper | Chapter
An Efficient Algorithm for Chinese Postman Walk on Bi-directed de Bruijn Graphs
Authors : Vamsi Kundeti, Sanguthevar Rajasekaran, Heiu Dinh
Published in: Combinatorial Optimization and Applications
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However finding a Shortest Double stranded DNA string (SDDNA) containing all the
k
-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying un-weighted bi-directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in
O
(|
E
|
2
log
2
(|
V
|)) time.
In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a
$\Theta(p(|V|+|E|)\log(|V|) + (d_{max}p)^3 )$
time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where
p
= max {|{
v
|
d
in
(
v
) −
d
out
(
v
) > 0}|, |{
v
|
d
in
(
v
) −
d
out
(
v
) < 0}|} and
d
max
= max { |
d
in
(
v
) −
d
out
(
v
)}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of
imbalanced
nodes
p
is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of
p
/|
V
| lies between 0.08% and 0.13% with 95% probability.
Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. We also present a Θ((|
V
| + |
E
|)log(
V
)) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest.