2013 | OriginalPaper | Chapter
Suffix Trees for Partial Words and the Longest Common Compatible Prefix Problem
Authors : Francine Blanchet-Sadri, Justin Lazarow
Published in: Language and Automata Theory and Applications
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Suffix trees, introduced by Weiner in 1973, are powerful data structures used to solve many problems on words such as searching for patterns in biological sequences, data compression, text processing, etc. Although they have fallen out of favor in the past years to the more space-efficient suffix arrays, suffix trees are useful for modelling partial words by allowing paths to meet whilst keeping them acyclic through directedness (a partial word may have don’t care symbols or holes, which are compatible with any letter of the alphabet over which it is defined). We extend suffix trees to partial words by introducing a suffix directed acyclic graph, with compatibility links, that exhibits all the suffixes while preserving the longest common compatible prefix, lccp, between suffixes. We give an optimal
O
(
n
2
) time and space algorithm for constructing the suffix dag of a given partial word
w
with an arbitrary number of holes of length
n
over a fixed alphabet by modifying Weiner’s algorithm. Our algorithm also computes the lccp array between suffixes of
w
starting with holes and all other suffixes of
w
. It possesses the invariant that after the suffix at position
i
has been processed, the lccp between any suffix starting at or after position
i
with any suffix starting with a hole can be computed in constant time. As a result, with
O
(
n
2
) preprocessing time, finding the lccp of two given suffixes of
w
requires constant time.