2007 | OriginalPaper | Chapter
Cache-Oblivious Index for Approximate String Matching
Authors : Wing-Kai Hon, Tak-Wah Lam, Rahul Shah, Siu-Lung Tam, Jeffrey Scott Vitter
Published in: Combinatorial Pattern Matching
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text
T
of length
n
and a positive integer
k
, we want to construct an index of
T
such that for any input pattern
P
, we can find all its
k
-error matches in
T
efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies
O
((
n
log
k
n
)/
B
) disk pages and finds all
k
-error matches with
$O((|P|+occ)/B + \log^k n \log \log_{\scriptscriptstyle B} n)$
I/Os, where
B
denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require
$\Omega(|P| + occ + \mbox{poly}(\log n))$
I/Os. The second index reduces the space to
O
((
n
log
n
)/
B
) disk pages, and the I/O complexity is
O
((|
P
| +
occ
)/
B
+ log
k
(
k
+ 1)
n
loglog
n
).