2013 | OriginalPaper | Chapter
Less Space: Indexing for Queries with Wildcards
Authors : Moshe Lewenstein, J. Ian Munro, Venkatesh Raman, Sharma V. Thankachan
Published in: Algorithms and Computation
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Text indexing is a fundamental problem in computer science, where the task is to index a given text (string)
T
[1..
n
], such that whenever a pattern
P
[1..
p
] comes as a query, we can efficiently report all those locations where
P
occurs as a substring of
T
. In this paper, we consider the case when
P
contains wildcard characters (which can match with any other character). The first non-trivial solution for the problem is given by Cole et al. [STOC 2004], where the index space is
O
(
n
log
k
n
) words or
O
(
n
log
k
+ 1
n
) bits and the query time is
O
(
p
+ 2
h
loglog
n
+
occ
), where
k
is the maximum number of wildcard characters allowed in
P
,
h
≤
k
is the number of wildcard characters in
P
and
occ
represents the number of occurrences of
P
in
T
. Even though many indexes offering different space-time trade-offs were later proposed, a clear improvement on this result is still not known. In this paper, we first propose an
O
(
n
log
k
+
ε
n
) bits index achieving the same query time as that of Cole et al.’s index, where 0 <
ε
< 1 is an arbitrary small constant. Then we propose another index of size
O
(
n
log
k
n
log
σ
) bits, but with a slightly higher query time of
O
(
p
+ 2
h
log
n
+
occ
), where
σ
denotes the alphabet set size.