2006 | OriginalPaper | Buchkapitel
Position-Restricted Substring Searching
verfasst von : Veli Mäkinen, Gonzalo Navarro
Erschienen in: LATIN 2006: Theoretical Informatics
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
A full-text index is a data structure built over a text string
T
[1,
n
]. The most basic functionality provided is (
a
) counting how many times a pattern string
P
[1,
m
] appears in
T
and (
b
) locating all those
occ
positions. There exist several indexes that solve (
a
) in
O
(
m
) time and (
b
) in
O
(
occ
) time. In this paper we propose two new queries, (
c
) counting how many times
P
[1,
m
] appears in
T
[
l
,
r
] and (
d
) locating all those
occ
l
,
r
positions. These can be solved using (
a
) and (
b
) but this requires
O
(
occ
) time. We present two solutions to (
c
) and (
d
) in this paper. The first is an index that requires
O
(
n
log
n
) bits of space and answers (
c
) in
O
(
m
+log
n
) time and (
d
) in
O
(log
n
) time per occurrence (that is,
O
(
occ
l
,
r
log
n
) time overall). A variant of the first solution answers (
c
) in
O
(
m
+loglog
n
) time and (
d
) in constant time per occurrence, but requires
O
(
n
log
$^{\rm 1+{\it \epsilon}}$
n
) bits of space for any constant
ε
> 0. The second solution requires
O
(
nm
log
σ
) bits of space, solving (
c
) in
O
(
m
⌈log
σ
/ loglog
n
⌉) time and (
d
) in
O
(
m
⌈log
σ
/ loglog
n
⌉) time per occurrence, where
σ
is the alphabet size. This second structure takes less space when the text is compressible.
Our solutions can be seen as a generalization of
rank
and
select
dictionaries, which allow computing how many times a given character
c
appears in a prefix
T
[1,
i
] and also locate the
i
-th occurrence of
c
in
T
. Our solution to (
c
) extends character
rank
queries to
substring rank
queries, and our solution to (
d
) extends character
select
to
substring select
queries.
As a byproduct, we show how
rank
queries can be used to implement fractional cascading in little space, so as to obtain an alternative implementation of a well-known two-dimensional range search data structure by Chazelle. We also show how Grossi et al.’s
wavelet trees
are suitable for two-dimensional range searching, and their connection with Chazelle’s data structure.