2010 | OriginalPaper | Buchkapitel
Compressed Self-indices Supporting Conjunctive Queries on Document Collections
verfasst von : Diego Arroyuelo, Senén González, Mauricio Oyarzún
Erschienen in: String Processing and Information Retrieval
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
We prove that a document collection, represented as a unique sequence
T
of
n
terms over a vocabulary Σ, can be represented in
nH
0
(
T
) +
o
(
n
)(
H
0
(
T
) + 1) bits of space, such that a conjunctive query
t
1
∧ ⋯ ∧
t
k
can be answered in
O
(
kδ
loglog|Σ|) adaptive time, where
δ
is the instance difficulty of the query, as defined by Barbay and Kenyon in their SODA’02 paper, and
H
0
(
T
) is the empirical entropy of order 0 of
T
. As a comparison, using an inverted index plus the adaptive intersection algorithm by Barbay and Kenyon takes
$O(k\delta\log{\frac{n_M}{\delta}})$
, where
n
M
is the length of the shortest and longest occurrence lists, respectively, among those of the query terms. Thus, we can replace an inverted index by a more space-efficient in-memory encoding, outperforming the query performance of inverted indices when the ratio
$\frac{n_M}{\delta}$
is
ω
(log|Σ|).