2013 | OriginalPaper | Buchkapitel
Top-k Document Retrieval in Compact Space and Near-Optimal Time
verfasst von : Gonzalo Navarro, Sharma V. Thankachan
Erschienen in: Algorithms and Computation
Verlag: Springer Berlin Heidelberg
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
Let
$\cal{D}$
= {
d
1
,
d
2
,...
d
D
} be a given set of
D
string documents of total length
n
. Our task is to index
$\cal{D}$
such that the
k
most relevant documents for an online query pattern
P
of length
p
can be retrieved efficiently. There exist linear space data structures of
O
(
n
) words for answering such queries in optimal
O
(
p
+
k
) time. In this paper, we describe a compact index of size |
CSA
|+
n
log
D
+
o
(
n
log
D
) bits with near optimal time,
O
(
p
+
k
log
*
n
), for the basic relevance metric
term-frequency
, where |
CSA
| is the size (in bits) of a compressed full-text index of
$\cal{D}$
, and log
*
n
is the iterated logarithm of
n
.