2013 | OriginalPaper | Chapter
Top-k Document Retrieval in Compact Space and Near-Optimal Time
Authors : Gonzalo Navarro, Sharma V. Thankachan
Published in: Algorithms and Computation
Publisher: Springer Berlin Heidelberg
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence. powered by
Select sections of text to find additional relevant content using AI-assisted search. powered by
Let
$\cal{D}$
= {
d
1
,
d
2
,...
d
D
} be a given set of
D
string documents of total length
n
. Our task is to index
$\cal{D}$
such that the
k
most relevant documents for an online query pattern
P
of length
p
can be retrieved efficiently. There exist linear space data structures of
O
(
n
) words for answering such queries in optimal
O
(
p
+
k
) time. In this paper, we describe a compact index of size |
CSA
|+
n
log
D
+
o
(
n
log
D
) bits with near optimal time,
O
(
p
+
k
log
*
n
), for the basic relevance metric
term-frequency
, where |
CSA
| is the size (in bits) of a compressed full-text index of
$\cal{D}$
, and log
*
n
is the iterated logarithm of
n
.