Bottom-k document retrieval

doi:10.1016/j.jda.2014.12.009

Abstract

We consider the problem of retrieving the k documents from a collection of strings where a given pattern P appears least often. This has potential applications in data mining, bioinformatics, security, and big data. We show that adapting the classical linear-space solutions for this problem is trivial, but the compressed-space solutions are not easy to extend. We design a new solution for this problem that matches the best-known result when using $2 | CSA | + o (n)$ bits, where $CSA$ is a Compressed Suffix Array. Our structure answers queries in the time needed by the $CSA$ to find the suffix array interval of the pattern plus $O (k \lg k \lg^{ϵ} n)$ accesses to suffix array cells, for any constant $ϵ > 0$ .

Journal of Discrete Algorithms

Abstract

Keywords

Cited by (0)