Bottom-k document retrieval

https://doi.org/10.1016/j.jda.2014.12.009Get rights and content
Under an Elsevier user license
open archive

Abstract

We consider the problem of retrieving the k documents from a collection of strings where a given pattern P appears least often. This has potential applications in data mining, bioinformatics, security, and big data. We show that adapting the classical linear-space solutions for this problem is trivial, but the compressed-space solutions are not easy to extend. We design a new solution for this problem that matches the best-known result when using 2|CSA|+o(n) bits, where CSA is a Compressed Suffix Array. Our structure answers queries in the time needed by the CSA to find the suffix array interval of the pattern plus O(klgklgϵn) accesses to suffix array cells, for any constant ϵ>0.

Keywords

Compact data structures
Document retrieval
String collections

Cited by (0)