skip to main content
research-article

Query-aware locality-sensitive hashing for approximate nearest neighbor search

Authors Info & Claims
Published:01 September 2015Publication History
Skip Abstract Section

Abstract

Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes for the c-Approximate Nearest Neighbor (c-ANN) search problem in high-dimensional Euclidean space. Traditionally, LSH functions are constructed in a query-oblivious manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c ≥ 2.

In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the "anchor" for bucket partition. Accordingly, a query-aware LSH function is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. Notably, query-aware bucket partition can be easily implemented so that query performance is guaranteed. We propose a novel query-aware LSH scheme named QALSH for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c > 1. Extensive experiments show that QALSH outperforms C2LSH and LSB-Forest, especially in high-dimensional space. Specifically, by using a ratio c < 2, QALSH can achieve much better query quality.

References

  1. A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, pages 1018--1028, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SoCG, pages 253--262, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In ACM SIGMOD, pages 301--312, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, pages 541--552, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, pages 518--529. VLDB Endowment, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In ACM STOC, pages 604--613, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: an adaptive b+-tree based indexing method for nearest neighbor search. ACM TODS, 30(2):364--397, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Johnson and J. Lindenstrauss. Extensions of lipshitz mapping into hilbert space. Contemporary Mathematics, 26:189--206, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  10. J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In ACM STOC, pages 599--608, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Liu, J. Cui, Z. Huang, H. Li, and H. T. Shen. Sk-lsh: An efficient index structure for approximate nearest neighbor search. VLDB, 7(9), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In ACM-SIAM SODA, pages 1186--1195, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Samet. Foundations of multidimensional and metric data structures. Morgan Kaufmann, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Sun, W. Wang, J. Qin, Y. Zhang, and X. Lin. Srs: Solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. VLDB, 8(1), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM TODS, 35(3):20, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 9, Issue 1
    September 2015
    35 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 September 2015
    Published in pvldb Volume 9, Issue 1

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader