research-article

Query-aware locality-sensitive hashing for approximate nearest neighbor search

Authors:
Qiang Huang

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

,
Jianlin Feng

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

,
Yikai Zhang

Sun Yat-sen University, Guangzhou, China

Sun Yat-sen University, Guangzhou, China
View Profile

,
Qiong Fang

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

,
Wilfred Ng

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

Proceedings of the VLDB Endowment Volume 9 Issue 1pp 1–12https://doi.org/10.14778/2850469.2850470

Published:01 September 2015Publication History

Proceedings of the VLDB Endowment

Abstract

Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes for the c-Approximate Nearest Neighbor (c-ANN) search problem in high-dimensional Euclidean space. Traditionally, LSH functions are constructed in a query-oblivious manner in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer c ≥ 2.

In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the "anchor" for bucket partition. Accordingly, a query-aware LSH function is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. Notably, query-aware bucket partition can be easily implemented so that query performance is guaranteed. We propose a novel query-aware LSH scheme named QALSH for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio c > 1. Extensive experiments show that QALSH outperforms C2LSH and LSB-Forest, especially in high-dimensional space. Specifically, by using a ratio c < 2, QALSH can achieve much better query quality.

References

A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, pages 1018--1028, 2014. Google ScholarDigital Library
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SoCG, pages 253--262, 2004. Google ScholarDigital Library
R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In ACM SIGMOD, pages 301--312, 2003. Google ScholarDigital Library
J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD, pages 541--552, 2012. Google ScholarDigital Library
A. Gionis, P. Indyk, R. Motwani, et al. Similarity search in high dimensions via hashing. In VLDB, volume 99, pages 518--529. VLDB Endowment, 1999. Google ScholarDigital Library
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):13--30, 1963.Google ScholarCross Ref
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In ACM STOC, pages 604--613, 1998. Google ScholarDigital Library
H. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. idistance: an adaptive b+-tree based indexing method for nearest neighbor search. ACM TODS, 30(2):364--397, 2005. Google ScholarDigital Library
W. Johnson and J. Lindenstrauss. Extensions of lipshitz mapping into hilbert space. Contemporary Mathematics, 26:189--206, 1984.Google ScholarCross Ref
J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In ACM STOC, pages 599--608, 1997. Google ScholarDigital Library
Y. Liu, J. Cui, Z. Huang, H. Li, and H. T. Shen. Sk-lsh: An efficient index structure for approximate nearest neighbor search. VLDB, 7(9), 2014. Google ScholarDigital Library
R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In ACM-SIAM SODA, pages 1186--1195, 2006. Google ScholarDigital Library
H. Samet. Foundations of multidimensional and metric data structures. Morgan Kaufmann, 2006. Google ScholarDigital Library
Y. Sun, W. Wang, J. Qin, Y. Zhang, and X. Lin. Srs: Solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. VLDB, 8(1), 2014. Google ScholarDigital Library
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM TODS, 35(3):20, 2010. Google ScholarDigital Library

Recommendations

Complementary hashing for approximate nearest neighbor search
ICCV '11: Proceedings of the 2011 International Conference on Computer Vision

Recently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g.,...
Read More
Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing ...
Read More
Order preserving hashing for approximate nearest neighbor search
MM '13: Proceedings of the 21st ACM international conference on Multimedia

In this paper, we propose a novel method to learn similarity-preserving hash functions for approximate nearest neighbor (NN) search. The key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 9, Issue 1
September 2015
35 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2015
Published in pvldb Volume 9, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 69
  Total Citations
  View Citations
- 499
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Query-aware locality-sensitive hashing for approximate nearest neighbor search

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Complementary hashing for approximate nearest neighbor search

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Order preserving hashing for approximate nearest neighbor search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Query-aware locality-sensitive hashing for approximate nearest neighbor search

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Complementary hashing for approximate nearest neighbor search

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Order preserving hashing for approximate nearest neighbor search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media