ABSTRACT
Given a metric space (X,dX), c≥1, r>0, and p,q ≡ [0,1], a distribution over mappings H : X → N is called a (r,cr,p,q)-sensitive hash family if any two points in X at distance at most r are mapped by H to the same value with probability at least p, and any two points at distance greater than cr are mapped by H to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm, and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ⊇=log(1/p)/log(1/q), and constructing hash families with small ⊇ automatically yields improved nearest neighbor algorithms. Here we show that for X=l1 it is impossible to achieve ⊇ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ⊇ ≤ 1/c.
- A. Andoni and P. Indyk. Faster algorithms for high dimensional nearest neighbor problems. Manuscript, 2005.Google Scholar
- W. Beckner. Inequalities in Fourier analysis. Ann. of Math. (2), 102(1):159--182, 1975.Google Scholar
- A. Bonami. Étude des coefficients de Fourier des fonctions de Lp(G). Ann. Inst. Fourier (Grenoble), 20(fasc. 2):335--402 (1971), 1970.Google Scholar
- M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG '04: Proceedings of the Twentieth Annual Symposium on Computational Geometry (Brooklyn, NY, 2004), pages 253--262. ACM Press, New York, NY, 2004. Google ScholarDigital Library
- S. Har-Peled. A replacement for Voronoi diagrams of near linear size. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001), pages 94--103. IEEE Computer Soc., Los Alamitos, CA, 2001. Google ScholarDigital Library
- P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC '98: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (Dallas, TX, 1998), pages 604--613. ACM Press, New York, NY, 1998. Google ScholarDigital Library
- R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm (Miami, FL, 2006), pages 1186--1195. ACM Press, New York, NY, 2006. Google ScholarDigital Library
Index Terms
- Lower bounds on locality sensitive hashing
Recommendations
Fast locality-sensitive hashing
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningLocality-sensitive hashing (LSH) is a basic primitive in several large-scale data processing applications, including nearest-neighbor search, de-duplication, clustering, etc. In this paper we propose a new and simple method to speed up the widely-used ...
Lower Bounds on Locality Sensitive Hashing
Given a metric space $(X,d_X)$, $c \ge 1$, $r > 0$, and $p,q \in [0,1]$, a distribution over mappings $\mathscr{H} : X \to \mathbb{N}$ is called a $(r,cr,p,q)$-sensitive hash family if any two points in $X$ at distance at most $r$ are mapped by $\...
A posteriori multi-probe locality sensitive hashing
MM '08: Proceedings of the 16th ACM international conference on MultimediaEfficient high-dimensional similarity search structures are essential for building scalable content-based search systems on feature-rich multimedia data. In the last decade, Locality Sensitive Hashing (LSH) has been proposed as indexing technique for ...
Comments