research-article

Free Access

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Authors:
Alexandr Andoni

View Profile

,
Piotr Indyk

Massachusetts Institute of Technology, Cambridge, MA

Massachusetts Institute of Technology, Cambridge, MA
View Profile

Authors Info & Claims

Communications of the ACM Volume 51 Issue 1January 2008pp 117–122https://doi.org/10.1145/1327452.1327494

Published:01 January 2008Publication History

Communications of the ACM

Abstract

In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.

Supplemental Material

Available for Download

pdf

p117-andoni.jp.pdf (432 KB)

Requires Asian Language Support in Adobe Reader and Japanese Language Support in Your Browser.

References

1. Ailon, N. and Chazelle, B. 2006. Approximate nearest neighbors and the Fast Johnson-Lindenstrauss Transform. In Proceedings of the Symposium on Theory of Computing. Google ScholarDigital Library
Andoni, A. and Indyk, P. 2004. E2lsh: Exact Euclidean localitysensitive hashing. http://web.mit.edu/andoni/www/LSH/.Google Scholar
Andoni, A. and Indyk, P. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarDigital Library
Andoni, A. and Indyk, P. 2006. Efficient algorithms for substring near neighbor problem. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 1203--1212. Google ScholarDigital Library
Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. 1994. An optimal algorithm for approximate nearest neighbor searching. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 573--582. Google ScholarDigital Library
Bentley, J. L. 1975. Multidimensional binary search trees used for associative searching. Comm. ACM 18, 509--517. Google ScholarDigital Library
Broder, A., Charikar, M., Frieze, A., and Mitzenmacher, M. 1998. Min-wise independent permutations. J. Comput. Sys. Sci. Google ScholarDigital Library
Broder, A., Glassman, S., Manasse, M., and Zweig, G. 1997. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference. 391--404. Google ScholarDigital Library
Broder, A. 1997. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences. 21--29. Google ScholarDigital Library
Buhler, J. 2001. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinform. 17, 419--428.Google ScholarCross Ref
Buhler, J. and Tompa, M. 2001. Finding motifs using random projections. In Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB1). Google ScholarDigital Library
Califano, A. and Rigoutsos, I. 1993. Flash: A fast look-up algorithm for string homology. In Proceedings of the IEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Chakrabarti, A. and Regev, O. 2004. An optimal randomised cell probe lower bounds for approximate nearest neighbor searching. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarDigital Library
Charikar, M. 2002. Similarity estimation techniques from rounding. In Proceedings of the Symposium on Theory of Computing. Google ScholarDigital Library
Charikar, M., Chekuri, C., Goel, A., Guha, S., and Plotkin, S. 1998. Approximating a finite metric by a small number of tree metrics. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarDigital Library
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduct. Algorithms. 2nd Ed. MIT Press.Google Scholar
Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. 2004. Localitysensitive hashing scheme based on p-stable distributions. In Proceedings of the ACM Symposium on Computational Geometry. Google ScholarDigital Library
Dutta, D., Guha, R., Jurs, C., and Chen, T. 2006. Scalable partitioning and exploration of chemical spaces using geometric hashing. J. Chem. Inf. Model. 46.Google ScholarCross Ref
Gionis, A., Indyk, P., and Motwani, R. 1999. Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Databases. Google ScholarDigital Library
Goemans, M. and Williamson, D. 1995. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42. 1115--1145. Google ScholarDigital Library
Greene, D., Parnas, M., and Yao, F. 1994. Multi-index hashing for information retrieval. In Proceedings of the Symposium on Foundations of Computer Science. 722--731.Google Scholar
Har-Peled, S. 2001. A replacement for voronoi diagrams of near linear size. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarDigital Library
Haveliwala, T., Gionis, A., and Indyk, P. 2000. Scalable techniques for clustering the web. WebDB Workshop.Google Scholar
Indyk, P. 2003. Nearest neighbors in high-dimensional spaces. In Handbook of Discrete and Computational Geometry. CRC Press.Google Scholar
Indyk, P. and Motwani, R. 1998. Approximate nearest neighbor: Towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of Computing. Google ScholarDigital Library
Karp, R. M., Waarts, O., and Zweig, G. 1995. The bit vector intersection problem. In Proceedings of the Symposium on Foundations of Computer Science. pages 621--630. Google ScholarDigital Library
Kleinberg, J. 1997. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the Symposium on Theory of Computing. Google ScholarDigital Library
Krauthgamer, R. and Lee, J. R. 2004. Navigating nets: Simple algorithms for proximity search. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. Google ScholarDigital Library
Kushilevitz, E., Ostrovsky, R., and Rabani, Y. 1998. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the Symposium on Theory of Computing. 614--623. Google ScholarDigital Library
Linial, N., London, E., and Rabinovich, Y. 1994. The geometry of graphs and some of its algorithmic applications. In Proceedings of the Symposium on Foundations of Computer Science. 577--591.Google Scholar
Motwani, R., Naor, A., and Panigrahy, R. 2006. Lower bounds on locality sensitive hashing. In Proceedings of the ACM Symposium on Computational Geometry. Google ScholarDigital Library
Panigrahy, R. 2006. Entropy-based nearest neighbor algorithm in high dimensions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. Google ScholarDigital Library
Paturi, R., Rajasekaran, S., and Reif, J.The light bulb problem. Inform. Comput. 117, 2, 187--192. Google ScholarDigital Library
Ravichandran, D., Pantel, P., and Hovy, E. 2005. Randomized algorithms and nlp: Using locality sensitive hash functions for high speed noun clustering. In Proceedings of the Annual Meeting of the Association of Computational Linguistics. Google ScholarDigital Library
Samet, H. 2006. Foundations of Multidimensional and Metric Data Structures. Elsevier, 2006. Google ScholarDigital Library
Shakhnarovich, G., Darrell, T., and Indyk, P. Eds. Nearest Neighbor Methods in Learning and Vision. Neural Processing Information Series, MIT Press. Google ScholarDigital Library
Terasawa, T. and Tanaka, Y. 2007. Spherical lsh for approximate nearest neighbor search on unit hypersphere. In Proceedings of the Workshop on Algorithms and Data Structures. Google ScholarDigital Library

Index Terms

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Recommendations

Complementary hashing for approximate nearest neighbor search
ICCV '11: Proceedings of the 2011 International Conference on Computer Vision

Recently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g.,...
Read More
Entropy based nearest neighbor search in high dimensions
SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points ...
Read More
Order preserving hashing for approximate nearest neighbor search
MM '13: Proceedings of the 21st ACM international conference on Multimedia

In this paper, we propose a novel method to learn similarity-preserving hash functions for approximate nearest neighbor (NN) search. The key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the ...
Read More

Reviews

Reviewer: Amos O Olagunju

Computerized document, image, and video retrieval systems require efficient storage and algorithms for fast access to objects. Designing efficient algorithms for constructing data structures that support fast retrieval of objects in a high-dimensional Euclidean space raises interesting questions. How should an effective data structure be constructed for a set of object points to support the retrieval of the object point closest to a query__?__ How should features of objects be represented as points in high-dimensional spaces to generate distance metrics that measure the similarities of objects__?__ How should cataloging or similarity probing for query objects be implemented__?__ Diverse applications of assorted data prearranged as shades of points in high dimensions are available in the literature [1]. Andoni and Indyk present near-optimal locality-sensitive hashing (LSH) algorithms for solving approximate and exact nearest neighbor problems in high dimensions. The LSH scheme hashes object points using several hash functions to minimize the probabilities of collisions for objects far apart. The object query points are hashed to locate near neighbors, and the corresponding elements in stored buckets are retrieved. A data structure is created for a set of points in a high-dimensional space, in such a way that the algorithm reports?with a certainty probability?either some or all near neighbors of a queried point. The authors ingeniously design a family of LSH functions that produce a remarkable exponential query runtime. They recognize that it is clearly impossible to provide an assumed query runtime bound for a data structure capable of reporting all or many data points in close proximity of a query point. They insightfully critique the most viable data structures and LSH algorithms for solving the well-known approximate and exact nearest neighbor problems. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Communications of the ACM Volume 51, Issue 1
50th anniversary issue: 1958 - 2008
January 2008
106 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/1327452
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 January 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 934
  Total Citations
  View Citations
- 4,975
  Total Downloads
- Downloads (Last 12 months)377
- Downloads (Last 6 weeks)84
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Complementary hashing for approximate nearest neighbor search

Entropy based nearest neighbor search in high dimensions

Order preserving hashing for approximate nearest neighbor search

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Complementary hashing for approximate nearest neighbor search

Entropy based nearest neighbor search in high dimensions

Order preserving hashing for approximate nearest neighbor search

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media