Abstract
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Supplemental Material
Available for Download
Requires Asian Language Support in Adobe Reader and Japanese Language Support in Your Browser.
- 1. Ailon, N. and Chazelle, B. 2006. Approximate nearest neighbors and the Fast Johnson-Lindenstrauss Transform. In Proceedings of the Symposium on Theory of Computing. Google ScholarDigital Library
- Andoni, A. and Indyk, P. 2004. E2lsh: Exact Euclidean localitysensitive hashing. http://web.mit.edu/andoni/www/LSH/.Google Scholar
- Andoni, A. and Indyk, P. 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarDigital Library
- Andoni, A. and Indyk, P. 2006. Efficient algorithms for substring near neighbor problem. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 1203--1212. Google ScholarDigital Library
- Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. 1994. An optimal algorithm for approximate nearest neighbor searching. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. 573--582. Google ScholarDigital Library
- Bentley, J. L. 1975. Multidimensional binary search trees used for associative searching. Comm. ACM 18, 509--517. Google ScholarDigital Library
- Broder, A., Charikar, M., Frieze, A., and Mitzenmacher, M. 1998. Min-wise independent permutations. J. Comput. Sys. Sci. Google ScholarDigital Library
- Broder, A., Glassman, S., Manasse, M., and Zweig, G. 1997. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference. 391--404. Google ScholarDigital Library
- Broder, A. 1997. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences. 21--29. Google ScholarDigital Library
- Buhler, J. 2001. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinform. 17, 419--428.Google ScholarCross Ref
- Buhler, J. and Tompa, M. 2001. Finding motifs using random projections. In Proceedings of the Annual International Conference on Computational Molecular Biology (RECOMB1). Google ScholarDigital Library
- Califano, A. and Rigoutsos, I. 1993. Flash: A fast look-up algorithm for string homology. In Proceedings of the IEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Chakrabarti, A. and Regev, O. 2004. An optimal randomised cell probe lower bounds for approximate nearest neighbor searching. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarDigital Library
- Charikar, M. 2002. Similarity estimation techniques from rounding. In Proceedings of the Symposium on Theory of Computing. Google ScholarDigital Library
- Charikar, M., Chekuri, C., Goel, A., Guha, S., and Plotkin, S. 1998. Approximating a finite metric by a small number of tree metrics. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarDigital Library
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduct. Algorithms. 2nd Ed. MIT Press.Google Scholar
- Datar, M., Immorlica, N., Indyk, P., and Mirrokni, V. 2004. Localitysensitive hashing scheme based on p-stable distributions. In Proceedings of the ACM Symposium on Computational Geometry. Google ScholarDigital Library
- Dutta, D., Guha, R., Jurs, C., and Chen, T. 2006. Scalable partitioning and exploration of chemical spaces using geometric hashing. J. Chem. Inf. Model. 46.Google ScholarCross Ref
- Gionis, A., Indyk, P., and Motwani, R. 1999. Similarity search in high dimensions via hashing. In Proceedings of the International Conference on Very Large Databases. Google ScholarDigital Library
- Goemans, M. and Williamson, D. 1995. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42. 1115--1145. Google ScholarDigital Library
- Greene, D., Parnas, M., and Yao, F. 1994. Multi-index hashing for information retrieval. In Proceedings of the Symposium on Foundations of Computer Science. 722--731.Google Scholar
- Har-Peled, S. 2001. A replacement for voronoi diagrams of near linear size. In Proceedings of the Symposium on Foundations of Computer Science. Google ScholarDigital Library
- Haveliwala, T., Gionis, A., and Indyk, P. 2000. Scalable techniques for clustering the web. WebDB Workshop.Google Scholar
- Indyk, P. 2003. Nearest neighbors in high-dimensional spaces. In Handbook of Discrete and Computational Geometry. CRC Press.Google Scholar
- Indyk, P. and Motwani, R. 1998. Approximate nearest neighbor: Towards removing the curse of dimensionality. In Proceedings of the Symposium on Theory of Computing. Google ScholarDigital Library
- Karp, R. M., Waarts, O., and Zweig, G. 1995. The bit vector intersection problem. In Proceedings of the Symposium on Foundations of Computer Science. pages 621--630. Google ScholarDigital Library
- Kleinberg, J. 1997. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the Symposium on Theory of Computing. Google ScholarDigital Library
- Krauthgamer, R. and Lee, J. R. 2004. Navigating nets: Simple algorithms for proximity search. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. Google ScholarDigital Library
- Kushilevitz, E., Ostrovsky, R., and Rabani, Y. 1998. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the Symposium on Theory of Computing. 614--623. Google ScholarDigital Library
- Linial, N., London, E., and Rabinovich, Y. 1994. The geometry of graphs and some of its algorithmic applications. In Proceedings of the Symposium on Foundations of Computer Science. 577--591.Google Scholar
- Motwani, R., Naor, A., and Panigrahy, R. 2006. Lower bounds on locality sensitive hashing. In Proceedings of the ACM Symposium on Computational Geometry. Google ScholarDigital Library
- Panigrahy, R. 2006. Entropy-based nearest neighbor algorithm in high dimensions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms. Google ScholarDigital Library
- Paturi, R., Rajasekaran, S., and Reif, J.The light bulb problem. Inform. Comput. 117, 2, 187--192. Google ScholarDigital Library
- Ravichandran, D., Pantel, P., and Hovy, E. 2005. Randomized algorithms and nlp: Using locality sensitive hash functions for high speed noun clustering. In Proceedings of the Annual Meeting of the Association of Computational Linguistics. Google ScholarDigital Library
- Samet, H. 2006. Foundations of Multidimensional and Metric Data Structures. Elsevier, 2006. Google ScholarDigital Library
- Shakhnarovich, G., Darrell, T., and Indyk, P. Eds. Nearest Neighbor Methods in Learning and Vision. Neural Processing Information Series, MIT Press. Google ScholarDigital Library
- Terasawa, T. and Tanaka, Y. 2007. Spherical lsh for approximate nearest neighbor search on unit hypersphere. In Proceedings of the Workshop on Algorithms and Data Structures. Google ScholarDigital Library
Index Terms
- Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Recommendations
Complementary hashing for approximate nearest neighbor search
ICCV '11: Proceedings of the 2011 International Conference on Computer VisionRecently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g.,...
Entropy based nearest neighbor search in high dimensions
SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithmIn this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use locality-preserving hash functions (that tend to map nearby points ...
Order preserving hashing for approximate nearest neighbor search
MM '13: Proceedings of the 21st ACM international conference on MultimediaIn this paper, we propose a novel method to learn similarity-preserving hash functions for approximate nearest neighbor (NN) search. The key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the ...
Comments