ABSTRACT
This paper presents the NV-Tree (Nearest Vector Tree). It addresses the specific, yet important, problem of efficiently and effectively finding the approximate k-nearest neighbors within a collection of a few billion high-dimensional data points. The NV-Tree is a very compact index, as only six bytes are kept in the index for each high-dimensional descriptor. It thus scales extremely well when indexing large collections of high-dimensional descriptors. The NV-Tree efficiently produces results of good quality, even at such a large scale that the indices cannot be kept entirely in main memory any more. We demonstrate this with extensive experiments using a collection of 2.5 billion SIFT (Scale Invariant Feature Transform) descriptors.
- M. Batko, F. Falchi, C. Lucchese, D. Novak, R. Perego, F. Rabitti, J. Sedmidubsky, and P. Zezula. Building a web-scale image similarity search system. Multimedia Tools and Applications, 47, 2010. Google ScholarDigital Library
- M. Datar, P. Indyk, N. Immorlica, and V. Mirrokni. Locality-sensitive hashing using stable distributions. MIT Press, 2006.Google Scholar
- M. Douze, H. Jégou, H. Sandhawalia, L. Amsaleg, and C. Schmid. Evaluation of gist descriptors for web-scale image search. In Proc. ACM CIVR, Santorini, Greece, 2009. Google ScholarDigital Library
- R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proc. ACM SIGMOD, San Diego, CA, USA, 2003. Google ScholarDigital Library
- H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In Proc. IEEE CVPR, San Francisco, CA, USA, 2010.Google ScholarCross Ref
- A. Joly and O. Buisson. A posteriori multi-probe locality sensitive hashing. In Proc ACM Multimedia, Vancouver, BC, Canada, 2008. Google ScholarDigital Library
- H. Lejsek, F. H. Ásmundsson, B. T. Jónsson, and L. Amsaleg. NV-tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE TPAMI, 31(5), 2009. Google ScholarDigital Library
- H. Lejsek, H. Thormódsdóttir, F. H. Ásmundsson, K. Dadason, Á. T. Jóhannsson, B. T. Jónsson, and L. Amsaleg. Videntifier#8482; Forensic: Large-scale video identification in practise. In Proc. MiFor, Florence, Italy, 2010. Google ScholarDigital Library
- T. Liu, A. Moore, A. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In Proc. NIPS, Vancouver, BC, Canada, 2004.Google Scholar
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 2004. Google ScholarDigital Library
- A. Ólafsson, B. T. Jónsson, L. Amsaleg, and H. Lejsek. Dynamic behavior of balanced NV-trees. Multimedia Systems, 17(2), 2011.Google Scholar
- F. A. P. Petitcolas et al. A public automated web-based evaluation service for watermarking schemes: StirMark benchmark. In Proc. of Electronic Imaging, Security and Watermarking of Multimedia Contents III, San Jose, CA, USA, 2001.Google Scholar
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, Minneapolis, MN, USA, 2007.Google ScholarCross Ref
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, Anchorage, AK, USA, 2008.Google ScholarCross Ref
- U. Shaft and R. Ramakrishnan. Theory of nearest neighbors indexability. ACM Transactions on Database Systems, 31(3):814--838, 2006. Google ScholarDigital Library
- J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proc. ICCV, Nice, France, 2003. Google ScholarDigital Library
- Z. Wu, Q. Ke, M. Isard, and J. Sun. Bundling features for large scale partial-duplicate web image search. In Proc. CVPR, Miami, FL, USA, 2009.Google Scholar
Index Terms
- NV-Tree: nearest neighbors at the billion scale
Recommendations
NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections
Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets. Most of the approaches published thus far have, however, only been tested on rather small collections. When large collections have ...
NV-Tree: reducing consistency cost for NVM-based single level systems
FAST'15: Proceedings of the 13th USENIX Conference on File and Storage TechnologiesThe non-volatile memory (NVM) has DRAM-like performance and disk-like persistency which make it possible to replace both disk and DRAM to build single level systems. To keep data consistency in such systems is non-trivial because memory writes may be ...
Scalability of the NV-tree: Three Experiments
Similarity Search and ApplicationsAbstractThe NV-tree is a scalable approximate high-dimensional indexing method specifically designed for large-scale visual instance search. In this paper, we report on three experiments designed to evaluate the performance of the NV-tree. Two of these ...
Comments