research-article

NV-Tree: nearest neighbors at the billion scale

Authors:
Herwig Lejsek

Videntifier Technologies Reykjavík, Iceland

Videntifier Technologies Reykjavík, Iceland
View Profile

,
Björn Þór Jónsson

Reykjavik University, Iceland

Reykjavik University, Iceland
View Profile

,
Laurent Amsaleg

IRISA--CNRS Rennes, France

IRISA--CNRS Rennes, France
View Profile

ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia RetrievalApril 2011Article No.: 54Pages 1–8https://doi.org/10.1145/1991996.1992050

Published:18 April 2011Publication History

ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval

Pages 1–8

ABSTRACT

This paper presents the NV-Tree (Nearest Vector Tree). It addresses the specific, yet important, problem of efficiently and effectively finding the approximate k-nearest neighbors within a collection of a few billion high-dimensional data points. The NV-Tree is a very compact index, as only six bytes are kept in the index for each high-dimensional descriptor. It thus scales extremely well when indexing large collections of high-dimensional descriptors. The NV-Tree efficiently produces results of good quality, even at such a large scale that the indices cannot be kept entirely in main memory any more. We demonstrate this with extensive experiments using a collection of 2.5 billion SIFT (Scale Invariant Feature Transform) descriptors.

References

M. Batko, F. Falchi, C. Lucchese, D. Novak, R. Perego, F. Rabitti, J. Sedmidubsky, and P. Zezula. Building a web-scale image similarity search system. Multimedia Tools and Applications, 47, 2010. Google ScholarDigital Library
M. Datar, P. Indyk, N. Immorlica, and V. Mirrokni. Locality-sensitive hashing using stable distributions. MIT Press, 2006.Google Scholar
M. Douze, H. Jégou, H. Sandhawalia, L. Amsaleg, and C. Schmid. Evaluation of gist descriptors for web-scale image search. In Proc. ACM CIVR, Santorini, Greece, 2009. Google ScholarDigital Library
R. Fagin, R. Kumar, and D. Sivakumar. Efficient similarity search and classification via rank aggregation. In Proc. ACM SIGMOD, San Diego, CA, USA, 2003. Google ScholarDigital Library
H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In Proc. IEEE CVPR, San Francisco, CA, USA, 2010.Google ScholarCross Ref
A. Joly and O. Buisson. A posteriori multi-probe locality sensitive hashing. In Proc ACM Multimedia, Vancouver, BC, Canada, 2008. Google ScholarDigital Library
H. Lejsek, F. H. Ásmundsson, B. T. Jónsson, and L. Amsaleg. NV-tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE TPAMI, 31(5), 2009. Google ScholarDigital Library
H. Lejsek, H. Thormódsdóttir, F. H. Ásmundsson, K. Dadason, Á. T. Jóhannsson, B. T. Jónsson, and L. Amsaleg. Videntifier#8482; Forensic: Large-scale video identification in practise. In Proc. MiFor, Florence, Italy, 2010. Google ScholarDigital Library
T. Liu, A. Moore, A. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In Proc. NIPS, Vancouver, BC, Canada, 2004.Google Scholar
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 2004. Google ScholarDigital Library
A. Ólafsson, B. T. Jónsson, L. Amsaleg, and H. Lejsek. Dynamic behavior of balanced NV-trees. Multimedia Systems, 17(2), 2011.Google Scholar
F. A. P. Petitcolas et al. A public automated web-based evaluation service for watermarking schemes: StirMark benchmark. In Proc. of Electronic Imaging, Security and Watermarking of Multimedia Contents III, San Jose, CA, USA, 2001.Google Scholar
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, Minneapolis, MN, USA, 2007.Google ScholarCross Ref
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, Anchorage, AK, USA, 2008.Google ScholarCross Ref
U. Shaft and R. Ramakrishnan. Theory of nearest neighbors indexability. ACM Transactions on Database Systems, 31(3):814--838, 2006. Google ScholarDigital Library
J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proc. ICCV, Nice, France, 2003. Google ScholarDigital Library
Z. Wu, Q. Ke, M. Isard, and J. Sun. Bundling features for large scale partial-duplicate web image search. In Proc. CVPR, Miami, FL, USA, 2009.Google Scholar

Index Terms

NV-Tree: nearest neighbors at the billion scale
1. Information systems
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Trees

Recommendations

NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections

Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets. Most of the approaches published thus far have, however, only been tested on rather small collections. When large collections have ...
Read More
NV-Tree: reducing consistency cost for NVM-based single level systems
FAST'15: Proceedings of the 13th USENIX Conference on File and Storage Technologies

The non-volatile memory (NVM) has DRAM-like performance and disk-like persistency which make it possible to replace both disk and DRAM to build single level systems. To keep data consistency in such systems is non-trivial because memory writes may be ...
Read More
Scalability of the NV-tree: Three Experiments
Similarity Search and Applications
Abstract
The NV-tree is a scalable approximate high-dimensional indexing method specifically designed for large-scale visual instance search. In this paper, we report on three experiments designed to evaluate the performance of the NV-tree. Two of these ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval
April 2011
512 pages
ISBN:9781450303361
DOI:10.1145/1991996
General Chairs:
Francesco G. B. De Natale
University of Trento, Italy
,
Alberto Del Bimbo
University of Florence, Italy
,
Program Chairs:
Alan Hanjalic
University of Amsterdam, Netherlands
,
B. S. Manjunath
University of California, Santa Barbara
,
Shin'ichi Satoh
NII, Japan
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 April 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 222
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NV-Tree: nearest neighbors at the billion scale

ICMR '11: Proceedings of the 1st ACM International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections

NV-Tree: reducing consistency cost for NVM-based single level systems

Scalability of the NV-tree: Three Experiments