Abstract
Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial scenario of Taobao (Alibaba Group) and has been integrated into their billion-scale search engine.
- F. André, A.-M. Kermarrec, and N. Le Scouarnec. Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan. PVLDB, 9(4):288--299, 2015. Google ScholarDigital Library
- A. Arora, S. Sinha, P. Kumar, and A. Bhattacharya. Hd-index: Pushing the scalability-accuracy boundary for approximate knn search in high-dimensional spaces. PVLDB, 11(8):906--919, 2018. Google ScholarDigital Library
- S. Arya and D. M. Mount. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, pages 271--280, 1993. Google ScholarDigital Library
- F. Aurenhammer. Voronoi diagramsa survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR), 23(3):345--405, 1991. Google ScholarDigital Library
- A. Babenko and V. Lempitsky. Efficient indexing of billion-scale datasets of deep descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2055--2063, 2016.Google Scholar
- N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: an efficient and robust access method for points and rectangles. Acm, 19(2):322--331, 1990. Google ScholarDigital Library
- H. Ben and D. Tom. FANNG: Fast approximate nearest neighbour graphs. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pages 5713--5722, 2016.Google Scholar
- J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarDigital Library
- M. Boguna, D. Krioukov, and K. C. Claffy. Navigability of complex networks. Nature Physics, 5(1):74--80, 2009.Google ScholarCross Ref
- L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 491--502. ACM, 2005. Google ScholarDigital Library
- J. A. Costa, A. Girotra, and A. Hero. Estimating local intrinsic dimension with k-nearest neighbor graphs. Statistical Signal Processing, 2005 IEEE/SP 13th Workshop on, pages 417--422, 2005.Google ScholarCross Ref
- A. P. de Vries, N. Mamoulis, N. Nes, and M. Kersten. Efficient k-nn search on vertically decomposed data. Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 322--333, 2002. Google ScholarDigital Library
- D. Dearholt, N. Gonzales, and G. Kurup. Monotonic search networks for computer vision databases. Signals, Systems and Computers, 1988. Twenty-Second Asilomar Conference on, 2:548--553, 1988.Google ScholarCross Ref
- W. Dong, C. Moses, and K. Li. Efficient k-nearest neighbor graph construction for generic similarity measures. Proceedings of the 20th international Conference on World Wide Web, pages 577--586, 2011. Google ScholarDigital Library
- C. Fu and D. Cai. Efanna : An extremely fast approximate nearest neighbor search algorithm based on knn graph. arXiv:1609.07228, 2016.Google Scholar
- C. Fu, C. Xiang, C. Wang, and D. Cai. Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv: 1707.00143, 2018.Google Scholar
- K. Fukunaga and P. M. Narendra. A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers, 100(7):750--753, 1975. Google ScholarDigital Library
- J. Gao, H. V. Jagadish, W. Lu, and B. C. Ooi. Dsh: data sensitive hashing for high-dimensional k-nnsearch. Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1127--1138, 2014. Google ScholarDigital Library
- T. Ge, K. He, Q. Ke, and J. Sun. Optimized product quantization for approximate nearest neighbor search. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2946--2953, 2013. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. PVLDB, pages 518--529, 1999. Google ScholarDigital Library
- K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang. Fast approximate nearest-neighbor search with k-nearest neighbor graph. Proceedings of the International Joint Conference on Artificial Intelligence, 22:1312--1317, 2011. Google ScholarDigital Library
- S. Har-Peled, P. Indyk, and R. Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing, 8(1):321--350, 2012.Google ScholarCross Ref
- Q. Huang, J. Feng, Y. Zhang, Q. Fang, and W. Ng. Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB, 9(1):1--12, 2015. Google ScholarDigital Library
- H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), 30(2):364--397, 2005. Google ScholarDigital Library
- J. W. Jaromczyk and G. T. Toussaint. Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9):1502--1517, 1992.Google ScholarCross Ref
- H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117--128, 2011. Google ScholarDigital Library
- Z. Jin, D. Zhang, Y. Hu, S. Lin, D. Cai, and X. He. Fast and accurate hashing via iterative nearest neighbors expansion. IEEE transactions on cybernetics, 44(11):2167--2177, 2014.Google Scholar
- J. Johnson, M. Douze, and H. Jégou. Billion-scale similarity search with gpus. arXiv:1702.08734, 2017.Google Scholar
- J. M. Kleinberg. Navigation in a small world. Nature, 406(6798):845--845, 2000.Google ScholarCross Ref
- G. D. Kurup. Database organized on the basis of similarities with applications in computer vision. Ph.D. thesis, New Mexico State University, 1992. Google ScholarDigital Library
- W. Li, Y. Zhang, Y. Sun, W. Wang, W. Zhang, and X. Lin. Approximate nearest neighbor search on high dimensional data---experiments, analyses, and improvement (v1. 0). arXiv:1610.02455, 2016.Google Scholar
- Y. Liu, J. Cui, Z. Huang, H. Li, and H. T. Shen. Sk-lsh: an efficient index structure for approximate nearest neighbor search. PVLDB, 7(9):745--756, 2014. Google ScholarDigital Library
- Y. Malkov, A. Ponomarenko, A. Logvinov, and V. Krylov. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, 45:61--68, 2014.Google ScholarCross Ref
- Y. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv:1603.09320, 2016.Google Scholar
- C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval. Cambridge university press Cambridge 1(1), 2008. Google ScholarDigital Library
- C. Silpa-Anan and R. Hartley. Optimised kd-trees for fast image descriptor matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8, 2008.Google ScholarCross Ref
- G. Teodoro, E. Valle, N. Mariano, R. Torres, W. Meira, and J. H. Saltz. Approximate similarity search for online multimedia services on distributed cpu-gpu platforms. The VLDB Journal, 23(3):427--448, 2014. Google ScholarDigital Library
- G. T. Toussaint. The relative neighbourhood graph of a finite planar set. Pattern recognition, 12(4):261--268, 1980.Google ScholarCross Ref
- R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. PVLDB, 98:194--205, 1998. Google ScholarDigital Library
- Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. Advances in neural information processing systems, pages 1753--1760, 2009. Google ScholarDigital Library
- Y. Wu, R. Jin, and X. Zhang. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1139--1150, 2014. Google ScholarDigital Library
- Y. Zheng, Q. Guo, A. K. Tung, and S. Wu. Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index. Proceedings of the 2016 International Conference on Management of Data, pages 2023--2037, 2016. Google ScholarDigital Library
Recommendations
Fast approximate nearest neighbor search via k-diverse nearest neighbor graph
AAAI'18/IAAI'18/EAAI'18: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial IntelligenceApproximate nearest neighbor search is a fundamental problem and has been studied for a few decades. Recently graph-based indexing methods have demonstrated their great efficiency, whose main idea is to construct neighborhood graph offline and perform a ...
Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search
MM '23: Proceedings of the 31st ACM International Conference on MultimediaApproximate Nearest Neighbor Search (ANNS) is the task of finding the database vector that is closest to a given query vector. Graph-based ANNS is the family of methods with the best balance of accuracy and speed for million-scale datasets. However, ...
FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search
WWW '23: Proceedings of the ACM Web Conference 2023Approximate K-Nearest Neighbor Search (AKNNS) has now become ubiquitous in modern applications, such as a fast search procedure with two-tower deep learning models. Graph-based methods for AKNNS in particular have received great attention due to their ...
Comments