skip to main content
research-article

Fast approximate nearest neighbor search with the navigating spreading-out graph

Authors Info & Claims
Published:01 January 2019Publication History
Skip Abstract Section

Abstract

Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial scenario of Taobao (Alibaba Group) and has been integrated into their billion-scale search engine.

References

  1. F. André, A.-M. Kermarrec, and N. Le Scouarnec. Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan. PVLDB, 9(4):288--299, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arora, S. Sinha, P. Kumar, and A. Bhattacharya. Hd-index: Pushing the scalability-accuracy boundary for approximate knn search in high-dimensional spaces. PVLDB, 11(8):906--919, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Arya and D. M. Mount. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, pages 271--280, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Aurenhammer. Voronoi diagramsa survey of a fundamental geometric data structure. ACM Computing Surveys (CSUR), 23(3):345--405, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Babenko and V. Lempitsky. Efficient indexing of billion-scale datasets of deep descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2055--2063, 2016.Google ScholarGoogle Scholar
  6. N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: an efficient and robust access method for points and rectangles. Acm, 19(2):322--331, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Ben and D. Tom. FANNG: Fast approximate nearest neighbour graphs. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pages 5713--5722, 2016.Google ScholarGoogle Scholar
  8. J. L. Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509--517, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Boguna, D. Krioukov, and K. C. Claffy. Navigability of complex networks. Nature Physics, 5(1):74--80, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 491--502. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. A. Costa, A. Girotra, and A. Hero. Estimating local intrinsic dimension with k-nearest neighbor graphs. Statistical Signal Processing, 2005 IEEE/SP 13th Workshop on, pages 417--422, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. P. de Vries, N. Mamoulis, N. Nes, and M. Kersten. Efficient k-nn search on vertically decomposed data. Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 322--333, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Dearholt, N. Gonzales, and G. Kurup. Monotonic search networks for computer vision databases. Signals, Systems and Computers, 1988. Twenty-Second Asilomar Conference on, 2:548--553, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  14. W. Dong, C. Moses, and K. Li. Efficient k-nearest neighbor graph construction for generic similarity measures. Proceedings of the 20th international Conference on World Wide Web, pages 577--586, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. Fu and D. Cai. Efanna : An extremely fast approximate nearest neighbor search algorithm based on knn graph. arXiv:1609.07228, 2016.Google ScholarGoogle Scholar
  16. C. Fu, C. Xiang, C. Wang, and D. Cai. Fast approximate nearest neighbor search with the navigating spreading-out graph. arXiv preprint arXiv: 1707.00143, 2018.Google ScholarGoogle Scholar
  17. K. Fukunaga and P. M. Narendra. A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers, 100(7):750--753, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Gao, H. V. Jagadish, W. Lu, and B. C. Ooi. Dsh: data sensitive hashing for high-dimensional k-nnsearch. Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1127--1138, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Ge, K. He, Q. Ke, and J. Sun. Optimized product quantization for approximate nearest neighbor search. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2946--2953, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. PVLDB, pages 518--529, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang. Fast approximate nearest-neighbor search with k-nearest neighbor graph. Proceedings of the International Joint Conference on Artificial Intelligence, 22:1312--1317, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Har-Peled, P. Indyk, and R. Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of computing, 8(1):321--350, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  23. Q. Huang, J. Feng, Y. Zhang, Q. Fang, and W. Ng. Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB, 9(1):1--12, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang. iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS), 30(2):364--397, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. W. Jaromczyk and G. T. Toussaint. Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9):1502--1517, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  26. H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117--128, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Jin, D. Zhang, Y. Hu, S. Lin, D. Cai, and X. He. Fast and accurate hashing via iterative nearest neighbors expansion. IEEE transactions on cybernetics, 44(11):2167--2177, 2014.Google ScholarGoogle Scholar
  28. J. Johnson, M. Douze, and H. Jégou. Billion-scale similarity search with gpus. arXiv:1702.08734, 2017.Google ScholarGoogle Scholar
  29. J. M. Kleinberg. Navigation in a small world. Nature, 406(6798):845--845, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  30. G. D. Kurup. Database organized on the basis of similarities with applications in computer vision. Ph.D. thesis, New Mexico State University, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Li, Y. Zhang, Y. Sun, W. Wang, W. Zhang, and X. Lin. Approximate nearest neighbor search on high dimensional data---experiments, analyses, and improvement (v1. 0). arXiv:1610.02455, 2016.Google ScholarGoogle Scholar
  32. Y. Liu, J. Cui, Z. Huang, H. Li, and H. T. Shen. Sk-lsh: an efficient index structure for approximate nearest neighbor search. PVLDB, 7(9):745--756, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Malkov, A. Ponomarenko, A. Logvinov, and V. Krylov. Approximate nearest neighbor algorithm based on navigable small world graphs. Information Systems, 45:61--68, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. Y. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv:1603.09320, 2016.Google ScholarGoogle Scholar
  35. C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval. Cambridge university press Cambridge 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Silpa-Anan and R. Hartley. Optimised kd-trees for fast image descriptor matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1--8, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  37. G. Teodoro, E. Valle, N. Mariano, R. Torres, W. Meira, and J. H. Saltz. Approximate similarity search for online multimedia services on distributed cpu-gpu platforms. The VLDB Journal, 23(3):427--448, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. T. Toussaint. The relative neighbourhood graph of a finite planar set. Pattern recognition, 12(4):261--268, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  39. R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. PVLDB, 98:194--205, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. Advances in neural information processing systems, pages 1753--1760, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Wu, R. Jin, and X. Zhang. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1139--1150, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Zheng, Q. Guo, A. K. Tung, and S. Wu. Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index. Proceedings of the 2016 International Conference on Management of Data, pages 2023--2037, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 5
    January 2019
    163 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 January 2019
    Published in pvldb Volume 12, Issue 5

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader