ABSTRACT
The problem of Approximate Maximum Inner Product (AMIP) search has received increasing attention due to its wide applications. Interestingly, based on asymmetric transformation, the problem can be reduced to the Approximate Nearest Neighbor (ANN) search, and hence leverage Locality-Sensitive Hashing (LSH) to find solution. However, existing asymmetric transformations such as L2-ALSH and XBOX, suffer from large distortion error in reducing AMIP search to ANN search, such that the results of AMIP search can be arbitrarily bad. In this paper, we propose a novel Asymmetric LSH scheme based on Homocentric Hypersphere partition (H2-ALSH) for high-dimensional AMIP search. On the one hand, we propose a novel Query Normalized First (QNF) transformation to significantly reduce the distortion error. On the other hand, by adopting the homocentric hypersphere partition strategy, we can not only improve the search efficiency with early stop pruning, but also get higher search accuracy by further reducing the distortion error with limited data range. Our theoretical studies show that H2-ALSH enjoys a guarantee on search accuracy. Experimental results over four real datasets demonstrate that H2-ALSH significantly outperforms the state-of-the-art schemes.
- Alexandr Andoni and Piotr Indyk . 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions FOCS. 459--468. Google ScholarDigital Library
- Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, and Yoshua Bengio . 2015. Clustering is efficient for approximate maximum inner product search. arXiv preprint arXiv:1507.05910 (2015).Google Scholar
- Yoram Bachrach, Yehuda Finkelstein, Ran Gilad-Bachrach, Liran Katzir, Noam Koenigstein, Nir Nice, and Ulrich Paquet . 2014. Speeding up the xbox recommender system using a euclidean transformation for inner-product spaces. In RecSys. 257--264. Google ScholarDigital Library
- James Bennett, Stan Lanning, et almbox. . 2007. The netflix prize. In Proceedings of KDD cup and workshop. 35.Google Scholar
- Moses S Charikar . 2002. Similarity estimation techniques from rounding algorithms STOC. 380--388. Google ScholarDigital Library
- Paolo Cremonesi, Yehuda Koren, and Roberto Turrin . 2010. Performance of recommender algorithms on top-n recommendation tasks RecSys. 39--46. Google ScholarDigital Library
- Ryan R Curtin and Parikshit Ram . 2014. Dual-tree fast exact max-kernel search. Statistical Analysis and Data Mining Vol. 7, 4 (2014), 229--253. Google ScholarDigital Library
- Ryan R Curtin, Parikshit Ram, and Alexander G Gray . 2013. Fast exact max-kernel search. In ICDM. 1--9.Google Scholar
- Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni . 2004. Locality-sensitive hashing scheme based on p-stable distributions SoCG. 253--262. Google ScholarDigital Library
- Thomas Dean, Mark A Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, and Jay Yagnik . 2013. Fast, accurate detection of 100,000 object classes on a single machine CVPR. 1814--1821. Google ScholarDigital Library
- Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer . 2012. The Yahoo! Music Dataset and KDD-Cup'11.. In KDD Cup. 8--18. Google ScholarDigital Library
- Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng . 2012. Locality-sensitive hashing scheme based on dynamic collision counting SIGMOD. 541--552. Google ScholarDigital Library
- Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha . 2016. Quantization based fast inner product search. In Artificial Intelligence and Statistics. 482--490.Google Scholar
- Qiang Huang, Jianlin Feng, Qiong Fang, et almbox. . 2017. Query-aware locality-sensitive hashing scheme for $l_p$ norm. The VLDB Journal Vol. 26, 5 (2017), 683--708. Google ScholarDigital Library
- Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng . 2015. Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proceedings of the VLDB Endowment Vol. 9, 1 (2015), 1--12. Google ScholarDigital Library
- Prateek Jain and Ashish Kapoor . 2009. Active learning for large multi-class problems. In CVPR. 762--769.Google Scholar
- Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu . 2009. Cutting-plane training of structural SVMs. Machine Learning Vol. 77, 1 (2009), 27--59. Google ScholarDigital Library
- Noam Koenigstein, Parikshit Ram, and Yuval Shavitt . 2012. Efficient retrieval of recommendations in a matrix factorization framework CIKM. 535--544. Google ScholarDigital Library
- Yehuda Koren, Robert Bell, and Chris Volinsky . 2009. Matrix factorization techniques for recommender systems. Computer Vol. 42, 8 (2009). Google ScholarDigital Library
- Hui Li, Tsz Nam Chan, Man Lung Yiu, and Nikos Mamoulis . 2017. FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems SIGMOD. 835--850. Google ScholarDigital Library
- Behnam Neyshabur and Nathan Srebro . 2015. On Symmetric and Asymmetric LSHs for Inner Product Search ICML. 1926--1934. Google ScholarDigital Library
- Parikshit Ram and Alexander G Gray . 2012. Maximum inner-product search using cone trees. In SIGKDD. 931--939. Google ScholarDigital Library
- Anshumali Shrivastava and Ping Li . 2014. Asymmetric LSH (ALSH) for sublinear time Maximum Inner Product Search (MIPS) NIPS. 2321--2329. Google ScholarDigital Library
- Anshumali Shrivastava and Ping Li . 2015. Improved asymmetric locality sensitive hashing (ALSH) for Maximum Inner Product Search (MIPS). In UAI. 812--821. Google ScholarDigital Library
- Ryan Spring and Anshumali Shrivastava . 2017. Scalable and sustainable deep learning via randomized hashing SIGKDD. 445--454. Google ScholarDigital Library
- Nathan Srebro, Jason Rennie, and Tommi S Jaakkola . 2005. Maximum-margin matrix factorization. In NIPS. 1329--1336. Google ScholarDigital Library
- Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, and Xuemin Lin . 2014. SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. Proceedings of the VLDB Endowment Vol. 8, 1 (2014), 1--12. Google ScholarDigital Library
- Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis . 2009. Quality and efficiency in high dimensional nearest neighbor search SIGMOD. 563--576. Google ScholarDigital Library
- Christina Teflioudi and Rainer Gemulla . 2016. Exact and approximate maximum inner product search with lemp. ACM TODS Vol. 42, 1 (2016), 5. Google ScholarDigital Library
- Christina Teflioudi, Rainer Gemulla, and Olga Mykytiuk . 2015. LEMP: Fast retrieval of large entries in a matrix product SIGMOD. 107--122. Google ScholarDigital Library
- Sudheendra Vijayanarasimhan et almbox. . 2014. Deep networks with large output spaces. arXiv preprint arXiv:1412.7479 (2014).Google Scholar
- Roger Weber, Hans-Jörg Schek, and Stephen Blott . 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. Vol. 98. 194--205. Google ScholarDigital Library
- Yuxin Zheng, Qi Guo, Anthony KH Tung, and Sai Wu . 2016. Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index. In SIGMOD. 2023--2037. Google ScholarDigital Library
Index Terms
- Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search
Recommendations
Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataLocality-Sensitive Hashing (LSH) is one of the most popular methods for c-Approximate Nearest Neighbor Search (c-ANNS) in high-dimensional spaces. In this paper, we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search ...
Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm
The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing ...
Secure Approximate Nearest Neighbor Search with Locality-Sensitive Hashing
Computer Security – ESORICS 2023AbstractEnsuring both security and efficiency in Nearest Neighbor Search (NNS) on large datasets remains a formidable challenge, as it often leads to substantial computation and communication costs due to the resource-intensive nature of ciphertext ...
Comments