research-article

Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search

Authors:
Qiang Huang

Sun Yat-Sen University, Guangzhou, China

Sun Yat-Sen University, Guangzhou, China
View Profile

,
Guihong Ma

Sun Yat-Sen University, Guangzhou, China

Sun Yat-Sen University, Guangzhou, China
View Profile

,
Jianlin Feng

Sun Yat-Sen University, Guangzhou, China

Sun Yat-Sen University, Guangzhou, China
View Profile

,
Qiong Fang

South China University of Technology, Guangzhou, China

South China University of Technology, Guangzhou, China
View Profile

,
Anthony K. H. Tung

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningJuly 2018Pages 1561–1570https://doi.org/10.1145/3219819.3219971

Published:19 July 2018Publication History

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 1561–1570

ABSTRACT

The problem of Approximate Maximum Inner Product (AMIP) search has received increasing attention due to its wide applications. Interestingly, based on asymmetric transformation, the problem can be reduced to the Approximate Nearest Neighbor (ANN) search, and hence leverage Locality-Sensitive Hashing (LSH) to find solution. However, existing asymmetric transformations such as L2-ALSH and XBOX, suffer from large distortion error in reducing AMIP search to ANN search, such that the results of AMIP search can be arbitrarily bad. In this paper, we propose a novel Asymmetric LSH scheme based on Homocentric Hypersphere partition (H2-ALSH) for high-dimensional AMIP search. On the one hand, we propose a novel Query Normalized First (QNF) transformation to significantly reduce the distortion error. On the other hand, by adopting the homocentric hypersphere partition strategy, we can not only improve the search efficiency with early stop pruning, but also get higher search accuracy by further reducing the distortion error with limited data range. Our theoretical studies show that H2-ALSH enjoys a guarantee on search accuracy. Experimental results over four real datasets demonstrate that H2-ALSH significantly outperforms the state-of-the-art schemes.

References

Alexandr Andoni and Piotr Indyk . 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions FOCS. 459--468. Google ScholarDigital Library
Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, and Yoshua Bengio . 2015. Clustering is efficient for approximate maximum inner product search. arXiv preprint arXiv:1507.05910 (2015).Google Scholar
Yoram Bachrach, Yehuda Finkelstein, Ran Gilad-Bachrach, Liran Katzir, Noam Koenigstein, Nir Nice, and Ulrich Paquet . 2014. Speeding up the xbox recommender system using a euclidean transformation for inner-product spaces. In RecSys. 257--264. Google ScholarDigital Library
James Bennett, Stan Lanning, et almbox. . 2007. The netflix prize. In Proceedings of KDD cup and workshop. 35.Google Scholar
Moses S Charikar . 2002. Similarity estimation techniques from rounding algorithms STOC. 380--388. Google ScholarDigital Library
Paolo Cremonesi, Yehuda Koren, and Roberto Turrin . 2010. Performance of recommender algorithms on top-n recommendation tasks RecSys. 39--46. Google ScholarDigital Library
Ryan R Curtin and Parikshit Ram . 2014. Dual-tree fast exact max-kernel search. Statistical Analysis and Data Mining Vol. 7, 4 (2014), 229--253. Google ScholarDigital Library
Ryan R Curtin, Parikshit Ram, and Alexander G Gray . 2013. Fast exact max-kernel search. In ICDM. 1--9.Google Scholar
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni . 2004. Locality-sensitive hashing scheme based on p-stable distributions SoCG. 253--262. Google ScholarDigital Library
Thomas Dean, Mark A Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, and Jay Yagnik . 2013. Fast, accurate detection of 100,000 object classes on a single machine CVPR. 1814--1821. Google ScholarDigital Library
Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer . 2012. The Yahoo! Music Dataset and KDD-Cup'11.. In KDD Cup. 8--18. Google ScholarDigital Library
Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng . 2012. Locality-sensitive hashing scheme based on dynamic collision counting SIGMOD. 541--552. Google ScholarDigital Library
Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha . 2016. Quantization based fast inner product search. In Artificial Intelligence and Statistics. 482--490.Google Scholar
Qiang Huang, Jianlin Feng, Qiong Fang, et almbox. . 2017. Query-aware locality-sensitive hashing scheme for $l_p$ norm. The VLDB Journal Vol. 26, 5 (2017), 683--708. Google ScholarDigital Library
Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng . 2015. Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proceedings of the VLDB Endowment Vol. 9, 1 (2015), 1--12. Google ScholarDigital Library
Prateek Jain and Ashish Kapoor . 2009. Active learning for large multi-class problems. In CVPR. 762--769.Google Scholar
Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu . 2009. Cutting-plane training of structural SVMs. Machine Learning Vol. 77, 1 (2009), 27--59. Google ScholarDigital Library
Noam Koenigstein, Parikshit Ram, and Yuval Shavitt . 2012. Efficient retrieval of recommendations in a matrix factorization framework CIKM. 535--544. Google ScholarDigital Library
Yehuda Koren, Robert Bell, and Chris Volinsky . 2009. Matrix factorization techniques for recommender systems. Computer Vol. 42, 8 (2009). Google ScholarDigital Library
Hui Li, Tsz Nam Chan, Man Lung Yiu, and Nikos Mamoulis . 2017. FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems SIGMOD. 835--850. Google ScholarDigital Library
Behnam Neyshabur and Nathan Srebro . 2015. On Symmetric and Asymmetric LSHs for Inner Product Search ICML. 1926--1934. Google ScholarDigital Library
Parikshit Ram and Alexander G Gray . 2012. Maximum inner-product search using cone trees. In SIGKDD. 931--939. Google ScholarDigital Library
Anshumali Shrivastava and Ping Li . 2014. Asymmetric LSH (ALSH) for sublinear time Maximum Inner Product Search (MIPS) NIPS. 2321--2329. Google ScholarDigital Library
Anshumali Shrivastava and Ping Li . 2015. Improved asymmetric locality sensitive hashing (ALSH) for Maximum Inner Product Search (MIPS). In UAI. 812--821. Google ScholarDigital Library
Ryan Spring and Anshumali Shrivastava . 2017. Scalable and sustainable deep learning via randomized hashing SIGKDD. 445--454. Google ScholarDigital Library
Nathan Srebro, Jason Rennie, and Tommi S Jaakkola . 2005. Maximum-margin matrix factorization. In NIPS. 1329--1336. Google ScholarDigital Library
Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, and Xuemin Lin . 2014. SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. Proceedings of the VLDB Endowment Vol. 8, 1 (2014), 1--12. Google ScholarDigital Library
Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis . 2009. Quality and efficiency in high dimensional nearest neighbor search SIGMOD. 563--576. Google ScholarDigital Library
Christina Teflioudi and Rainer Gemulla . 2016. Exact and approximate maximum inner product search with lemp. ACM TODS Vol. 42, 1 (2016), 5. Google ScholarDigital Library
Christina Teflioudi, Rainer Gemulla, and Olga Mykytiuk . 2015. LEMP: Fast retrieval of large entries in a matrix product SIGMOD. 107--122. Google ScholarDigital Library
Sudheendra Vijayanarasimhan et almbox. . 2014. Deep networks with large output spaces. arXiv preprint arXiv:1412.7479 (2014).Google Scholar
Roger Weber, Hans-Jörg Schek, and Stephen Blott . 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. Vol. 98. 194--205. Google ScholarDigital Library
Yuxin Zheng, Qi Guo, Anthony KH Tung, and Sai Wu . 2016. Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index. In SIGMOD. 2023--2037. Google ScholarDigital Library

Index Terms

Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals

Recommendations

Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Locality-Sensitive Hashing (LSH) is one of the most popular methods for c-Approximate Nearest Neighbor Search (c-ANNS) in high-dimensional spaces. In this paper, we propose a novel LSH scheme based on the Longest Circular Co-Substring (LCCS) search ...
Read More
Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing ...
Read More
Secure Approximate Nearest Neighbor Search with Locality-Sensitive Hashing
Computer Security – ESORICS 2023
Abstract
Ensuring both security and efficiency in Nearest Neighbor Search (NNS) on large datasets remains a formidable challenge, as it often leads to substantial computation and communication costs due to the resource-intensive nature of ciphertext ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
General Chairs:
Yike Guo
Imperial College London
,
Faisal Farooq
IBM
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 July 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
locality-sensitive hashing
maximum inner product search
nearest neighbor search
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 1,066
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Secure Approximate Nearest Neighbor Search with Locality-Sensitive Hashing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search

KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Secure Approximate Nearest Neighbor Search with Locality-Sensitive Hashing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media