skip to main content
10.1145/3219819.3219971acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search

Published:19 July 2018Publication History

ABSTRACT

The problem of Approximate Maximum Inner Product (AMIP) search has received increasing attention due to its wide applications. Interestingly, based on asymmetric transformation, the problem can be reduced to the Approximate Nearest Neighbor (ANN) search, and hence leverage Locality-Sensitive Hashing (LSH) to find solution. However, existing asymmetric transformations such as L2-ALSH and XBOX, suffer from large distortion error in reducing AMIP search to ANN search, such that the results of AMIP search can be arbitrarily bad. In this paper, we propose a novel Asymmetric LSH scheme based on Homocentric Hypersphere partition (H2-ALSH) for high-dimensional AMIP search. On the one hand, we propose a novel Query Normalized First (QNF) transformation to significantly reduce the distortion error. On the other hand, by adopting the homocentric hypersphere partition strategy, we can not only improve the search efficiency with early stop pruning, but also get higher search accuracy by further reducing the distortion error with limited data range. Our theoretical studies show that H2-ALSH enjoys a guarantee on search accuracy. Experimental results over four real datasets demonstrate that H2-ALSH significantly outperforms the state-of-the-art schemes.

References

  1. Alexandr Andoni and Piotr Indyk . 2006. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions FOCS. 459--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alex Auvolat, Sarath Chandar, Pascal Vincent, Hugo Larochelle, and Yoshua Bengio . 2015. Clustering is efficient for approximate maximum inner product search. arXiv preprint arXiv:1507.05910 (2015).Google ScholarGoogle Scholar
  3. Yoram Bachrach, Yehuda Finkelstein, Ran Gilad-Bachrach, Liran Katzir, Noam Koenigstein, Nir Nice, and Ulrich Paquet . 2014. Speeding up the xbox recommender system using a euclidean transformation for inner-product spaces. In RecSys. 257--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. James Bennett, Stan Lanning, et almbox. . 2007. The netflix prize. In Proceedings of KDD cup and workshop. 35.Google ScholarGoogle Scholar
  5. Moses S Charikar . 2002. Similarity estimation techniques from rounding algorithms STOC. 380--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Paolo Cremonesi, Yehuda Koren, and Roberto Turrin . 2010. Performance of recommender algorithms on top-n recommendation tasks RecSys. 39--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ryan R Curtin and Parikshit Ram . 2014. Dual-tree fast exact max-kernel search. Statistical Analysis and Data Mining Vol. 7, 4 (2014), 229--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ryan R Curtin, Parikshit Ram, and Alexander G Gray . 2013. Fast exact max-kernel search. In ICDM. 1--9.Google ScholarGoogle Scholar
  9. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni . 2004. Locality-sensitive hashing scheme based on p-stable distributions SoCG. 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thomas Dean, Mark A Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, and Jay Yagnik . 2013. Fast, accurate detection of 100,000 object classes on a single machine CVPR. 1814--1821. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gideon Dror, Noam Koenigstein, Yehuda Koren, and Markus Weimer . 2012. The Yahoo! Music Dataset and KDD-Cup'11.. In KDD Cup. 8--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng . 2012. Locality-sensitive hashing scheme based on dynamic collision counting SIGMOD. 541--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ruiqi Guo, Sanjiv Kumar, Krzysztof Choromanski, and David Simcha . 2016. Quantization based fast inner product search. In Artificial Intelligence and Statistics. 482--490.Google ScholarGoogle Scholar
  14. Qiang Huang, Jianlin Feng, Qiong Fang, et almbox. . 2017. Query-aware locality-sensitive hashing scheme for $l_p$ norm. The VLDB Journal Vol. 26, 5 (2017), 683--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng . 2015. Query-aware locality-sensitive hashing for approximate nearest neighbor search. Proceedings of the VLDB Endowment Vol. 9, 1 (2015), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Prateek Jain and Ashish Kapoor . 2009. Active learning for large multi-class problems. In CVPR. 762--769.Google ScholarGoogle Scholar
  17. Thorsten Joachims, Thomas Finley, and Chun-Nam John Yu . 2009. Cutting-plane training of structural SVMs. Machine Learning Vol. 77, 1 (2009), 27--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Noam Koenigstein, Parikshit Ram, and Yuval Shavitt . 2012. Efficient retrieval of recommendations in a matrix factorization framework CIKM. 535--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yehuda Koren, Robert Bell, and Chris Volinsky . 2009. Matrix factorization techniques for recommender systems. Computer Vol. 42, 8 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hui Li, Tsz Nam Chan, Man Lung Yiu, and Nikos Mamoulis . 2017. FEXIPRO: Fast and Exact Inner Product Retrieval in Recommender Systems SIGMOD. 835--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Behnam Neyshabur and Nathan Srebro . 2015. On Symmetric and Asymmetric LSHs for Inner Product Search ICML. 1926--1934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Parikshit Ram and Alexander G Gray . 2012. Maximum inner-product search using cone trees. In SIGKDD. 931--939. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Anshumali Shrivastava and Ping Li . 2014. Asymmetric LSH (ALSH) for sublinear time Maximum Inner Product Search (MIPS) NIPS. 2321--2329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Anshumali Shrivastava and Ping Li . 2015. Improved asymmetric locality sensitive hashing (ALSH) for Maximum Inner Product Search (MIPS). In UAI. 812--821. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ryan Spring and Anshumali Shrivastava . 2017. Scalable and sustainable deep learning via randomized hashing SIGKDD. 445--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Nathan Srebro, Jason Rennie, and Tommi S Jaakkola . 2005. Maximum-margin matrix factorization. In NIPS. 1329--1336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yifang Sun, Wei Wang, Jianbin Qin, Ying Zhang, and Xuemin Lin . 2014. SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. Proceedings of the VLDB Endowment Vol. 8, 1 (2014), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis . 2009. Quality and efficiency in high dimensional nearest neighbor search SIGMOD. 563--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Christina Teflioudi and Rainer Gemulla . 2016. Exact and approximate maximum inner product search with lemp. ACM TODS Vol. 42, 1 (2016), 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Christina Teflioudi, Rainer Gemulla, and Olga Mykytiuk . 2015. LEMP: Fast retrieval of large entries in a matrix product SIGMOD. 107--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sudheendra Vijayanarasimhan et almbox. . 2014. Deep networks with large output spaces. arXiv preprint arXiv:1412.7479 (2014).Google ScholarGoogle Scholar
  32. Roger Weber, Hans-Jörg Schek, and Stephen Blott . 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, Vol. Vol. 98. 194--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yuxin Zheng, Qi Guo, Anthony KH Tung, and Sai Wu . 2016. Lazylsh: Approximate nearest neighbor search for multiple distance functions with a single index. In SIGMOD. 2023--2037. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accurate and Fast Asymmetric Locality-Sensitive Hashing Scheme for Maximum Inner Product Search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      July 2018
      2925 pages
      ISBN:9781450355520
      DOI:10.1145/3219819

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader