skip to main content
10.1145/2365952.2365984acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Scalable similarity-based neighborhood methods with MapReduce

Published:09 September 2012Publication History

ABSTRACT

Similarity-based neighborhood methods, a simple and popular approach to collaborative filtering, infer their predictions by finding users with similar taste or items that have been similarly rated. If the number of users grows to millions, the standard approach of sequentially examining each item and looking at all interacting users does not scale. To solve this problem, we develop a MapReduce algorithm for the pairwise item comparison and top-N recommendation problem that scales linearly with respect to a growing number of users. This parallel algorithm is able to work on partitioned data and is general in that it supports a wide range of similarity measures. We evaluate our algorithm on a large dataset consisting of 700 million song ratings from Yahoo! Music.

References

  1. K. Ali and W. van Stam. Tivo: Making show recommendations using a distributed collaborative filtering architecture. KDD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apache Hadoop, http://hadoop.apache.org.Google ScholarGoogle Scholar
  3. Apache Mahout, http://mahout.apache.org.Google ScholarGoogle Scholar
  4. R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. WWW, pp. 131--140, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Newsl., 9:75--79, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. WWW, pp. 271--280, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Davidson, B. Liebald, J. Liu, P. Nandy, T. Van Vleet, U. Gargi, S. Gupta, Y. He, M. Lambert, B. Livingston, and D. Sampath. The youtube video recommendation system. RecSys, pp. 293--296, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. DeWitt, R. Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna. GAMMA - a high performance data flow database machine. VLDB, pp. 228--237, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Dunning. Accurate methods for the statistics of surprise and coincidence. ACL, 19:61--74, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. T. Riedl. Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. RecSys, pp. 133--140, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning Fast Iterative Data Flows. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Fushimi, M. Kitsuregawa, and H. Tanaka. An overview of the system software of a parallel relational database machine GRACE. VLDB, pp. 209--219, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Z. Gantner, S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Mymedialite: a free recommender system library. RecSys, pp. 305--308, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Gemulla, E. Nijkamp, P. Haas, and Y. Sismannis. Large-scale matrix factorization with distributed stochastic gradient descent. KDD, pp. 69--77, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Jamali and M. Ester. Trustwalker: a random walk model for combining trust-based and item-based recommendation. KDD, pp. 397--406, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Jiang, J. Lu, G. Zhang, and G. Long. Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. SERVICES, pp. 490--497, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Trans. KDD, 4:1:1--1:24, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Linden, B. Smith, and J. York. Amazon.com recommendations: item-to-item collaborative filtering. Internet Computing, IEEE, 7(1):76--80, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Low and J. Gonzalez and A. Kyrola and D. Bickson and C. Guestrin and J. Hellerstein. Distributed GraphLab: A Framework for Machine Learning in the Cloud. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. CSCW, pp. 175--186, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. Recommender Systems Handbook. 2011. Google ScholarGoogle ScholarCross RefCross Ref
  23. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. WWW, pp. 285--295, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. KDD, pp. 678--684, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Symeonidis, E. Tiakas, and Y. Manolopoulos. Product recommendation and rating prediction based on multi-modal social networks. RecSys, pp. 61--68, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. AAIM, pp. 337--348, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable similarity-based neighborhood methods with MapReduce

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RecSys '12: Proceedings of the sixth ACM conference on Recommender systems
      September 2012
      376 pages
      ISBN:9781450312707
      DOI:10.1145/2365952

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 September 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      RecSys '12 Paper Acceptance Rate24of119submissions,20%Overall Acceptance Rate254of1,295submissions,20%

      Upcoming Conference

      RecSys '24
      18th ACM Conference on Recommender Systems
      October 14 - 18, 2024
      Bari , Italy

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader