research-article

Scalable similarity-based neighborhood methods with MapReduce

Authors:
Sebastian Schelter

Technische Universität Berlin, Berlin, Germany

Technische Universität Berlin, Berlin, Germany
View Profile

,
Christoph Boden

Technische Universität Berlin, Berlin, Germany

Technische Universität Berlin, Berlin, Germany
View Profile

,
Volker Markl

Technische Universität Berlin, Berlin, Germany

Technische Universität Berlin, Berlin, Germany
View Profile

RecSys '12: Proceedings of the sixth ACM conference on Recommender systemsSeptember 2012Pages 163–170https://doi.org/10.1145/2365952.2365984

Published:09 September 2012Publication History

RecSys '12: Proceedings of the sixth ACM conference on Recommender systems

Pages 163–170

ABSTRACT

Similarity-based neighborhood methods, a simple and popular approach to collaborative filtering, infer their predictions by finding users with similar taste or items that have been similarly rated. If the number of users grows to millions, the standard approach of sequentially examining each item and looking at all interacting users does not scale. To solve this problem, we develop a MapReduce algorithm for the pairwise item comparison and top-N recommendation problem that scales linearly with respect to a growing number of users. This parallel algorithm is able to work on partitioned data and is general in that it supports a wide range of similarity measures. We evaluate our algorithm on a large dataset consisting of 700 million song ratings from Yahoo! Music.

References

K. Ali and W. van Stam. Tivo: Making show recommendations using a distributed collaborative filtering architecture. KDD, 2004. Google ScholarDigital Library
Apache Hadoop, http://hadoop.apache.org.Google Scholar
Apache Mahout, http://mahout.apache.org.Google Scholar
R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. WWW, pp. 131--140, 2007. Google ScholarDigital Library
R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Newsl., 9:75--79, 2007. Google ScholarDigital Library
A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. WWW, pp. 271--280, 2007. Google ScholarDigital Library
J. Davidson, B. Liebald, J. Liu, P. Nandy, T. Van Vleet, U. Gargi, S. Gupta, Y. He, M. Lambert, B. Livingston, and D. Sampath. The youtube video recommendation system. RecSys, pp. 293--296, 2010. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, 2008. Google ScholarDigital Library
D. DeWitt, R. Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna. GAMMA - a high performance data flow database machine. VLDB, pp. 228--237, 1986. Google ScholarDigital Library
T. Dunning. Accurate methods for the statistics of surprise and coincidence. ACL, 19:61--74, 1993. Google ScholarDigital Library
M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. T. Riedl. Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. RecSys, pp. 133--140, 2011. Google ScholarDigital Library
S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning Fast Iterative Data Flows. PVLDB, 2012. Google ScholarDigital Library
S. Fushimi, M. Kitsuregawa, and H. Tanaka. An overview of the system software of a parallel relational database machine GRACE. VLDB, pp. 209--219, 1986. Google ScholarDigital Library
Z. Gantner, S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Mymedialite: a free recommender system library. RecSys, pp. 305--308, 2011. Google ScholarDigital Library
R. Gemulla, E. Nijkamp, P. Haas, and Y. Sismannis. Large-scale matrix factorization with distributed stochastic gradient descent. KDD, pp. 69--77, 2011. Google ScholarDigital Library
M. Jamali and M. Ester. Trustwalker: a random walk model for combining trust-based and item-based recommendation. KDD, pp. 397--406, 2009. Google ScholarDigital Library
J. Jiang, J. Lu, G. Zhang, and G. Long. Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. SERVICES, pp. 490--497, 2011. Google ScholarDigital Library
Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Trans. KDD, 4:1:1--1:24, 2010. Google ScholarDigital Library
G. Linden, B. Smith, and J. York. Amazon.com recommendations: item-to-item collaborative filtering. Internet Computing, IEEE, 7(1):76--80, 2003. Google ScholarDigital Library
Y. Low and J. Gonzalez and A. Kyrola and D. Bickson and C. Guestrin and J. Hellerstein. Distributed GraphLab: A Framework for Machine Learning in the Cloud. PVLDB, 2012. Google ScholarDigital Library
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. CSCW, pp. 175--186, 1994. Google ScholarDigital Library
F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. Recommender Systems Handbook. 2011. Google ScholarCross Ref
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. WWW, pp. 285--295, 2001. Google ScholarDigital Library
E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. KDD, pp. 678--684, 2005. Google ScholarDigital Library
P. Symeonidis, E. Tiakas, and Y. Manolopoulos. Product recommendation and rating prediction based on multi-modal social networks. RecSys, pp. 61--68, 2011. Google ScholarDigital Library
Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. AAIM, pp. 337--348, 2008. Google ScholarDigital Library

Index Terms

Scalable similarity-based neighborhood methods with MapReduce
1. Information systems
  1. Information systems applications

Recommendations

Distributed matrix factorization with mapreduce using a series of broadcast-joins
RecSys '13: Proceedings of the 7th ACM conference on Recommender systems

The efficient, distributed factorization of large matrices on clusters of commodity machines is crucial to applying latent factor models in industrial-scale recommender systems. We propose an efficient, data-parallel low-rank matrix factorization with ...
Read More
Scalable Collaborative Filtering Recommendation Algorithm with MapReduce
DASC '14: Proceedings of the 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure Computing

Collaborative Filtering (CF) algorithm is the common solution to Recommender System (RS). With the development of network and storage technology, the amount of users and items in RS system is exclusively growing. How to increase the scalability and ...
Read More
Iterative Neighbourhood Similarity Computation for Collaborative Filtering
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Collaborative filtering recommender systems make predictions based on the preferences of users considered like-minded to the target user (user-based), or the popularities of items similar to the target item (item-based). There have been several ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RecSys '12: Proceedings of the sixth ACM conference on Recommender systems
September 2012
376 pages
ISBN:9781450312707
DOI:10.1145/2365952
General Chairs:
Pádraig Cunningham
University College Dublin, Ireland
,
Neil Hurley
University College Dublin, Ireland
,
Program Chairs:
Ido Guy
IBM Haifa Research Laboratory, Israel
,
Sarabjot Singh Anand
University of Warwick, England
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 September 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MapReduce
scalable collaborative filtering
Qualifiers
- research-article
Conference

Acceptance Rates
RecSys '12 Paper Acceptance Rate24of119submissions,20%Overall Acceptance Rate254of1,295submissions,20%
More
Upcoming Conference
RecSys '24

Sponsor:

sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 1,220
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable similarity-based neighborhood methods with MapReduce

RecSys '12: Proceedings of the sixth ACM conference on Recommender systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Distributed matrix factorization with mapreduce using a series of broadcast-joins

Scalable Collaborative Filtering Recommendation Algorithm with MapReduce

Iterative Neighbourhood Similarity Computation for Collaborative Filtering