ABSTRACT
Similarity-based neighborhood methods, a simple and popular approach to collaborative filtering, infer their predictions by finding users with similar taste or items that have been similarly rated. If the number of users grows to millions, the standard approach of sequentially examining each item and looking at all interacting users does not scale. To solve this problem, we develop a MapReduce algorithm for the pairwise item comparison and top-N recommendation problem that scales linearly with respect to a growing number of users. This parallel algorithm is able to work on partitioned data and is general in that it supports a wide range of similarity measures. We evaluate our algorithm on a large dataset consisting of 700 million song ratings from Yahoo! Music.
- K. Ali and W. van Stam. Tivo: Making show recommendations using a distributed collaborative filtering architecture. KDD, 2004. Google ScholarDigital Library
- Apache Hadoop, http://hadoop.apache.org.Google Scholar
- Apache Mahout, http://mahout.apache.org.Google Scholar
- R. J. Bayardo, Y. Ma, and R. Srikant. Scaling up all pairs similarity search. WWW, pp. 131--140, 2007. Google ScholarDigital Library
- R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Newsl., 9:75--79, 2007. Google ScholarDigital Library
- A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. WWW, pp. 271--280, 2007. Google ScholarDigital Library
- J. Davidson, B. Liebald, J. Liu, P. Nandy, T. Van Vleet, U. Gargi, S. Gupta, Y. He, M. Lambert, B. Livingston, and D. Sampath. The youtube video recommendation system. RecSys, pp. 293--296, 2010. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, 2008. Google ScholarDigital Library
- D. DeWitt, R. Gerber, G. Graefe, M. Heytens, K. Kumar, and M. Muralikrishna. GAMMA - a high performance data flow database machine. VLDB, pp. 228--237, 1986. Google ScholarDigital Library
- T. Dunning. Accurate methods for the statistics of surprise and coincidence. ACL, 19:61--74, 1993. Google ScholarDigital Library
- M. D. Ekstrand, M. Ludwig, J. A. Konstan, and J. T. Riedl. Rethinking the recommender research ecosystem: reproducibility, openness, and lenskit. RecSys, pp. 133--140, 2011. Google ScholarDigital Library
- S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl. Spinning Fast Iterative Data Flows. PVLDB, 2012. Google ScholarDigital Library
- S. Fushimi, M. Kitsuregawa, and H. Tanaka. An overview of the system software of a parallel relational database machine GRACE. VLDB, pp. 209--219, 1986. Google ScholarDigital Library
- Z. Gantner, S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Mymedialite: a free recommender system library. RecSys, pp. 305--308, 2011. Google ScholarDigital Library
- R. Gemulla, E. Nijkamp, P. Haas, and Y. Sismannis. Large-scale matrix factorization with distributed stochastic gradient descent. KDD, pp. 69--77, 2011. Google ScholarDigital Library
- M. Jamali and M. Ester. Trustwalker: a random walk model for combining trust-based and item-based recommendation. KDD, pp. 397--406, 2009. Google ScholarDigital Library
- J. Jiang, J. Lu, G. Zhang, and G. Long. Scaling-up item-based collaborative filtering recommendation algorithm based on hadoop. SERVICES, pp. 490--497, 2011. Google ScholarDigital Library
- Y. Koren. Factor in the neighbors: Scalable and accurate collaborative filtering. ACM Trans. KDD, 4:1:1--1:24, 2010. Google ScholarDigital Library
- G. Linden, B. Smith, and J. York. Amazon.com recommendations: item-to-item collaborative filtering. Internet Computing, IEEE, 7(1):76--80, 2003. Google ScholarDigital Library
- Y. Low and J. Gonzalez and A. Kyrola and D. Bickson and C. Guestrin and J. Hellerstein. Distributed GraphLab: A Framework for Machine Learning in the Cloud. PVLDB, 2012. Google ScholarDigital Library
- P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. CSCW, pp. 175--186, 1994. Google ScholarDigital Library
- F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. Recommender Systems Handbook. 2011. Google ScholarCross Ref
- B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. WWW, pp. 285--295, 2001. Google ScholarDigital Library
- E. Spertus, M. Sahami, and O. Buyukkokten. Evaluating similarity measures: a large-scale study in the orkut social network. KDD, pp. 678--684, 2005. Google ScholarDigital Library
- P. Symeonidis, E. Tiakas, and Y. Manolopoulos. Product recommendation and rating prediction based on multi-modal social networks. RecSys, pp. 61--68, 2011. Google ScholarDigital Library
- Y. Zhou, D. Wilkinson, R. Schreiber, and R. Pan. Large-scale parallel collaborative filtering for the netflix prize. AAIM, pp. 337--348, 2008. Google ScholarDigital Library
Index Terms
- Scalable similarity-based neighborhood methods with MapReduce
Recommendations
Distributed matrix factorization with mapreduce using a series of broadcast-joins
RecSys '13: Proceedings of the 7th ACM conference on Recommender systemsThe efficient, distributed factorization of large matrices on clusters of commodity machines is crucial to applying latent factor models in industrial-scale recommender systems. We propose an efficient, data-parallel low-rank matrix factorization with ...
Scalable Collaborative Filtering Recommendation Algorithm with MapReduce
DASC '14: Proceedings of the 2014 IEEE 12th International Conference on Dependable, Autonomic and Secure ComputingCollaborative Filtering (CF) algorithm is the common solution to Recommender System (RS). With the development of network and storage technology, the amount of users and items in RS system is exclusively growing. How to increase the scalability and ...
Iterative Neighbourhood Similarity Computation for Collaborative Filtering
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01Collaborative filtering recommender systems make predictions based on the preferences of users considered like-minded to the target user (user-based), or the popularities of items similar to the target item (item-based). There have been several ...
Comments