ABSTRACT
In the past decade large scale recommendation datasets were published and extensively studied. In this work we describe a detailed analysis of a sparse, large scale dataset, specifically designed to push the envelope of recommender system models. The Yahoo! Music dataset consists of more than a million users, 600 thousand musical items and more than 250 million ratings, collected over a decade. It is characterized by three unique features: First, rated items are multi-typed, including tracks, albums, artists and genres; Second, items are arranged within a four level taxonomy, proving itself effective in coping with a severe sparsity problem that originates from the unusually large number of items (compared to, e.g., movie ratings datasets). Finally, fine resolution timestamps associated with the ratings enable a comprehensive temporal and session analysis. We further present a matrix factorization model exploiting the special characteristics of this dataset. In particular, the model incorporates a rich bias model with terms that capture information from the taxonomy of items and different temporal dynamics of music ratings. To gain additional insights of its properties, we organized the KddCup-2011 competition about this dataset. As the competition drew thousands of participants, we expect the dataset to attract considerable research activity in the future.
- D. Agarwal and B.-C. Chen. Regression-based latent factor models. In KDD, pages 19--28, 2009. Google ScholarDigital Library
- X. Amatriain, J. Bonada, Àlex Loscos, J. L. Arcos, and V. Verfaille. Content-based transformations. Journal of New Music Research, 32:2003, 2003.Google ScholarCross Ref
- J.-J. Aucouturier and F. Pachet. Music similarity measures: What's the use? In Proc. 3rd International Symposium on Music Information Retrieval, pages 157--163, 2002.Google Scholar
- R. M. Bell and Y. Koren. Lessons from the netflix prize challenge. SIGKDD Explor. Newsl., 9:75--79, 2007. Google ScholarDigital Library
- J. Bennett and S. Lanning. The netflix prize. In Proc. KDD Cup and Workshop, 2007.Google Scholar
- O. Celma. Music Recommendation and Discovery in the Long Tail. PhD thesis, Universitat Pompeu Fabra, 2008.Google Scholar
- O. Celma and P. Cano. From hits to niches? or how popular artists can bias music recommendation and discovery. In 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition, 2008. Google ScholarDigital Library
- Z. Gantner, L. Drumond, C. Freudenthaler, S. Rendle, and L. Schmidt-Thieme. Learning attribute-to-feature mappings for cold-start recommendations. In ICDM, pages 176--185, 2010. Google ScholarDigital Library
- A. Gunawardana and C. Meek. Tied boltzmann machines for cold start recommendations. In RecSys, pages 19--26, 2008. Google ScholarDigital Library
- M. Kendall and K. D. Gibbons. Rank Correlation Methods. Oxford University Press, 1990.Google Scholar
- Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 426--434, 2008. Google ScholarDigital Library
- Y. Koren. The bellkor solution to the netflix grand prize. 2009.Google Scholar
- Y. Koren. Collaborative filtering with temporal dynamics. In KDD, pages 447--456, 2009. Google ScholarDigital Library
- Y. Koren, R. M. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30--37, 2009. Google ScholarDigital Library
- J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright. Convergence properties of the nelder-mead simplex algorithm in low dimensions. SIAM Journal of Optimization, 9:112--147, 1996. Google ScholarDigital Library
- P. Lamere. Social tagging and music information retrieval. Journal of New Music Research, 37(2):101--114, 2008.Google ScholarCross Ref
- D. Lee and M. Wiswall. A parallel implementation of the simplex function minimization routine. Comput. Econ., 30:171--187, 2007. Google ScholarDigital Library
- B. Logan. Mel frequency cepstral coefficients for music modeling. In Int. Symposium on Music Information Retrieval, 2000.Google Scholar
- A. Nanopoulos, D. Rafailidis, P. Symeonidis, and Y. Manolopoulos. Musicbox: Personalized music recommendation based on cubic analysis of social tags. IEEE Trans. on Audio, Speech and Language Processing, 18(2):407--412, 2010. Google ScholarDigital Library
- J. A. Nelder and R. Mead. A simplex method for function minimization. The Computer Journal, 7(4), 1965.Google ScholarCross Ref
- M. Piotte and M. Chabbert. The pragmatic theory solution to the netflix grand prize. 2009.Google Scholar
- F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011. Google ScholarDigital Library
- M. Schedl and P. Knees. Context-based Music Similarity Estimation. In Proc. 3rd International Workshop on Learning the Semantics of Audio Signals (LSAS 2009), 2009.Google Scholar
- A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In Proc. 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 253--260. ACM Press, 2002. Google ScholarDigital Library
- M. Wright. Direct search methods: Once scorned, now respectable. In D. Griffiths and G. Watson, editors, Numerical Analysis, pages 191--208. Addison Wesley, 1995.Google Scholar
- L. Xiang, Q. Yuan, S. Zhao, L. Chen, X. Zhang, Q. Yang, and J. Sun. Temporal recommendation on graphs via long- and short-term preference fusion. In KDD, pages 723--732, 2010. Google ScholarDigital Library
Index Terms
- Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy
Recommendations
Naïve filterbots for robust cold-start recommendations
KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data miningThe goal of a recommender system is to suggest items of interest to a user based on historical behavior of a community of users. Given detailed enough history, item-based collaborative filtering (CF) often performs as well or better than almost any ...
Making recommendations from top-N user-item subgroups
Group-aware collaborative filtering (CF) has recently become a hot research topic in recommender systems, which typically divides a large CF task on the entire data (i.e. rating matrix) into some smaller CF tasks on subgroups (i.e., sub-matrices). This ...
Serendipitous Personalized Ranking for Top-N Recommendation
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Serendipitous recommendation has benefitted both e-retailers and users. It tends to suggest items which are both unexpected and useful to users. These items are not only profitable to the retailers but also surprisingly suitable to consumers' tastes. ...
Comments