ABSTRACT
We investigate how metrics that can be measured offline can be used to predict the online performance of recommender systems, thus avoiding costly A-B testing. In addition to accuracy metrics, we combine diversity, coverage, and serendipity metrics to create a new performance model. Using the model, we quantify the trade-off between different metrics and propose to use it to tune the parameters of recommender algorithms without the need for online testing. Another application for the model is a self-adjusting algorithm blend that optimizes a recommender's parameters over time. We evaluate our findings on data and experiments from news websites.
Supplemental Material
- D. Agarwal, B.-C. Chen, P. Elango, and X. Wang. Click shaping to optimize multiple objectives. In KDD, 2011. Google ScholarDigital Library
- R. Burke. Evaluating the dynamic properties of recommendation algorithms. In RecSys, 2010. Google ScholarDigital Library
- A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW, 2007. Google ScholarDigital Library
- B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, et al. Least angle regression. The Annals of statistics, 32(2):407--499, 2004.Google ScholarCross Ref
- F. Garcin, C. Dimitrakakis, and B. Faltings. Personalized news recommendation with context trees. In RecSys, 2013. Google ScholarDigital Library
- F. Garcin, B. Faltings, O. Donatsch, A. Alazzawi, C. Bruttin, and A. Huber. Offline and online evaluation of news recommender systems at swissinfo. In RecSys, 2014. Google ScholarDigital Library
- M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. In RecSys, 2010. Google ScholarDigital Library
- J. Herlocker, J. Konstan, L. Terveen, and J. Riedl. Evaluating collaborative filtering recommender systems. Transactions on Information Systems, 22(1):5--53, 2004. Google ScholarDigital Library
- T. Jambor and J. Wang. Optimizing multiple objectives in collaborative filtering. In RecSys, 2010. Google ScholarDigital Library
- N. Lathia, S. Hailes, L. Capra, and X. Amatriain. Temporal diversity in recommender systems. In SIGIR, 2010. Google ScholarDigital Library
- L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In WSDM, 2011. Google ScholarDigital Library
- J. Liu, P. Dolan, and E. R. Pedersen. Personalized news recommendation based on click behavior. In IUI, 2010. Google ScholarDigital Library
- S. McNee, J. Riedl, and J. Konstan. Being accurate is not enough: how accuracy metrics have hurt recommender systems. In CHI, pages 1097--1101, 2006. Google ScholarDigital Library
- T. Murakami, K. Mori, and R. Orihara. Metrics for evaluating the serendipity of recommendation lists. In New frontiers in artificial intelligence, pages 40--46. 2008. Google ScholarDigital Library
- D. M. Powers. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. In Bioinfo Publications, 2011.Google Scholar
- M. Ribeiro, A. Lacerda, E. de Moura, A. Veloso, and N. Ziviani. Multi-objective pareto-efficient approaches for recommender systems. ACM Transactions on Intelligent Systems and Technology, 9(1):1--20, 2013. Google ScholarDigital Library
- F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook. Springer, 2011. Google ScholarCross Ref
- M. Rodriguez, C. Posse, and E. Zhang. Multiple objective optimization in recommender systems. In RecSys, 2012. Google ScholarDigital Library
- A. Said. Evaluating the Accuracy and Utility of Recommender Systems. PhD thesis, 2013.Google Scholar
- A. Said, A. Bellogın, J. Lin, and A. de Vries. Do recommendations matter?: news recommendation in real life. In CSCW, 2014. Google ScholarDigital Library
- A. Said, J. Lin, A. Bellogın, and A. de Vries. A month in the life of a production news recommender system. In CIKM-LL Workshop, 2013. Google ScholarDigital Library
- K. Svore, M. Volkovs, and C. Burges. Learning to rank with multiple objective functions. In WWW, 2011. Google ScholarDigital Library
- H. Vanchinathan, I. Nikolic, F. De Bona, and A. Krause. Explore-exploit in top-n recommender systems via gaussian processes. In RecSys, 2014. Google ScholarDigital Library
- S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics for recommender systems. In RecSys, 2011. Google ScholarDigital Library
- H. Wang, A. Dong, L. Li, Y. Chang, and E. Gabrilovich. Joint relevance and freshness learning from clickthroughs for news search. In WWW, 2012. Google ScholarDigital Library
- S. Wang, M. Gong, L. Ma, Q. Cai, and L. Jiao. Decomposition based multiobjective evolutionary algorithm for collaborative filtering recommender systems. In Evolutionary Computation, IEEE Congress on, pages 672--679, 2014.Google Scholar
- J. Yi, Y. Chen, J. Li, S. Sett, and T. W. Yan. Predictive model performance: Offline and online evaluations. In KDD, pages 1294--1302, 2013. Google ScholarDigital Library
- M. Zhang and N. Hurley. Avoiding monotony: improving the diversity of recommendation lists. In RecSys, 2008. Google ScholarDigital Library
- Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR, 2002. Google ScholarDigital Library
- T. Zhou, Z. Kuscsik, J.-G. Liu, M. Medo, J. Wakeling, and Y.-C. Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. Proceedings of NAS, 107(10):4511--4515, 2010.Google ScholarCross Ref
- C.-N. Ziegler, S. McNee, J. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW, 2005. Google ScholarDigital Library
Index Terms
- Predicting Online Performance of News Recommender Systems Through Richer Evaluation Metrics
Recommendations
Offline and online evaluation of news recommender systems at swissinfo.ch
RecSys '14: Proceedings of the 8th ACM Conference on Recommender systemsWe report on the live evaluation of various news recommender systems conducted on the website swissinfo.ch. We demonstrate that there is a major difference between offline and online accuracy evaluations. In an offline setting, recommending most popular ...
How Well do Offline Metrics Predict Online Performance of Product Ranking Models?
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalOnline evaluation techniques are widely adopted by industrial search engines to determine which ranking models perform better under a certain business metric. However, online evaluation can only evaluate a small number of rankers and people resort to ...
A survey of serendipity in recommender systems
We summarize most efforts on serendipity in recommender systems.We compare definitions of serendipity in recommender systems.We classify the state-of-the-art serendipity-oriented recommendation algorithms.We review methods to assess serendipity in ...
Comments