ABSTRACT
Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.
In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.
Supplemental Material
Available for Download
In the auxiliary material is available a full clone of the Github repository with all source files, data and results.
- S. Antenucci, S. Boglio, E. Chioso, E. Dervishaj, K. Shuwen, T. Scarlatti, and M. Ferrari Dacrema. 2018. Artist-driven layering and user's behaviour impact on recommendations in a playlist continuation scenario. In Proceedings of the ACM Recommender Systems Challenge 2018 (RecSys 2018). Source: https://github.com/MaurizioFD/spotify-recsys-challenge. Google ScholarDigital Library
- Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements That Don't Add Up: Ad-hoc Retrieval Results Since 1998. In Proceedings CIKM '09. 601--610. Google ScholarDigital Library
- Joeran Beel, Corinna Breitinger, Stefan Langer, Andreas Lommatzsch, and Bela Gipp. 2016. Towards reproducibility in recommender-systems research. User Modeling and User-Adapted Interaction 26, 1 (2016), 69--101. Google ScholarDigital Library
- Jöran Beel and Stefan Langer. 2015. A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems. In Proceedings TPDL '15. 153--168.Google ScholarCross Ref
- Robert M Bell and Yehuda Koren. 2007. Improved neighborhood-based collaborative filtering. In KDD cup and workshop at the KDD '07. Citeseer, 7--14.Google Scholar
- Homanga Bharadhwaj, Homin Park, and Brian Y. Lim. 2018. RecGAN: Recurrent Generative Adversarial Networks for Recommendation Systems. In Proceedings RecSys '18. 372--376. Google ScholarDigital Library
- Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings SIGIR '17. 335--344. Google ScholarDigital Library
- Colin Cooper, Sang Hyuk Lee, Tomasz Radzik, and Yiannis Siantos. 2014. Random walks in recommender systems: exact computation and simulations. In Proceedings WWW '14. 811--816. Google ScholarDigital Library
- Paolo Cremonesi, Franca Garzotto, and Roberto Turrin. 2012. Investigating the Persuasion Potential of Recommender Systems from a Quality Perspective: An Empirical Study. Transactions on Interactive Intelligent Systems 2, 2 (2012), 1--41. Google ScholarDigital Library
- Travis Ebesu, Bin Shen, and Yi Fang. 2018. Collaborative Memory Network for Recommendation Systems. In Proceedings SIGIR '18. 515--524. Google ScholarDigital Library
- Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings WWW '15. 278--288. Google ScholarDigital Library
- Association for Computing Machinery. 2016. Artifact Review and Badging. Available online at: https://www.acm.org/publications/policies/artifact-review-badging (Accessed March, 2018).Google Scholar
- Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, and Amr Huber. 2014. Offline and Online Evaluation of News Recommender Systems at Swissinfo.Ch. In Proceedings RecSys '14. 169--176. Google ScholarDigital Library
- Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings WWW '17. 173--182.Google ScholarDigital Library
- Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep Reinforcement Learning That Matters. In Proceedings AAAI '18. 3207--3214.Google Scholar
- Balázs Hidasi, Alexandras Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In Proceedings ICLR '16.Google Scholar
- Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S Yu. 2018. Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In Proceedings KDD '18. 1531--1540. Google ScholarDigital Library
- Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In Proceedings ICDM '08. 263--272. Google ScholarDigital Library
- Dietmar Jannach and Malte Ludewig. 2017. When Recurrent Neural Networks Meet the Neighborhood for Session-Based Recommendation. In Proceedings RecSys '17. 306--310. Google ScholarDigital Library
- Dietmar Jannach, Paul Resnick, Alexander Tuzhilin, and Markus Zanker. 2016. Recommender Systems - Beyond Matrix Completion. Commun. ACM 59, 11 (2016), 94--102. Google ScholarDigital Library
- Donghyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu. 2016. Convolutional Matrix Factorization for Document Context-Aware Recommendation. In Proceedings RecSys '16. 233--240. Google ScholarDigital Library
- Joseph A. Konstan and John Riedl. 2012. Recommender systems: from algorithms to user experience. User Modeling and User-Adapted Interaction 22, 1 (2012), 101--123. Google ScholarDigital Library
- Xiaopeng Li and James She. 2017. Collaborative variational autoencoder for recommender systems. In Proceedings KDD '17. 305--314. Google ScholarDigital Library
- Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings WWW '18. 689--698. Google ScholarDigital Library
- Jimmy Lin. 2019. The Neural Hype and Comparisons Against Weak Baselines. SIGIR Forum 52, 2 (Jan. 2019), 40--51. Google ScholarDigital Library
- G. Linden, B. Smith, and J. York. 2003. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing 7, 1 (2003), 76--80. Google ScholarDigital Library
- Zachary C. Lipton and Jacob Steinhardt. 2018. Troubling Trends in Machine Learning Scholarship. arXiv:arXiv:1807.03341Google Scholar
- Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2011. Content-based recommender systems: State of the art and trends. In Recommender Systems Handbook. Springer, 73--105.Google Scholar
- Malte Ludewig and Dietmar Jannach. 2018. Evaluation of Session-based Recommendation Algorithms. User-Modeling and User-Adapted Interaction 28, 4--5 (2018), 331--390. Google ScholarDigital Library
- Malte Ludewig, Noemi Mauro, Sara Latifi, and Dietmar Jannach. 2019. Performance Comparison of Neural and Non-Neural Approaches to Session-based Recommendation. In Proceedings RecSys '19. Google ScholarDigital Library
- Andrii Maksai, Florent Garcin, and Boi Faltings. 2015. Predicting Online Performance of News Recommender Systems Through Richer Evaluation Metrics. In Proceedings RecSys '15. 179--186. Google ScholarDigital Library
- Jarana Manotumruksa, Craig Macdonald, and Iadh Ounis. 2018. A Contextual Attention Recurrent Architecture for Context-Aware Venue Recommendation. In Proceedings SIGIR '18. 555--564. Google ScholarDigital Library
- Xia Ning and George Karypis. 2011. SLIM: Sparse linear methods for top-n recommender systems. In Proceedings ICDM '11. 497--506. Google ScholarDigital Library
- Bibek Paudel, Fabian Christoffel, Chris Newell, and Abraham Bernstein. 2017. Updatable, Accurate, Diverse, and Scalable Recommendations for Interactive Applications. ACM Transactions on Interactive Intelligent Systems 7, 1 (2017), 1. Google ScholarDigital Library
- Hans Ekkehard Plesser. 2017. Reproducibility vs. Replicability: A Brief History of a Confused Terminology. Frontiers in Neuroinformatics 11, 76 (2017).Google Scholar
- Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-Aware Recommender Systems. Comput. Surveys 51, 4 (2018), 1--36. Google ScholarDigital Library
- Marco Rossetti, Fabio Stella, and Markus Zanker. 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms. In Proceedings RecSys '16. 31--34. Google ScholarDigital Library
- Noveen Sachdeva, Kartik Gupta, and Vikram Pudi. 2018. Attentive Neural Architecture Incorporating Song Features for Music Recommendation. In Proceedings RecSys '18. 417--421. Google ScholarDigital Library
- Alan Said and Alejandro Bellogín. 2014. Rival: A Toolkit to Foster Reproducibility in Recommender System Evaluation. In Proceedings RecSys '14. 371--372. Google ScholarDigital Library
- Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings WWW '01. 285--295. Google ScholarDigital Library
- Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu. 2018. Recurrent Knowledge Graph Embedding for Effective Recommendation. In Proceedings RecSys '18. 297--305. Google ScholarDigital Library
- Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Latent relational metric learning via memory-based attention for collaborative ranking. In Proceedings WWW '18. 729--739. Google ScholarDigital Library
- Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Multi-Pointer Co-Attention Networks for Recommendation. In Proceedings SIGKDD '18. 2309--2318. Google ScholarDigital Library
- Trinh Xuan Tuan and Tu Minh Phuong. 2017. 3D Convolutional Networks for Session-based Recommendation with Content Features. In Proceedings RecSys '17. 138--146. Google ScholarDigital Library
- Flavian Vasile, Elena Smirnova, and Alexis Conneau. 2016. Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation. In Proceedings RecSys '16. 225--232. Google ScholarDigital Library
- Kiri Wagstaff. 2012. Machine Learning that Matters. In Proceedings ICML '12. 529--536. Google ScholarDigital Library
- Chong Wang and David M Blei. 2011. Collaborative topic modeling for recommending scientific articles. In Proceedings KDD '11. 448--456. Google ScholarDigital Library
- Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learning for recommender systems. In Proceedings KDD '15. 1235--1244. Google ScholarDigital Library
- Jun Wang, Arjen P De Vries, and Marcel JT Reinders. 2006. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings SIGIR '06. 501--508. Google ScholarDigital Library
- Jun Wang, Stephen Robertson, Arjen P de Vries, and Marcel JT Reinders. 2008. Probabilistic relevance ranking for collaborative filtering. Information Retrieval 11, 6 (2008), 477--497. Google ScholarDigital Library
- Yao Wu, Christopher DuBois, Alice X Zheng, and Martin Ester. 2016. Collaborative denoising auto-encoders for top-n recommender systems. In Proceedings WSDM '16. 153--162. Google ScholarDigital Library
- Bo Xiao and Izak Benbasat. 2007. E-commerce Product Recommendation Agents: Use, Characteristics, and Impact. MIS Quarterly 31, 1 (March 2007), 137--209. Google ScholarDigital Library
- Lei Zheng, Chun-Ta Lu, Fei Jiang, Jiawei Zhang, and Philip S. Yu. 2018. Spectral Collaborative Filtering. In Proceedings RecSys '18. 311--319. Google ScholarDigital Library
Index Terms
- Are we really making much progress? A worrying analysis of recent neural recommendation approaches
Recommendations
News Session-Based Recommendations using Deep Neural Networks
DLRS 2018: Proceedings of the 3rd Workshop on Deep Learning for Recommender SystemsNews recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling,...
The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility Study
Advances in Information RetrievalAbstractResearch has shown that recommender systems are typically biased towards popular items, which leads to less popular items being underrepresented in recommendations. The recent work of Abdollahpouri et al. in the context of movie recommendations ...
Coevolutionary Recommendation Model: Mutual Learning between Ratings and Reviews
WWW '18: Proceedings of the 2018 World Wide Web ConferenceCollaborative filtering (CF) is a common recommendation approach that relies on user-item ratings. However, the natural sparsity of user-item rating data can be problematic in many domains and settings, limiting the ability to generate accurate ...
Comments