research-article

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

Authors:
Maurizio Ferrari Dacrema

Politecnico di Milano, Italy

Politecnico di Milano, Italy
View Profile

,
Paolo Cremonesi

Politecnico di Milano, Italy

Politecnico di Milano, Italy
View Profile

,
Dietmar Jannach

University of Klagenfurt, Austria

University of Klagenfurt, Austria
View Profile

RecSys '19: Proceedings of the 13th ACM Conference on Recommender SystemsSeptember 2019Pages 101–109https://doi.org/10.1145/3298689.3347058

Published:10 September 2019Publication History

RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems

Pages 101–109

ABSTRACT

Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models.

In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and calls for improved scientific practices in this area.

Supplemental Material

Available for Download

zip

p101-dacrema.zip (101.5 MB)

In the auxiliary material is available a full clone of the Github repository with all source files, data and results.

References

S. Antenucci, S. Boglio, E. Chioso, E. Dervishaj, K. Shuwen, T. Scarlatti, and M. Ferrari Dacrema. 2018. Artist-driven layering and user's behaviour impact on recommendations in a playlist continuation scenario. In Proceedings of the ACM Recommender Systems Challenge 2018 (RecSys 2018). Source: https://github.com/MaurizioFD/spotify-recsys-challenge. Google ScholarDigital Library
Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel. 2009. Improvements That Don't Add Up: Ad-hoc Retrieval Results Since 1998. In Proceedings CIKM '09. 601--610. Google ScholarDigital Library
Joeran Beel, Corinna Breitinger, Stefan Langer, Andreas Lommatzsch, and Bela Gipp. 2016. Towards reproducibility in recommender-systems research. User Modeling and User-Adapted Interaction 26, 1 (2016), 69--101. Google ScholarDigital Library
Jöran Beel and Stefan Langer. 2015. A Comparison of Offline Evaluations, Online Evaluations, and User Studies in the Context of Research-Paper Recommender Systems. In Proceedings TPDL '15. 153--168.Google ScholarCross Ref
Robert M Bell and Yehuda Koren. 2007. Improved neighborhood-based collaborative filtering. In KDD cup and workshop at the KDD '07. Citeseer, 7--14.Google Scholar
Homanga Bharadhwaj, Homin Park, and Brian Y. Lim. 2018. RecGAN: Recurrent Generative Adversarial Networks for Recommendation Systems. In Proceedings RecSys '18. 372--376. Google ScholarDigital Library
Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings SIGIR '17. 335--344. Google ScholarDigital Library
Colin Cooper, Sang Hyuk Lee, Tomasz Radzik, and Yiannis Siantos. 2014. Random walks in recommender systems: exact computation and simulations. In Proceedings WWW '14. 811--816. Google ScholarDigital Library
Paolo Cremonesi, Franca Garzotto, and Roberto Turrin. 2012. Investigating the Persuasion Potential of Recommender Systems from a Quality Perspective: An Empirical Study. Transactions on Interactive Intelligent Systems 2, 2 (2012), 1--41. Google ScholarDigital Library
Travis Ebesu, Bin Shen, and Yi Fang. 2018. Collaborative Memory Network for Recommendation Systems. In Proceedings SIGIR '18. 515--524. Google ScholarDigital Library
Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. 2015. A multi-view deep learning approach for cross domain user modeling in recommendation systems. In Proceedings WWW '15. 278--288. Google ScholarDigital Library
Association for Computing Machinery. 2016. Artifact Review and Badging. Available online at: https://www.acm.org/publications/policies/artifact-review-badging (Accessed March, 2018).Google Scholar
Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin, and Amr Huber. 2014. Offline and Online Evaluation of News Recommender Systems at Swissinfo.Ch. In Proceedings RecSys '14. 169--176. Google ScholarDigital Library
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings WWW '17. 173--182.Google ScholarDigital Library
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep Reinforcement Learning That Matters. In Proceedings AAAI '18. 3207--3214.Google Scholar
Balázs Hidasi, Alexandras Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In Proceedings ICLR '16.Google Scholar
Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S Yu. 2018. Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In Proceedings KDD '18. 1531--1540. Google ScholarDigital Library
Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. In Proceedings ICDM '08. 263--272. Google ScholarDigital Library
Dietmar Jannach and Malte Ludewig. 2017. When Recurrent Neural Networks Meet the Neighborhood for Session-Based Recommendation. In Proceedings RecSys '17. 306--310. Google ScholarDigital Library
Dietmar Jannach, Paul Resnick, Alexander Tuzhilin, and Markus Zanker. 2016. Recommender Systems - Beyond Matrix Completion. Commun. ACM 59, 11 (2016), 94--102. Google ScholarDigital Library
Donghyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu. 2016. Convolutional Matrix Factorization for Document Context-Aware Recommendation. In Proceedings RecSys '16. 233--240. Google ScholarDigital Library
Joseph A. Konstan and John Riedl. 2012. Recommender systems: from algorithms to user experience. User Modeling and User-Adapted Interaction 22, 1 (2012), 101--123. Google ScholarDigital Library
Xiaopeng Li and James She. 2017. Collaborative variational autoencoder for recommender systems. In Proceedings KDD '17. 305--314. Google ScholarDigital Library
Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings WWW '18. 689--698. Google ScholarDigital Library
Jimmy Lin. 2019. The Neural Hype and Comparisons Against Weak Baselines. SIGIR Forum 52, 2 (Jan. 2019), 40--51. Google ScholarDigital Library
G. Linden, B. Smith, and J. York. 2003. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing 7, 1 (2003), 76--80. Google ScholarDigital Library
Zachary C. Lipton and Jacob Steinhardt. 2018. Troubling Trends in Machine Learning Scholarship. arXiv:arXiv:1807.03341Google Scholar
Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2011. Content-based recommender systems: State of the art and trends. In Recommender Systems Handbook. Springer, 73--105.Google Scholar
Malte Ludewig and Dietmar Jannach. 2018. Evaluation of Session-based Recommendation Algorithms. User-Modeling and User-Adapted Interaction 28, 4--5 (2018), 331--390. Google ScholarDigital Library
Malte Ludewig, Noemi Mauro, Sara Latifi, and Dietmar Jannach. 2019. Performance Comparison of Neural and Non-Neural Approaches to Session-based Recommendation. In Proceedings RecSys '19. Google ScholarDigital Library
Andrii Maksai, Florent Garcin, and Boi Faltings. 2015. Predicting Online Performance of News Recommender Systems Through Richer Evaluation Metrics. In Proceedings RecSys '15. 179--186. Google ScholarDigital Library
Jarana Manotumruksa, Craig Macdonald, and Iadh Ounis. 2018. A Contextual Attention Recurrent Architecture for Context-Aware Venue Recommendation. In Proceedings SIGIR '18. 555--564. Google ScholarDigital Library
Xia Ning and George Karypis. 2011. SLIM: Sparse linear methods for top-n recommender systems. In Proceedings ICDM '11. 497--506. Google ScholarDigital Library
Bibek Paudel, Fabian Christoffel, Chris Newell, and Abraham Bernstein. 2017. Updatable, Accurate, Diverse, and Scalable Recommendations for Interactive Applications. ACM Transactions on Interactive Intelligent Systems 7, 1 (2017), 1. Google ScholarDigital Library
Hans Ekkehard Plesser. 2017. Reproducibility vs. Replicability: A Brief History of a Confused Terminology. Frontiers in Neuroinformatics 11, 76 (2017).Google Scholar
Massimo Quadrana, Paolo Cremonesi, and Dietmar Jannach. 2018. Sequence-Aware Recommender Systems. Comput. Surveys 51, 4 (2018), 1--36. Google ScholarDigital Library
Marco Rossetti, Fabio Stella, and Markus Zanker. 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms. In Proceedings RecSys '16. 31--34. Google ScholarDigital Library
Noveen Sachdeva, Kartik Gupta, and Vikram Pudi. 2018. Attentive Neural Architecture Incorporating Song Features for Music Recommendation. In Proceedings RecSys '18. 417--421. Google ScholarDigital Library
Alan Said and Alejandro Bellogín. 2014. Rival: A Toolkit to Foster Reproducibility in Recommender System Evaluation. In Proceedings RecSys '14. 371--372. Google ScholarDigital Library
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings WWW '01. 285--295. Google ScholarDigital Library
Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, and Chi Xu. 2018. Recurrent Knowledge Graph Embedding for Effective Recommendation. In Proceedings RecSys '18. 297--305. Google ScholarDigital Library
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Latent relational metric learning via memory-based attention for collaborative ranking. In Proceedings WWW '18. 729--739. Google ScholarDigital Library
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Multi-Pointer Co-Attention Networks for Recommendation. In Proceedings SIGKDD '18. 2309--2318. Google ScholarDigital Library
Trinh Xuan Tuan and Tu Minh Phuong. 2017. 3D Convolutional Networks for Session-based Recommendation with Content Features. In Proceedings RecSys '17. 138--146. Google ScholarDigital Library
Flavian Vasile, Elena Smirnova, and Alexis Conneau. 2016. Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation. In Proceedings RecSys '16. 225--232. Google ScholarDigital Library
Kiri Wagstaff. 2012. Machine Learning that Matters. In Proceedings ICML '12. 529--536. Google ScholarDigital Library
Chong Wang and David M Blei. 2011. Collaborative topic modeling for recommending scientific articles. In Proceedings KDD '11. 448--456. Google ScholarDigital Library
Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learning for recommender systems. In Proceedings KDD '15. 1235--1244. Google ScholarDigital Library
Jun Wang, Arjen P De Vries, and Marcel JT Reinders. 2006. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proceedings SIGIR '06. 501--508. Google ScholarDigital Library
Jun Wang, Stephen Robertson, Arjen P de Vries, and Marcel JT Reinders. 2008. Probabilistic relevance ranking for collaborative filtering. Information Retrieval 11, 6 (2008), 477--497. Google ScholarDigital Library
Yao Wu, Christopher DuBois, Alice X Zheng, and Martin Ester. 2016. Collaborative denoising auto-encoders for top-n recommender systems. In Proceedings WSDM '16. 153--162. Google ScholarDigital Library
Bo Xiao and Izak Benbasat. 2007. E-commerce Product Recommendation Agents: Use, Characteristics, and Impact. MIS Quarterly 31, 1 (March 2007), 137--209. Google ScholarDigital Library
Lei Zheng, Chun-Ta Lu, Fei Jiang, Jiawei Zhang, and Philip S. Yu. 2018. Spectral Collaborative Filtering. In Proceedings RecSys '18. 311--319. Google ScholarDigital Library

Index Terms

Are we really making much progress? A worrying analysis of recent neural recommendation approaches
1. General and reference
  1. Cross-computing tools and techniques
    1. Evaluation
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
  2. World Wide Web
    1. Web searching and information discovery
      1. Collaborative filtering

Recommendations

News Session-Based Recommendations using Deep Neural Networks
DLRS 2018: Proceedings of the 3rd Workshop on Deep Learning for Recommender Systems

News recommender systems are aimed to personalize users experiences and help them to discover relevant articles from a large and dynamic search space. Therefore, news domain is a challenging scenario for recommendations, due to its sparse user profiling,...
Read More
The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility Study
Advances in Information Retrieval
Abstract
Research has shown that recommender systems are typically biased towards popular items, which leads to less popular items being underrepresented in recommendations. The recent work of Abdollahpouri et al. in the context of movie recommendations ...
Read More
Coevolutionary Recommendation Model: Mutual Learning between Ratings and Reviews
WWW '18: Proceedings of the 2018 World Wide Web Conference

Collaborative filtering (CF) is a common recommendation approach that relies on user-item ratings. However, the natural sparsity of user-item rating data can be problematic in many domains and settings, limiting the ability to generate accurate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems
September 2019
635 pages
ISBN:9781450362436
DOI:10.1145/3298689
General Chairs:
Toine Bogers
Aalborg University Copenhagen, Denmark
,
Alan Said
University of Gothenburg, Sweden
,
Program Chairs:
Peter Brusilovsky
University of Pittsburgh
,
Domonkos Tikk
Gravity R&D, Hungary
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 September 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
deep learning
evaluation
recommender systems
reproducibility
Qualifiers
- research-article
Conference

Acceptance Rates
RecSys '19 Paper Acceptance Rate36of189submissions,19%Overall Acceptance Rate254of1,295submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 289
  Total Citations
  View Citations
- 8,003
  Total Downloads
- Downloads (Last 12 months)526
- Downloads (Last 6 weeks)71
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

RecSys '19: Proceedings of the 13th ACM Conference on Recommender Systems

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

News Session-Based Recommendations using Deep Neural Networks

The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility Study

Coevolutionary Recommendation Model: Mutual Learning between Ratings and Reviews