ABSTRACT
Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the corpus, users tend to provide feedback for a very small set of them, causing a power-law distribution. This makes the feedback data for long-tail items extremely sparse.
Inspired by the recent success in self-supervised representation learning research in both computer vision and natural language understanding, we propose a multi-task self-supervised learning (SSL) framework for large-scale item recommendations. The framework is designed to tackle the label sparsity problem by learning better latent relationship of item features. Specifically, SSL improves item representation learning as well as serving as additional regularization to improve generalization. Furthermore, we propose a novel data augmentation method that utilizes feature correlations within the proposed framework.
We evaluate our framework using two real-world datasets with 500M and 1B training examples respectively. Our results demonstrate the effectiveness of SSL regularization and show its superior performance over the state-of-the-art regularization techniques. We also have already launched the proposed techniques to a web-scale commercial app-to-app recommendation system, with significant improvements top-tier business metrics demonstrated in A/B experiments on live traffic. Our online results also verify our hypothesis that our framework indeed improves model performance even more on slices that lack supervision.
Supplemental Material
- Alex Beutel, Ed H. Chi, Zhiyuan Cheng, Hubert Pham, and John Anderson. [n.d.]. Beyond Globally Optimal: Focused Learning for Improved Recommendations. In WWW 2017. Google ScholarDigital Library
- L. Beyer, X. Zhai, A. Oliver, and A. Kolesnikov. [n.d.]. S4L: Self-Supervised Semi-Supervised Learning. In ICCV 2019.Google Scholar
- Wei-Cheng Chang, Felix X. Yu, Yin-Wen Chang, Yiming Yang, and Sanjiv Kumar. [n.d.]. Pre-training Tasks for Embedding-based Large-scale Retrieval. In ICLR 2020.Google Scholar
- Tianqi Chen and Carlos Guestrin. [n.d.]. XGBoost: A Scalable Tree Boosting System. In KDD 2016. Google ScholarDigital Library
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020 a. A Simple Framework for Contrastive Learning of Visual Representations. https://arxiv.org/abs/2002.05709Google Scholar
- Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. 2020 b. Big Self-Supervised Models are Strong Semi-Supervised Learners. arXiv preprint arXiv:2006.10029 (2020).Google Scholar
- Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. [n.d.]. Wide & Deep Learning for Recommender Systems (DLRS 2016). Google ScholarDigital Library
- Evangelia Christakopoulou and George Karypis. [n.d.]. Local Latent Space Models for Top-N Recommendation.Google Scholar
- Edith Cohen and David D. Lewis. [n.d.]. Approximating Matrix Multiplication for Pattern Recognition Tasks. In SODA 1997. Google ScholarDigital Library
- Paul Covington, Jay Adams, and Emre Sargin. [n.d.]. Deep Neural Networks for YouTube Recommendations. In RecSys 2016. Google ScholarDigital Library
- Maurizio Ferrari Dacrema, Paolo Cremonesi, and Dietmar Jannach. [n.d.]. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In RecSys 2019.Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. [n.d.]. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT 2019.Google Scholar
- John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res., Vol. 12, null (July 2011), 2121--2159. Google ScholarDigital Library
- Wikimedia Foundation. [n.d.]. Wikimedia. https://dumps.wikimedia.org/Google Scholar
- Spyros Gidaris, Praveer Singh, and Nikos Komodakis. [n.d.]. Unsupervised Representation Learning by Predicting Image Rotations. In ICLR 2018.Google Scholar
- Daniel Gillick, Alessandro Presta, and Gaurav Singh Tomar. 2018. End-to-End Retrieval in Continuous Space. CoRR, Vol. abs/1811.08008 (2018). http://arxiv.org/abs/1811.08008Google Scholar
- Chuan Guo, Ali Mousavi, Xiang Wu, Daniel N Holtmann-Rice, Satyen Kale, Sashank Reddi, and Sanjiv Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In Neurips,, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.). Google ScholarDigital Library
- Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. [n.d.]. Neural Collaborative Filtering. In WWW 2017.Google Scholar
- Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Qui nonero Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. Google ScholarDigital Library
- Alexander Kolesnikov, Xiaohua Zhai, and Lucas Beyer. [n.d.]. Revisiting Self-Supervised Visual Representation Learning. In CVPR 2019.Google Scholar
- Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer, Vol. 42, 8 (Aug. 2009), 30--37. Google ScholarDigital Library
- Yehuda Koren and Robert M. Bell. 2015. Advances in Collaborative Filtering. Springer, 77--118.Google Scholar
- Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang, Xinyang Yi, Lichan Hong, Ed Chi, and John Anderson. [n.d.]. Efficient Training on Very Large Corpora via Gramian Estimation. In ICLR 2019.Google Scholar
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. [n.d.]. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR 2020.Google Scholar
- Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. [n.d.]. Learning Representations for Automatic Colorization. In ECCV 2016.Google Scholar
- David C. Liu, Stephanie Rogers, Raymond Shiau, Dmitry Kislyuk, Kevin C. Ma, Zhigang Zhong, Jenny Liu, and Yushi Jing. [n.d.]. Related Pins at Pinterest: The Evolution of a Real-World Recommender System. In WWW 2017. Google ScholarDigital Library
- Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, and Wenwu Zhu. [n.d.]. Disentangled Self-Supervision in Sequential Recommenders. In KDD 2020.Google Scholar
- Klaas Bosteels Mark Levy. [n.d.]. Music Recommendation and the Long Tail. In 1st Workshop On Music Recommendation And Discovery (WOMRAD), ACM RecSys, 2010.Google Scholar
- Rishabh Mehrotra, Mounia Lalmas, Doug Kenney, Thomas Lim-Meng, and Golli Hashemian. [n.d.]. Jointly Leveraging Intent and Interaction Signals to Predict User Satisfaction with Slate Recommendations. In WWW 2019. Google ScholarDigital Library
- Stavsa Milojević. 2010. Power Law Distributions in Information Science: Making the Case for Logarithmic Binning. J. Am. Soc. Inf. Sci. Technol., Vol. 61, 12 (Dec. 2010), 2417--2425. Google ScholarDigital Library
- Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, and Misha Smelyanskiy. 2019. Deep Learning Recommendation Model for Personalization and Recommendation Systems. CoRR, Vol. abs/1906.00091 (2019).Google Scholar
- Wei Niu, James Caverlee, and Haokai Lu. [n.d.]. Neural Personalized Ranking for Image Recommendation. In WSDM 2018. Google ScholarDigital Library
- Mehdi Noroozi and Paolo Favaro. [n.d.]. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In ECCV 2016.Google ScholarCross Ref
- Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. [n.d.]. Embedding-Based News Recommendation for Millions of Users. In KDD 2017. Google ScholarDigital Library
- Maksims Volkovs, Guangwei Yu, and Tomi Poutanen. [n.d.]. DropoutNet: Addressing Cold Start in Recommender Systems. In Neurips 2017. Google ScholarDigital Library
- Zhirong Wu, Yuanjun Xiong, Stella Yu, and Dahua Lin. 2018. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination. CoRR, Vol. abs/1805.01978 (2018). http://arxiv.org/abs/1805.01978Google Scholar
- Xin Xin, Alexandros Karatzoglou, I. Arapakis, and J. Jose. [n.d.]. Self-Supervised Reinforcement Learning for Recommender Systems. SIGIR 2020 ([n.,d.]). Google ScholarDigital Library
- Yinfei Yang, Steve Yuan, Daniel Cer, Sheng-yi Kong, Noah Constant, Petr Pilar, Heming Ge, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. 2018. Learning Semantic Textual Similarity from Conversations. In Proceedings of The Third Workshop on Representation Learning for NLP. ACL, 164--174.Google ScholarCross Ref
- Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. [n.d.]. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations. In RecSys 2019. Google ScholarDigital Library
- Andrew Zhai, Dmitry Kislyuk, Yushi Jing, Michael Feng, Eric Tzeng, Jeff Donahue, Yue Li Du, and Trevor Darrell. [n.d.]. Visual Discovery at Pinterest. In WWW 2017. Google ScholarDigital Library
- Xu Zhang, Felix X. Yu, Sanjiv Kumar, and Shih-Fu Chang. [n.d.]. Learning Spread-Out Local Feature Descriptors. In ICCV 2017.Google Scholar
- Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi. [n.d.]. Recommending What Video to Watch next: A Multitask Ranking System. In RecSys 2019. Google ScholarDigital Library
- Kun Zhou, Haibo Wang, Wayne Xin Zhao, Yutao Zhu, Sirui Wang, Fuzheng Zhang, Zhong yuan Wang, and Jirong Wen. [n.d.]. S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. CIKM 2020 ([n.,d.]).Google Scholar
Index Terms
- Self-supervised Learning for Large-scale Item Recommendations
Recommendations
Multi-view Contrastive Learning Network for Recommendation
Pattern Recognition and Computer VisionAbstractKnowledge graphs (KGs) are being introduced into recommender systems in more and more scenarios. However, the supervised signals of the existing KG-aware recommendation models only come from the historical interactions between users and items, ...
Item cold-start recommendations: learning local collective embeddings
RecSys '14: Proceedings of the 8th ACM Conference on Recommender systemsRecommender systems suggest to users items that they might like (e.g., news articles, songs, movies) and, in doing so, they help users deal with information overload and enjoy a personalized experience. One of the main problems of these systems is the ...
Contrastive Collaborative Filtering for Cold-Start Item Recommendation
WWW '23: Proceedings of the ACM Web Conference 2023The cold-start problem is a long-standing challenge in recommender systems. As a promising solution, content-based generative models usually project a cold-start item’s content onto a warm-start item embedding to capture collaborative signals from item ...
Comments