skip to main content
10.1145/3077136.3080777acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Neural Factorization Machines for Sparse Predictive Analytics

Authors Info & Claims
Published:07 August 2017Publication History

ABSTRACT

Many predictive tasks of web applications need to model categorical variables, such as user IDs and demographics like genders and occupations. To apply standard machine learning techniques, these categorical predictors are always converted to a set of binary features via one-hot encoding, making the resultant feature vector highly sparse. To learn from such sparse data effectively, it is crucial to account for the interactions between features.

Factorization Machines (FMs) are a popular solution for efficiently using the second-order feature interactions. However, FM models feature interactions in a linear way, which can be insufficient for capturing the non-linear and complex inherent structure of real-world data. While deep neural networks have recently been applied to learn non-linear feature interactions in industry, such as the Wide&Deep by Google and DeepCross by Microsoft, the deep structure meanwhile makes them difficult to train.

In this paper, we propose a novel model Neural Factorization Machine (NFM) for prediction under sparse settings. NFM seamlessly combines the linearity of FM in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions. Conceptually, NFM is more expressive than FM since FM can be seen as a special case of NFM without hidden layers. Empirical results on two regression tasks show that with one hidden layer only, NFM significantly outperforms FM with a 7.3% relative improvement. Compared to the recent deep learning methods Wide&Deep and DeepCross, our NFM uses a shallower structure but offers better performance, being much easier to train and tune in practice.

References

  1. L. Baltrunas, K. Church, A. Karatzoglou, and N. Oliver. Frappe: Understanding the usage and perception of mobile app recommendations in-the-wild. CoRR, abs/1505.03014, 2015.Google ScholarGoogle Scholar
  2. I. Bayer, X. He, B. Kanagal, and S. Rendle. A generic coordinate descent framework for learning from implicit feedback. In WWW, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Blondel, A. Fujino, N. Ueda, and M. Ishihata. Higher-order factorization machines. In NIPS, 2016.Google ScholarGoogle Scholar
  4. M. Blondel, M. Ishihata, A. Fujino, and N. Ueda. Polynomial networks and factorization machines: New insights and efficient training algorithms. In ICML, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Cao, X. He, L. Nie, X. Wei, X. Hu, S. Wu, and T.-S. Chua. Cross-platform app recommendation by jointly modeling ratings and texts. ACM TOIS, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Chen, B. Sun, H. Li, H. Lu, and X.-S. Hua. Deep ctr prediction in display advertising. In MM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua. Attentive collaborative filtering: Multimedia recommendation with feature- and item-level attention. In SIGIR, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Chen, X. He, and M.-Y. Kan. Context-aware image tweets modelling and recommendation. In MM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu, and H. Shah. Wide & deep learning for recommender systems. In DLRS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 2010.Google ScholarGoogle Scholar
  12. M. Genzel and G. Kutyniok. A mathematical framework for feature selection from real-world data with non-linear observations. arXiv preprint arXiv:1608.08852, 2016.Google ScholarGoogle Scholar
  13. F. M. Harper and J. A. Konstan. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  15. X. He, M. Gao, M.-Y. Kan, Y. Liu, and K. Sugiyama. Predicting the popularity of web 2.0 items based on user comments. In SIGIR, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua. Neural collaborative filtering. In WWW, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua. Fast matrix factorization for online recommendation with implicit feedback. In SIGIR, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Hong, A. S. Doumith, and B. D. Davison. Co-factorization machines: Modeling user interests and predicting individual decisions in twitter. In WSDM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Hong, Y. Yang, M. Wang, and X.-S. Hua. Learning visual semantic relationships for efficient visual retrieval. IEEE Transactions on Big Data, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Juan, Y. Zhuang, W.-S. Chin, and C.-J. Lin. Field-aware factorization machines for ctr prediction. In RecSys, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Koren. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Novikov, M. Trofimov, and I. Oseledets. Exponential machines. In ICLR Workshop, 2017.Google ScholarGoogle Scholar
  24. R. J. Oentaryo, E.-P. Lim, J.-W. Low, D. Lo, and M. Finegold. Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In WSDM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Petroni, L. Del Corro, and R. Gemulla. Core: Context-aware open relation extraction with factorization machines. In EMNLP, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  26. R. Qiang, F. Liang, and J. Yang. Exploiting ranking factorization machines for microblog retrieval. In CIKM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Rendle. Factorization machines. In ICDM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In UAI, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme. Fast context-aware recommendations with factorization machines. In SIGIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Shan, T. R. Hoens, J. Jiao, H. Wang, D. Yu, and J. Mao. Deep crossing: Web-scale modeling without manually crafted combinatorial features. In KDD, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Shen, Y. Mu, Y. Yang, W. Liu, L. Liu, J. Song, and H. T. Shen. Classification by retrieval: Binarizing data and classifier. In SIGIR, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Wang, W. Fu, S. Hao, D. Tao, and X. Wu. Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Transaction on Knowledge and Data Engineering, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  35. M. Wang, X. Liu, and X. Wu. Visual classification by l1-hypergraph modeling. IEEE Transaction on Knowledge and Data Engineering, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Wang, J. Guo, Y. Lan, J. Xu, S. Wan, and X. Cheng. Learning hierarchical representation model for nextbasket recommendation. In SIGIR, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. X. Wang, X. He, L. Nie, and T.-S. Chua. Item silk road: Recommending items from information domains to social users. In SIGIR, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Xiao, H. Ye, X. He, H. Zhang, F. Wu, and T.-S. Chua. Attentional factorization machines: Learning the weight of feature interactions via attention networks. In IJCAI, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Xiong, J. Callan, and T.-Y. Liu. Learning to attend and to rank with word-entity duets. In SIGIR, 2017.Google ScholarGoogle Scholar
  40. C. Zhang, G. Zhou, Q. Yuan, H. Zhuang, Y. Zheng, L. Kaplan, S. Wang, and J. Han. Geoburst: Real-time local event detection in geo-tagged tweet streams. In SIGIR, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. Zhang, F. Shen, W. Liu, X. He, H. Luan, and T.-S. Chua. Discrete collaborative filtering. In SIGIR, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H. Zhang, M. Wang, R. Hong, and T.-S. Chua. Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In MM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. Zhang, Z.-J. Zha, Y. Yang, S. Yan, Y. Gao, and T.-S. Chua. Attribute-augmented semantic hierarchy: Towards bridging semantic gap and intention gap in image retrieval. In MM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. W. Zhang, T. Du, and J. Wang. Deep learning over multi-field categorical data. In ECIR, 2016. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Neural Factorization Machines for Sparse Predictive Analytics

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
          August 2017
          1476 pages
          ISBN:9781450350228
          DOI:10.1145/3077136

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGIR '17 Paper Acceptance Rate78of362submissions,22%Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader