research-article

Neural Factorization Machines for Sparse Predictive Analytics

Authors:
Xiangnan He

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Tat-Seng Chua

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalAugust 2017Pages 355–364https://doi.org/10.1145/3077136.3080777

Published:07 August 2017Publication History

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 355–364

ABSTRACT

Many predictive tasks of web applications need to model categorical variables, such as user IDs and demographics like genders and occupations. To apply standard machine learning techniques, these categorical predictors are always converted to a set of binary features via one-hot encoding, making the resultant feature vector highly sparse. To learn from such sparse data effectively, it is crucial to account for the interactions between features.

Factorization Machines (FMs) are a popular solution for efficiently using the second-order feature interactions. However, FM models feature interactions in a linear way, which can be insufficient for capturing the non-linear and complex inherent structure of real-world data. While deep neural networks have recently been applied to learn non-linear feature interactions in industry, such as the Wide&Deep by Google and DeepCross by Microsoft, the deep structure meanwhile makes them difficult to train.

In this paper, we propose a novel model Neural Factorization Machine (NFM) for prediction under sparse settings. NFM seamlessly combines the linearity of FM in modelling second-order feature interactions and the non-linearity of neural network in modelling higher-order feature interactions. Conceptually, NFM is more expressive than FM since FM can be seen as a special case of NFM without hidden layers. Empirical results on two regression tasks show that with one hidden layer only, NFM significantly outperforms FM with a 7.3% relative improvement. Compared to the recent deep learning methods Wide&Deep and DeepCross, our NFM uses a shallower structure but offers better performance, being much easier to train and tune in practice.

References

L. Baltrunas, K. Church, A. Karatzoglou, and N. Oliver. Frappe: Understanding the usage and perception of mobile app recommendations in-the-wild. CoRR, abs/1505.03014, 2015.Google Scholar
I. Bayer, X. He, B. Kanagal, and S. Rendle. A generic coordinate descent framework for learning from implicit feedback. In WWW, 2017. Google ScholarDigital Library
M. Blondel, A. Fujino, N. Ueda, and M. Ishihata. Higher-order factorization machines. In NIPS, 2016.Google Scholar
M. Blondel, M. Ishihata, A. Fujino, and N. Ueda. Polynomial networks and factorization machines: New insights and efficient training algorithms. In ICML, 2016.Google ScholarDigital Library
D. Cao, X. He, L. Nie, X. Wei, X. Hu, S. Wu, and T.-S. Chua. Cross-platform app recommendation by jointly modeling ratings and texts. ACM TOIS, 2017.Google ScholarDigital Library
J. Chen, B. Sun, H. Li, H. Lu, and X.-S. Hua. Deep ctr prediction in display advertising. In MM, 2016. Google ScholarDigital Library
J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T.-S. Chua. Attentive collaborative filtering: Multimedia recommendation with feature- and item-level attention. In SIGIR, 2017.Google ScholarDigital Library
T. Chen, X. He, and M.-Y. Kan. Context-aware image tweets modelling and recommendation. In MM, 2016. Google ScholarDigital Library
H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir, R. Anil, Z. Haque, L. Hong, V. Jain, X. Liu, and H. Shah. Wide & deep learning for recommender systems. In DLRS, 2016. Google ScholarDigital Library
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011.Google ScholarDigital Library
D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, and S. Bengio. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 2010.Google Scholar
M. Genzel and G. Kutyniok. A mathematical framework for feature selection from real-world data with non-linear observations. arXiv preprint arXiv:1608.08852, 2016.Google Scholar
F. M. Harper and J. A. Konstan. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems, 2015. Google ScholarDigital Library
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. Google ScholarCross Ref
X. He, M. Gao, M.-Y. Kan, Y. Liu, and K. Sugiyama. Predicting the popularity of web 2.0 items based on user comments. In SIGIR, 2014.Google ScholarDigital Library
X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua. Neural collaborative filtering. In WWW, 2017. Google ScholarDigital Library
X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua. Fast matrix factorization for online recommendation with implicit feedback. In SIGIR, 2016. Google ScholarDigital Library
L. Hong, A. S. Doumith, and B. D. Davison. Co-factorization machines: Modeling user interests and predicting individual decisions in twitter. In WSDM, 2013.Google ScholarDigital Library
R. Hong, Y. Yang, M. Wang, and X.-S. Hua. Learning visual semantic relationships for efficient visual retrieval. IEEE Transactions on Big Data, 2015.Google ScholarCross Ref
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.Google ScholarDigital Library
Y. Juan, Y. Zhuang, W.-S. Chin, and C.-J. Lin. Field-aware factorization machines for ctr prediction. In RecSys, 2016. Google ScholarDigital Library
Y. Koren. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In KDD, 2008. Google ScholarDigital Library
A. Novikov, M. Trofimov, and I. Oseledets. Exponential machines. In ICLR Workshop, 2017.Google Scholar
R. J. Oentaryo, E.-P. Lim, J.-W. Low, D. Lo, and M. Finegold. Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In WSDM, 2014. Google ScholarDigital Library
F. Petroni, L. Del Corro, and R. Gemulla. Core: Context-aware open relation extraction with factorization machines. In EMNLP, 2015.Google ScholarCross Ref
R. Qiang, F. Liang, and J. Yang. Exploiting ranking factorization machines for microblog retrieval. In CIKM, 2013. Google ScholarDigital Library
S. Rendle. Factorization machines. In ICDM, 2010. Google ScholarDigital Library
S. Rendle. Factorization machines with libfm. ACM Transactions on Intelligent Systems and Technology, 2012. Google ScholarDigital Library
S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In UAI, 2009.Google ScholarDigital Library
S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme. Fast context-aware recommendations with factorization machines. In SIGIR, 2011. Google ScholarDigital Library
Y. Shan, T. R. Hoens, J. Jiao, H. Wang, D. Yu, and J. Mao. Deep crossing: Web-scale modeling without manually crafted combinatorial features. In KDD, 2016.Google ScholarDigital Library
F. Shen, Y. Mu, Y. Yang, W. Liu, L. Liu, J. Song, and H. T. Shen. Classification by retrieval: Binarizing data and classifier. In SIGIR, 2017.Google ScholarDigital Library
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014.Google ScholarDigital Library
M. Wang, W. Fu, S. Hao, D. Tao, and X. Wu. Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Transaction on Knowledge and Data Engineering, 2016. Google ScholarCross Ref
M. Wang, X. Liu, and X. Wu. Visual classification by l1-hypergraph modeling. IEEE Transaction on Knowledge and Data Engineering, 2015. Google ScholarDigital Library
P. Wang, J. Guo, Y. Lan, J. Xu, S. Wan, and X. Cheng. Learning hierarchical representation model for nextbasket recommendation. In SIGIR, 2015. Google ScholarDigital Library
X. Wang, X. He, L. Nie, and T.-S. Chua. Item silk road: Recommending items from information domains to social users. In SIGIR, 2017.Google ScholarDigital Library
J. Xiao, H. Ye, X. He, H. Zhang, F. Wu, and T.-S. Chua. Attentional factorization machines: Learning the weight of feature interactions via attention networks. In IJCAI, 2017.Google ScholarDigital Library
C. Xiong, J. Callan, and T.-Y. Liu. Learning to attend and to rank with word-entity duets. In SIGIR, 2017.Google Scholar
C. Zhang, G. Zhou, Q. Yuan, H. Zhuang, Y. Zheng, L. Kaplan, S. Wang, and J. Han. Geoburst: Real-time local event detection in geo-tagged tweet streams. In SIGIR, 2016.Google ScholarDigital Library
H. Zhang, F. Shen, W. Liu, X. He, H. Luan, and T.-S. Chua. Discrete collaborative filtering. In SIGIR, 2016. Google ScholarDigital Library
H. Zhang, M. Wang, R. Hong, and T.-S. Chua. Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In MM, 2016.Google ScholarDigital Library
H. Zhang, Z.-J. Zha, Y. Yang, S. Yan, Y. Gao, and T.-S. Chua. Attribute-augmented semantic hierarchy: Towards bridging semantic gap and intention gap in image retrieval. In MM, 2013.Google ScholarDigital Library
W. Zhang, T. Du, and J. Wang. Deep learning over multi-field categorical data. In ECIR, 2016. Google ScholarCross Ref

Index Terms

Neural Factorization Machines for Sparse Predictive Analytics
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Factorization methods
      2. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

Factorization machines with regularization for sparse feature interactions

Factorization machines (FMs) are machine learning predictive models based on second-order feature interactions and FMs with sparse regularization are called sparse FMs. Such regularizations enable feature selection, which selects the most relevant ...
Read More
Deep Feature Fusion over Multi-field Categorical Data for Rating Prediction
AICCC '18: Proceedings of the 2018 Artificial Intelligence and Cloud Computing Conference

Many predictive tasks in recommender systems model from categorical variables. Different from continuous features extracted from images and videos, categorical data is discrete and of multi-field while their dependencies are little known, which brings ...
Read More
xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Combinatorial features are essential for the success of many commercial models. Manually crafting these features usually comes with high cost due to the variety, volume and velocity of raw data in web-scale systems. Factorization based models, which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2017
1476 pages
ISBN:9781450350228
DOI:10.1145/3077136
General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 August 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
factorization machines
neural networks
recommendation
regression
sparse data
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '17 Paper Acceptance Rate78of362submissions,22%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 100
  Total Citations
  View Citations
- 3,972
  Total Downloads
- Downloads (Last 12 months)370
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Neural Factorization Machines for Sparse Predictive Analytics

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Factorization machines with regularization for sparse feature interactions

Deep Feature Fusion over Multi-field Categorical Data for Rating Prediction

xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems