Abstract
The most common approach in predictive modeling is to describe cases with feature vectors (aka design matrix). Many machine learning methods such as linear regression or support vector machines rely on this representation. However, when the underlying data has strong relational patterns, especially relations with high cardinality, the design matrix can get very large which can make learning and prediction slow or even infeasible.
This work solves this issue by making use of repeating patterns in the design matrix which stem from the underlying relational structure of the data. It is shown how coordinate descent learning and Bayesian Markov Chain Monte Carlo inference can be scaled for linear regression and factorization machine models. Empirically, it is shown on two large scale and very competitive datasets (Netflix prize, KDDCup 2012), that (1) standard learning algorithms based on the design matrix representation cannot scale to relational predictor variables, (2) the proposed new algorithms scale and (3) the predictive quality of the proposed generic feature-based approach is as good as the best specialized models that have been tailored to the respective tasks.
- C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 281-288. MIT Press, Cambridge, MA, 2007.Google Scholar
- J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1-22, 2 2010.Google Scholar
- Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD'08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426-434, New York, NY, USA, 2008. ACM. Google Scholar
- Y. Koren. The bellkor solution to the netflix grand prize. 2009.Google Scholar
- Y. Koren. Collaborative filtering with temporal dynamics. In KDD'09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 447-456, New York, NY, USA, 2009. ACM. Google Scholar
- Y. Koren and R. M. Bell. Advances in collaborative filtering. In F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors, Recommender Systems Handbook, pages 145-186. Springer, 2011.Google Scholar
- Y. J. Lim and Y. W. Teh. Variational Bayesian approach to movie rating prediction. In Proceedings of KDD Cup and Workshop, 2007.Google Scholar
- J. Neville, D. Jensen, and B. Gallagher. Simple estimators for relational bayesian classifiers. In Proceedings of the Third IEEE International Conference on Data Mining, ICDM'03, pages 609-612, Washington, DC, USA, 2003. IEEE Computer Society. Google Scholar
- C. Perlich and F. Provost. Distribution-based aggregation for relational learning with identifier attributes. Mach. Learn., 62(1-2):65-105, Feb. 2006. Google Scholar
- I. Pilászy, D. Zibriczky, and D. Tikk. Fast als-based matrix factorization for explicit and implicit feedback datasets. In RecSys'10: Proceedings of the fourth ACM conference on Recommender systems, pages 71-78, New York, NY, USA, 2010. ACM. Google Scholar
- I. Porteous, A. Asuncion, and M. Welling. Bayesian matrix factorization with side information and dirichlet process mixtures. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, pages 563-568, 2010.Google Scholar
- S. Rendle. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM'10, pages 995-1000, Washington, DC, USA, 2010. IEEE Computer Society. Google Scholar
- S. Rendle. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol., 3(3):57:1-57:22, May 2012. Google Scholar
- S. Rendle. Social network and click-through prediction with factorization machines. In KDD-Cup Workshop, 2012.Google Scholar
- S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In WSDM'10: Proceedings of the third ACM international conference on Web search and data mining, pages 81-90, New York, NY, USA, 2010. ACM. Google Scholar
- R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine learning, ICML'08, pages 880-887, New York, NY, USA, 2008. ACM. Google Scholar
- R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1257-1264, Cambridge, MA, 2008. MIT Press.Google Scholar
- D. H. Stern, R. Herbrich, and T. Graepel. Matchbox: large scale online bayesian recommendations. In Proceedings of the 18th international conference on World wide web, WWW'09, pages 111-120, New York, NY, USA, 2009. ACM. Google Scholar
- G. Takács, I. Pilászy, B. Németh, and D. Tikk. Scalable collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res., 10:623-656, June 2009. Google Scholar
- K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1113-1120. ACM, 2009. Google Scholar
- L. Xiong, X. Chen, T.-K. Huang, J. Schneider, and J. G. Carbonell. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In Proceedings of the SIAM International Conference on Data Mining, pages 211-222. SIAM, 2010.Google Scholar
- H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'10, pages 833-842, New York, NY, USA, 2010. ACM. Google Scholar
- H.-F. Yu, H.-Y. Lo, H.-P. Hsieh, J.-K. Lou, T. G. McKenzie, J.-W. Chou, P.-H. Chung, C.-H. Ho, C.-F. Chang, Y.-H. Wei, J.-Y. Weng, E.-S. Yan, C.-W. Chang, T.-T. Kuo, Y.-C. Lo, P. T. Chang, C. Po, C.-Y. Wang, Y.-H. Huang, C.-W. Hung, Y.-X. Ruan, Y.-S. Lin, S. de Lin, H.-T. Lin, and C.-J. Lin. Feature engineering and classifier ensemble for kdd cup 2010. In Proceedings of KDD Cup and Workshop, 2010.Google Scholar
- S. Zhu, K. Yu, and Y. Gong. Stochastic relational models for large-scale dyadic data using MCMC. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1993-2000, 2009.Google Scholar
Index Terms
- Scaling factorization machines to relational data
Recommendations
Factorization Machines with libFM
Factorization approaches provide high accuracy in several important prediction problems, for example, recommender systems. However, applying factorization approaches to a new prediction problem is a nontrivial task and requires a lot of expert ...
Relational data factorization
Motivated by an analogy with matrix factorization, we introduce the problem of factorizing relational data. In matrix factorization, one is given a matrix and has to factorize it as a product of other matrices. In relational data factorization, the task ...
Factorization Machines
ICDM '10: Proceedings of the 2010 IEEE International Conference on Data MiningIn this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature ...
Comments