skip to main content
article

Scaling factorization machines to relational data

Published:01 March 2013Publication History
Skip Abstract Section

Abstract

The most common approach in predictive modeling is to describe cases with feature vectors (aka design matrix). Many machine learning methods such as linear regression or support vector machines rely on this representation. However, when the underlying data has strong relational patterns, especially relations with high cardinality, the design matrix can get very large which can make learning and prediction slow or even infeasible.

This work solves this issue by making use of repeating patterns in the design matrix which stem from the underlying relational structure of the data. It is shown how coordinate descent learning and Bayesian Markov Chain Monte Carlo inference can be scaled for linear regression and factorization machine models. Empirically, it is shown on two large scale and very competitive datasets (Netflix prize, KDDCup 2012), that (1) standard learning algorithms based on the design matrix representation cannot scale to relational predictor variables, (2) the proposed new algorithms scale and (3) the predictive quality of the proposed generic feature-based approach is as good as the best specialized models that have been tailored to the respective tasks.

References

  1. C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 281-288. MIT Press, Cambridge, MA, 2007.Google ScholarGoogle Scholar
  2. J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1-22, 2 2010.Google ScholarGoogle Scholar
  3. Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD'08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426-434, New York, NY, USA, 2008. ACM. Google ScholarGoogle Scholar
  4. Y. Koren. The bellkor solution to the netflix grand prize. 2009.Google ScholarGoogle Scholar
  5. Y. Koren. Collaborative filtering with temporal dynamics. In KDD'09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 447-456, New York, NY, USA, 2009. ACM. Google ScholarGoogle Scholar
  6. Y. Koren and R. M. Bell. Advances in collaborative filtering. In F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors, Recommender Systems Handbook, pages 145-186. Springer, 2011.Google ScholarGoogle Scholar
  7. Y. J. Lim and Y. W. Teh. Variational Bayesian approach to movie rating prediction. In Proceedings of KDD Cup and Workshop, 2007.Google ScholarGoogle Scholar
  8. J. Neville, D. Jensen, and B. Gallagher. Simple estimators for relational bayesian classifiers. In Proceedings of the Third IEEE International Conference on Data Mining, ICDM'03, pages 609-612, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle Scholar
  9. C. Perlich and F. Provost. Distribution-based aggregation for relational learning with identifier attributes. Mach. Learn., 62(1-2):65-105, Feb. 2006. Google ScholarGoogle Scholar
  10. I. Pilászy, D. Zibriczky, and D. Tikk. Fast als-based matrix factorization for explicit and implicit feedback datasets. In RecSys'10: Proceedings of the fourth ACM conference on Recommender systems, pages 71-78, New York, NY, USA, 2010. ACM. Google ScholarGoogle Scholar
  11. I. Porteous, A. Asuncion, and M. Welling. Bayesian matrix factorization with side information and dirichlet process mixtures. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, pages 563-568, 2010.Google ScholarGoogle Scholar
  12. S. Rendle. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM'10, pages 995-1000, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle Scholar
  13. S. Rendle. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol., 3(3):57:1-57:22, May 2012. Google ScholarGoogle Scholar
  14. S. Rendle. Social network and click-through prediction with factorization machines. In KDD-Cup Workshop, 2012.Google ScholarGoogle Scholar
  15. S. Rendle and L. Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In WSDM'10: Proceedings of the third ACM international conference on Web search and data mining, pages 81-90, New York, NY, USA, 2010. ACM. Google ScholarGoogle Scholar
  16. R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine learning, ICML'08, pages 880-887, New York, NY, USA, 2008. ACM. Google ScholarGoogle Scholar
  17. R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1257-1264, Cambridge, MA, 2008. MIT Press.Google ScholarGoogle Scholar
  18. D. H. Stern, R. Herbrich, and T. Graepel. Matchbox: large scale online bayesian recommendations. In Proceedings of the 18th international conference on World wide web, WWW'09, pages 111-120, New York, NY, USA, 2009. ACM. Google ScholarGoogle Scholar
  19. G. Takács, I. Pilászy, B. Németh, and D. Tikk. Scalable collaborative filtering approaches for large recommender systems. J. Mach. Learn. Res., 10:623-656, June 2009. Google ScholarGoogle Scholar
  20. K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 1113-1120. ACM, 2009. Google ScholarGoogle Scholar
  21. L. Xiong, X. Chen, T.-K. Huang, J. Schneider, and J. G. Carbonell. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In Proceedings of the SIAM International Conference on Data Mining, pages 211-222. SIAM, 2010.Google ScholarGoogle Scholar
  22. H.-F. Yu, C.-J. Hsieh, K.-W. Chang, and C.-J. Lin. Large linear classification when data cannot fit in memory. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD'10, pages 833-842, New York, NY, USA, 2010. ACM. Google ScholarGoogle Scholar
  23. H.-F. Yu, H.-Y. Lo, H.-P. Hsieh, J.-K. Lou, T. G. McKenzie, J.-W. Chou, P.-H. Chung, C.-H. Ho, C.-F. Chang, Y.-H. Wei, J.-Y. Weng, E.-S. Yan, C.-W. Chang, T.-T. Kuo, Y.-C. Lo, P. T. Chang, C. Po, C.-Y. Wang, Y.-H. Huang, C.-W. Hung, Y.-X. Ruan, Y.-S. Lin, S. de Lin, H.-T. Lin, and C.-J. Lin. Feature engineering and classifier ensemble for kdd cup 2010. In Proceedings of KDD Cup and Workshop, 2010.Google ScholarGoogle Scholar
  24. S. Zhu, K. Yu, and Y. Gong. Stochastic relational models for large-scale dyadic data using MCMC. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1993-2000, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Scaling factorization machines to relational data

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Proceedings of the VLDB Endowment
              Proceedings of the VLDB Endowment  Volume 6, Issue 5
              March 2013
              60 pages

              Publisher

              VLDB Endowment

              Publication History

              • Published: 1 March 2013
              Published in pvldb Volume 6, Issue 5

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader