Skip to main content
Erschienen in: Knowledge and Information Systems 3/2014

01.12.2014 | Regular Paper

Parallel matrix factorization for recommender systems

verfasst von: Hsiang-Fu Yu, Cho-Jui Hsieh, Si Si, Inderjit S. Dhillon

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating least squares (ALS) and stochastic gradient descent (SGD) are two popular approaches to compute matrix factorization, and there has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been thoroughly explored. In this paper, we show that coordinate descent-based methods have a more efficient update rule compared to ALS and have faster and more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updates rank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, with a synthetic dataset containing 14.6 billion ratings, on a distributed memory cluster with 64 processors, to deliver the desired test RMSE, CCD++ is 49 times faster than SGD and 20 times faster than ALS. When the number of processors is increased to 256, CCD++ takes only 16 s and is still 40 times faster than SGD and 20 times faster than ALS.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
2
In [8], the name “Jellyfish” is used.
 
3
Intel MKL is used in our implementation of ALS.
 
4
We implement a multi-core version of DSGD according to [7].
 
5
HogWild is downloaded from http://​research.​cs.​wisc.​edu/​hazy/​victor/​Hogwild/​ and modified to start from the same initial point as ALS and DSGD.
 
6
In HogWild, seven cores are used for SGD updates, and one core is used for random shuffle.
 
7
for \(-\)s1, initial \(\eta =0.001\); for \(-\)s2, initial \(\eta =0.05\).
 
11
Our C implementation is 6x faster than the MATLAB version provided by [2].
 
12
\(\lambda \left( \sum _i |\Omega _{i}| \Vert \varvec{w}_{i}\Vert ^2 + \sum _j |\bar{\Omega }_{j}|\Vert \varvec{h}_{j}\Vert ^2\right) \) is used to replace the regularization term in (1).
 
14
 
Literatur
1.
Zurück zum Zitat Dror G, Koenigstein N, Koren Y, Weimer M (2012) The Yahoo! music dataset and KDD-Cup’11. In: JMLR workshop and conference proceedings: proceedings of KDD Cup 2011 competition, vol. 18, pp 3–18 Dror G, Koenigstein N, Koren Y, Weimer M (2012) The Yahoo! music dataset and KDD-Cup’11. In: JMLR workshop and conference proceedings: proceedings of KDD Cup 2011 competition, vol. 18, pp 3–18
2.
Zurück zum Zitat Zhou Y, Wilkinson D, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the Netflix prize. In: Proceedings of international conference on algorithmic aspects in, information and management Zhou Y, Wilkinson D, Schreiber R, Pan R (2008) Large-scale parallel collaborative filtering for the Netflix prize. In: Proceedings of international conference on algorithmic aspects in, information and management
3.
Zurück zum Zitat Koren Y, Bell RM, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42:30–37CrossRef Koren Y, Bell RM, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42:30–37CrossRef
4.
Zurück zum Zitat Takács G, Pilászy I, Németh B, Tikk D (2009) Scalable collaborative filtering approaches for large recommender systems. JMLR 10:623–656 Takács G, Pilászy I, Németh B, Tikk D (2009) Scalable collaborative filtering approaches for large recommender systems. JMLR 10:623–656
5.
Zurück zum Zitat Chen P-L, Tsai C-T, Chen Y-N, Chou K-C, Li C-L, Tsai C-H, Wu K-W, Chou Y-C, Li C-Y, Lin W-S, Yu S-H, Chiu R-B, Lin C-Y, Wang C-C, Wang P-W, Su W-L, Wu C-H, Kuo T-T, McKenzie TG, Chang Y-H, Ferng C-S, Niv, Lin H-T, Lin C-J, Lin S-D (2012) A linear ensemble of individual and blended models for music. In: JMLR workshop and conference proceedings: proceedings of KDD cup 2011 competition, vol. 18, pp 21–60 Chen P-L, Tsai C-T, Chen Y-N, Chou K-C, Li C-L, Tsai C-H, Wu K-W, Chou Y-C, Li C-Y, Lin W-S, Yu S-H, Chiu R-B, Lin C-Y, Wang C-C, Wang P-W, Su W-L, Wu C-H, Kuo T-T, McKenzie TG, Chang Y-H, Ferng C-S, Niv, Lin H-T, Lin C-J, Lin S-D (2012) A linear ensemble of individual and blended models for music. In: JMLR workshop and conference proceedings: proceedings of KDD cup 2011 competition, vol. 18, pp 21–60
6.
Zurück zum Zitat Langford J, Smola A, Zinkevich M (2009) Slow learners are fast. In: NIPS Langford J, Smola A, Zinkevich M (2009) Slow learners are fast. In: NIPS
7.
Zurück zum Zitat Gemulla R, Haas PJ, Nijkamp E, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: ACM KDD Gemulla R, Haas PJ, Nijkamp E, Sismanis Y (2011) Large-scale matrix factorization with distributed stochastic gradient descent. In: ACM KDD
8.
Zurück zum Zitat Recht B, Re C, (2013) Parallel stochastic gradient algorithms for large-scale matrix completion. Math Program Comput 5(2): 201–226 Recht B, Re C, (2013) Parallel stochastic gradient algorithms for large-scale matrix completion. Math Program Comput 5(2): 201–226
9.
Zurück zum Zitat Zinkevich M, Weimer M, Smola A, Li L (2010) Parallelized stochastic gradient descent. In: NIPS Zinkevich M, Weimer M, Smola A, Li L (2010) Parallelized stochastic gradient descent. In: NIPS
10.
Zurück zum Zitat Niu F, Recht B, Re C, Wright SJ (2011) Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. In: NIPS Niu F, Recht B, Re C, Wright SJ (2011) Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. In: NIPS
11.
Zurück zum Zitat Cichocki A, Phan A-H (2009) Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans Fundam Electron Commun Comput Sci, vol. E92-A, no. 3, pp 708–721 Cichocki A, Phan A-H (2009) Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans Fundam Electron Commun Comput Sci, vol. E92-A, no. 3, pp 708–721
12.
Zurück zum Zitat Hsieh C-J, Dhillon IS (2011) Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: ACM KDD Hsieh C-J, Dhillon IS (2011) Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: ACM KDD
13.
Zurück zum Zitat Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: proceedings of international conference on, computational statistics Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: proceedings of international conference on, computational statistics
14.
Zurück zum Zitat Agarwal A, Duchi JC (2011) Distributed delayed stochastic optimization. In: NIPS Agarwal A, Duchi JC (2011) Distributed delayed stochastic optimization. In: NIPS
15.
Zurück zum Zitat Bertsekas DP (1999) Nonlinear programming. Belmont, MA 02178–9998: Athena Scientific, second ed. Bertsekas DP (1999) Nonlinear programming. Belmont, MA 02178–9998: Athena Scientific, second ed.
16.
Zurück zum Zitat Hsieh C-J, Chang K-W, Lin C-J, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: ICML Hsieh C-J, Chang K-W, Lin C-J, Keerthi SS, Sundararajan S (2008) A dual coordinate descent method for large-scale linear SVM. In: ICML
17.
Zurück zum Zitat Yu H-F, Huang F-L, Lin C-J (2011) Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn 85(1–2):41–75CrossRefMATHMathSciNet Yu H-F, Huang F-L, Lin C-J (2011) Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn 85(1–2):41–75CrossRefMATHMathSciNet
18.
Zurück zum Zitat Hsieh C-J, Sustik M, Dhillon IS, Ravikumar P (2011) Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS Hsieh C-J, Sustik M, Dhillon IS, Ravikumar P (2011) Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS
19.
Zurück zum Zitat Pilászy I, Zibriczky D, Tikk D (2010) Fast ALS-based matrix factorization for explicit and implicit feedback datasets. In: ACM RecSys Pilászy I, Zibriczky D, Tikk D (2010) Fast ALS-based matrix factorization for explicit and implicit feedback datasets. In: ACM RecSys
20.
Zurück zum Zitat Bell RM, Koren Y, Volinsky C (2007) Modeling relationships at multiple scales to improve accuracy of large recommender systems. In: ACM KDD Bell RM, Koren Y, Volinsky C (2007) Modeling relationships at multiple scales to improve accuracy of large recommender systems. In: ACM KDD
21.
Zurück zum Zitat Ho N-D, Blondel PVDVD (2011) Descent methods for nonnegative matrix factorization. In: numerical linear algebra in signals, systems and control. Springer: Netherlands, SA, pp 251–293 Ho N-D, Blondel PVDVD (2011) Descent methods for nonnegative matrix factorization. In: numerical linear algebra in signals, systems and control. Springer: Netherlands, SA, pp 251–293
22.
Zurück zum Zitat Thakur R, Gropp W (2003) Improving the performance of collective operations in MPICH. In: proceedings of European PVM/MPI users’ group meeting Thakur R, Gropp W (2003) Improving the performance of collective operations in MPICH. In: proceedings of European PVM/MPI users’ group meeting
23.
Zurück zum Zitat Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. CoRR, vol. abs/1006.4990 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. CoRR, vol. abs/1006.4990
24.
Zurück zum Zitat Chung F, Lu L, Vu V (2003) The spectra of random graphs with given expected degrees. Intern Math, 1(3): 257–275 Chung F, Lu L, Vu V (2003) The spectra of random graphs with given expected degrees. Intern Math, 1(3): 257–275
25.
Zurück zum Zitat Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8): 716–727 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8): 716–727
26.
Zurück zum Zitat Yuan G-X, Chang K-W, Hsieh C-J, Lin C-J (2010) A comparison of optimization methods and software for large-scale l1-regularized linear classification. J Mach Learn Res 11:3183–3234MATHMathSciNet Yuan G-X, Chang K-W, Hsieh C-J, Lin C-J (2010) A comparison of optimization methods and software for large-scale l1-regularized linear classification. J Mach Learn Res 11:3183–3234MATHMathSciNet
Metadaten
Titel
Parallel matrix factorization for recommender systems
verfasst von
Hsiang-Fu Yu
Cho-Jui Hsieh
Si Si
Inderjit S. Dhillon
Publikationsdatum
01.12.2014
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2014
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0682-2

Weitere Artikel der Ausgabe 3/2014

Knowledge and Information Systems 3/2014 Zur Ausgabe