Skip to main content
Erschienen in: International Journal of Parallel Programming 4/2018

06.10.2017

DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

verfasst von: Xiangyu Ju, Quan Chen, Zhenning Wang, Minyi Guo, Guang R. Gao

Erschienen in: International Journal of Parallel Programming | Ausgabe 4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Emerging recommender systems often adopt collaborative filtering techniques to improve the recommending accuracy. Existing collaborative filtering techniques are implemented with either alternating least square algorithm or gradient descent (GD) algorithm. However, both of the two algorithms are not scalable because ALS suffers from high computation complexity and GD suffers from severe synchronization problem and tremendous data movement. To solve the above problems, we proposed a Dataflow-based Collaborative Filtering (DCF) algorithm. More specifically, DCF exploits fine-grain asynchronous feature of dataflow model to minimize synchronization overhead; leverages mini-batch technique to reduce computation and communication complexities; uses dummy edge and multicasting techniques to avoid fine-grain overhead of dependency checking and reduce data movement. By utilizing all the above techniques, DCF is able to significantly improve the performance of collaborative filtering. Our experiment on a cluster with one master node and ten slave nodes show that DCF achieves 23\(\times \) speedup over ALS on Spark and 18\(\times \) speedup over GD on Graphlab in public datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Abadi, M., Barham, P., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI. Savannah, Georgia, USA (2016) Abadi, M., Barham, P., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI. Savannah, Georgia, USA (2016)
3.
Zurück zum Zitat Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)CrossRef Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)CrossRef
4.
Zurück zum Zitat Armbrust, M., Xin, R.S., et al.: Spark sql: relational data processing in spark. In: SIGMOD, pp. 1383–1394. ACM (2015) Armbrust, M., Xin, R.S., et al.: Spark sql: relational data processing in spark. In: SIGMOD, pp. 1383–1394. ACM (2015)
5.
Zurück zum Zitat Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Euro-Par 23, 187–198 (2011). doi:10.1002/cpe.1631 Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Euro-Par 23, 187–198 (2011). doi:10.​1002/​cpe.​1631
6.
Zurück zum Zitat Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Knowledge-based systems. Recomm. Syst. Surv. 46, 109–132 (2013) Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Knowledge-based systems. Recomm. Syst. Surv. 46, 109–132 (2013)
7.
Zurück zum Zitat Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann Publishers Inc. (1998) Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann Publishers Inc. (1998)
8.
Zurück zum Zitat Chin, W.S., Zhuang, Y., Juan, Y.C., Lin, C.J.: A fast parallel stochastic gradient method for matrix factorization in shared memory systems. ACM Trans. Intell. Syst. Technol. (TIST) 6(1), 2 (2015) Chin, W.S., Zhuang, Y., Juan, Y.C., Lin, C.J.: A fast parallel stochastic gradient method for matrix factorization in shared memory systems. ACM Trans. Intell. Syst. Technol. (TIST) 6(1), 2 (2015)
9.
Zurück zum Zitat Culler, D.E.: Dataflow architectures. Technical report, DTIC Document (1986) Culler, D.E.: Dataflow architectures. Technical report, DTIC Document (1986)
10.
Zurück zum Zitat Dean, J., Corrado, G., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012) Dean, J., Corrado, G., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
11.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
12.
Zurück zum Zitat Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: SIGKDD, pp. 69–77. ACM (2011) Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: SIGKDD, pp. 69–77. ACM (2011)
13.
Zurück zum Zitat Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: ICDM, pp. 263–272. IEEE (2008) Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: ICDM, pp. 263–272. IEEE (2008)
14.
Zurück zum Zitat Kim, J.K., Ho, Q., Lee, S., Zheng, X., Dai, W., Gibson, G.A., Xing, E.P.: Strads: a distributed framework for scheduled model parallel machine learning. In: Eurosys, p. 5 (2016) Kim, J.K., Ho, Q., Lee, S., Zheng, X., Dai, W., Gibson, G.A., Xing, E.P.: Strads: a distributed framework for scheduled model parallel machine learning. In: Eurosys, p. 5 (2016)
15.
Zurück zum Zitat Koren, Y., Bell, R., Volinsky, C., et al.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRef Koren, Y., Bell, R., Volinsky, C., et al.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRef
16.
Zurück zum Zitat Li, M., Andersen, D.G., et al.: Scaling distributed machine learning with the parameter server. In: OSDI, vol. 1, p. 3 (2014) Li, M., Andersen, D.G., et al.: Scaling distributed machine learning with the parameter server. In: OSDI, vol. 1, p. 3 (2014)
17.
Zurück zum Zitat Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014) Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:​1408.​2041 (2014)
18.
Zurück zum Zitat Meng, X., Bradley, J., et al.: Mllib: machine learning in apache spark. JMLR 17(34), 1–7 (2016)MathSciNetMATH Meng, X., Bradley, J., et al.: Mllib: machine learning in apache spark. JMLR 17(34), 1–7 (2016)MathSciNetMATH
19.
Zurück zum Zitat Oh, J., Han, W.S., Yu, H., Jiang, X.: Fast and robust parallel SGD matrix factorization. In: SIGKDD, pp. 865–874. ACM (2015) Oh, J., Han, W.S., Yu, H., Jiang, X.: Fast and robust parallel SGD matrix factorization. In: SIGKDD, pp. 865–874. ACM (2015)
20.
Zurück zum Zitat Takane, Y., Young, F.W., De Leeuw, J.: Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1), 7–67 (1977)CrossRefMATH Takane, Y., Young, F.W., De Leeuw, J.: Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1), 7–67 (1977)CrossRefMATH
21.
Zurück zum Zitat Zuckerman, S., Suetterlein, J., Knauerhase, R., Gao, G.R.: Using a codelet program execution model for exascale machines: position paper. In: Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, pp. 64–69. ACM (2011) Zuckerman, S., Suetterlein, J., Knauerhase, R., Gao, G.R.: Using a codelet program execution model for exascale machines: position paper. In: Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, pp. 64–69. ACM (2011)
Metadaten
Titel
DCF: A Dataflow-Based Collaborative Filtering Training Algorithm
verfasst von
Xiangyu Ju
Quan Chen
Zhenning Wang
Minyi Guo
Guang R. Gao
Publikationsdatum
06.10.2017
Verlag
Springer US
Erschienen in
International Journal of Parallel Programming / Ausgabe 4/2018
Print ISSN: 0885-7458
Elektronische ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-017-0525-y

Weitere Artikel der Ausgabe 4/2018

International Journal of Parallel Programming 4/2018 Zur Ausgabe