Top

International Journal of Parallel Programming

Published in:

06-10-2017

DCF: A Dataflow-Based Collaborative Filtering Training Algorithm

Authors: Xiangyu Ju, Quan Chen, Zhenning Wang, Minyi Guo, Guang R. Gao

Published in: International Journal of Parallel Programming | Issue 4/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Emerging recommender systems often adopt collaborative filtering techniques to improve the recommending accuracy. Existing collaborative filtering techniques are implemented with either alternating least square algorithm or gradient descent (GD) algorithm. However, both of the two algorithms are not scalable because ALS suffers from high computation complexity and GD suffers from severe synchronization problem and tremendous data movement. To solve the above problems, we proposed a Dataflow-based Collaborative Filtering (DCF) algorithm. More specifically, DCF exploits fine-grain asynchronous feature of dataflow model to minimize synchronization overhead; leverages mini-batch technique to reduce computation and communication complexities; uses dummy edge and multicasting techniques to avoid fine-grain overhead of dependency checking and reduce data movement. By utilizing all the above techniques, DCF is able to significantly improve the performance of collaborative filtering. Our experiment on a cluster with one master node and ten slave nodes show that DCF achieves 23\(\times \) speedup over ALS on Spark and 18\(\times \) speedup over GD on Graphlab in public datasets.

previous article Improving the Performance of Distributed TensorFlow with RDMA

next article Have Your Cake and Eat it (Too): A Concurrent Hash Table with Hardware Transactions

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Apache hadoop project. http://hadoop.apache.org/ (2017)

Abadi, M., Barham, P., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI. Savannah, Georgia, USA (2016)

Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749 (2005)CrossRef

Armbrust, M., Xin, R.S., et al.: Spark sql: relational data processing in spark. In: SIGMOD, pp. 1383–1394. ACM (2015)

Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Euro-Par 23, 187–198 (2011). doi:10.1002/cpe.1631

Bobadilla, J., Ortega, F., Hernando, A., Gutiérrez, A.: Knowledge-based systems. Recomm. Syst. Surv. 46, 109–132 (2013)

Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann Publishers Inc. (1998)

Chin, W.S., Zhuang, Y., Juan, Y.C., Lin, C.J.: A fast parallel stochastic gradient method for matrix factorization in shared memory systems. ACM Trans. Intell. Syst. Technol. (TIST) 6(1), 2 (2015)

Culler, D.E.: Dataflow architectures. Technical report, DTIC Document (1986)

10.

Dean, J., Corrado, G., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)

11.

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef

12.

Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: SIGKDD, pp. 69–77. ACM (2011)

13.

Hu, Y., Koren, Y., Volinsky, C.: Collaborative filtering for implicit feedback datasets. In: ICDM, pp. 263–272. IEEE (2008)

14.

Kim, J.K., Ho, Q., Lee, S., Zheng, X., Dai, W., Gibson, G.A., Xing, E.P.: Strads: a distributed framework for scheduled model parallel machine learning. In: Eurosys, p. 5 (2016)

15.

Koren, Y., Bell, R., Volinsky, C., et al.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRef

16.

Li, M., Andersen, D.G., et al.: Scaling distributed machine learning with the parameter server. In: OSDI, vol. 1, p. 3 (2014)

17.

Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014)

18.

Meng, X., Bradley, J., et al.: Mllib: machine learning in apache spark. JMLR 17(34), 1–7 (2016)MathSciNetMATH

19.

Oh, J., Han, W.S., Yu, H., Jiang, X.: Fast and robust parallel SGD matrix factorization. In: SIGKDD, pp. 865–874. ACM (2015)

20.

Takane, Y., Young, F.W., De Leeuw, J.: Nonmetric individual differences multidimensional scaling: an alternating least squares method with optimal scaling features. Psychometrika 42(1), 7–67 (1977)CrossRefMATH

21.

Zuckerman, S., Suetterlein, J., Knauerhase, R., Gao, G.R.: Using a codelet program execution model for exascale machines: position paper. In: Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, pp. 64–69. ACM (2011)

Title: DCF: A Dataflow-Based Collaborative Filtering Training Algorithm
Authors: Xiangyu Ju
Quan Chen
Zhenning Wang
Minyi Guo
Guang R. Gao
Publication date: 06-10-2017
Publisher: Springer US
Published in: International Journal of Parallel Programming / Issue 4/2018
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI: https://doi.org/10.1007/s10766-017-0525-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 4/2018

Improving the Performance of Distributed TensorFlow with RDMA

Have Your Cake and Eat it (Too): A Concurrent Hash Table with Hardware Transactions

RollSec: Automatically Secure Software States Against General Rollback

Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive

A Scalable Runtime Fault Localization Framework for High-Performance Computing Systems

Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Premium Partner