Skip to main content
Top

08-03-2018 | Regular Paper

Large-scale asynchronous distributed learning based on parameter exchanges

Authors: Bikash Joshi, Franck Iutzeler, Massih-Reza Amini

Published in: International Journal of Data Science and Analytics | Issue 4/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In many distributed learning problems, the heterogeneous loading of computing machines may harm the overall performance of synchronous strategies, as each machine begins its new computations after receiving an aggregated information from a master and any delay in sending local information to the latter may be a bottleneck. In this paper, we propose an effective asynchronous distributed framework for the minimization of a sum of smooth functions, where each machine performs iterations in parallel on its local function and updates a shared parameter asynchronously. In this way, all machines can continuously work even though they do not have the latest version of the shared parameter. We prove the convergence of the consistency of this general distributed asynchronous method for gradient iterations and then show its efficiency on the matrix factorization problem for recommender systems and on binary classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)CrossRefMATH Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)CrossRefMATH
2.
go back to reference Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc, Upper Saddle River (1999)MATH Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc, Upper Saddle River (1999)MATH
3.
go back to reference Chang, T., Hong, M., Liao, W., Wang, X.: Asynchronous distributed ADMM for large-scale optimization—part I: algorithm and convergence analysis (2015). arXiv preprint arXiv:1509.02597 Chang, T., Hong, M., Liao, W., Wang, X.: Asynchronous distributed ADMM for large-scale optimization—part I: algorithm and convergence analysis (2015). arXiv preprint arXiv:​1509.​02597
4.
go back to reference Chin, W.S., Zhuang, Y., Juan, Y.C., Lin, C.J.: A learning-rate schedule for stochastic gradient methods to matrix factorization. In: Advances in Knowledge Discovery and Data Mining, pp. 442–455. Springer, Cham (2015) Chin, W.S., Zhuang, Y., Juan, Y.C., Lin, C.J.: A learning-rate schedule for stochastic gradient methods to matrix factorization. In: Advances in Knowledge Discovery and Data Mining, pp. 442–455. Springer, Cham (2015)
5.
go back to reference Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q.V., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012) Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q.V., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
6.
go back to reference Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014) Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)
7.
go back to reference Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–77. ACM (2011) Gemulla, R., Nijkamp, E., Haas, P.J., Sismanis, Y.: Large-scale matrix factorization with distributed stochastic gradient descent. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–77. ACM (2011)
8.
go back to reference Ho, Q., Cipar, J., Cui, H., Lee, S., Kim, J.K., Gibbons, P.B., Gibson, G.A., Ganger, G., Xing, E.P.: More effective distributed ml via a stale synchronous parallel parameter server. In: Advances in Neural Information Processing Systems, vol. 26, pp. 1223–1231 (2013) Ho, Q., Cipar, J., Cui, H., Lee, S., Kim, J.K., Gibbons, P.B., Gibson, G.A., Ganger, G., Xing, E.P.: More effective distributed ml via a stale synchronous parallel parameter server. In: Advances in Neural Information Processing Systems, vol. 26, pp. 1223–1231 (2013)
9.
go back to reference Huo, Z., Huang, H.: Asynchronous stochastic gradient descent with variance reduction for non-convex optimization (2016). ArXiv preprint arXiv:1604.03584 Huo, Z., Huang, H.: Asynchronous stochastic gradient descent with variance reduction for non-convex optimization (2016). ArXiv preprint arXiv:​1604.​03584
10.
go back to reference Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013) Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
11.
go back to reference Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRef Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)CrossRef
13.
go back to reference Makari, F., Teflioudi, C., Gemulla, R., Haas, P., Sismanis, Y.: Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion. Knowl. Inf. Syst. 42(3), 493–523 (2015)CrossRef Makari, F., Teflioudi, C., Gemulla, R., Haas, P., Sismanis, Y.: Shared-memory and shared-nothing stochastic gradient descent algorithms for matrix completion. Knowl. Inf. Syst. 42(3), 493–523 (2015)CrossRef
14.
go back to reference Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence_rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012) Roux, N.L., Schmidt, M., Bach, F.R.: A stochastic gradient method with an exponential convergence_rate for finite training sets. In: Advances in Neural Information Processing Systems, pp. 2663–2671 (2012)
15.
go back to reference Sra, S.: Scalable nonconvex inexact proximal splitting. In: Advances in Neural Information Processing Systems, vol. 25, pp. 530–538 (2012) Sra, S.: Scalable nonconvex inexact proximal splitting. In: Advances in Neural Information Processing Systems, vol. 25, pp. 530–538 (2012)
16.
go back to reference Xu, Y., Yin, W.: A globally convergent algorithm for nonconvex optimization based on block coordinate update. (2014). ArXiv e-prints arXiv:1410.1386 Xu, Y., Yin, W.: A globally convergent algorithm for nonconvex optimization based on block coordinate update. (2014). ArXiv e-prints arXiv:​1410.​1386
17.
go back to reference Yu, Z.Q., Shi, X.J., Yan, L., Li, W.J.: Distributed stochastic ADMM for matrix factorization. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1259–1268. ACM (2014) Yu, Z.Q., Shi, X.J., Yan, L., Li, W.J.: Distributed stochastic ADMM for matrix factorization. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1259–1268. ACM (2014)
19.
go back to reference Zhu, D.L., Marcotte, P.: Co-coercivity and its role in the convergence of iterative schemes for solving variational inequalities. SIAM J. Optim. 6(3), 714–726 (1996)MathSciNetCrossRefMATH Zhu, D.L., Marcotte, P.: Co-coercivity and its role in the convergence of iterative schemes for solving variational inequalities. SIAM J. Optim. 6(3), 714–726 (1996)MathSciNetCrossRefMATH
Metadata
Title
Large-scale asynchronous distributed learning based on parameter exchanges
Authors
Bikash Joshi
Franck Iutzeler
Massih-Reza Amini
Publication date
08-03-2018
Publisher
Springer International Publishing
Published in
International Journal of Data Science and Analytics / Issue 4/2018
Print ISSN: 2364-415X
Electronic ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-018-0110-5

Premium Partner