Skip to main content
Top

2019 | OriginalPaper | Chapter

ADMMLIB: A Library of Communication-Efficient AD-ADMM for Distributed Machine Learning

Authors : Jinyang Xie, Yongmei Lei

Published in: Network and Parallel Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Alternating direction method of multipliers (ADMM) has recently been identified as a compelling approach for solving large-scale machine learning problems in the cluster setting. To reduce the synchronization overhead in a distributed environment, asynchronous distributed ADMM (AD-ADMM) was proposed. However, due to the high communication overhead in the master-slave architecture, AD-ADMM still cannot scale well. To address this challenge, this paper proposes the ADMMLIB, a library of AD-ADMM for distributed machine learning. We employ a set of network optimization techniques. First, hierarchical communication architecture is utilized. Second, we integrate ring-based allreduce and mixed precision training into ADMMLIB to further effectively reduce the inter-node communication cost. Evaluation with large dataset demonstrates that ADMMLIB can achieve significant speed up, up to 2x, compared to the original AD-ADMM implementation, and the overall communication cost is reduced by 83%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)MATH Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)MATH
2.
go back to reference Chang, T.H., Hong, M., Liao, W.C., Wang, X.: Asynchronous distributed admm for large-scale optimization-part I: algorithm and convergence analysis. IEEE Trans. Signal Process. 64(12), 3118–3130 (2016)MathSciNetMATHCrossRef Chang, T.H., Hong, M., Liao, W.C., Wang, X.: Asynchronous distributed admm for large-scale optimization-part I: algorithm and convergence analysis. IEEE Trans. Signal Process. 64(12), 3118–3130 (2016)MathSciNetMATHCrossRef
3.
go back to reference Li, M., et al.: Scaling distributed machine learning with the parameter server. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp. 583–598 (2014) Li, M., et al.: Scaling distributed machine learning with the parameter server. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp. 583–598 (2014)
4.
go back to reference Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)CrossRef Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)CrossRef
5.
7.
go back to reference Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)CrossRef Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)CrossRef
8.
go back to reference Zhang, R., Kwok, J.: Asynchronous distributed ADMM for consensus optimization. In: International Conference on Machine Learning, pp. 1701–1709 (2014) Zhang, R., Kwok, J.: Asynchronous distributed ADMM for consensus optimization. In: International Conference on Machine Learning, pp. 1701–1709 (2014)
Metadata
Title
ADMMLIB: A Library of Communication-Efficient AD-ADMM for Distributed Machine Learning
Authors
Jinyang Xie
Yongmei Lei
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-30709-7_27

Premium Partner