Skip to main content

2019 | OriginalPaper | Buchkapitel

ADMMLIB: A Library of Communication-Efficient AD-ADMM for Distributed Machine Learning

verfasst von : Jinyang Xie, Yongmei Lei

Erschienen in: Network and Parallel Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Alternating direction method of multipliers (ADMM) has recently been identified as a compelling approach for solving large-scale machine learning problems in the cluster setting. To reduce the synchronization overhead in a distributed environment, asynchronous distributed ADMM (AD-ADMM) was proposed. However, due to the high communication overhead in the master-slave architecture, AD-ADMM still cannot scale well. To address this challenge, this paper proposes the ADMMLIB, a library of AD-ADMM for distributed machine learning. We employ a set of network optimization techniques. First, hierarchical communication architecture is utilized. Second, we integrate ring-based allreduce and mixed precision training into ADMMLIB to further effectively reduce the inter-node communication cost. Evaluation with large dataset demonstrates that ADMMLIB can achieve significant speed up, up to 2x, compared to the original AD-ADMM implementation, and the overall communication cost is reduced by 83%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)MATH Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends® Mach. Learn. 3(1), 1–122 (2011)MATH
2.
Zurück zum Zitat Chang, T.H., Hong, M., Liao, W.C., Wang, X.: Asynchronous distributed admm for large-scale optimization-part I: algorithm and convergence analysis. IEEE Trans. Signal Process. 64(12), 3118–3130 (2016)MathSciNetMATHCrossRef Chang, T.H., Hong, M., Liao, W.C., Wang, X.: Asynchronous distributed admm for large-scale optimization-part I: algorithm and convergence analysis. IEEE Trans. Signal Process. 64(12), 3118–3130 (2016)MathSciNetMATHCrossRef
3.
Zurück zum Zitat Li, M., et al.: Scaling distributed machine learning with the parameter server. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp. 583–598 (2014) Li, M., et al.: Scaling distributed machine learning with the parameter server. In: 11th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 14), pp. 583–598 (2014)
4.
Zurück zum Zitat Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)CrossRef Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)CrossRef
5.
Zurück zum Zitat Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:1802.05799 (2018) Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in tensorflow. arXiv preprint arXiv:​1802.​05799 (2018)
6.
7.
Zurück zum Zitat Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)CrossRef Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)CrossRef
8.
Zurück zum Zitat Zhang, R., Kwok, J.: Asynchronous distributed ADMM for consensus optimization. In: International Conference on Machine Learning, pp. 1701–1709 (2014) Zhang, R., Kwok, J.: Asynchronous distributed ADMM for consensus optimization. In: International Conference on Machine Learning, pp. 1701–1709 (2014)
Metadaten
Titel
ADMMLIB: A Library of Communication-Efficient AD-ADMM for Distributed Machine Learning
verfasst von
Jinyang Xie
Yongmei Lei
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-30709-7_27

Premium Partner