Skip to main content

2020 | OriginalPaper | Buchkapitel

Distributed Learning of Non-convex Linear Models with One Round of Communication

verfasst von : Mike Izbicki, Christian R. Shelton

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present the optimal weighted average (OWA) distributed learning algorithm for linear models. OWA achieves statistically optimal learning rates, uses only one round of communication, works on non-convex problems, and supports a fast cross validation procedure. The OWA algorithm first trains local models on each of the compute nodes; then a master machine merges the models using a second round of optimization. This second optimization uses only a small fraction of the data, and so has negligible computational cost. Compared with similar distributed estimators that merge locally trained models, OWA either has stronger statistical guarantees, is applicable to more models, or has a more computationally efficient merging procedure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Other non-interactive estimators have made similar assumptions (e.g. [28]). If this assumption is too limiting, however, Appendix A shows how to transfer these data points to the master machine after optimizing the local models. The idea is to first project the data onto the subspace \(\hat{{\mathcal W}}^{\textit{owa}}\) before transfer, reducing the dimensionality of the data. The communication complexity of this alternate procedure is \(O(dm^2)\).
 
Literatur
1.
2.
Zurück zum Zitat Battey, H., Fan, J., Liu, H., Lu, J., Zhu, Z.: Distributed estimation and inference with statistical guarantees. arXiv preprint arXiv:1509.05457 (2015) Battey, H., Fan, J., Liu, H., Lu, J., Zhu, Z.: Distributed estimation and inference with statistical guarantees. arXiv preprint arXiv:​1509.​05457 (2015)
3.
Zurück zum Zitat Bonnans, J.F., Ioffe, A.: Second-order sufficiency and quadratic growth for nonisolated minima. Math. Oper. Res. 20(4), 801–817 (1995)MathSciNetMATHCrossRef Bonnans, J.F., Ioffe, A.: Second-order sufficiency and quadratic growth for nonisolated minima. Math. Oper. Res. 20(4), 801–817 (1995)MathSciNetMATHCrossRef
4.
Zurück zum Zitat Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)MATHCrossRef Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)MATHCrossRef
5.
Zurück zum Zitat Breiman, L.: Out-of-bag estimation. Technical report (1996) Breiman, L.: Out-of-bag estimation. Technical report (1996)
6.
Zurück zum Zitat Han, J., Liu, Q.: Bootstrap model aggregation for distributed statistical learning. In: NeurIPS (2016) Han, J., Liu, Q.: Bootstrap model aggregation for distributed statistical learning. In: NeurIPS (2016)
7.
Zurück zum Zitat Izbicki, M.: Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training. In: ICML (2013) Izbicki, M.: Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training. In: ICML (2013)
8.
Zurück zum Zitat Jaggi, M., et al.: Communication-efficient distributed dual coordinate ascent. In: NeurIPS, pp. 3068–3076 (2014) Jaggi, M., et al.: Communication-efficient distributed dual coordinate ascent. In: NeurIPS, pp. 3068–3076 (2014)
9.
Zurück zum Zitat Jordan, M.I., Lee, J.D., Yang, Y.: Communication-efficient distributed statistical inference. arXiv preprint arXiv:1605.07689 (2016) Jordan, M.I., Lee, J.D., Yang, Y.: Communication-efficient distributed statistical inference. arXiv preprint arXiv:​1605.​07689 (2016)
10.
Zurück zum Zitat Joulani, P., György, A., Szepesvári, C.: Fast cross-validation for incremental learning. In: IJCAI, pp. 3597–3604 (2015) Joulani, P., György, A., Szepesvári, C.: Fast cross-validation for incremental learning. In: IJCAI, pp. 3597–3604 (2015)
11.
12.
Zurück zum Zitat Lee, J.D., Liu, Q., Sun, Y., Taylor, J.E.: Communication-efficient sparse regression. JMLR 18(5), 1–30 (2017)MathSciNetMATH Lee, J.D., Liu, Q., Sun, Y., Taylor, J.E.: Communication-efficient sparse regression. JMLR 18(5), 1–30 (2017)MathSciNetMATH
14.
Zurück zum Zitat Li, M., Andersen, D.G., Park, J.W.: Scaling distributed machine learning with the parameter server. In: OSDI (2014) Li, M., Andersen, D.G., Park, J.W.: Scaling distributed machine learning with the parameter server. In: OSDI (2014)
15.
Zurück zum Zitat Liu, Q., Ihler, A.T.: Distributed estimation, information loss and exponential families. In NeurIPS, pp. 1098–1106 (2014) Liu, Q., Ihler, A.T.: Distributed estimation, information loss and exponential families. In NeurIPS, pp. 1098–1106 (2014)
16.
Zurück zum Zitat Ma, C., Smith, V., Jaggi, M., Jordan, M.I., Richtárik, P., Takáč., M.: Adding vs. averaging in distributed primal-dual optimization. In: International Conference of Machine Learning (2015) Ma, C., Smith, V., Jaggi, M., Jordan, M.I., Richtárik, P., Takáč., M.: Adding vs. averaging in distributed primal-dual optimization. In: International Conference of Machine Learning (2015)
17.
Zurück zum Zitat McDonald, R., Mohri, M., Silberman, N., Walker, D., Mann, G.S.: Efficient large-scale distributed training of conditional maximum entropy models. In: NeurIPS, pp. 1231–1239 (2009) McDonald, R., Mohri, M., Silberman, N., Walker, D., Mann, G.S.: Efficient large-scale distributed training of conditional maximum entropy models. In: NeurIPS, pp. 1231–1239 (2009)
18.
Zurück zum Zitat McMahan, H.B., Moore, E., Ramage, D., Hampson, S., et al.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017) McMahan, H.B., Moore, E., Ramage, D., Hampson, S., et al.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)
19.
Zurück zum Zitat Niu, Y., et al.: The tencent dataset and KDD-cup’12 (2012) Niu, Y., et al.: The tencent dataset and KDD-cup’12 (2012)
20.
Zurück zum Zitat Panov, M., Spokoiny, V., et al.: Finite sample Bernstein-von Mises theorem for semiparametric problems. Bayesian Anal. 10(3), 665–710 (2015)MathSciNetMATHCrossRef Panov, M., Spokoiny, V., et al.: Finite sample Bernstein-von Mises theorem for semiparametric problems. Bayesian Anal. 10(3), 665–710 (2015)MathSciNetMATHCrossRef
21.
22.
Zurück zum Zitat Rosenblatt, J.D., Nadler, B.: On the optimality of averaging in distributed statistical learning. Inf. Infer. 5(4), 379–404 (2016)MathSciNetMATH Rosenblatt, J.D., Nadler, B.: On the optimality of averaging in distributed statistical learning. Inf. Infer. 5(4), 379–404 (2016)MathSciNetMATH
23.
Zurück zum Zitat Smith, V., Forte, S., Ma, C., Takáč, M., Jordan, M.I., Jaggi, M.: Cocoa: a general framework for communication-efficient distributed optimization. JMLR 18, 230 (2018)MathSciNetMATH Smith, V., Forte, S., Ma, C., Takáč, M., Jordan, M.I., Jaggi, M.: Cocoa: a general framework for communication-efficient distributed optimization. JMLR 18, 230 (2018)MathSciNetMATH
24.
Zurück zum Zitat Wang, S.: A sharper generalization bound for divide-and-conquer ridge regression. In: AAAI (2019) Wang, S.: A sharper generalization bound for divide-and-conquer ridge regression. In: AAAI (2019)
25.
Zurück zum Zitat Zhang, Y., Wainwright, M.J., Duchi, J.C.: Communication-efficient algorithms for statistical optimization. In: NeurIPS, pp. 1502–1510 (2012) Zhang, Y., Wainwright, M.J., Duchi, J.C.: Communication-efficient algorithms for statistical optimization. In: NeurIPS, pp. 1502–1510 (2012)
26.
Zurück zum Zitat Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression. In: COLT (2013) Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression. In: COLT (2013)
27.
Zurück zum Zitat Zhao, S.-Y., Ru, X., Shi, Y.-H., Gao, P., Li, W-J.: Scope: scalable composite optimization for learning on spark. In: AAAI (2017) Zhao, S.-Y., Ru, X., Shi, Y.-H., Gao, P., Li, W-J.: Scope: scalable composite optimization for learning on spark. In: AAAI (2017)
28.
Zurück zum Zitat Zinkevich, M., Weimer, M., Li, L., Smola, A.J.: Parallelized stochastic gradient descent. In: NeurIPS, pp. 2595–2603 (2010) Zinkevich, M., Weimer, M., Li, L., Smola, A.J.: Parallelized stochastic gradient descent. In: NeurIPS, pp. 2595–2603 (2010)
Metadaten
Titel
Distributed Learning of Non-convex Linear Models with One Round of Communication
verfasst von
Mike Izbicki
Christian R. Shelton
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-46147-8_12