nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

Distributed Learning of Non-convex Linear Models with One Round of Communication

verfasst von : Mike Izbicki, Christian R. Shelton

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We present the optimal weighted average (OWA) distributed learning algorithm for linear models. OWA achieves statistically optimal learning rates, uses only one round of communication, works on non-convex problems, and supports a fast cross validation procedure. The OWA algorithm first trains local models on each of the compute nodes; then a master machine merges the models using a second round of optimization. This second optimization uses only a small fraction of the data, and so has negligible computational cost. Compared with similar distributed estimators that merge locally trained models, OWA either has stronger statistical guarantees, is applicable to more models, or has a more computationally efficient merging procedure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Synthetic Oversampling of Multi-label Data Based on Local Label Distribution

Nächstes Kapitel SLSGD: Secure and Efficient Distributed On-device Machine Learning

Other non-interactive estimators have made similar assumptions (e.g. [28]). If this assumption is too limiting, however, Appendix A shows how to transfer these data points to the master machine after optimizing the local models. The idea is to first project the data onto the subspace \(\hat{{\mathcal W}}^{\textit{owa}}\) before transfer, reducing the dimensionality of the data. The communication complexity of this alternate procedure is \(O(dm^2)\).

Anitescu, M.: Degenerate nonlinear programming with a quadratic growth condition. SIAM J. Optim. 10(4), 1116–1135 (2000)MathSciNetMATHCrossRef

Battey, H., Fan, J., Liu, H., Lu, J., Zhu, Z.: Distributed estimation and inference with statistical guarantees. arXiv preprint arXiv:1509.05457 (2015)

Bonnans, J.F., Ioffe, A.: Second-order sufficiency and quadratic growth for nonisolated minima. Math. Oper. Res. 20(4), 801–817 (1995)MathSciNetMATHCrossRef

Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)MATHCrossRef

Breiman, L.: Out-of-bag estimation. Technical report (1996)

Han, J., Liu, Q.: Bootstrap model aggregation for distributed statistical learning. In: NeurIPS (2016)

Izbicki, M.: Algebraic classifiers: a generic approach to fast cross-validation, online training, and parallel training. In: ICML (2013)

Jaggi, M., et al.: Communication-efficient distributed dual coordinate ascent. In: NeurIPS, pp. 3068–3076 (2014)

Jordan, M.I., Lee, J.D., Yang, Y.: Communication-efficient distributed statistical inference. arXiv preprint arXiv:1605.07689 (2016)

10.

Joulani, P., György, A., Szepesvári, C.: Fast cross-validation for incremental learning. In: IJCAI, pp. 3597–3604 (2015)

11.

Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9851, pp. 795–811. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46128-1_50CrossRef

12.

Lee, J.D., Liu, Q., Sun, Y., Taylor, J.E.: Communication-efficient sparse regression. JMLR 18(5), 1–30 (2017)MathSciNetMATH

13.

Lehmann, E.L.: Elements of Large-Sample Theory. Springer Texts in Statistics. Springer, New York (1999). https://doi.org/10.1007/b98855MATHCrossRef

14.

Li, M., Andersen, D.G., Park, J.W.: Scaling distributed machine learning with the parameter server. In: OSDI (2014)

15.

Liu, Q., Ihler, A.T.: Distributed estimation, information loss and exponential families. In NeurIPS, pp. 1098–1106 (2014)

16.

Ma, C., Smith, V., Jaggi, M., Jordan, M.I., Richtárik, P., Takáč., M.: Adding vs. averaging in distributed primal-dual optimization. In: International Conference of Machine Learning (2015)

17.

McDonald, R., Mohri, M., Silberman, N., Walker, D., Mann, G.S.: Efficient large-scale distributed training of conditional maximum entropy models. In: NeurIPS, pp. 1231–1239 (2009)

18.

McMahan, H.B., Moore, E., Ramage, D., Hampson, S., et al.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)

19.

Niu, Y., et al.: The tencent dataset and KDD-cup’12 (2012)

20.

Panov, M., Spokoiny, V., et al.: Finite sample Bernstein-von Mises theorem for semiparametric problems. Bayesian Anal. 10(3), 665–710 (2015)MathSciNetMATHCrossRef

21.

Pedregosa, F., et al.: Scikit-learn: machine learning in Python. JMLR 12, 2825–2830 (2011)MathSciNetMATH

22.

Rosenblatt, J.D., Nadler, B.: On the optimality of averaging in distributed statistical learning. Inf. Infer. 5(4), 379–404 (2016)MathSciNetMATH

23.

Smith, V., Forte, S., Ma, C., Takáč, M., Jordan, M.I., Jaggi, M.: Cocoa: a general framework for communication-efficient distributed optimization. JMLR 18, 230 (2018)MathSciNetMATH

24.

Wang, S.: A sharper generalization bound for divide-and-conquer ridge regression. In: AAAI (2019)

25.

Zhang, Y., Wainwright, M.J., Duchi, J.C.: Communication-efficient algorithms for statistical optimization. In: NeurIPS, pp. 1502–1510 (2012)

26.

Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression. In: COLT (2013)

27.

Zhao, S.-Y., Ru, X., Shi, Y.-H., Gao, P., Li, W-J.: Scope: scalable composite optimization for learning on spark. In: AAAI (2017)

28.

Zinkevich, M., Weimer, M., Li, L., Smola, A.J.: Parallelized stochastic gradient descent. In: NeurIPS, pp. 2595–2603 (2010)

Titel: Distributed Learning of Non-convex Linear Models with One Round of Communication
verfasst von: Mike Izbicki
Christian R. Shelton
Verlag: Springer International Publishing
Buch: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-030-46146-1

Electronic ISBN: 978-3-030-46147-8

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-46147-8_12

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"