nach oben

Erschienen in:

2022 | OriginalPaper | Buchkapitel

Recent Theoretical Advances in Decentralized Distributed Convex Optimization

verfasst von : Eduard Gorbunov, Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh, Alexander Gasnikov

Erschienen in: High-Dimensional Optimization and Probability

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent results that have not been published yet and that could be found in detail only in arXiv preprints.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel A Statistical Learning Theory Approach for the Analysis of the Trade-off Between Sample Size and Precision in Truncated Ordinary Least Squares

Nächstes Kapitel On Training Set Selection in Spatial Deep Learning

The narrative in this section follows [51].

For the sake of simplicity, we slightly abuse the notation and denote gradients and subgradients similarly.

The overall number of performed iterations during the calls of AC-SA² equals N.

This approach was described in [32] and formally proved in [51].

Under assumption that measures are separated from zero, see the details in [21] and the proof of Proposition 2.5 from [30].

The narrative in this section follows [15].

S. Abadeh, P. Esfahani, D. Kuhn, Distributionally robust logistic regression, in Advances in Neural Information Processing Systems (NeurIPS) (2015), pp. 1576–1584

A. Aghajan, B. Touri, Distributed optimization over dependent random networks (2020). arXiv preprint arXiv:2010.01956

S.A. Alghunaim, E.K. Ryu, K. Yuan, A.H. Sayed, Decentralized proximal gradient algorithms with linear convergence rates. IEEE Trans. Autom. Control 66(6), 2787–2794 (2020)MathSciNetMATHCrossRef

D. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnovic, QSGD: communication-efficient SGD via gradient quantization and encoding, in Advances in Neural Information Processing Systems (2017), pp. 1709–1720

Z. Allen-Zhu, Katyusha: the first direct acceleration of stochastic gradient methods, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017 (ACM, New York, 2017), pp. 1200–1205. newblock arXiv:1603.05953

Z. Allen-Zhu, How to make the gradients small stochastically: even faster convex and nonconvex SGD, in Advances in Neural Information Processing Systems (2018), pp. 1157–1167

A.S. Anikin, A.V. Gasnikov, P.E. Dvurechensky, A.I. Tyurin, A.V. Chernov, Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints. Comput. Math. Math. Phys. 57(8), 1262–1276 (2017)MathSciNetMATHCrossRef

Y. Arjevani, O. Shamir, Communication complexity of distributed convex learning and optimization, in Advances in Neural Information Processing Systems (2015), pp. 1756–1764

M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, in Proceedings of the 34th International Conference on Machine Learning (ICML), vol. 70(1) (2017), pp. 214–223

10.

N. Bansal, A. Gupta, Potential-function proofs for gradient methods. Theory Comput. 15(1), 1–32 (2019)MathSciNetMATH

11.

D. Basu, D. Data, C. Karakus, S. Diggavi, Qsparse-local-SGD: distributed SGD with quantization, sparsification, and local computations (2019). arXiv preprint arXiv:1906.02367

12.

A. Bayandina, P. Dvurechensky, A. Gasnikov, F. Stonyakin, A. Titov, Mirror descent and convex optimization problems with non-smooth inequality constraints, in Large-Scale and Distributed Optimization (Springer, Berlin, 2018), pp. 181–213MATH

13.

D.P. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, vol. 23 (Prentice Hall, Englewood Cliffs, 1989)

14.

A. Beznosikov, P. Dvurechensky, A. Koloskova, V. Samokhin, S.U. Stich, A. Gasnikov, Decentralized local stochastic extra-gradient for variational inequalities (2021). arXiv preprint arXiv:2106.08315

15.

A. Beznosikov, E. Gorbunov, A. Gasnikov, Derivative-free method for composite optimization with applications to decentralized distributed optimization. IFAC-PapersOnLine 53(2), 4038–4043 (2020)CrossRef

16.

A. Beznosikov, S. Horváth, P. Richtárik, M. Safaryan, On biased compression for distributed learning (2020). arXiv preprint arXiv:2002.12410

17.

A. Beznosikov, D. Kovalev, A. Sadiev, P. Richtarik, A. Gasnikov, Optimal distributed algorithms for stochastic variational inequalities (2021). arXiv preprint

18.

A. Beznosikov, A. Rogozin, D. Kovalev, A. Gasnikov, Near-optimal decentralized algorithms for saddle point problems over time-varying networks, in International Conference on Optimization and Applications (Springer, Berlin, 2021), pp. 246–257

19.

A. Beznosikov, A. Sadiev, A. Gasnikov, Gradient-free methods with inexact oracle for convex-concave stochastic saddle-point problem, in International Conference on Mathematical Optimization Theory and Operations Research (Springer, Berlin, 2020), pp. 105–119MATH

20.

A. Beznosikov, G. Scutari, A. Rogozin, A. Gasnikov, Distributed saddle-point problems under data similarity, in Advances in Neural Information Processing Systems, vol. 34 (2021)

21.

J. Blanchet, A. Jambulapati, C. Kent, A. Sidford, Towards optimal running times for optimal transport (2018). arXiv preprint arXiv:1810.07717

22.

S. Boyd, A. Ghosh, B. Prabhakar, D. Shah, Randomized gossip algorithms. IEEE Trans. Inform. Theory 52(6), 2508–2530 (2006)MathSciNetMATHCrossRef

23.

N. Cesa-bianchi, A. Conconi, C. Gentile, On the generalization ability of on-line learning algorithms, in Advances in Neural Information Processing Systems, vol. 14, ed. by T.G. Dietterich, S. Becker, Z. Ghahramani (MIT Press, Cambridge, 2002), pp. 359–366

24.

A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetMATHCrossRef

25.

M. Cuturi, G. Peyré, A smoothed dual approach for variational Wasserstein problems. SIAM J. Imaging Sci. 9(1), 320–343 (2016)MathSciNetMATHCrossRef

26.

A. Defazio, F. Bach, S. Lacoste-Julien, Saga: a fast incremental gradient method with support for non-strongly convex composite objectives, in Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14 (MIT Press, Cambridge, 2014), pp. 1646–1654

27.

O. Devolder, Exactness, Inexactness and Stochasticity in First-Order Methods for Large-Scale Convex Optimization. Ph.D. Thesis, ICTEAM and CORE, Université Catholique de Louvain, 2013

28.

O. Devolder, F. Glineur, Y. Nesterov, First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016:47 (2013)MATH

29.

O. Devolder, F. Glineur, Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)MathSciNetMATHCrossRef

30.

D. Dvinskikh, Stochastic approximation versus sample average approximation for population Wasserstein barycenters (2020). arXiv preprint arXiv:2001.07697

31.

D. Dvinskikh, Decentralized algorithms for Wasserstein barycenters (2021). arXiv preprint arXiv:2105.01587

32.

D. Dvinskikh, A. Gasnikov, Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems. J. Inverse Ill-Posed Probl. 29(3), 385–405 (2021)MathSciNetMATHCrossRef

33.

D. Dvinskikh, A. Gasnikov, A. Rogozin, A. Beznosikov, Parallel and distributed algorithms for ML problems 2020. arXiv preprint arXiv:2010.09585

34.

D. Dvinskikh, E. Gorbunov, A. Gasnikov, P. Dvurechensky, C.A. Uribe, On primal and dual approaches for distributed stochastic convex optimization over networks, in 2019 IEEE 58th Conference on Decision and Control (CDC) (IEEE, Piscataway, 2019), pp. 7435–7440CrossRef

35.

D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, in International Conference on Artificial Intelligence and Statistics (PMLR, 2021), pp. 1738–1746

36.

D.M. Dvinskikh, A.I. Turin, A.V. Gasnikov, S.S. Omelchenko, Accelerated and nonaccelerated stochastic gradient descent in model generality. Matematicheskie Zametki 108(4), 515–528 (2020)MathSciNetCrossRef

37.

P. Dvurechenskii, D. Dvinskikh, A. Gasnikov, C. Uribe, A. Nedich, Decentralize and randomize: Faster algorithm for Wasserstein barycenters, in Advances in Neural Information Processing Systems (2018), pp. 10760–10770

38.

P. Dvurechensky, A. Gasnikov, Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016)MathSciNetMATHCrossRef

39.

P. Dvurechensky, A. Gasnikov, A. Tiurin, Randomized similar triangles method: a unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method) (2017). arXiv:1707.08486

40.

F. Facchinei, J. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Series in Operations Research and Financial Engineering (Springer, New York, 2007)

41.

A. Fallah, M. Gurbuzbalaban, A. Ozdaglar, U. Simsekli, L. Zhu, Robust distributed accelerated stochastic gradient methods for multi-agent networks (2019). arXiv preprint arXiv:1910.08701

42.

V. Feldman, J. Vondrak, High probability generalization bounds for uniformly stable algorithms with nearly optimal rate (2019). arXiv preprint arXiv:1902.10710

43.

D. Foster, A. Sekhari, O. Shamir, N. Srebro, K. Sridharan, B. Woodworth, The complexity of making the gradient small in stochastic convex optimization (2019). arXiv preprint arXiv:1902.04686

44.

A. Gasnikov, Universal gradient descent (2017). arXiv preprint arXiv:1711.00394

45.

A. Gasnikov, D. Dvinskikh, P. Dvurechensky, D. Kamzolov, V. Matyukhin, D. Pasechnyuk, N. Tupitsa, A. Chernov, Accelerated meta-algorithm for convex optimization problems. Comput. Math. Math. Phys. 61(1), 17–28 (2021)MathSciNetMATHCrossRef

46.

A.V. Gasnikov, A.A. Lagunovskaya, I.N. Usmanova, F.A. Fedorenko, Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex. Autom. Remote Control 77(11), 2018–2034 (2016). arXiv:1412.3890

47.

A.V. Gasnikov, Y.E. Nesterov, Universal method for stochastic composite optimization problems. Comput. Math. Math. Phys. 58(1), 48–64 (2018)MathSciNetMATHCrossRef

48.

S. Ghadimi, G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. . SIAM J. Optim. 22(4), 1469–1492 (2012)MathSciNetMATHCrossRef

49.

S. Ghadimi, G. Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013). arXiv:1309.5549

50.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems (NeurIPS) (2014), pp. 2672–2680

51.

E. Gorbunov, D. Dvinskikh, A. Gasnikov, Optimal decentralized distributed algorithms for stochastic convex optimization (2019). arXiv preprint arXiv:1911.07363

52.

E. Gorbunov, P. Dvurechensky, A. Gasnikov, An accelerated method for derivative-free smooth stochastic convex optimization (2022). SIOPT (in print)

53.

E. Gorbunov, F. Hanzely, P. Richtárik, Local SGD: unified theory and new efficient methods (2020). arXiv preprint arXiv:2011.02828

54.

E. Gorbunov, F. Hanzely, P. Richtárik, A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent, in International Conference on Artificial Intelligence and Statistics (PMLR, 2020), pp. 680–690

55.

E. Gorbunov, D. Kovalev, D. Makarenko, P. Richtárik, Linearly converging error compensated SGD, in Advances in Neural Information Processing Systems, vol. 33 (2020)

56.

E. Gorbunov, E.A. Vorontsova, A.V. Gasnikov, On the upper bound for the expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere, in Mathematical Notes, vol. 106 (2019)

57.

R.M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, P. Richtarik, SGD: general analysis and improved rates (2019). arXiv preprint arXiv:1901.09401

58.

S. Guminov, P. Dvurechensky, N. Tupitsa, A. Gasnikov, On a combination of alternating minimization and Nesterov’s momentum, in International Conference on Machine Learning (PMLR, 2021), pp. 3886–3898

59.

H. Hendrikx, F. Bach, L. Massoulie, An optimal algorithm for decentralized finite sum optimization (2020). arXiv preprint arXiv:2005.10675

60.

H. Hendrikx, L. Xiao, S. Bubeck, F. Bach, L. Massoulie, Statistically preconditioned accelerated gradient method for distributed optimization (2020). arXiv preprint arXiv:2002.10726

61.

S. Horvath, C.-Y. Ho, L. Horvath, A.N. Sahu, M. Canini, P. Richtarik, Natural compression for distributed deep learning (2019). arXiv preprint arXiv:1905.10988

62.

S. Horváth, D. Kovalev, K. Mishchenko, S. Stich, P. Richtárik, Stochastic distributed learning with gradient quantization and variance reduction (2019). arXiv preprint arXiv:1904.05115

63.

D. Jakovetić, J. Xavier, J.M. Moura, Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)MathSciNetMATHCrossRef

64.

R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems (2013), pp. 315–323

65.

A. Juditsky, A. Nemirovski, First order methods for non-smooth convex large-scale optimization, I: general purpose methods, in Optimization for Machine Learning, ed. by S.W. Suvrit Sra, S. Nowozin (MIT Press, Cambridge, 2012), pp. 121–184

66.

A. Juditsky, A. Nemirovski, C. Tauvel, Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Syst. 1(1), 17–58 (2011)MathSciNetMATHCrossRef

67.

A. Juditsky, Y. Nesterov, Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization. Stochastic Syst. 4(1), 44–80 (2014)MathSciNetMATHCrossRef

68.

P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open problems in federated learning (2019). arXiv preprint arXiv:1912.04977

69.

S. Kakade, S. Shalev-Shwartz, A. Tewari, On the duality of strong convexity and strong smoothness: learning applications and matrix regularization 2(1), 2009. Unpublished Manuscript. http://ttic.uchicago.edu/shai/papers/KakadeShalevTewari09.pdf

70.

S.P. Karimireddy, S. Kale, M. Mohri, S.J. Reddi, S.U. Stich, A.T. Suresh, Scaffold: stochastic controlled averaging for federated learning (2019). arXiv preprint arXiv:1910.06378

71.

S.P. Karimireddy, Q. Rebjock, S.U. Stich, M. Jaggi, Error feedback fixes signSGD and other gradient compression schemes (2019). arXiv preprint arXiv:1901.09847

72.

A. Khaled, K. Mishchenko, P. Richtárik, Tighter theory for local SGD on identical and heterogeneous data, in International Conference on Artificial Intelligence and Statistics (2020), pp. 4519–4529

73.

V. Kibardin, Decomposition into functions in the minimization problem. Avtomatika i Telemekhanika (9), 66–79 (1979)MathSciNetMATH

74.

A. Koloskova, T. Lin, S.U. Stich, An improved analysis of gradient tracking for decentralized machine learning, in Advances in Neural Information Processing Systems, vol. 34 (2021)

75.

A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, S.U. Stich, A unified theory of decentralized SGD with changing topology and local updates (ICML 2020). arXiv preprint arXiv:2003.10422

76.

D. Kovalev, E. Gasanov, A. Gasnikov, P. Richtarik, Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks, in Advances in Neural Information Processing Systems, vol. 34 (2021)

77.

D. Kovalev, A. Salim, P. Richtárik, Optimal and practical algorithms for smooth and strongly convex decentralized optimization, in Advances in Neural Information Processing Systems, vol. 33 (2020)

78.

D. Kovalev, E. Shulgin, P. Richtárik, A. Rogozin, A. Gasnikov, Adom: accelerated decentralized optimization method for time-varying networks (2021). arXiv preprint arXiv:2102.09234

79.

D. Kovalev, A. Beznosikov, A. Sadiev, M. Persiianov, P. Richtárik, A. Gasnikov, Optimal algorithms for decentralized stochastic variational inequalities (2022). arXiv preprint arXiv:2202.0277

80.

A. Kroshnin, N. Tupitsa, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, C. Uribe, On the complexity of approximating Wasserstein barycenters, in International Conference on Machine Learning (PMLR, 2019), pp. 3530–3540

81.

A. Kulunchakov, J. Mairal, Estimate sequences for stochastic composite optimization: variance reduction, acceleration, and robustness to noise (2019). arXiv preprint arXiv:1901.08788

82.

A. Kulunchakov, J. Mairal, Estimate sequences for variance-reduced stochastic composite optimization (2019). arXiv preprint arXiv:1905.02374

83.

A. Kulunchakov, J. Mairal, A generic acceleration framework for stochastic composite optimization (2019). arXiv preprint arXiv:1906.01164

84.

G. Lan, An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012). Firs appeared in June 2008

85.

G. Lan, Gradient sliding for composite optimization. Math. Program. 159(1), 201–235 (2016)MathSciNetMATHCrossRef

86.

G. Lan, Lectures on optimization methods for machine learning (2019). e-print

87.

G. Lan, First-Order and Stochastic Optimization Methods for Machine Learning (Springer, Berlin, 2020)MATHCrossRef

88.

G. Lan, S. Lee, Y. Zhou, Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 180, 237–284 (2020)MathSciNetMATHCrossRef

89.

G. Lan, Y. Ouyang, Mirror-prox sliding methods for solving a class of monotone variational inequalities (2021). arXiv preprint arXiv:2111.00996

90.

G. Lan, Y. Zhou, Random gradient extrapolation for distributed and stochastic optimization. SIAM J. Optim. 28(4), 2753–2782 (2018)MathSciNetMATHCrossRef

91.

G. Lan, Z. Zhou, Algorithms for stochastic optimization with expectation constraints (2016). arXiv:1604.03887

92.

J. Larson, M. Menickelly, S.M. Wild, Derivative-free optimization methods. Acta Numer. 28, 287–404 (2019)MathSciNetMATHCrossRef

93.

S. Lee, A. Nedic, Distributed random projection algorithm for convex optimization. IEEE J. Selected Topics Signal Process. 7(2), 221–229 (2013)CrossRef

94.

H. Li, C. Fang, W. Yin, Z. Lin, Decentralized accelerated gradient methods with increasing penalty parameters. IEEE Trans. Signal Process. 68, 4855–4870 (2020)MathSciNetCrossRef

95.

H. Li, Z. Lin, Revisiting extra for smooth distributed optimization (2020). arXiv preprint arXiv:2002.10110

96.

H. Li, Z. Lin, Accelerated gradient tracking over time-varying graphs for decentralized optimization (2021). arXiv preprint arXiv:2104.02596

97.

H. Li, Z. Lin, Y. Fang, Optimal accelerated variance reduced extra and DIGing for strongly convex and smooth decentralized optimization (2020). arXiv preprint arXiv:2009.04373

98.

J. Li, C. Wu, Z. Wu, Q. Long, Gradient-free method for nonsmooth distributed optimization. J. Global Optim. 61, 325–340 (2015)MathSciNetMATHCrossRef

99.

H. Lin, J. Mairal, Z. Harchaoui, A universal catalyst for first-order optimization, in Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15 (MIT Press, Cambridge, 2015), pp. 3384–3392

100.

T. Lin, C. Jin, M. I. Jordan, Near-optimal algorithms for minimax optimization, in Conference on Learning Theory (PMLR, 2020), pp. 2738–2779

101.

T. Lin, S.P. Karimireddy, S.U. Stich, M. Jaggi, Quasi-global momentum: accelerating decentralized deep learning on heterogeneous data (2021). arXiv preprint arXiv:2102.04761

102.

J. Liu, A.S. Morse, Accelerated linear iterations for distributed averaging. Ann. Rev. Control 35(2), 160–165 (2011)CrossRef

103.

M. Liu, W. Zhang, Y. Mroueh, X. Cui, J. Ross, T. Yang, P. Das, A decentralized parallel algorithm for training generative adversarial nets, in Advances in Neural Information Processing Systems (NeurIPS) (2020)

104.

W. Liu, A. Mokhtari, A. Ozdaglar, S. Pattathil, Z. Shen, N. Zheng, A decentralized proximal point-type method for non-convex non-concave saddle point problems.

105.

W. Liu, A. Mokhtari, A. Ozdaglar, S. Pattathil, Z. Shen, N. Zheng, A decentralized proximal point-type method for saddle point problems (2019). arXiv preprint arXiv:1910.14380

106.

X. Liu, Y. Li, J. Tang, M. Yan, A double residual compression algorithm for efficient distributed learning (2019). arXiv preprint arXiv:1910.07561

107.

D. Mateos-Núnez, J. Cortés, Distributed subgradient methods for saddle-point problems, in 2015 54th IEEE Conference on Decision and Control (CDC) (IEEE, Piscataway, 2015), pp. 5462–5467CrossRef

108.

G.J. Minty, Monotone (nonlinear) operators in Hilbert space. Duke Math. J. 29(3), 341–346 (1962)MathSciNetMATHCrossRef

109.

K. Mishchenko, E. Gorbunov, M. Takáč, P. Richtárik, Distributed learning with compressed gradient differences (2019). arXiv preprint arXiv:1901.09269

110.

S. Muthukrishnan, B. Ghosh, M.H. Schultz, First-and second-order diffusive methods for rapid, coarse, distributed load balancing. Theory Comput. Syst. 31(4), 331–354 (1998)MathSciNetMATHCrossRef

111.

A. Nedic, Distributed gradient methods for convex machine learning problems in networks: distributed optimization. IEEE Signal Process. Mag. 37(3), 92–101 (2020)CrossRef

112.

A. Nedic, A. Olshevsky, W. Shi, Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)MathSciNetMATHCrossRef

113.

A. Nedić, A. Ozdaglar, Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)MathSciNetMATHCrossRef

114.

A. Nemirovski, Prox-method with rate of convergence o(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)MathSciNetMATHCrossRef

115.

A. Nemirovski, A. Juditsky, G. Lan, A. Shapiro, Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)MathSciNetMATHCrossRef

116.

Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, Massachusetts, 2004)MATHCrossRef

117.

Y. Nesterov, How to make the gradients small. Optima 88, 10–11 (2012)

118.

Y. Nesterov, Lectures on Convex Optimization, vol. 137 (Springer, Berlin, 2018)MATHCrossRef

119.

Y. Nesterov, V.G. Spokoiny, Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017)MathSciNetMATHCrossRef

120.

L.M. Nguyen, P.H. Nguyen, M. van Dijk, P. Richtárik, K. Scheinberg, M. Takáč, SGD and Hogwild! convergence without the bounded gradients assumption (2018). arXiv preprint arXiv:1802.03801

121.

A. Olshevsky, I.C. Paschalidis, S. Pu, Asymptotic network independence in distributed optimization for machine learning (2019). arXiv preprint arXiv:1906.12345

122.

A. Olshevsky, I.C. Paschalidis, S. Pu, A non-asymptotic analysis of network independence for distributed stochastic gradient descent (2019). arXiv preprint arXiv:1906.02702

123.

G. Peyré, M. Cuturi, et al., Computational optimal transport. Found. Trends® Mach. Learn. 11(5–6), 355–607 (2019)

124.

S. Pu, A. Nedić, Distributed stochastic gradient tracking methods. Math. Program. 187(1), 409–457 (2021)MathSciNetMATHCrossRef

125.

G. Qu, N. Li, Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Netw. Syst. 5(3), 1245–1260 (2017)MathSciNetMATHCrossRef

126.

G. Qu, N. Li, Accelerated distributed Nesterov gradient descent. IEEE Trans. Autom. Control 65(6), 2566–2581 (2020)MathSciNetMATHCrossRef

127.

P. Rigollet, J. Weed, Entropic optimal transport is maximum-likelihood deconvolution. C. R. Math. 356(11–12), 1228–1235 (2018)MathSciNetMATHCrossRef

128.

H. Robbins, S. Monro, A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)MathSciNetMATHCrossRef

129.

R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 2015)

130.

A. Rogozin, A. Beznosikov, D. Dvinskikh, D. Kovalev, P. Dvurechensky, A. Gasnikov, Decentralized distributed optimization for saddle point problems (2021). arXiv preprint arXiv:2102.07758

131.

A. Rogozin, M. Bochko, P. Dvurechensky, A. Gasnikov, V. Lukoshkin, An accelerated method for decentralized distributed stochastic optimization over time-varying graphs in Conference on Decision and Control (2021)

132.

A. Rogozin, A. Gasnikov, Projected gradient method for decentralized optimization over time-varying networks (2019). arXiv preprint arXiv:1911.08527

133.

A. Rogozin, A. Gasnikov, Penalty-based method for decentralized optimization over time-varying graphs, in International Conference on Optimization and Applications (Springer, Berlin, 2020), pp. 239–256

134.

A. Rogozin, V. Lukoshkin, A. Gasnikov, D. Kovalev, E. Shulgin, Towards accelerated rates for distributed optimization over time-varying networks, in International Conference on Optimization and Applications (Springer, Berlin, 2021), pp. 258–272

135.

K. Scaman, F. Bach, S. Bubeck, Y.T. Lee, L. Massoulié, Optimal algorithms for smooth and strongly convex distributed optimization in networks, in Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), pp. 3027–3036. JMLR.org

136.

K. Scaman, F. Bach, S. Bubeck, Y.T. Lee, L. Massoulié, Optimal convergence rates for convex distributed optimization in networks. J. Mach. Learn. Res. 20(159), 1–31 (2019)MathSciNetMATH

137.

K. Scaman, F. Bach, S. Bubeck, L. Massoulié, Y.T. Lee, Optimal algorithms for non-smooth distributed optimization in networks, in Advances in Neural Information Processing Systems (2018), pp. 2740–2749

138.

M. Schmidt, N. Le Roux, F. Bach, Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)MathSciNetMATHCrossRef

139.

S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge, 2014)MATHCrossRef

140.

S. Shalev-Shwartz, O. Shamir, N. Srebro, K. Sridharan. Stochastic convex optimization, in COLT (2009)

141.

O. Shamir, An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18, 52:1–52:11 (2017). First appeared in arXiv:1507.08752

142.

O. Shamir, An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18(52), 1–11 (2017)MathSciNetMATH

143.

W. Shi, Q. Ling, G. Wu, W. Yin, Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)MathSciNetMATHCrossRef

144.

Z. Song, L. Shi, S. Pu, M. Yan, Optimal gradient tracking for decentralized optimization (2021). arXiv preprint arXiv:2110.05282

145.

Z. Song, L. Shi, S. Pu, M. Yan, Provably accelerated decentralized gradient method over unbalanced directed graphs (2021). arXiv preprint arXiv:2107.12065

146.

V. Spokoiny et al., Parametric estimation. finite sample theory. Ann. Stat. 40(6), 2877–2909 (2012)

147.

I. Stepanov, A. Voronov, A. Beznosikov, A. Gasnikov, One-point gradient-free methods for composite optimization with applications to distributed optimization (2021). arXiv preprint arXiv:2107.05951

148.

S.U. Stich, Local SGD converges fast and communicates little (2018). arXiv preprint arXiv:1805.09767

149.

S.U. Stich, J.-B. Cordonnier, M. Jaggi, Sparsified SGD with memory, in Advances in Neural Information Processing Systems (2018), pp. 4447–4458

150.

F. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C.A. Uribe, D. Pasechnyuk, et al., Gradient methods for problems with inexact model of the objective (2019). arXiv preprint arXiv:1902.09001

151.

F. Stonyakin, A. Tyurin, A. Gasnikov, P. Dvurechensky, A. Agafonov, D. Dvinskikh, M. Alkousa, D. Pasechnyuk, S. Artamonov, V. Piskunova, Inexact model: a framework for optimization and variational inequalities. Optim. Methods Softw. 1–47 (2021). https://doi.org/10.1080/10556788.2021.1924714

152.

Y. Sun, A. Daneshmand, G. Scutari, Convergence rate of distributed optimization algorithms based on gradient tracking (2019). arXiv preprint arXiv:1905.02637

153.

Y. Sun, A. Daneshmand, G. Scutari, Distributed optimization based on gradient-tracking revisited: enhancing convergence rate via surrogation (2020). arXiv preprint arXiv:1905.02637

154.

Y. Tang, J. Zhang and N. Li, “Distributed zero-order algorithms for nonconvex multiagent optimization,” in IEEE Transactions on Control of Network Systems, vol. 8, no. 1, pp. 269–281, March 2021, https://doi.org/10.1109/TCNS.2020.3024321 MathSciNetCrossRef

155.

Y. Tian, G. Scutari, T. Cao, A. Gasnikov, Acceleration in distributed optimization under similarity (2021). arXiv preprint arXiv:2110.12347

156.

V. Tominin, Y. Tominin, E. Borodich, D. Kovalev, A. Gasnikov, P. Dvurechensky, On accelerated methods for saddle-point problems with composite structure (2021). arXiv preprint arXiv:2103.09344

157.

J.N. Tsitsiklis, Problems in decentralized decision making and computation. Technical report, Massachusetts Inst of Tech Cambridge Lab for Information and Decision Systems, 1984

158.

C.A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, A. Nedić, Distributed computation of Wasserstein barycenters over networks, in 2018 IEEE 57th Annual Conference on Decision and Control (CDC) (2018). Accepted, arXiv:1803.02933

159.

C.A. Uribe, S. Lee, A. Gasnikov, A. Nedić, Optimal algorithms for distributed optimization (2017). arXiv preprint arXiv:1712.00232

160.

C.A. Uribe, S. Lee, A. Gasnikov, A. Nedić, A dual approach for optimal algorithms in distributed optimization over networks. Information Theory and Applications Workshop (ITA) 2020, 1–37 (2020). https://doi.org/10.1109/ITA50056.2020.9244951. IEEEMATH

161.

S. Vaswani, F. Bach, M. Schmidt, Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron, in The 22nd International Conference on Artificial Intelligence and Statistics (2019), pp. 1195–1204

162.

J. von Neumann, O. Morgenstern, H. Kuhn, Theory of Games and Economic Behavior (commemorative Edition) (Princeton University Press, Princeton, 2007)

163.

H.-T. Wai, Z. Yang, Z. Wang, M. Hong, Multi-agent reinforcement learning via double averaging primal-dual optimization (2018). arXiv preprint arXiv:1806.00877

164.

W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, H. Li, TernGrad: Ternary gradients to reduce communication in distributed deep learning, in Advances in Neural Information Processing Systems (2017), pp. 1509–1519

165.

B. Woodworth, K.K. Patel, N. Srebro, Minibatch vs local SGD for heterogeneous distributed learning (2020). arXiv preprint arXiv:2006.04735

166.

B. Woodworth, K.K. Patel, S.U. Stich, Z. Dai, B. Bullins, H.B. McMahan, O. Shamir, N. Srebro, Is local SGD better than minibatch SGD? (2020). arXiv preprint arXiv:2002.07839

167.

L. Xiao, S. Boyd, Fast linear iterations for distributed averaging. Syst. Control Lett. 53(1), 65–78 (2004)MathSciNetMATHCrossRef

168.

J. Xu, Y. Tian, Y. Sun, G. Scutari, Accelerated primal-dual algorithms for distributed smooth convex optimization over networks (2019). arXiv preprint arXiv:1910.10666

169.

J. Yang, S. Zhang, N. Kiyavash, N. He, A catalyst framework for minimax optimization. Adv. Neural Inf. Proces. Syst. 33, 5667–5678 (2020)

170.

H. Ye, L. Luo, Z. Zhou, T. Zhang, Multi-consensus decentralized accelerated gradient descent (2020). arXiv preprint arXiv:2005.00797

171.

H. Ye, Z. Zhou, L. Luo, T. Zhang, Decentralized accelerated proximal gradient descent, in Advances in Neural Information Processing Systems, vol. 33 (2020)

172.

H. Yu, R. Jin, S. Yang, On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization (2019). arXiv preprint arXiv:1905.03817

173.

K. Yuan, Q. Ling, W. Yin, On the convergence of decentralized gradient descent. SIAM J. Optim 26(3), 1835–1854 (2016)MathSciNetMATHCrossRef

174.

K. Zhou, Direct acceleration of saga using sampled negative momentum (2018). arXiv preprint arXiv:1806.11048

175.

K. Zhou, F. Shang, J. Cheng, A simple stochastic variance reduced algorithm with fast convergence rates (2018). arXiv preprint arXiv:1806.11027

Titel: Recent Theoretical Advances in Decentralized Distributed Convex Optimization
verfasst von: Eduard Gorbunov
Alexander Rogozin
Aleksandr Beznosikov
Darina Dvinskikh
Alexander Gasnikov
Verlag: Springer International Publishing
Buch: High-Dimensional Optimization and Probability
Print ISBN: 978-3-031-00831-3

Electronic ISBN: 978-3-031-00832-0

Copyright-Jahr: 2022
DOI: https://doi.org/10.1007/978-3-031-00832-0_8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"