Skip to main content

2022 | OriginalPaper | Buchkapitel

Recent Theoretical Advances in Decentralized Distributed Convex Optimization

verfasst von : Eduard Gorbunov, Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh, Alexander Gasnikov

Erschienen in: High-Dimensional Optimization and Probability

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent results that have not been published yet and that could be found in detail only in arXiv preprints.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The narrative in this section follows [51].
 
2
For the sake of simplicity, we slightly abuse the notation and denote gradients and subgradients similarly.
 
3
The overall number of performed iterations during the calls of AC-SA2 equals N.
 
4
This approach was described in [32] and formally proved in [51].
 
5
Under assumption that measures are separated from zero, see the details in [21] and the proof of Proposition 2.5 from [30].
 
6
The narrative in this section follows [15].
 
Literatur
1.
Zurück zum Zitat S. Abadeh, P. Esfahani, D. Kuhn, Distributionally robust logistic regression, in Advances in Neural Information Processing Systems (NeurIPS) (2015), pp. 1576–1584 S. Abadeh, P. Esfahani, D. Kuhn, Distributionally robust logistic regression, in Advances in Neural Information Processing Systems (NeurIPS) (2015), pp. 1576–1584
2.
Zurück zum Zitat A. Aghajan, B. Touri, Distributed optimization over dependent random networks (2020). arXiv preprint arXiv:2010.01956 A. Aghajan, B. Touri, Distributed optimization over dependent random networks (2020). arXiv preprint arXiv:2010.01956
3.
Zurück zum Zitat S.A. Alghunaim, E.K. Ryu, K. Yuan, A.H. Sayed, Decentralized proximal gradient algorithms with linear convergence rates. IEEE Trans. Autom. Control 66(6), 2787–2794 (2020)MathSciNetMATHCrossRef S.A. Alghunaim, E.K. Ryu, K. Yuan, A.H. Sayed, Decentralized proximal gradient algorithms with linear convergence rates. IEEE Trans. Autom. Control 66(6), 2787–2794 (2020)MathSciNetMATHCrossRef
4.
Zurück zum Zitat D. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnovic, QSGD: communication-efficient SGD via gradient quantization and encoding, in Advances in Neural Information Processing Systems (2017), pp. 1709–1720 D. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnovic, QSGD: communication-efficient SGD via gradient quantization and encoding, in Advances in Neural Information Processing Systems (2017), pp. 1709–1720
5.
Zurück zum Zitat Z. Allen-Zhu, Katyusha: the first direct acceleration of stochastic gradient methods, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017 (ACM, New York, 2017), pp. 1200–1205. newblock arXiv:1603.05953 Z. Allen-Zhu, Katyusha: the first direct acceleration of stochastic gradient methods, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017 (ACM, New York, 2017), pp. 1200–1205. newblock arXiv:1603.05953
6.
Zurück zum Zitat Z. Allen-Zhu, How to make the gradients small stochastically: even faster convex and nonconvex SGD, in Advances in Neural Information Processing Systems (2018), pp. 1157–1167 Z. Allen-Zhu, How to make the gradients small stochastically: even faster convex and nonconvex SGD, in Advances in Neural Information Processing Systems (2018), pp. 1157–1167
7.
Zurück zum Zitat A.S. Anikin, A.V. Gasnikov, P.E. Dvurechensky, A.I. Tyurin, A.V. Chernov, Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints. Comput. Math. Math. Phys. 57(8), 1262–1276 (2017)MathSciNetMATHCrossRef A.S. Anikin, A.V. Gasnikov, P.E. Dvurechensky, A.I. Tyurin, A.V. Chernov, Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints. Comput. Math. Math. Phys. 57(8), 1262–1276 (2017)MathSciNetMATHCrossRef
8.
Zurück zum Zitat Y. Arjevani, O. Shamir, Communication complexity of distributed convex learning and optimization, in Advances in Neural Information Processing Systems (2015), pp. 1756–1764 Y. Arjevani, O. Shamir, Communication complexity of distributed convex learning and optimization, in Advances in Neural Information Processing Systems (2015), pp. 1756–1764
9.
Zurück zum Zitat M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, in Proceedings of the 34th International Conference on Machine Learning (ICML), vol. 70(1) (2017), pp. 214–223 M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, in Proceedings of the 34th International Conference on Machine Learning (ICML), vol. 70(1) (2017), pp. 214–223
10.
Zurück zum Zitat N. Bansal, A. Gupta, Potential-function proofs for gradient methods. Theory Comput. 15(1), 1–32 (2019)MathSciNetMATH N. Bansal, A. Gupta, Potential-function proofs for gradient methods. Theory Comput. 15(1), 1–32 (2019)MathSciNetMATH
11.
Zurück zum Zitat D. Basu, D. Data, C. Karakus, S. Diggavi, Qsparse-local-SGD: distributed SGD with quantization, sparsification, and local computations (2019). arXiv preprint arXiv:1906.02367 D. Basu, D. Data, C. Karakus, S. Diggavi, Qsparse-local-SGD: distributed SGD with quantization, sparsification, and local computations (2019). arXiv preprint arXiv:1906.02367
12.
Zurück zum Zitat A. Bayandina, P. Dvurechensky, A. Gasnikov, F. Stonyakin, A. Titov, Mirror descent and convex optimization problems with non-smooth inequality constraints, in Large-Scale and Distributed Optimization (Springer, Berlin, 2018), pp. 181–213MATH A. Bayandina, P. Dvurechensky, A. Gasnikov, F. Stonyakin, A. Titov, Mirror descent and convex optimization problems with non-smooth inequality constraints, in Large-Scale and Distributed Optimization (Springer, Berlin, 2018), pp. 181–213MATH
13.
Zurück zum Zitat D.P. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, vol. 23 (Prentice Hall, Englewood Cliffs, 1989) D.P. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, vol. 23 (Prentice Hall, Englewood Cliffs, 1989)
14.
Zurück zum Zitat A. Beznosikov, P. Dvurechensky, A. Koloskova, V. Samokhin, S.U. Stich, A. Gasnikov, Decentralized local stochastic extra-gradient for variational inequalities (2021). arXiv preprint arXiv:2106.08315 A. Beznosikov, P. Dvurechensky, A. Koloskova, V. Samokhin, S.U. Stich, A. Gasnikov, Decentralized local stochastic extra-gradient for variational inequalities (2021). arXiv preprint arXiv:2106.08315
15.
Zurück zum Zitat A. Beznosikov, E. Gorbunov, A. Gasnikov, Derivative-free method for composite optimization with applications to decentralized distributed optimization. IFAC-PapersOnLine 53(2), 4038–4043 (2020)CrossRef A. Beznosikov, E. Gorbunov, A. Gasnikov, Derivative-free method for composite optimization with applications to decentralized distributed optimization. IFAC-PapersOnLine 53(2), 4038–4043 (2020)CrossRef
16.
Zurück zum Zitat A. Beznosikov, S. Horváth, P. Richtárik, M. Safaryan, On biased compression for distributed learning (2020). arXiv preprint arXiv:2002.12410 A. Beznosikov, S. Horváth, P. Richtárik, M. Safaryan, On biased compression for distributed learning (2020). arXiv preprint arXiv:2002.12410
17.
Zurück zum Zitat A. Beznosikov, D. Kovalev, A. Sadiev, P. Richtarik, A. Gasnikov, Optimal distributed algorithms for stochastic variational inequalities (2021). arXiv preprint A. Beznosikov, D. Kovalev, A. Sadiev, P. Richtarik, A. Gasnikov, Optimal distributed algorithms for stochastic variational inequalities (2021). arXiv preprint
18.
Zurück zum Zitat A. Beznosikov, A. Rogozin, D. Kovalev, A. Gasnikov, Near-optimal decentralized algorithms for saddle point problems over time-varying networks, in International Conference on Optimization and Applications (Springer, Berlin, 2021), pp. 246–257 A. Beznosikov, A. Rogozin, D. Kovalev, A. Gasnikov, Near-optimal decentralized algorithms for saddle point problems over time-varying networks, in International Conference on Optimization and Applications (Springer, Berlin, 2021), pp. 246–257
19.
Zurück zum Zitat A. Beznosikov, A. Sadiev, A. Gasnikov, Gradient-free methods with inexact oracle for convex-concave stochastic saddle-point problem, in International Conference on Mathematical Optimization Theory and Operations Research (Springer, Berlin, 2020), pp. 105–119MATH A. Beznosikov, A. Sadiev, A. Gasnikov, Gradient-free methods with inexact oracle for convex-concave stochastic saddle-point problem, in International Conference on Mathematical Optimization Theory and Operations Research (Springer, Berlin, 2020), pp. 105–119MATH
20.
Zurück zum Zitat A. Beznosikov, G. Scutari, A. Rogozin, A. Gasnikov, Distributed saddle-point problems under data similarity, in Advances in Neural Information Processing Systems, vol. 34 (2021) A. Beznosikov, G. Scutari, A. Rogozin, A. Gasnikov, Distributed saddle-point problems under data similarity, in Advances in Neural Information Processing Systems, vol. 34 (2021)
21.
Zurück zum Zitat J. Blanchet, A. Jambulapati, C. Kent, A. Sidford, Towards optimal running times for optimal transport (2018). arXiv preprint arXiv:1810.07717 J. Blanchet, A. Jambulapati, C. Kent, A. Sidford, Towards optimal running times for optimal transport (2018). arXiv preprint arXiv:1810.07717
22.
23.
Zurück zum Zitat N. Cesa-bianchi, A. Conconi, C. Gentile, On the generalization ability of on-line learning algorithms, in Advances in Neural Information Processing Systems, vol. 14, ed. by T.G. Dietterich, S. Becker, Z. Ghahramani (MIT Press, Cambridge, 2002), pp. 359–366 N. Cesa-bianchi, A. Conconi, C. Gentile, On the generalization ability of on-line learning algorithms, in Advances in Neural Information Processing Systems, vol. 14, ed. by T.G. Dietterich, S. Becker, Z. Ghahramani (MIT Press, Cambridge, 2002), pp. 359–366
24.
Zurück zum Zitat A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetMATHCrossRef A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetMATHCrossRef
25.
Zurück zum Zitat M. Cuturi, G. Peyré, A smoothed dual approach for variational Wasserstein problems. SIAM J. Imaging Sci. 9(1), 320–343 (2016)MathSciNetMATHCrossRef M. Cuturi, G. Peyré, A smoothed dual approach for variational Wasserstein problems. SIAM J. Imaging Sci. 9(1), 320–343 (2016)MathSciNetMATHCrossRef
26.
Zurück zum Zitat A. Defazio, F. Bach, S. Lacoste-Julien, Saga: a fast incremental gradient method with support for non-strongly convex composite objectives, in Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14 (MIT Press, Cambridge, 2014), pp. 1646–1654 A. Defazio, F. Bach, S. Lacoste-Julien, Saga: a fast incremental gradient method with support for non-strongly convex composite objectives, in Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14 (MIT Press, Cambridge, 2014), pp. 1646–1654
27.
Zurück zum Zitat O. Devolder, Exactness, Inexactness and Stochasticity in First-Order Methods for Large-Scale Convex Optimization. Ph.D. Thesis, ICTEAM and CORE, Université Catholique de Louvain, 2013 O. Devolder, Exactness, Inexactness and Stochasticity in First-Order Methods for Large-Scale Convex Optimization. Ph.D. Thesis, ICTEAM and CORE, Université Catholique de Louvain, 2013
28.
Zurück zum Zitat O. Devolder, F. Glineur, Y. Nesterov, First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016:47 (2013)MATH O. Devolder, F. Glineur, Y. Nesterov, First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers 2013016:47 (2013)MATH
29.
Zurück zum Zitat O. Devolder, F. Glineur, Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)MathSciNetMATHCrossRef O. Devolder, F. Glineur, Y. Nesterov, First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)MathSciNetMATHCrossRef
30.
Zurück zum Zitat D. Dvinskikh, Stochastic approximation versus sample average approximation for population Wasserstein barycenters (2020). arXiv preprint arXiv:2001.07697 D. Dvinskikh, Stochastic approximation versus sample average approximation for population Wasserstein barycenters (2020). arXiv preprint arXiv:2001.07697
31.
Zurück zum Zitat D. Dvinskikh, Decentralized algorithms for Wasserstein barycenters (2021). arXiv preprint arXiv:2105.01587 D. Dvinskikh, Decentralized algorithms for Wasserstein barycenters (2021). arXiv preprint arXiv:2105.01587
32.
Zurück zum Zitat D. Dvinskikh, A. Gasnikov, Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems. J. Inverse Ill-Posed Probl. 29(3), 385–405 (2021)MathSciNetMATHCrossRef D. Dvinskikh, A. Gasnikov, Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems. J. Inverse Ill-Posed Probl. 29(3), 385–405 (2021)MathSciNetMATHCrossRef
33.
Zurück zum Zitat D. Dvinskikh, A. Gasnikov, A. Rogozin, A. Beznosikov, Parallel and distributed algorithms for ML problems 2020. arXiv preprint arXiv:2010.09585 D. Dvinskikh, A. Gasnikov, A. Rogozin, A. Beznosikov, Parallel and distributed algorithms for ML problems 2020. arXiv preprint arXiv:2010.09585
34.
Zurück zum Zitat D. Dvinskikh, E. Gorbunov, A. Gasnikov, P. Dvurechensky, C.A. Uribe, On primal and dual approaches for distributed stochastic convex optimization over networks, in 2019 IEEE 58th Conference on Decision and Control (CDC) (IEEE, Piscataway, 2019), pp. 7435–7440CrossRef D. Dvinskikh, E. Gorbunov, A. Gasnikov, P. Dvurechensky, C.A. Uribe, On primal and dual approaches for distributed stochastic convex optimization over networks, in 2019 IEEE 58th Conference on Decision and Control (CDC) (IEEE, Piscataway, 2019), pp. 7435–7440CrossRef
35.
Zurück zum Zitat D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, in International Conference on Artificial Intelligence and Statistics (PMLR, 2021), pp. 1738–1746 D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, in International Conference on Artificial Intelligence and Statistics (PMLR, 2021), pp. 1738–1746
36.
Zurück zum Zitat D.M. Dvinskikh, A.I. Turin, A.V. Gasnikov, S.S. Omelchenko, Accelerated and nonaccelerated stochastic gradient descent in model generality. Matematicheskie Zametki 108(4), 515–528 (2020)MathSciNetCrossRef D.M. Dvinskikh, A.I. Turin, A.V. Gasnikov, S.S. Omelchenko, Accelerated and nonaccelerated stochastic gradient descent in model generality. Matematicheskie Zametki 108(4), 515–528 (2020)MathSciNetCrossRef
37.
Zurück zum Zitat P. Dvurechenskii, D. Dvinskikh, A. Gasnikov, C. Uribe, A. Nedich, Decentralize and randomize: Faster algorithm for Wasserstein barycenters, in Advances in Neural Information Processing Systems (2018), pp. 10760–10770 P. Dvurechenskii, D. Dvinskikh, A. Gasnikov, C. Uribe, A. Nedich, Decentralize and randomize: Faster algorithm for Wasserstein barycenters, in Advances in Neural Information Processing Systems (2018), pp. 10760–10770
38.
Zurück zum Zitat P. Dvurechensky, A. Gasnikov, Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016)MathSciNetMATHCrossRef P. Dvurechensky, A. Gasnikov, Stochastic intermediate gradient method for convex problems with stochastic inexact oracle. J. Optim. Theory Appl. 171(1), 121–145 (2016)MathSciNetMATHCrossRef
39.
Zurück zum Zitat P. Dvurechensky, A. Gasnikov, A. Tiurin, Randomized similar triangles method: a unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method) (2017). arXiv:1707.08486 P. Dvurechensky, A. Gasnikov, A. Tiurin, Randomized similar triangles method: a unifying framework for accelerated randomized optimization methods (coordinate descent, directional search, derivative-free method) (2017). arXiv:1707.08486
40.
Zurück zum Zitat F. Facchinei, J. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Series in Operations Research and Financial Engineering (Springer, New York, 2007) F. Facchinei, J. Pang, Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer Series in Operations Research and Financial Engineering (Springer, New York, 2007)
41.
Zurück zum Zitat A. Fallah, M. Gurbuzbalaban, A. Ozdaglar, U. Simsekli, L. Zhu, Robust distributed accelerated stochastic gradient methods for multi-agent networks (2019). arXiv preprint arXiv:1910.08701 A. Fallah, M. Gurbuzbalaban, A. Ozdaglar, U. Simsekli, L. Zhu, Robust distributed accelerated stochastic gradient methods for multi-agent networks (2019). arXiv preprint arXiv:1910.08701
42.
Zurück zum Zitat V. Feldman, J. Vondrak, High probability generalization bounds for uniformly stable algorithms with nearly optimal rate (2019). arXiv preprint arXiv:1902.10710 V. Feldman, J. Vondrak, High probability generalization bounds for uniformly stable algorithms with nearly optimal rate (2019). arXiv preprint arXiv:1902.10710
43.
Zurück zum Zitat D. Foster, A. Sekhari, O. Shamir, N. Srebro, K. Sridharan, B. Woodworth, The complexity of making the gradient small in stochastic convex optimization (2019). arXiv preprint arXiv:1902.04686 D. Foster, A. Sekhari, O. Shamir, N. Srebro, K. Sridharan, B. Woodworth, The complexity of making the gradient small in stochastic convex optimization (2019). arXiv preprint arXiv:1902.04686
44.
Zurück zum Zitat A. Gasnikov, Universal gradient descent (2017). arXiv preprint arXiv:1711.00394 A. Gasnikov, Universal gradient descent (2017). arXiv preprint arXiv:1711.00394
45.
Zurück zum Zitat A. Gasnikov, D. Dvinskikh, P. Dvurechensky, D. Kamzolov, V. Matyukhin, D. Pasechnyuk, N. Tupitsa, A. Chernov, Accelerated meta-algorithm for convex optimization problems. Comput. Math. Math. Phys. 61(1), 17–28 (2021)MathSciNetMATHCrossRef A. Gasnikov, D. Dvinskikh, P. Dvurechensky, D. Kamzolov, V. Matyukhin, D. Pasechnyuk, N. Tupitsa, A. Chernov, Accelerated meta-algorithm for convex optimization problems. Comput. Math. Math. Phys. 61(1), 17–28 (2021)MathSciNetMATHCrossRef
46.
Zurück zum Zitat A.V. Gasnikov, A.A. Lagunovskaya, I.N. Usmanova, F.A. Fedorenko, Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex. Autom. Remote Control 77(11), 2018–2034 (2016). arXiv:1412.3890 A.V. Gasnikov, A.A. Lagunovskaya, I.N. Usmanova, F.A. Fedorenko, Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex. Autom. Remote Control 77(11), 2018–2034 (2016). arXiv:1412.3890
47.
Zurück zum Zitat A.V. Gasnikov, Y.E. Nesterov, Universal method for stochastic composite optimization problems. Comput. Math. Math. Phys. 58(1), 48–64 (2018)MathSciNetMATHCrossRef A.V. Gasnikov, Y.E. Nesterov, Universal method for stochastic composite optimization problems. Comput. Math. Math. Phys. 58(1), 48–64 (2018)MathSciNetMATHCrossRef
48.
Zurück zum Zitat S. Ghadimi, G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. . SIAM J. Optim. 22(4), 1469–1492 (2012)MathSciNetMATHCrossRef S. Ghadimi, G. Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. . SIAM J. Optim. 22(4), 1469–1492 (2012)MathSciNetMATHCrossRef
49.
Zurück zum Zitat S. Ghadimi, G. Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013). arXiv:1309.5549 S. Ghadimi, G. Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013). arXiv:1309.5549
50.
Zurück zum Zitat I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems (NeurIPS) (2014), pp. 2672–2680 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems (NeurIPS) (2014), pp. 2672–2680
51.
Zurück zum Zitat E. Gorbunov, D. Dvinskikh, A. Gasnikov, Optimal decentralized distributed algorithms for stochastic convex optimization (2019). arXiv preprint arXiv:1911.07363 E. Gorbunov, D. Dvinskikh, A. Gasnikov, Optimal decentralized distributed algorithms for stochastic convex optimization (2019). arXiv preprint arXiv:1911.07363
52.
Zurück zum Zitat E. Gorbunov, P. Dvurechensky, A. Gasnikov, An accelerated method for derivative-free smooth stochastic convex optimization (2022). SIOPT (in print) E. Gorbunov, P. Dvurechensky, A. Gasnikov, An accelerated method for derivative-free smooth stochastic convex optimization (2022). SIOPT (in print)
53.
Zurück zum Zitat E. Gorbunov, F. Hanzely, P. Richtárik, Local SGD: unified theory and new efficient methods (2020). arXiv preprint arXiv:2011.02828 E. Gorbunov, F. Hanzely, P. Richtárik, Local SGD: unified theory and new efficient methods (2020). arXiv preprint arXiv:2011.02828
54.
Zurück zum Zitat E. Gorbunov, F. Hanzely, P. Richtárik, A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent, in International Conference on Artificial Intelligence and Statistics (PMLR, 2020), pp. 680–690 E. Gorbunov, F. Hanzely, P. Richtárik, A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent, in International Conference on Artificial Intelligence and Statistics (PMLR, 2020), pp. 680–690
55.
Zurück zum Zitat E. Gorbunov, D. Kovalev, D. Makarenko, P. Richtárik, Linearly converging error compensated SGD, in Advances in Neural Information Processing Systems, vol. 33 (2020) E. Gorbunov, D. Kovalev, D. Makarenko, P. Richtárik, Linearly converging error compensated SGD, in Advances in Neural Information Processing Systems, vol. 33 (2020)
56.
Zurück zum Zitat E. Gorbunov, E.A. Vorontsova, A.V. Gasnikov, On the upper bound for the expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere, in Mathematical Notes, vol. 106 (2019) E. Gorbunov, E.A. Vorontsova, A.V. Gasnikov, On the upper bound for the expectation of the norm of a vector uniformly distributed on the sphere and the phenomenon of concentration of uniform measure on the sphere, in Mathematical Notes, vol. 106 (2019)
57.
Zurück zum Zitat R.M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, P. Richtarik, SGD: general analysis and improved rates (2019). arXiv preprint arXiv:1901.09401 R.M. Gower, N. Loizou, X. Qian, A. Sailanbayev, E. Shulgin, P. Richtarik, SGD: general analysis and improved rates (2019). arXiv preprint arXiv:1901.09401
58.
Zurück zum Zitat S. Guminov, P. Dvurechensky, N. Tupitsa, A. Gasnikov, On a combination of alternating minimization and Nesterov’s momentum, in International Conference on Machine Learning (PMLR, 2021), pp. 3886–3898 S. Guminov, P. Dvurechensky, N. Tupitsa, A. Gasnikov, On a combination of alternating minimization and Nesterov’s momentum, in International Conference on Machine Learning (PMLR, 2021), pp. 3886–3898
59.
Zurück zum Zitat H. Hendrikx, F. Bach, L. Massoulie, An optimal algorithm for decentralized finite sum optimization (2020). arXiv preprint arXiv:2005.10675 H. Hendrikx, F. Bach, L. Massoulie, An optimal algorithm for decentralized finite sum optimization (2020). arXiv preprint arXiv:2005.10675
60.
Zurück zum Zitat H. Hendrikx, L. Xiao, S. Bubeck, F. Bach, L. Massoulie, Statistically preconditioned accelerated gradient method for distributed optimization (2020). arXiv preprint arXiv:2002.10726 H. Hendrikx, L. Xiao, S. Bubeck, F. Bach, L. Massoulie, Statistically preconditioned accelerated gradient method for distributed optimization (2020). arXiv preprint arXiv:2002.10726
61.
Zurück zum Zitat S. Horvath, C.-Y. Ho, L. Horvath, A.N. Sahu, M. Canini, P. Richtarik, Natural compression for distributed deep learning (2019). arXiv preprint arXiv:1905.10988 S. Horvath, C.-Y. Ho, L. Horvath, A.N. Sahu, M. Canini, P. Richtarik, Natural compression for distributed deep learning (2019). arXiv preprint arXiv:1905.10988
62.
Zurück zum Zitat S. Horváth, D. Kovalev, K. Mishchenko, S. Stich, P. Richtárik, Stochastic distributed learning with gradient quantization and variance reduction (2019). arXiv preprint arXiv:1904.05115 S. Horváth, D. Kovalev, K. Mishchenko, S. Stich, P. Richtárik, Stochastic distributed learning with gradient quantization and variance reduction (2019). arXiv preprint arXiv:1904.05115
63.
Zurück zum Zitat D. Jakovetić, J. Xavier, J.M. Moura, Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)MathSciNetMATHCrossRef D. Jakovetić, J. Xavier, J.M. Moura, Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)MathSciNetMATHCrossRef
64.
Zurück zum Zitat R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems (2013), pp. 315–323 R. Johnson, T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, in Advances in Neural Information Processing Systems (2013), pp. 315–323
65.
Zurück zum Zitat A. Juditsky, A. Nemirovski, First order methods for non-smooth convex large-scale optimization, I: general purpose methods, in Optimization for Machine Learning, ed. by S.W. Suvrit Sra, S. Nowozin (MIT Press, Cambridge, 2012), pp. 121–184 A. Juditsky, A. Nemirovski, First order methods for non-smooth convex large-scale optimization, I: general purpose methods, in Optimization for Machine Learning, ed. by S.W. Suvrit Sra, S. Nowozin (MIT Press, Cambridge, 2012), pp. 121–184
66.
Zurück zum Zitat A. Juditsky, A. Nemirovski, C. Tauvel, Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Syst. 1(1), 17–58 (2011)MathSciNetMATHCrossRef A. Juditsky, A. Nemirovski, C. Tauvel, Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Syst. 1(1), 17–58 (2011)MathSciNetMATHCrossRef
67.
Zurück zum Zitat A. Juditsky, Y. Nesterov, Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization. Stochastic Syst. 4(1), 44–80 (2014)MathSciNetMATHCrossRef A. Juditsky, Y. Nesterov, Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization. Stochastic Syst. 4(1), 44–80 (2014)MathSciNetMATHCrossRef
68.
Zurück zum Zitat P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open problems in federated learning (2019). arXiv preprint arXiv:1912.04977 P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open problems in federated learning (2019). arXiv preprint arXiv:1912.04977
70.
Zurück zum Zitat S.P. Karimireddy, S. Kale, M. Mohri, S.J. Reddi, S.U. Stich, A.T. Suresh, Scaffold: stochastic controlled averaging for federated learning (2019). arXiv preprint arXiv:1910.06378 S.P. Karimireddy, S. Kale, M. Mohri, S.J. Reddi, S.U. Stich, A.T. Suresh, Scaffold: stochastic controlled averaging for federated learning (2019). arXiv preprint arXiv:1910.06378
71.
Zurück zum Zitat S.P. Karimireddy, Q. Rebjock, S.U. Stich, M. Jaggi, Error feedback fixes signSGD and other gradient compression schemes (2019). arXiv preprint arXiv:1901.09847 S.P. Karimireddy, Q. Rebjock, S.U. Stich, M. Jaggi, Error feedback fixes signSGD and other gradient compression schemes (2019). arXiv preprint arXiv:1901.09847
72.
Zurück zum Zitat A. Khaled, K. Mishchenko, P. Richtárik, Tighter theory for local SGD on identical and heterogeneous data, in International Conference on Artificial Intelligence and Statistics (2020), pp. 4519–4529 A. Khaled, K. Mishchenko, P. Richtárik, Tighter theory for local SGD on identical and heterogeneous data, in International Conference on Artificial Intelligence and Statistics (2020), pp. 4519–4529
73.
Zurück zum Zitat V. Kibardin, Decomposition into functions in the minimization problem. Avtomatika i Telemekhanika (9), 66–79 (1979)MathSciNetMATH V. Kibardin, Decomposition into functions in the minimization problem. Avtomatika i Telemekhanika (9), 66–79 (1979)MathSciNetMATH
74.
Zurück zum Zitat A. Koloskova, T. Lin, S.U. Stich, An improved analysis of gradient tracking for decentralized machine learning, in Advances in Neural Information Processing Systems, vol. 34 (2021) A. Koloskova, T. Lin, S.U. Stich, An improved analysis of gradient tracking for decentralized machine learning, in Advances in Neural Information Processing Systems, vol. 34 (2021)
75.
Zurück zum Zitat A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, S.U. Stich, A unified theory of decentralized SGD with changing topology and local updates (ICML 2020). arXiv preprint arXiv:2003.10422 A. Koloskova, N. Loizou, S. Boreiri, M. Jaggi, S.U. Stich, A unified theory of decentralized SGD with changing topology and local updates (ICML 2020). arXiv preprint arXiv:2003.10422
76.
Zurück zum Zitat D. Kovalev, E. Gasanov, A. Gasnikov, P. Richtarik, Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks, in Advances in Neural Information Processing Systems, vol. 34 (2021) D. Kovalev, E. Gasanov, A. Gasnikov, P. Richtarik, Lower bounds and optimal algorithms for smooth and strongly convex decentralized optimization over time-varying networks, in Advances in Neural Information Processing Systems, vol. 34 (2021)
77.
Zurück zum Zitat D. Kovalev, A. Salim, P. Richtárik, Optimal and practical algorithms for smooth and strongly convex decentralized optimization, in Advances in Neural Information Processing Systems, vol. 33 (2020) D. Kovalev, A. Salim, P. Richtárik, Optimal and practical algorithms for smooth and strongly convex decentralized optimization, in Advances in Neural Information Processing Systems, vol. 33 (2020)
78.
Zurück zum Zitat D. Kovalev, E. Shulgin, P. Richtárik, A. Rogozin, A. Gasnikov, Adom: accelerated decentralized optimization method for time-varying networks (2021). arXiv preprint arXiv:2102.09234 D. Kovalev, E. Shulgin, P. Richtárik, A. Rogozin, A. Gasnikov, Adom: accelerated decentralized optimization method for time-varying networks (2021). arXiv preprint arXiv:2102.09234
79.
Zurück zum Zitat D. Kovalev, A. Beznosikov, A. Sadiev, M. Persiianov, P. Richtárik, A. Gasnikov, Optimal algorithms for decentralized stochastic variational inequalities (2022). arXiv preprint arXiv:2202.0277 D. Kovalev, A. Beznosikov, A. Sadiev, M. Persiianov, P. Richtárik, A. Gasnikov, Optimal algorithms for decentralized stochastic variational inequalities (2022). arXiv preprint arXiv:2202.0277
80.
Zurück zum Zitat A. Kroshnin, N. Tupitsa, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, C. Uribe, On the complexity of approximating Wasserstein barycenters, in International Conference on Machine Learning (PMLR, 2019), pp. 3530–3540 A. Kroshnin, N. Tupitsa, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, C. Uribe, On the complexity of approximating Wasserstein barycenters, in International Conference on Machine Learning (PMLR, 2019), pp. 3530–3540
81.
Zurück zum Zitat A. Kulunchakov, J. Mairal, Estimate sequences for stochastic composite optimization: variance reduction, acceleration, and robustness to noise (2019). arXiv preprint arXiv:1901.08788 A. Kulunchakov, J. Mairal, Estimate sequences for stochastic composite optimization: variance reduction, acceleration, and robustness to noise (2019). arXiv preprint arXiv:1901.08788
82.
Zurück zum Zitat A. Kulunchakov, J. Mairal, Estimate sequences for variance-reduced stochastic composite optimization (2019). arXiv preprint arXiv:1905.02374 A. Kulunchakov, J. Mairal, Estimate sequences for variance-reduced stochastic composite optimization (2019). arXiv preprint arXiv:1905.02374
83.
Zurück zum Zitat A. Kulunchakov, J. Mairal, A generic acceleration framework for stochastic composite optimization (2019). arXiv preprint arXiv:1906.01164 A. Kulunchakov, J. Mairal, A generic acceleration framework for stochastic composite optimization (2019). arXiv preprint arXiv:1906.01164
84.
Zurück zum Zitat G. Lan, An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012). Firs appeared in June 2008 G. Lan, An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012). Firs appeared in June 2008
86.
Zurück zum Zitat G. Lan, Lectures on optimization methods for machine learning (2019). e-print G. Lan, Lectures on optimization methods for machine learning (2019). e-print
87.
Zurück zum Zitat G. Lan, First-Order and Stochastic Optimization Methods for Machine Learning (Springer, Berlin, 2020)MATHCrossRef G. Lan, First-Order and Stochastic Optimization Methods for Machine Learning (Springer, Berlin, 2020)MATHCrossRef
88.
Zurück zum Zitat G. Lan, S. Lee, Y. Zhou, Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 180, 237–284 (2020)MathSciNetMATHCrossRef G. Lan, S. Lee, Y. Zhou, Communication-efficient algorithms for decentralized and stochastic optimization. Math. Program. 180, 237–284 (2020)MathSciNetMATHCrossRef
89.
Zurück zum Zitat G. Lan, Y. Ouyang, Mirror-prox sliding methods for solving a class of monotone variational inequalities (2021). arXiv preprint arXiv:2111.00996 G. Lan, Y. Ouyang, Mirror-prox sliding methods for solving a class of monotone variational inequalities (2021). arXiv preprint arXiv:2111.00996
90.
Zurück zum Zitat G. Lan, Y. Zhou, Random gradient extrapolation for distributed and stochastic optimization. SIAM J. Optim. 28(4), 2753–2782 (2018)MathSciNetMATHCrossRef G. Lan, Y. Zhou, Random gradient extrapolation for distributed and stochastic optimization. SIAM J. Optim. 28(4), 2753–2782 (2018)MathSciNetMATHCrossRef
91.
Zurück zum Zitat G. Lan, Z. Zhou, Algorithms for stochastic optimization with expectation constraints (2016). arXiv:1604.03887 G. Lan, Z. Zhou, Algorithms for stochastic optimization with expectation constraints (2016). arXiv:1604.03887
93.
Zurück zum Zitat S. Lee, A. Nedic, Distributed random projection algorithm for convex optimization. IEEE J. Selected Topics Signal Process. 7(2), 221–229 (2013)CrossRef S. Lee, A. Nedic, Distributed random projection algorithm for convex optimization. IEEE J. Selected Topics Signal Process. 7(2), 221–229 (2013)CrossRef
94.
Zurück zum Zitat H. Li, C. Fang, W. Yin, Z. Lin, Decentralized accelerated gradient methods with increasing penalty parameters. IEEE Trans. Signal Process. 68, 4855–4870 (2020)MathSciNetCrossRef H. Li, C. Fang, W. Yin, Z. Lin, Decentralized accelerated gradient methods with increasing penalty parameters. IEEE Trans. Signal Process. 68, 4855–4870 (2020)MathSciNetCrossRef
95.
Zurück zum Zitat H. Li, Z. Lin, Revisiting extra for smooth distributed optimization (2020). arXiv preprint arXiv:2002.10110 H. Li, Z. Lin, Revisiting extra for smooth distributed optimization (2020). arXiv preprint arXiv:2002.10110
96.
Zurück zum Zitat H. Li, Z. Lin, Accelerated gradient tracking over time-varying graphs for decentralized optimization (2021). arXiv preprint arXiv:2104.02596 H. Li, Z. Lin, Accelerated gradient tracking over time-varying graphs for decentralized optimization (2021). arXiv preprint arXiv:2104.02596
97.
Zurück zum Zitat H. Li, Z. Lin, Y. Fang, Optimal accelerated variance reduced extra and DIGing for strongly convex and smooth decentralized optimization (2020). arXiv preprint arXiv:2009.04373 H. Li, Z. Lin, Y. Fang, Optimal accelerated variance reduced extra and DIGing for strongly convex and smooth decentralized optimization (2020). arXiv preprint arXiv:2009.04373
98.
Zurück zum Zitat J. Li, C. Wu, Z. Wu, Q. Long, Gradient-free method for nonsmooth distributed optimization. J. Global Optim. 61, 325–340 (2015)MathSciNetMATHCrossRef J. Li, C. Wu, Z. Wu, Q. Long, Gradient-free method for nonsmooth distributed optimization. J. Global Optim. 61, 325–340 (2015)MathSciNetMATHCrossRef
99.
Zurück zum Zitat H. Lin, J. Mairal, Z. Harchaoui, A universal catalyst for first-order optimization, in Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15 (MIT Press, Cambridge, 2015), pp. 3384–3392 H. Lin, J. Mairal, Z. Harchaoui, A universal catalyst for first-order optimization, in Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15 (MIT Press, Cambridge, 2015), pp. 3384–3392
100.
Zurück zum Zitat T. Lin, C. Jin, M. I. Jordan, Near-optimal algorithms for minimax optimization, in Conference on Learning Theory (PMLR, 2020), pp. 2738–2779 T. Lin, C. Jin, M. I. Jordan, Near-optimal algorithms for minimax optimization, in Conference on Learning Theory (PMLR, 2020), pp. 2738–2779
101.
Zurück zum Zitat T. Lin, S.P. Karimireddy, S.U. Stich, M. Jaggi, Quasi-global momentum: accelerating decentralized deep learning on heterogeneous data (2021). arXiv preprint arXiv:2102.04761 T. Lin, S.P. Karimireddy, S.U. Stich, M. Jaggi, Quasi-global momentum: accelerating decentralized deep learning on heterogeneous data (2021). arXiv preprint arXiv:2102.04761
102.
Zurück zum Zitat J. Liu, A.S. Morse, Accelerated linear iterations for distributed averaging. Ann. Rev. Control 35(2), 160–165 (2011)CrossRef J. Liu, A.S. Morse, Accelerated linear iterations for distributed averaging. Ann. Rev. Control 35(2), 160–165 (2011)CrossRef
103.
Zurück zum Zitat M. Liu, W. Zhang, Y. Mroueh, X. Cui, J. Ross, T. Yang, P. Das, A decentralized parallel algorithm for training generative adversarial nets, in Advances in Neural Information Processing Systems (NeurIPS) (2020) M. Liu, W. Zhang, Y. Mroueh, X. Cui, J. Ross, T. Yang, P. Das, A decentralized parallel algorithm for training generative adversarial nets, in Advances in Neural Information Processing Systems (NeurIPS) (2020)
104.
Zurück zum Zitat W. Liu, A. Mokhtari, A. Ozdaglar, S. Pattathil, Z. Shen, N. Zheng, A decentralized proximal point-type method for non-convex non-concave saddle point problems. W. Liu, A. Mokhtari, A. Ozdaglar, S. Pattathil, Z. Shen, N. Zheng, A decentralized proximal point-type method for non-convex non-concave saddle point problems.
105.
Zurück zum Zitat W. Liu, A. Mokhtari, A. Ozdaglar, S. Pattathil, Z. Shen, N. Zheng, A decentralized proximal point-type method for saddle point problems (2019). arXiv preprint arXiv:1910.14380 W. Liu, A. Mokhtari, A. Ozdaglar, S. Pattathil, Z. Shen, N. Zheng, A decentralized proximal point-type method for saddle point problems (2019). arXiv preprint arXiv:1910.14380
106.
Zurück zum Zitat X. Liu, Y. Li, J. Tang, M. Yan, A double residual compression algorithm for efficient distributed learning (2019). arXiv preprint arXiv:1910.07561 X. Liu, Y. Li, J. Tang, M. Yan, A double residual compression algorithm for efficient distributed learning (2019). arXiv preprint arXiv:1910.07561
107.
Zurück zum Zitat D. Mateos-Núnez, J. Cortés, Distributed subgradient methods for saddle-point problems, in 2015 54th IEEE Conference on Decision and Control (CDC) (IEEE, Piscataway, 2015), pp. 5462–5467CrossRef D. Mateos-Núnez, J. Cortés, Distributed subgradient methods for saddle-point problems, in 2015 54th IEEE Conference on Decision and Control (CDC) (IEEE, Piscataway, 2015), pp. 5462–5467CrossRef
109.
Zurück zum Zitat K. Mishchenko, E. Gorbunov, M. Takáč, P. Richtárik, Distributed learning with compressed gradient differences (2019). arXiv preprint arXiv:1901.09269 K. Mishchenko, E. Gorbunov, M. Takáč, P. Richtárik, Distributed learning with compressed gradient differences (2019). arXiv preprint arXiv:1901.09269
110.
Zurück zum Zitat S. Muthukrishnan, B. Ghosh, M.H. Schultz, First-and second-order diffusive methods for rapid, coarse, distributed load balancing. Theory Comput. Syst. 31(4), 331–354 (1998)MathSciNetMATHCrossRef S. Muthukrishnan, B. Ghosh, M.H. Schultz, First-and second-order diffusive methods for rapid, coarse, distributed load balancing. Theory Comput. Syst. 31(4), 331–354 (1998)MathSciNetMATHCrossRef
111.
Zurück zum Zitat A. Nedic, Distributed gradient methods for convex machine learning problems in networks: distributed optimization. IEEE Signal Process. Mag. 37(3), 92–101 (2020)CrossRef A. Nedic, Distributed gradient methods for convex machine learning problems in networks: distributed optimization. IEEE Signal Process. Mag. 37(3), 92–101 (2020)CrossRef
112.
Zurück zum Zitat A. Nedic, A. Olshevsky, W. Shi, Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)MathSciNetMATHCrossRef A. Nedic, A. Olshevsky, W. Shi, Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)MathSciNetMATHCrossRef
113.
Zurück zum Zitat A. Nedić, A. Ozdaglar, Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)MathSciNetMATHCrossRef A. Nedić, A. Ozdaglar, Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 54(1), 48–61 (2009)MathSciNetMATHCrossRef
114.
Zurück zum Zitat A. Nemirovski, Prox-method with rate of convergence o(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)MathSciNetMATHCrossRef A. Nemirovski, Prox-method with rate of convergence o(1∕t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)MathSciNetMATHCrossRef
115.
Zurück zum Zitat A. Nemirovski, A. Juditsky, G. Lan, A. Shapiro, Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)MathSciNetMATHCrossRef A. Nemirovski, A. Juditsky, G. Lan, A. Shapiro, Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)MathSciNetMATHCrossRef
116.
Zurück zum Zitat Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, Massachusetts, 2004)MATHCrossRef Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic Publishers, Massachusetts, 2004)MATHCrossRef
117.
Zurück zum Zitat Y. Nesterov, How to make the gradients small. Optima 88, 10–11 (2012) Y. Nesterov, How to make the gradients small. Optima 88, 10–11 (2012)
118.
119.
Zurück zum Zitat Y. Nesterov, V.G. Spokoiny, Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017)MathSciNetMATHCrossRef Y. Nesterov, V.G. Spokoiny, Random gradient-free minimization of convex functions. Found. Comput. Math. 17(2), 527–566 (2017)MathSciNetMATHCrossRef
120.
Zurück zum Zitat L.M. Nguyen, P.H. Nguyen, M. van Dijk, P. Richtárik, K. Scheinberg, M. Takáč, SGD and Hogwild! convergence without the bounded gradients assumption (2018). arXiv preprint arXiv:1802.03801 L.M. Nguyen, P.H. Nguyen, M. van Dijk, P. Richtárik, K. Scheinberg, M. Takáč, SGD and Hogwild! convergence without the bounded gradients assumption (2018). arXiv preprint arXiv:1802.03801
121.
Zurück zum Zitat A. Olshevsky, I.C. Paschalidis, S. Pu, Asymptotic network independence in distributed optimization for machine learning (2019). arXiv preprint arXiv:1906.12345 A. Olshevsky, I.C. Paschalidis, S. Pu, Asymptotic network independence in distributed optimization for machine learning (2019). arXiv preprint arXiv:1906.12345
122.
Zurück zum Zitat A. Olshevsky, I.C. Paschalidis, S. Pu, A non-asymptotic analysis of network independence for distributed stochastic gradient descent (2019). arXiv preprint arXiv:1906.02702 A. Olshevsky, I.C. Paschalidis, S. Pu, A non-asymptotic analysis of network independence for distributed stochastic gradient descent (2019). arXiv preprint arXiv:1906.02702
123.
Zurück zum Zitat G. Peyré, M. Cuturi, et al., Computational optimal transport. Found. Trends® Mach. Learn. 11(5–6), 355–607 (2019) G. Peyré, M. Cuturi, et al., Computational optimal transport. Found. Trends® Mach. Learn. 11(5–6), 355–607 (2019)
125.
Zurück zum Zitat G. Qu, N. Li, Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Netw. Syst. 5(3), 1245–1260 (2017)MathSciNetMATHCrossRef G. Qu, N. Li, Harnessing smoothness to accelerate distributed optimization. IEEE Trans. Control Netw. Syst. 5(3), 1245–1260 (2017)MathSciNetMATHCrossRef
126.
127.
Zurück zum Zitat P. Rigollet, J. Weed, Entropic optimal transport is maximum-likelihood deconvolution. C. R. Math. 356(11–12), 1228–1235 (2018)MathSciNetMATHCrossRef P. Rigollet, J. Weed, Entropic optimal transport is maximum-likelihood deconvolution. C. R. Math. 356(11–12), 1228–1235 (2018)MathSciNetMATHCrossRef
129.
Zurück zum Zitat R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 2015) R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 2015)
130.
Zurück zum Zitat A. Rogozin, A. Beznosikov, D. Dvinskikh, D. Kovalev, P. Dvurechensky, A. Gasnikov, Decentralized distributed optimization for saddle point problems (2021). arXiv preprint arXiv:2102.07758 A. Rogozin, A. Beznosikov, D. Dvinskikh, D. Kovalev, P. Dvurechensky, A. Gasnikov, Decentralized distributed optimization for saddle point problems (2021). arXiv preprint arXiv:2102.07758
131.
Zurück zum Zitat A. Rogozin, M. Bochko, P. Dvurechensky, A. Gasnikov, V. Lukoshkin, An accelerated method for decentralized distributed stochastic optimization over time-varying graphs in Conference on Decision and Control (2021) A. Rogozin, M. Bochko, P. Dvurechensky, A. Gasnikov, V. Lukoshkin, An accelerated method for decentralized distributed stochastic optimization over time-varying graphs in Conference on Decision and Control (2021)
132.
Zurück zum Zitat A. Rogozin, A. Gasnikov, Projected gradient method for decentralized optimization over time-varying networks (2019). arXiv preprint arXiv:1911.08527 A. Rogozin, A. Gasnikov, Projected gradient method for decentralized optimization over time-varying networks (2019). arXiv preprint arXiv:1911.08527
133.
Zurück zum Zitat A. Rogozin, A. Gasnikov, Penalty-based method for decentralized optimization over time-varying graphs, in International Conference on Optimization and Applications (Springer, Berlin, 2020), pp. 239–256 A. Rogozin, A. Gasnikov, Penalty-based method for decentralized optimization over time-varying graphs, in International Conference on Optimization and Applications (Springer, Berlin, 2020), pp. 239–256
134.
Zurück zum Zitat A. Rogozin, V. Lukoshkin, A. Gasnikov, D. Kovalev, E. Shulgin, Towards accelerated rates for distributed optimization over time-varying networks, in International Conference on Optimization and Applications (Springer, Berlin, 2021), pp. 258–272 A. Rogozin, V. Lukoshkin, A. Gasnikov, D. Kovalev, E. Shulgin, Towards accelerated rates for distributed optimization over time-varying networks, in International Conference on Optimization and Applications (Springer, Berlin, 2021), pp. 258–272
135.
Zurück zum Zitat K. Scaman, F. Bach, S. Bubeck, Y.T. Lee, L. Massoulié, Optimal algorithms for smooth and strongly convex distributed optimization in networks, in Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), pp. 3027–3036. JMLR.org K. Scaman, F. Bach, S. Bubeck, Y.T. Lee, L. Massoulié, Optimal algorithms for smooth and strongly convex distributed optimization in networks, in Proceedings of the 34th International Conference on Machine Learning-Volume 70 (2017), pp. 3027–3036. JMLR.org
136.
Zurück zum Zitat K. Scaman, F. Bach, S. Bubeck, Y.T. Lee, L. Massoulié, Optimal convergence rates for convex distributed optimization in networks. J. Mach. Learn. Res. 20(159), 1–31 (2019)MathSciNetMATH K. Scaman, F. Bach, S. Bubeck, Y.T. Lee, L. Massoulié, Optimal convergence rates for convex distributed optimization in networks. J. Mach. Learn. Res. 20(159), 1–31 (2019)MathSciNetMATH
137.
Zurück zum Zitat K. Scaman, F. Bach, S. Bubeck, L. Massoulié, Y.T. Lee, Optimal algorithms for non-smooth distributed optimization in networks, in Advances in Neural Information Processing Systems (2018), pp. 2740–2749 K. Scaman, F. Bach, S. Bubeck, L. Massoulié, Y.T. Lee, Optimal algorithms for non-smooth distributed optimization in networks, in Advances in Neural Information Processing Systems (2018), pp. 2740–2749
138.
Zurück zum Zitat M. Schmidt, N. Le Roux, F. Bach, Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)MathSciNetMATHCrossRef M. Schmidt, N. Le Roux, F. Bach, Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)MathSciNetMATHCrossRef
139.
Zurück zum Zitat S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge, 2014)MATHCrossRef S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From Theory to Algorithms (Cambridge University Press, Cambridge, 2014)MATHCrossRef
140.
Zurück zum Zitat S. Shalev-Shwartz, O. Shamir, N. Srebro, K. Sridharan. Stochastic convex optimization, in COLT (2009) S. Shalev-Shwartz, O. Shamir, N. Srebro, K. Sridharan. Stochastic convex optimization, in COLT (2009)
141.
Zurück zum Zitat O. Shamir, An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18, 52:1–52:11 (2017). First appeared in arXiv:1507.08752 O. Shamir, An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18, 52:1–52:11 (2017). First appeared in arXiv:1507.08752
142.
Zurück zum Zitat O. Shamir, An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18(52), 1–11 (2017)MathSciNetMATH O. Shamir, An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Mach. Learn. Res. 18(52), 1–11 (2017)MathSciNetMATH
143.
Zurück zum Zitat W. Shi, Q. Ling, G. Wu, W. Yin, Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)MathSciNetMATHCrossRef W. Shi, Q. Ling, G. Wu, W. Yin, Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)MathSciNetMATHCrossRef
144.
Zurück zum Zitat Z. Song, L. Shi, S. Pu, M. Yan, Optimal gradient tracking for decentralized optimization (2021). arXiv preprint arXiv:2110.05282 Z. Song, L. Shi, S. Pu, M. Yan, Optimal gradient tracking for decentralized optimization (2021). arXiv preprint arXiv:2110.05282
145.
Zurück zum Zitat Z. Song, L. Shi, S. Pu, M. Yan, Provably accelerated decentralized gradient method over unbalanced directed graphs (2021). arXiv preprint arXiv:2107.12065 Z. Song, L. Shi, S. Pu, M. Yan, Provably accelerated decentralized gradient method over unbalanced directed graphs (2021). arXiv preprint arXiv:2107.12065
146.
Zurück zum Zitat V. Spokoiny et al., Parametric estimation. finite sample theory. Ann. Stat. 40(6), 2877–2909 (2012) V. Spokoiny et al., Parametric estimation. finite sample theory. Ann. Stat. 40(6), 2877–2909 (2012)
147.
Zurück zum Zitat I. Stepanov, A. Voronov, A. Beznosikov, A. Gasnikov, One-point gradient-free methods for composite optimization with applications to distributed optimization (2021). arXiv preprint arXiv:2107.05951 I. Stepanov, A. Voronov, A. Beznosikov, A. Gasnikov, One-point gradient-free methods for composite optimization with applications to distributed optimization (2021). arXiv preprint arXiv:2107.05951
148.
Zurück zum Zitat S.U. Stich, Local SGD converges fast and communicates little (2018). arXiv preprint arXiv:1805.09767 S.U. Stich, Local SGD converges fast and communicates little (2018). arXiv preprint arXiv:1805.09767
149.
Zurück zum Zitat S.U. Stich, J.-B. Cordonnier, M. Jaggi, Sparsified SGD with memory, in Advances in Neural Information Processing Systems (2018), pp. 4447–4458 S.U. Stich, J.-B. Cordonnier, M. Jaggi, Sparsified SGD with memory, in Advances in Neural Information Processing Systems (2018), pp. 4447–4458
150.
Zurück zum Zitat F. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C.A. Uribe, D. Pasechnyuk, et al., Gradient methods for problems with inexact model of the objective (2019). arXiv preprint arXiv:1902.09001 F. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C.A. Uribe, D. Pasechnyuk, et al., Gradient methods for problems with inexact model of the objective (2019). arXiv preprint arXiv:1902.09001
151.
152.
Zurück zum Zitat Y. Sun, A. Daneshmand, G. Scutari, Convergence rate of distributed optimization algorithms based on gradient tracking (2019). arXiv preprint arXiv:1905.02637 Y. Sun, A. Daneshmand, G. Scutari, Convergence rate of distributed optimization algorithms based on gradient tracking (2019). arXiv preprint arXiv:1905.02637
153.
Zurück zum Zitat Y. Sun, A. Daneshmand, G. Scutari, Distributed optimization based on gradient-tracking revisited: enhancing convergence rate via surrogation (2020). arXiv preprint arXiv:1905.02637 Y. Sun, A. Daneshmand, G. Scutari, Distributed optimization based on gradient-tracking revisited: enhancing convergence rate via surrogation (2020). arXiv preprint arXiv:1905.02637
155.
Zurück zum Zitat Y. Tian, G. Scutari, T. Cao, A. Gasnikov, Acceleration in distributed optimization under similarity (2021). arXiv preprint arXiv:2110.12347 Y. Tian, G. Scutari, T. Cao, A. Gasnikov, Acceleration in distributed optimization under similarity (2021). arXiv preprint arXiv:2110.12347
156.
Zurück zum Zitat V. Tominin, Y. Tominin, E. Borodich, D. Kovalev, A. Gasnikov, P. Dvurechensky, On accelerated methods for saddle-point problems with composite structure (2021). arXiv preprint arXiv:2103.09344 V. Tominin, Y. Tominin, E. Borodich, D. Kovalev, A. Gasnikov, P. Dvurechensky, On accelerated methods for saddle-point problems with composite structure (2021). arXiv preprint arXiv:2103.09344
157.
Zurück zum Zitat J.N. Tsitsiklis, Problems in decentralized decision making and computation. Technical report, Massachusetts Inst of Tech Cambridge Lab for Information and Decision Systems, 1984 J.N. Tsitsiklis, Problems in decentralized decision making and computation. Technical report, Massachusetts Inst of Tech Cambridge Lab for Information and Decision Systems, 1984
158.
Zurück zum Zitat C.A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, A. Nedić, Distributed computation of Wasserstein barycenters over networks, in 2018 IEEE 57th Annual Conference on Decision and Control (CDC) (2018). Accepted, arXiv:1803.02933 C.A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, A. Nedić, Distributed computation of Wasserstein barycenters over networks, in 2018 IEEE 57th Annual Conference on Decision and Control (CDC) (2018). Accepted, arXiv:1803.02933
159.
Zurück zum Zitat C.A. Uribe, S. Lee, A. Gasnikov, A. Nedić, Optimal algorithms for distributed optimization (2017). arXiv preprint arXiv:1712.00232 C.A. Uribe, S. Lee, A. Gasnikov, A. Nedić, Optimal algorithms for distributed optimization (2017). arXiv preprint arXiv:1712.00232
161.
Zurück zum Zitat S. Vaswani, F. Bach, M. Schmidt, Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron, in The 22nd International Conference on Artificial Intelligence and Statistics (2019), pp. 1195–1204 S. Vaswani, F. Bach, M. Schmidt, Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron, in The 22nd International Conference on Artificial Intelligence and Statistics (2019), pp. 1195–1204
162.
Zurück zum Zitat J. von Neumann, O. Morgenstern, H. Kuhn, Theory of Games and Economic Behavior (commemorative Edition) (Princeton University Press, Princeton, 2007) J. von Neumann, O. Morgenstern, H. Kuhn, Theory of Games and Economic Behavior (commemorative Edition) (Princeton University Press, Princeton, 2007)
163.
Zurück zum Zitat H.-T. Wai, Z. Yang, Z. Wang, M. Hong, Multi-agent reinforcement learning via double averaging primal-dual optimization (2018). arXiv preprint arXiv:1806.00877 H.-T. Wai, Z. Yang, Z. Wang, M. Hong, Multi-agent reinforcement learning via double averaging primal-dual optimization (2018). arXiv preprint arXiv:1806.00877
164.
Zurück zum Zitat W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, H. Li, TernGrad: Ternary gradients to reduce communication in distributed deep learning, in Advances in Neural Information Processing Systems (2017), pp. 1509–1519 W. Wen, C. Xu, F. Yan, C. Wu, Y. Wang, Y. Chen, H. Li, TernGrad: Ternary gradients to reduce communication in distributed deep learning, in Advances in Neural Information Processing Systems (2017), pp. 1509–1519
165.
Zurück zum Zitat B. Woodworth, K.K. Patel, N. Srebro, Minibatch vs local SGD for heterogeneous distributed learning (2020). arXiv preprint arXiv:2006.04735 B. Woodworth, K.K. Patel, N. Srebro, Minibatch vs local SGD for heterogeneous distributed learning (2020). arXiv preprint arXiv:2006.04735
166.
Zurück zum Zitat B. Woodworth, K.K. Patel, S.U. Stich, Z. Dai, B. Bullins, H.B. McMahan, O. Shamir, N. Srebro, Is local SGD better than minibatch SGD? (2020). arXiv preprint arXiv:2002.07839 B. Woodworth, K.K. Patel, S.U. Stich, Z. Dai, B. Bullins, H.B. McMahan, O. Shamir, N. Srebro, Is local SGD better than minibatch SGD? (2020). arXiv preprint arXiv:2002.07839
168.
Zurück zum Zitat J. Xu, Y. Tian, Y. Sun, G. Scutari, Accelerated primal-dual algorithms for distributed smooth convex optimization over networks (2019). arXiv preprint arXiv:1910.10666 J. Xu, Y. Tian, Y. Sun, G. Scutari, Accelerated primal-dual algorithms for distributed smooth convex optimization over networks (2019). arXiv preprint arXiv:1910.10666
169.
Zurück zum Zitat J. Yang, S. Zhang, N. Kiyavash, N. He, A catalyst framework for minimax optimization. Adv. Neural Inf. Proces. Syst. 33, 5667–5678 (2020) J. Yang, S. Zhang, N. Kiyavash, N. He, A catalyst framework for minimax optimization. Adv. Neural Inf. Proces. Syst. 33, 5667–5678 (2020)
170.
Zurück zum Zitat H. Ye, L. Luo, Z. Zhou, T. Zhang, Multi-consensus decentralized accelerated gradient descent (2020). arXiv preprint arXiv:2005.00797 H. Ye, L. Luo, Z. Zhou, T. Zhang, Multi-consensus decentralized accelerated gradient descent (2020). arXiv preprint arXiv:2005.00797
171.
Zurück zum Zitat H. Ye, Z. Zhou, L. Luo, T. Zhang, Decentralized accelerated proximal gradient descent, in Advances in Neural Information Processing Systems, vol. 33 (2020) H. Ye, Z. Zhou, L. Luo, T. Zhang, Decentralized accelerated proximal gradient descent, in Advances in Neural Information Processing Systems, vol. 33 (2020)
172.
Zurück zum Zitat H. Yu, R. Jin, S. Yang, On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization (2019). arXiv preprint arXiv:1905.03817 H. Yu, R. Jin, S. Yang, On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization (2019). arXiv preprint arXiv:1905.03817
173.
174.
Zurück zum Zitat K. Zhou, Direct acceleration of saga using sampled negative momentum (2018). arXiv preprint arXiv:1806.11048 K. Zhou, Direct acceleration of saga using sampled negative momentum (2018). arXiv preprint arXiv:1806.11048
175.
Zurück zum Zitat K. Zhou, F. Shang, J. Cheng, A simple stochastic variance reduced algorithm with fast convergence rates (2018). arXiv preprint arXiv:1806.11027 K. Zhou, F. Shang, J. Cheng, A simple stochastic variance reduced algorithm with fast convergence rates (2018). arXiv preprint arXiv:1806.11027
Metadaten
Titel
Recent Theoretical Advances in Decentralized Distributed Convex Optimization
verfasst von
Eduard Gorbunov
Alexander Rogozin
Aleksandr Beznosikov
Darina Dvinskikh
Alexander Gasnikov
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-031-00832-0_8