Skip to main content
Erschienen in: Queueing Systems 3-4/2022

31.03.2022

Learning to cooperate in agent-based control of queueing networks

verfasst von: Vivek S. Borkar

Erschienen in: Queueing Systems | Ausgabe 3-4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Excerpt

Control of queueing networks has been an active area of research for many years and with a variety of motivations, ranging from communication networks to supply chains. Nevertheless the problems amenable to clean and elegant analysis tend to be few and far between, and highly stylized. Real queueing networks tend to be plagued with many difficulties, both at the level of modeling and at the level of analysis, such as:
1.
Scale: One major issue is always the enormously large size of the network and the lack of structure because most of these networks are ‘emergent’ and structure, if any, can at best be characterized in statistical terms.
 
2.
Distributed, asynchronous control: The control, be it routing, admission or rate control, is at node or edge level, i.e., local and with local information. The only information available is the rewards and the local state. Gathering additional information through message passing may be unrealistic or possible only in a limited sense.
 
3.
Modeling issues: Assumptions about input processes and aspects of queueing dynamics, such as distributional assumptions or assumptions regarding independence or Markovianity, are often oversimplifications.
 
4.
Closed-loop stability: Despite major successes such as the backpressure scheme, one cannot say that the final word has been said on this.
 
5.
Choice of objectives: The primary objective of each queue is always its own throughput, but there can be secondary objectives such as energy efficiency and overall fairness. Also, ‘optimality’ is usually a tall order and one has to seek a ‘satisficing’ solution in the sense of Simon [13], i.e., one that meets some minimum specifications.
 
6.
Choice of policies: Distributed control of queueing networks is often viewed as a network game, thereby putting it firmly in the framework of stochastic games [11], with its equilibrium concepts such as Markov perfect equilibrium and Bayesian Nash equilibrium. However, it is not a priori ruled out that choice of a non-stationary policy at one or more queues may give a strictly better performance for all.
 
Given this, a case may be made for a data driven approach. We shall take inspiration from some recent work on foundational issues in reinforcement learning by Benjamin van Roy and associates [7, 10]. (See also [14] for a related discussion.) I introduce this paradigm in the next section and make a case for a community of learning automata aiming for ‘satisficing’ as embodied in Blackwell optimality. …

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abernethy, J., Bartlett, P. L., Hazan, E.: Blackwell approachability and no-regret learning are equivalent. In Proceedings of the 24th Annual Conference on Learning Theory, pages 27–46. PMLR (2011) Abernethy, J., Bartlett, P. L., Hazan, E.: Blackwell approachability and no-regret learning are equivalent. In Proceedings of the 24th Annual Conference on Learning Theory, pages 27–46. PMLR (2011)
2.
Zurück zum Zitat Axelrod, R.: The Complexity of Cooperation. Princeton University Press, NJ (1997) Axelrod, R.: The Complexity of Cooperation. Princeton University Press, NJ (1997)
3.
Zurück zum Zitat Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I and II (4th ed.). Athena Scientific, 2017/2012 Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I and II (4th ed.). Athena Scientific, 2017/2012
4.
Zurück zum Zitat Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pac. J. Math. 6, 1–8 (1956)CrossRef Blackwell, D.: An analog of the minimax theorem for vector payoffs. Pac. J. Math. 6, 1–8 (1956)CrossRef
5.
Zurück zum Zitat Brunton, S.L., Kutz, J.N.: Data-driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, Cambridge (2019)CrossRef Brunton, S.L., Kutz, J.N.: Data-driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, Cambridge (2019)CrossRef
6.
Zurück zum Zitat Crawford, V.P., Haller, H.: Learning how to cooperate: Optimal play in repeated coordination games. Econometrica: J. Econometric Soc. 58, 571–595 (1990)CrossRef Crawford, V.P., Haller, H.: Learning how to cooperate: Optimal play in repeated coordination games. Econometrica: J. Econometric Soc. 58, 571–595 (1990)CrossRef
7.
Zurück zum Zitat Dong, S., van Roy, B., Zhou, Z.: Simple agent, complex environment: Efficient reinforcement learning with agent state. arXiv preprint arXiv:2102.05261, (2021) Dong, S., van Roy, B., Zhou, Z.: Simple agent, complex environment: Efficient reinforcement learning with agent state. arXiv preprint arXiv:​2102.​05261, (2021)
8.
Zurück zum Zitat Francis, B.A., Wonham, W.M.: The internal model principle in control theory. Automatica 12, 457–465 (1976)CrossRef Francis, B.A., Wonham, W.M.: The internal model principle in control theory. Automatica 12, 457–465 (1976)CrossRef
9.
Zurück zum Zitat Levy, Y.J.: Discounted stochastic games with no stationary Nash equilibrium: two examples (corrigendum, with A. McLennan, in Econometrica 83(3), 1237–1252 (2015)). Econometrica 81, 1973–2007 (2013) Levy, Y.J.: Discounted stochastic games with no stationary Nash equilibrium: two examples (corrigendum, with A. McLennan, in Econometrica 83(3), 1237–1252 (2015)). Econometrica 81, 1973–2007 (2013)
10.
Zurück zum Zitat Lu, X., van Roy, B., Dwaracherla, V., Ibrahimi, M., Osband, I., Wen, Z.: Reinforcement learning, bit by bit. arXiv preprint arXiv:2103.04047, (2021) Lu, X., van Roy, B., Dwaracherla, V., Ibrahimi, M., Osband, I., Wen, Z.: Reinforcement learning, bit by bit. arXiv preprint arXiv:​2103.​04047, (2021)
11.
Zurück zum Zitat Menache, I., Ozdaglar, A.: Network Games: Theory, Models, and Dynamics. Synthesis Lectures on Communication Networks. Morgan & Claypool Publishers, (2011) Menache, I., Ozdaglar, A.: Network Games: Theory, Models, and Dynamics. Synthesis Lectures on Communication Networks. Morgan & Claypool Publishers, (2011)
12.
Zurück zum Zitat Nowak, M.A., Highfield, R.: SuperCooperators. Free Press, Mumbai (2011) Nowak, M.A., Highfield, R.: SuperCooperators. Free Press, Mumbai (2011)
13.
Zurück zum Zitat Simon, H.: Rational decision making in business organizations. Am. Econ. Rev. 69, 493–513 (1979) Simon, H.: Rational decision making in business organizations. Am. Econ. Rev. 69, 493–513 (1979)
14.
15.
Zurück zum Zitat Young, H.: Strategic Learning and its Limits. Oxford University Press, Oxford (2004)CrossRef Young, H.: Strategic Learning and its Limits. Oxford University Press, Oxford (2004)CrossRef
16.
Zurück zum Zitat Yu, H., Bertsekas, D.: On near optimality of the set of finite-state controllers for average cost POMDP. Math. Oper. Res. 33, 1–11 (2008)CrossRef Yu, H., Bertsekas, D.: On near optimality of the set of finite-state controllers for average cost POMDP. Math. Oper. Res. 33, 1–11 (2008)CrossRef
Metadaten
Titel
Learning to cooperate in agent-based control of queueing networks
verfasst von
Vivek S. Borkar
Publikationsdatum
31.03.2022
Verlag
Springer US
Erschienen in
Queueing Systems / Ausgabe 3-4/2022
Print ISSN: 0257-0130
Elektronische ISSN: 1572-9443
DOI
https://doi.org/10.1007/s11134-022-09772-9

Weitere Artikel der Ausgabe 3-4/2022

Queueing Systems 3-4/2022 Zur Ausgabe

Premium Partner