nach oben

Autonomous Agents and Multi-Agent Systems

Erschienen in:

27.04.2019

Decomposition methods with deep corrections for reinforcement learning

verfasst von: Maxime Bouton, Kyle D. Julian, Alireza Nakhaei, Kikuo Fujimura, Mykel J. Kochenderfer

Erschienen in: Autonomous Agents and Multi-Agent Systems | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.

Vorheriger Artikel Inferring true voting outcomes in homophilic social networks

Nächster Artikel VerifCar: a framework for modeling and model checking communicating autonomous vehicles

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Kochenderfer, M. J. (2015). Decision making under uncertainty: Theory and application. Cambridge: MIT Press.CrossRefMATH

Russell, S. J., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. In International conference on machine learning (ICML).

Tesauro, G. (2005). Online resource allocation using decompositional reinforcement learning. In AAAI conference on artificial intelligence (AAAI).

Bernstein, D. S., Zilberstein, S., & Immerman, N. (2000). The complexity of decentralized control of Markov decision processes. In Conference on uncertainty in artificial intelligence (UAI).

Rosenblatt, J. K. (2000). Optimal selection of uncertain actions by maximizing expected utility. Autonomous Robots, 9(1), 17–25.CrossRef

Chryssanthacopoulos, J. P., & Kochenderfer, M. J. (2012). Decomposition methods for optimized collision avoidance with multiple threats. AIAA Journal of Guidance, Control, and Dynamics, 35(2), 398–405.CrossRef

Ong, H. Y., & Kochenderfer, M. J. (2015). Short-term conflict resolution for unmanned aircraft traffic management. In Digital avionics systems conference (DASC).

Wray, K. H., Witwicki, S. J., & Zilberstein, S. (2017). Online decision-making for scalable autonomous systems. In International joint conference on artificial intelligence (IJCAI).

Hung, S., & Givigi, S. N. (2017). A q-learning approach to flocking with UAVs in a stochastic environment. IEEE Transactions on Cybernetics, 47(1), 186–197.CrossRef

10.

Tan, M. (1993). Multi-agent reinforcement learning: Independent versus cooperative agents. In International conference on machine learning (ICML).

11.

Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI conference on artificial intelligence (AAAI).

12.

Tompa, R. E., & Kochenderfer, M. J. (2018). Optimal aircraft rerouting during space launches using adaptive spatial discretization. In Digital avionics systems conference (DASC).

13.

Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for traffic light control. In NIPS workshop on learning, inference and control of multi-agent systems.

14.

Julian, K. D., & Kochenderfer, M. J. (2018). Autonomous distributed wildfire surveillance using deep reinforcement learning. In AIAA guidance, navigation, and control conference (GNC), decomposition methods with deep corrections for reinforcement learning 25.

15.

Oliehoek, F. A., Whiteson, S., & Spaan, M. T. J. (2013). Approximate solutions for factored Dec-POMDPs with many agents. In International conference on autonomous agents and multiagent systems (AAMAS).

16.

Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V. F., Jaderberg, M., et al. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In International conference on autonomous agents and multi-agent systems (AAMAS).

17.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.CrossRef

18.

Gu, S., Holly, E., Lillicrap, T. P., & Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE international conference on robotics and automation (ICRA).

19.

Smart, W. D., & Kaelbling, L. P. (2002). Effective reinforcement learning for mobile robots. In IEEE international conference on robotics and automation (ICRA).

20.

Gottwald, M., Meyer, D., Shen, H., & Diepold, K. (2017). Learning to walk with prior knowledge. In IEEE international conference on advanced intelligent mechatronics (AIM).

21.

Cutler, M., Walsh, T. J., & How, J. P. (2015). Real-world reinforcement learning via multifidelity simulators. IEEE Transactions on Robotics, 31(3), 655–671.CrossRef

22.

Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations (ICLR).

23.

Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI).

24.

Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., & de Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML).

25.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning—An introduction. Cambridge: MIT Press.CrossRefMATH

26.

Eldred, M., & Dunlavy, D. (2006). Formulations for surrogate-based optimization with data fit, multifidelity, and reduced-order models. In AIAA/ISSMO multi-disciplinary analysis and optimization conference.

27.

Rajnarayan, D., Haas, A., & Kroo, I. (2008). A multifidelity gradient-free optimization method and application to aerodynamic design. In AIAA/ISSMO multidisciplinary analysis and optimization conference.

28.

Egorov, M., Sunberg, Z. N., Balaban, E., Wheeler, T. A., Gupta, J. K., & Kochenderfer, M. J. (2017). POMDPs.jl: A framework for sequential decision making under uncertainty. Journal of Machine Learning, 18(26), 1–5.MathSciNetMATH

29.

Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. International Journal of Robotics Research, 33(9), 1288–1302.CrossRef

30.

Brechtel, S., Gindele, T., & Dillmann, R. (2014). Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs. In IEEE international conference on intelligent transportation systems (ITSC).

31.

Bandyopadhyay, T., Won, K. S., Frazzoli, E., Hsu, D., Lee, W. S., & Rus, D. (2012). Intention-aware motion planning. In Algorithmic foundations of robotics X.

32.

Chae, H., Kang, C. M., Kim, B., Kim, J., Chung, C. C., & Choi, J. W. (2017) Autonomous braking system via deep reinforcement learning. In IEEE international conference on intelligent transportation systems (ITSC).

33.

Chen, B., Zhao, D., & Peng, H. (2017). Evaluation of automated vehicles encountering pedestrians at unsignalized crossings. In IEEE intelligent vehicles symposium (IV).

Titel: Decomposition methods with deep corrections for reinforcement learning
verfasst von: Maxime Bouton
Kyle D. Julian
Alireza Nakhaei
Kikuo Fujimura
Mykel J. Kochenderfer
Publikationsdatum: 27.04.2019
Verlag: Springer US
Erschienen in: Autonomous Agents and Multi-Agent Systems / Ausgabe 3/2019
Print ISSN: 1387-2532
Elektronische ISSN: 1573-7454
DOI: https://doi.org/10.1007/s10458-019-09407-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner