Skip to main content
Erschienen in: Autonomous Agents and Multi-Agent Systems 3/2019

27.04.2019

Decomposition methods with deep corrections for reinforcement learning

verfasst von: Maxime Bouton, Kyle D. Julian, Alireza Nakhaei, Kikuo Fujimura, Mykel J. Kochenderfer

Erschienen in: Autonomous Agents and Multi-Agent Systems | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Decomposition methods have been proposed to approximate solutions to large sequential decision making problems. In contexts where an agent interacts with multiple entities, utility decomposition can be used to separate the global objective into local tasks considering each individual entity independently. An arbitrator is then responsible for combining the individual utilities and selecting an action in real time to solve the global problem. Although these techniques can perform well empirically, they rely on strong assumptions of independence between the local tasks and sacrifice the optimality of the global solution. This paper proposes an approach that improves upon such approximate solutions by learning a correction term represented by a neural network. We demonstrate this approach on a fisheries management problem where multiple boats must coordinate to maximize their catch over time as well as on a pedestrian avoidance problem for autonomous driving. In each problem, decomposition methods can scale to multiple boats or pedestrians by using strategies involving one entity. We verify empirically that the proposed correction method significantly improves the decomposition method and outperforms a policy trained on the full scale problem without utility decomposition.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kochenderfer, M. J. (2015). Decision making under uncertainty: Theory and application. Cambridge: MIT Press.CrossRefMATH Kochenderfer, M. J. (2015). Decision making under uncertainty: Theory and application. Cambridge: MIT Press.CrossRefMATH
2.
Zurück zum Zitat Russell, S. J., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. In International conference on machine learning (ICML). Russell, S. J., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. In International conference on machine learning (ICML).
3.
Zurück zum Zitat Tesauro, G. (2005). Online resource allocation using decompositional reinforcement learning. In AAAI conference on artificial intelligence (AAAI). Tesauro, G. (2005). Online resource allocation using decompositional reinforcement learning. In AAAI conference on artificial intelligence (AAAI).
4.
Zurück zum Zitat Bernstein, D. S., Zilberstein, S., & Immerman, N. (2000). The complexity of decentralized control of Markov decision processes. In Conference on uncertainty in artificial intelligence (UAI). Bernstein, D. S., Zilberstein, S., & Immerman, N. (2000). The complexity of decentralized control of Markov decision processes. In Conference on uncertainty in artificial intelligence (UAI).
5.
Zurück zum Zitat Rosenblatt, J. K. (2000). Optimal selection of uncertain actions by maximizing expected utility. Autonomous Robots, 9(1), 17–25.CrossRef Rosenblatt, J. K. (2000). Optimal selection of uncertain actions by maximizing expected utility. Autonomous Robots, 9(1), 17–25.CrossRef
6.
Zurück zum Zitat Chryssanthacopoulos, J. P., & Kochenderfer, M. J. (2012). Decomposition methods for optimized collision avoidance with multiple threats. AIAA Journal of Guidance, Control, and Dynamics, 35(2), 398–405.CrossRef Chryssanthacopoulos, J. P., & Kochenderfer, M. J. (2012). Decomposition methods for optimized collision avoidance with multiple threats. AIAA Journal of Guidance, Control, and Dynamics, 35(2), 398–405.CrossRef
7.
Zurück zum Zitat Ong, H. Y., & Kochenderfer, M. J. (2015). Short-term conflict resolution for unmanned aircraft traffic management. In Digital avionics systems conference (DASC). Ong, H. Y., & Kochenderfer, M. J. (2015). Short-term conflict resolution for unmanned aircraft traffic management. In Digital avionics systems conference (DASC).
8.
Zurück zum Zitat Wray, K. H., Witwicki, S. J., & Zilberstein, S. (2017). Online decision-making for scalable autonomous systems. In International joint conference on artificial intelligence (IJCAI). Wray, K. H., Witwicki, S. J., & Zilberstein, S. (2017). Online decision-making for scalable autonomous systems. In International joint conference on artificial intelligence (IJCAI).
9.
Zurück zum Zitat Hung, S., & Givigi, S. N. (2017). A q-learning approach to flocking with UAVs in a stochastic environment. IEEE Transactions on Cybernetics, 47(1), 186–197.CrossRef Hung, S., & Givigi, S. N. (2017). A q-learning approach to flocking with UAVs in a stochastic environment. IEEE Transactions on Cybernetics, 47(1), 186–197.CrossRef
10.
Zurück zum Zitat Tan, M. (1993). Multi-agent reinforcement learning: Independent versus cooperative agents. In International conference on machine learning (ICML). Tan, M. (1993). Multi-agent reinforcement learning: Independent versus cooperative agents. In International conference on machine learning (ICML).
11.
Zurück zum Zitat Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI conference on artificial intelligence (AAAI). Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI conference on artificial intelligence (AAAI).
12.
Zurück zum Zitat Tompa, R. E., & Kochenderfer, M. J. (2018). Optimal aircraft rerouting during space launches using adaptive spatial discretization. In Digital avionics systems conference (DASC). Tompa, R. E., & Kochenderfer, M. J. (2018). Optimal aircraft rerouting during space launches using adaptive spatial discretization. In Digital avionics systems conference (DASC).
13.
Zurück zum Zitat Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for traffic light control. In NIPS workshop on learning, inference and control of multi-agent systems. Van der Pol, E., & Oliehoek, F. A. (2016). Coordinated deep reinforcement learners for traffic light control. In NIPS workshop on learning, inference and control of multi-agent systems.
14.
Zurück zum Zitat Julian, K. D., & Kochenderfer, M. J. (2018). Autonomous distributed wildfire surveillance using deep reinforcement learning. In AIAA guidance, navigation, and control conference (GNC), decomposition methods with deep corrections for reinforcement learning 25. Julian, K. D., & Kochenderfer, M. J. (2018). Autonomous distributed wildfire surveillance using deep reinforcement learning. In AIAA guidance, navigation, and control conference (GNC), decomposition methods with deep corrections for reinforcement learning 25.
15.
Zurück zum Zitat Oliehoek, F. A., Whiteson, S., & Spaan, M. T. J. (2013). Approximate solutions for factored Dec-POMDPs with many agents. In International conference on autonomous agents and multiagent systems (AAMAS). Oliehoek, F. A., Whiteson, S., & Spaan, M. T. J. (2013). Approximate solutions for factored Dec-POMDPs with many agents. In International conference on autonomous agents and multiagent systems (AAMAS).
16.
Zurück zum Zitat Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V. F., Jaderberg, M., et al. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In International conference on autonomous agents and multi-agent systems (AAMAS). Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V. F., Jaderberg, M., et al. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In International conference on autonomous agents and multi-agent systems (AAMAS).
17.
Zurück zum Zitat Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.CrossRef
18.
Zurück zum Zitat Gu, S., Holly, E., Lillicrap, T. P., & Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE international conference on robotics and automation (ICRA). Gu, S., Holly, E., Lillicrap, T. P., & Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In IEEE international conference on robotics and automation (ICRA).
19.
Zurück zum Zitat Smart, W. D., & Kaelbling, L. P. (2002). Effective reinforcement learning for mobile robots. In IEEE international conference on robotics and automation (ICRA). Smart, W. D., & Kaelbling, L. P. (2002). Effective reinforcement learning for mobile robots. In IEEE international conference on robotics and automation (ICRA).
20.
Zurück zum Zitat Gottwald, M., Meyer, D., Shen, H., & Diepold, K. (2017). Learning to walk with prior knowledge. In IEEE international conference on advanced intelligent mechatronics (AIM). Gottwald, M., Meyer, D., Shen, H., & Diepold, K. (2017). Learning to walk with prior knowledge. In IEEE international conference on advanced intelligent mechatronics (AIM).
21.
Zurück zum Zitat Cutler, M., Walsh, T. J., & How, J. P. (2015). Real-world reinforcement learning via multifidelity simulators. IEEE Transactions on Robotics, 31(3), 655–671.CrossRef Cutler, M., Walsh, T. J., & How, J. P. (2015). Real-world reinforcement learning via multifidelity simulators. IEEE Transactions on Robotics, 31(3), 655–671.CrossRef
22.
Zurück zum Zitat Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations (ICLR). Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In International conference on learning representations (ICLR).
23.
Zurück zum Zitat Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI). Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence (AAAI).
24.
Zurück zum Zitat Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., & de Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML). Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., & de Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In International conference on machine learning (ICML).
25.
Zurück zum Zitat Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning—An introduction. Cambridge: MIT Press.CrossRefMATH Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning—An introduction. Cambridge: MIT Press.CrossRefMATH
26.
Zurück zum Zitat Eldred, M., & Dunlavy, D. (2006). Formulations for surrogate-based optimization with data fit, multifidelity, and reduced-order models. In AIAA/ISSMO multi-disciplinary analysis and optimization conference. Eldred, M., & Dunlavy, D. (2006). Formulations for surrogate-based optimization with data fit, multifidelity, and reduced-order models. In AIAA/ISSMO multi-disciplinary analysis and optimization conference.
27.
Zurück zum Zitat Rajnarayan, D., Haas, A., & Kroo, I. (2008). A multifidelity gradient-free optimization method and application to aerodynamic design. In AIAA/ISSMO multidisciplinary analysis and optimization conference. Rajnarayan, D., Haas, A., & Kroo, I. (2008). A multifidelity gradient-free optimization method and application to aerodynamic design. In AIAA/ISSMO multidisciplinary analysis and optimization conference.
28.
Zurück zum Zitat Egorov, M., Sunberg, Z. N., Balaban, E., Wheeler, T. A., Gupta, J. K., & Kochenderfer, M. J. (2017). POMDPs.jl: A framework for sequential decision making under uncertainty. Journal of Machine Learning, 18(26), 1–5.MathSciNetMATH Egorov, M., Sunberg, Z. N., Balaban, E., Wheeler, T. A., Gupta, J. K., & Kochenderfer, M. J. (2017). POMDPs.jl: A framework for sequential decision making under uncertainty. Journal of Machine Learning, 18(26), 1–5.MathSciNetMATH
29.
Zurück zum Zitat Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. International Journal of Robotics Research, 33(9), 1288–1302.CrossRef Bai, H., Hsu, D., & Lee, W. S. (2014). Integrated perception and planning in the continuous space: A POMDP approach. International Journal of Robotics Research, 33(9), 1288–1302.CrossRef
30.
Zurück zum Zitat Brechtel, S., Gindele, T., & Dillmann, R. (2014). Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs. In IEEE international conference on intelligent transportation systems (ITSC). Brechtel, S., Gindele, T., & Dillmann, R. (2014). Probabilistic decision-making under uncertainty for autonomous driving using continuous POMDPs. In IEEE international conference on intelligent transportation systems (ITSC).
31.
Zurück zum Zitat Bandyopadhyay, T., Won, K. S., Frazzoli, E., Hsu, D., Lee, W. S., & Rus, D. (2012). Intention-aware motion planning. In Algorithmic foundations of robotics X. Bandyopadhyay, T., Won, K. S., Frazzoli, E., Hsu, D., Lee, W. S., & Rus, D. (2012). Intention-aware motion planning. In Algorithmic foundations of robotics X.
32.
Zurück zum Zitat Chae, H., Kang, C. M., Kim, B., Kim, J., Chung, C. C., & Choi, J. W. (2017) Autonomous braking system via deep reinforcement learning. In IEEE international conference on intelligent transportation systems (ITSC). Chae, H., Kang, C. M., Kim, B., Kim, J., Chung, C. C., & Choi, J. W. (2017) Autonomous braking system via deep reinforcement learning. In IEEE international conference on intelligent transportation systems (ITSC).
33.
Zurück zum Zitat Chen, B., Zhao, D., & Peng, H. (2017). Evaluation of automated vehicles encountering pedestrians at unsignalized crossings. In IEEE intelligent vehicles symposium (IV). Chen, B., Zhao, D., & Peng, H. (2017). Evaluation of automated vehicles encountering pedestrians at unsignalized crossings. In IEEE intelligent vehicles symposium (IV).
Metadaten
Titel
Decomposition methods with deep corrections for reinforcement learning
verfasst von
Maxime Bouton
Kyle D. Julian
Alireza Nakhaei
Kikuo Fujimura
Mykel J. Kochenderfer
Publikationsdatum
27.04.2019
Verlag
Springer US
Erschienen in
Autonomous Agents and Multi-Agent Systems / Ausgabe 3/2019
Print ISSN: 1387-2532
Elektronische ISSN: 1573-7454
DOI
https://doi.org/10.1007/s10458-019-09407-z

Premium Partner