Skip to main content

2021 | OriginalPaper | Buchkapitel

Reconnaissance for Reinforcement Learning with Safety Constraints

verfasst von : Shin-ichi Maeda, Hayato Watahiki, Yi Ouyang, Shintarou Okada, Masanori Koyama, Prabhat Nagarajan

Erschienen in: Machine Learning and Knowledge Discovery in Databases. Research Track

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As RL algorithms have grown more powerful and sophisticated, they show promise for several practical applications in the real world. However, safety is a necessary prerequisite to deploying RL systems in real world domains such as autonomous vehicles or cooperative robotics. Safe RL problems are often formulated as constrained Markov decision processes (CMDPs). In particular, solving CMDPs becomes challenging when safety must be ensured in rare, dangerous situations in stochastic environments. In this paper, we propose an approach for CMDPs where we have access to a generative model (e.g. a simulator) that can preferentially sample rare, dangerous events. In particular, our approach, termed the RP algorithm decomposes the CMDP into a pair of MDPs which we term a reconnaissance MDP (R-MDP) and a planning MDP (P-MDP). In the R-MDP, we leverage the generative model to preferentially sample rare, dangerous events and train a threat function, the Q-function analog of danger that can determine the safety level of a given state-action pair. In the P-MDP, we train a reward-seeking policy while using the trained threat function to ensure that the agent considers only safe actions. We show that our approach, termed the RP algorithm enjoys several useful theoretical properties. Moreover, we present an approximate version of the RP algorithm that can significantly reduce the difficulty of solving the R-MDP. We demonstrate the efficacy of our method over classical approaches in multiple tasks, including a collision-free navigation task with dynamic obstacles.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Total variation distance is defined as \(d_{TV}(p(a),q(a)) = \frac{1}{2}\sum _a |p(a)-q(a)|\).
 
2
The code and videos are available at https://​github.​com/​pfnet-research/​rp-safe-rl.
 
Literatur
1.
Zurück zum Zitat Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: ICML, pp. 22–31 (2017) Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: ICML, pp. 22–31 (2017)
2.
Zurück zum Zitat Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., Tomlin, C.J.: Reachability-based safe learning with Gaussian processes. In: CDC, pp. 1424–1431 (2014) Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., Tomlin, C.J.: Reachability-based safe learning with Gaussian processes. In: CDC, pp. 1424–1431 (2014)
3.
Zurück zum Zitat Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)MATH Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)MATH
4.
Zurück zum Zitat Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: ECC, pp. 3420–3431 (2019) Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: ECC, pp. 3420–3431 (2019)
5.
Zurück zum Zitat Bansal, S., Chen, M., Herbert, S.L., Tomlin, C.J.: Hamilton-Jacobi reachability: a brief overview and recent advances. In: CDC, pp. 2242–2253 (2017) Bansal, S., Chen, M., Herbert, S.L., Tomlin, C.J.: Hamilton-Jacobi reachability: a brief overview and recent advances. In: CDC, pp. 2242–2253 (2017)
6.
Zurück zum Zitat Blake, R.J., Mayne David, Q.: Model Predictive Control: Theory and Design. Nob Hill Pub., Madison (2009) Blake, R.J., Mayne David, Q.: Model Predictive Control: Theory and Design. Nob Hill Pub., Madison (2009)
7.
Zurück zum Zitat Brockman, G., et al.: OpenAI gym (2016) Brockman, G., et al.: OpenAI gym (2016)
8.
Zurück zum Zitat Cetin, O., Kurnaz, S., Kaynak, O., Temeltas, H.: Potential field-based navigation task for autonomous flight control of unmanned aerial vehicles. Int. J. Autom. Control 5(1), 1–21 (2011)CrossRef Cetin, O., Kurnaz, S., Kaynak, O., Temeltas, H.: Potential field-based navigation task for autonomous flight control of unmanned aerial vehicles. Int. J. Autom. Control 5(1), 1–21 (2011)CrossRef
9.
Zurück zum Zitat Chang, P., Mertz, C.: Monte Carlo sampling based imminent collision detection algorithm. In: ICTIS, pp. 368–376 (2017) Chang, P., Mertz, C.: Monte Carlo sampling based imminent collision detection algorithm. In: ICTIS, pp. 368–376 (2017)
10.
Zurück zum Zitat Chen, M., Herbert, S., Tomlin, C.J.: Fast reachable set approximations via state decoupling disturbances. In: CDC, pp. 191–196 (2016) Chen, M., Herbert, S., Tomlin, C.J.: Fast reachable set approximations via state decoupling disturbances. In: CDC, pp. 191–196 (2016)
11.
Zurück zum Zitat Chen, M., Herbert, S., Tomlin, C.J.: Exact and efficient Hamilton-Jacobi-based guaranteed safety analysis via system decomposition. In: ICRA (2017) Chen, M., Herbert, S., Tomlin, C.J.: Exact and efficient Hamilton-Jacobi-based guaranteed safety analysis via system decomposition. In: ICRA (2017)
12.
Zurück zum Zitat Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. JMLR (2018) Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. JMLR (2018)
13.
Zurück zum Zitat Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: NeurIPS (2018) Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: NeurIPS (2018)
14.
Zurück zum Zitat Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., Duenez-Guzman, E.: Lyapunov-based safe policy optimization for continuous control. In: ICML (2019) Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., Duenez-Guzman, E.: Lyapunov-based safe policy optimization for continuous control. In: ICML (2019)
15.
Zurück zum Zitat Di Cairano, S., Bernardini, D., Bemporad, A., Kolmanovsky, I.V.: Stochastic MPC with learning for driver-predictive vehicle control and its application to HEV energy management. IEEE Trans. Control Syst. Technol. 22(3), 1018–1031 (2013)CrossRef Di Cairano, S., Bernardini, D., Bemporad, A., Kolmanovsky, I.V.: Stochastic MPC with learning for driver-predictive vehicle control and its application to HEV energy management. IEEE Trans. Control Syst. Technol. 22(3), 1018–1031 (2013)CrossRef
16.
Zurück zum Zitat Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: CoRL, pp. 1–16 (2017) Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: CoRL, pp. 1–16 (2017)
17.
Zurück zum Zitat Eidehall, A., Petersson, L.: Statistical threat assessment for general road scenes using Monte Carlo sampling. IEEE Trans. Intell. Transp. Syst. 9(1), 137–147 (2008)CrossRef Eidehall, A., Petersson, L.: Statistical threat assessment for general road scenes using Monte Carlo sampling. IEEE Trans. Intell. Transp. Syst. 9(1), 137–147 (2008)CrossRef
18.
Zurück zum Zitat Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019)MathSciNetCrossRef Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019)MathSciNetCrossRef
19.
Zurück zum Zitat Ge, S.S., Cui, Y.J.: New potential functions for mobile robot path planning. IEEE Trans. Robot. Autom. 16(5), 615–620 (2000)CrossRef Ge, S.S., Cui, Y.J.: New potential functions for mobile robot path planning. IEEE Trans. Robot. Autom. 16(5), 615–620 (2000)CrossRef
21.
Zurück zum Zitat Ji, J., Khajepour, A., Melek, W.W., Huang, Y.: Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints. IEEE Trans. Veh. Technol. 66(2), 952–964 (2016)CrossRef Ji, J., Khajepour, A., Melek, W.W., Huang, Y.: Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints. IEEE Trans. Veh. Technol. 66(2), 952–964 (2016)CrossRef
22.
Zurück zum Zitat Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: CDC, pp. 6059–6066 (2018) Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: CDC, pp. 6059–6066 (2018)
23.
Zurück zum Zitat Lam, C.P., Chou, C.T., Chiang, K.H., Fu, L.C.: Human-centered robot navigation-towards a harmoniously human-robot coexisting environment. T-RO 27(1), 99–112 (2010) Lam, C.P., Chou, C.T., Chiang, K.H., Fu, L.C.: Human-centered robot navigation-towards a harmoniously human-robot coexisting environment. T-RO 27(1), 99–112 (2010)
24.
Zurück zum Zitat Lee, D.N.: A theory of visual control of braking based on information about time-to-collision. Perception 5(4), 437–459 (1976)CrossRef Lee, D.N.: A theory of visual control of braking based on information about time-to-collision. Perception 5(4), 437–459 (1976)CrossRef
25.
Zurück zum Zitat Maciejowski, J.M.: Predictive Control: With Constraints. Pearson Education, London (2002)MATH Maciejowski, J.M.: Predictive Control: With Constraints. Pearson Education, London (2002)MATH
26.
Zurück zum Zitat Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef
27.
Zurück zum Zitat Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: ICML (2012) Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: ICML (2012)
28.
Zurück zum Zitat Prajna, S., Jadbabaie, A., Pappas, G.J.: A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control 52(8), 1415–1428 (2007)MathSciNetCrossRef Prajna, S., Jadbabaie, A., Pappas, G.J.: A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control 52(8), 1415–1428 (2007)MathSciNetCrossRef
29.
Zurück zum Zitat Rasekhipour, Y., Khajepour, A., Chen, S.K., Litkouhi, B.: A potential field-based model predictive path-planning controller for autonomous road vehicles. IEEE Trans. Intell. Transp. Syst. 18(5), 1255–1267 (2016)CrossRef Rasekhipour, Y., Khajepour, A., Chen, S.K., Litkouhi, B.: A potential field-based model predictive path-planning controller for autonomous road vehicles. IEEE Trans. Intell. Transp. Syst. 18(5), 1255–1267 (2016)CrossRef
31.
Zurück zum Zitat Summers, S., Kamgarpour, M., Lygeros, J., Tomlin, C.: A stochastic reach-avoid problem with random obstacles. In: 14th International Conference on Hybrid Systems: Computation and Control, pp. 251–260 (2011) Summers, S., Kamgarpour, M., Lygeros, J., Tomlin, C.: A stochastic reach-avoid problem with random obstacles. In: 14th International Conference on Hybrid Systems: Computation and Control, pp. 251–260 (2011)
32.
Zurück zum Zitat Wabersich, K.P., Zeilinger, M.N.: Linear model predictive safety certification for learning-based control. In: CDC, pp. 7130–7135 (2018) Wabersich, K.P., Zeilinger, M.N.: Linear model predictive safety certification for learning-based control. In: CDC, pp. 7130–7135 (2018)
33.
Zurück zum Zitat Wang, Y., Boyd, S.: Fast model predictive control using online optimization. IEEE Trans. Control Syst. Technol. 18(2), 267–278 (2010)CrossRef Wang, Y., Boyd, S.: Fast model predictive control using online optimization. IEEE Trans. Control Syst. Technol. 18(2), 267–278 (2010)CrossRef
34.
Zurück zum Zitat Weiskircher, T., Wang, Q., Ayalew, B.: Predictive guidance and control framework for (semi-) autonomous vehicles in public traffic. IEEE Trans. Control Syst. Technol. 25(6), 2034–2046 (2017)CrossRef Weiskircher, T., Wang, Q., Ayalew, B.: Predictive guidance and control framework for (semi-) autonomous vehicles in public traffic. IEEE Trans. Control Syst. Technol. 25(6), 2034–2046 (2017)CrossRef
35.
Zurück zum Zitat Wolf, M.T., Burdick, J.W.: Artificial potential functions for highway driving with collision avoidance. In: ICRA, pp. 3731–3736 (2008) Wolf, M.T., Burdick, J.W.: Artificial potential functions for highway driving with collision avoidance. In: ICRA, pp. 3731–3736 (2008)
Metadaten
Titel
Reconnaissance for Reinforcement Learning with Safety Constraints
verfasst von
Shin-ichi Maeda
Hayato Watahiki
Yi Ouyang
Shintarou Okada
Masanori Koyama
Prabhat Nagarajan
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-86520-7_35

Premium Partner