Skip to main content
Top

2021 | OriginalPaper | Chapter

Reconnaissance for Reinforcement Learning with Safety Constraints

Authors : Shin-ichi Maeda, Hayato Watahiki, Yi Ouyang, Shintarou Okada, Masanori Koyama, Prabhat Nagarajan

Published in: Machine Learning and Knowledge Discovery in Databases. Research Track

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As RL algorithms have grown more powerful and sophisticated, they show promise for several practical applications in the real world. However, safety is a necessary prerequisite to deploying RL systems in real world domains such as autonomous vehicles or cooperative robotics. Safe RL problems are often formulated as constrained Markov decision processes (CMDPs). In particular, solving CMDPs becomes challenging when safety must be ensured in rare, dangerous situations in stochastic environments. In this paper, we propose an approach for CMDPs where we have access to a generative model (e.g. a simulator) that can preferentially sample rare, dangerous events. In particular, our approach, termed the RP algorithm decomposes the CMDP into a pair of MDPs which we term a reconnaissance MDP (R-MDP) and a planning MDP (P-MDP). In the R-MDP, we leverage the generative model to preferentially sample rare, dangerous events and train a threat function, the Q-function analog of danger that can determine the safety level of a given state-action pair. In the P-MDP, we train a reward-seeking policy while using the trained threat function to ensure that the agent considers only safe actions. We show that our approach, termed the RP algorithm enjoys several useful theoretical properties. Moreover, we present an approximate version of the RP algorithm that can significantly reduce the difficulty of solving the R-MDP. We demonstrate the efficacy of our method over classical approaches in multiple tasks, including a collision-free navigation task with dynamic obstacles.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Total variation distance is defined as \(d_{TV}(p(a),q(a)) = \frac{1}{2}\sum _a |p(a)-q(a)|\).
 
2
The code and videos are available at https://​github.​com/​pfnet-research/​rp-safe-rl.
 
Literature
1.
go back to reference Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: ICML, pp. 22–31 (2017) Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: ICML, pp. 22–31 (2017)
2.
go back to reference Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., Tomlin, C.J.: Reachability-based safe learning with Gaussian processes. In: CDC, pp. 1424–1431 (2014) Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., Tomlin, C.J.: Reachability-based safe learning with Gaussian processes. In: CDC, pp. 1424–1431 (2014)
3.
go back to reference Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)MATH Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)MATH
4.
go back to reference Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: ECC, pp. 3420–3431 (2019) Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: ECC, pp. 3420–3431 (2019)
5.
go back to reference Bansal, S., Chen, M., Herbert, S.L., Tomlin, C.J.: Hamilton-Jacobi reachability: a brief overview and recent advances. In: CDC, pp. 2242–2253 (2017) Bansal, S., Chen, M., Herbert, S.L., Tomlin, C.J.: Hamilton-Jacobi reachability: a brief overview and recent advances. In: CDC, pp. 2242–2253 (2017)
6.
go back to reference Blake, R.J., Mayne David, Q.: Model Predictive Control: Theory and Design. Nob Hill Pub., Madison (2009) Blake, R.J., Mayne David, Q.: Model Predictive Control: Theory and Design. Nob Hill Pub., Madison (2009)
7.
8.
go back to reference Cetin, O., Kurnaz, S., Kaynak, O., Temeltas, H.: Potential field-based navigation task for autonomous flight control of unmanned aerial vehicles. Int. J. Autom. Control 5(1), 1–21 (2011)CrossRef Cetin, O., Kurnaz, S., Kaynak, O., Temeltas, H.: Potential field-based navigation task for autonomous flight control of unmanned aerial vehicles. Int. J. Autom. Control 5(1), 1–21 (2011)CrossRef
9.
go back to reference Chang, P., Mertz, C.: Monte Carlo sampling based imminent collision detection algorithm. In: ICTIS, pp. 368–376 (2017) Chang, P., Mertz, C.: Monte Carlo sampling based imminent collision detection algorithm. In: ICTIS, pp. 368–376 (2017)
10.
go back to reference Chen, M., Herbert, S., Tomlin, C.J.: Fast reachable set approximations via state decoupling disturbances. In: CDC, pp. 191–196 (2016) Chen, M., Herbert, S., Tomlin, C.J.: Fast reachable set approximations via state decoupling disturbances. In: CDC, pp. 191–196 (2016)
11.
go back to reference Chen, M., Herbert, S., Tomlin, C.J.: Exact and efficient Hamilton-Jacobi-based guaranteed safety analysis via system decomposition. In: ICRA (2017) Chen, M., Herbert, S., Tomlin, C.J.: Exact and efficient Hamilton-Jacobi-based guaranteed safety analysis via system decomposition. In: ICRA (2017)
12.
go back to reference Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. JMLR (2018) Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. JMLR (2018)
13.
go back to reference Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: NeurIPS (2018) Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: NeurIPS (2018)
14.
go back to reference Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., Duenez-Guzman, E.: Lyapunov-based safe policy optimization for continuous control. In: ICML (2019) Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., Duenez-Guzman, E.: Lyapunov-based safe policy optimization for continuous control. In: ICML (2019)
15.
go back to reference Di Cairano, S., Bernardini, D., Bemporad, A., Kolmanovsky, I.V.: Stochastic MPC with learning for driver-predictive vehicle control and its application to HEV energy management. IEEE Trans. Control Syst. Technol. 22(3), 1018–1031 (2013)CrossRef Di Cairano, S., Bernardini, D., Bemporad, A., Kolmanovsky, I.V.: Stochastic MPC with learning for driver-predictive vehicle control and its application to HEV energy management. IEEE Trans. Control Syst. Technol. 22(3), 1018–1031 (2013)CrossRef
16.
go back to reference Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: CoRL, pp. 1–16 (2017) Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: CoRL, pp. 1–16 (2017)
17.
go back to reference Eidehall, A., Petersson, L.: Statistical threat assessment for general road scenes using Monte Carlo sampling. IEEE Trans. Intell. Transp. Syst. 9(1), 137–147 (2008)CrossRef Eidehall, A., Petersson, L.: Statistical threat assessment for general road scenes using Monte Carlo sampling. IEEE Trans. Intell. Transp. Syst. 9(1), 137–147 (2008)CrossRef
18.
go back to reference Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019)MathSciNetCrossRef Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019)MathSciNetCrossRef
19.
go back to reference Ge, S.S., Cui, Y.J.: New potential functions for mobile robot path planning. IEEE Trans. Robot. Autom. 16(5), 615–620 (2000)CrossRef Ge, S.S., Cui, Y.J.: New potential functions for mobile robot path planning. IEEE Trans. Robot. Autom. 16(5), 615–620 (2000)CrossRef
21.
go back to reference Ji, J., Khajepour, A., Melek, W.W., Huang, Y.: Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints. IEEE Trans. Veh. Technol. 66(2), 952–964 (2016)CrossRef Ji, J., Khajepour, A., Melek, W.W., Huang, Y.: Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints. IEEE Trans. Veh. Technol. 66(2), 952–964 (2016)CrossRef
22.
go back to reference Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: CDC, pp. 6059–6066 (2018) Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: CDC, pp. 6059–6066 (2018)
23.
go back to reference Lam, C.P., Chou, C.T., Chiang, K.H., Fu, L.C.: Human-centered robot navigation-towards a harmoniously human-robot coexisting environment. T-RO 27(1), 99–112 (2010) Lam, C.P., Chou, C.T., Chiang, K.H., Fu, L.C.: Human-centered robot navigation-towards a harmoniously human-robot coexisting environment. T-RO 27(1), 99–112 (2010)
24.
go back to reference Lee, D.N.: A theory of visual control of braking based on information about time-to-collision. Perception 5(4), 437–459 (1976)CrossRef Lee, D.N.: A theory of visual control of braking based on information about time-to-collision. Perception 5(4), 437–459 (1976)CrossRef
25.
go back to reference Maciejowski, J.M.: Predictive Control: With Constraints. Pearson Education, London (2002)MATH Maciejowski, J.M.: Predictive Control: With Constraints. Pearson Education, London (2002)MATH
26.
go back to reference Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef
27.
go back to reference Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: ICML (2012) Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: ICML (2012)
28.
go back to reference Prajna, S., Jadbabaie, A., Pappas, G.J.: A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control 52(8), 1415–1428 (2007)MathSciNetCrossRef Prajna, S., Jadbabaie, A., Pappas, G.J.: A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control 52(8), 1415–1428 (2007)MathSciNetCrossRef
29.
go back to reference Rasekhipour, Y., Khajepour, A., Chen, S.K., Litkouhi, B.: A potential field-based model predictive path-planning controller for autonomous road vehicles. IEEE Trans. Intell. Transp. Syst. 18(5), 1255–1267 (2016)CrossRef Rasekhipour, Y., Khajepour, A., Chen, S.K., Litkouhi, B.: A potential field-based model predictive path-planning controller for autonomous road vehicles. IEEE Trans. Intell. Transp. Syst. 18(5), 1255–1267 (2016)CrossRef
31.
go back to reference Summers, S., Kamgarpour, M., Lygeros, J., Tomlin, C.: A stochastic reach-avoid problem with random obstacles. In: 14th International Conference on Hybrid Systems: Computation and Control, pp. 251–260 (2011) Summers, S., Kamgarpour, M., Lygeros, J., Tomlin, C.: A stochastic reach-avoid problem with random obstacles. In: 14th International Conference on Hybrid Systems: Computation and Control, pp. 251–260 (2011)
32.
go back to reference Wabersich, K.P., Zeilinger, M.N.: Linear model predictive safety certification for learning-based control. In: CDC, pp. 7130–7135 (2018) Wabersich, K.P., Zeilinger, M.N.: Linear model predictive safety certification for learning-based control. In: CDC, pp. 7130–7135 (2018)
33.
go back to reference Wang, Y., Boyd, S.: Fast model predictive control using online optimization. IEEE Trans. Control Syst. Technol. 18(2), 267–278 (2010)CrossRef Wang, Y., Boyd, S.: Fast model predictive control using online optimization. IEEE Trans. Control Syst. Technol. 18(2), 267–278 (2010)CrossRef
34.
go back to reference Weiskircher, T., Wang, Q., Ayalew, B.: Predictive guidance and control framework for (semi-) autonomous vehicles in public traffic. IEEE Trans. Control Syst. Technol. 25(6), 2034–2046 (2017)CrossRef Weiskircher, T., Wang, Q., Ayalew, B.: Predictive guidance and control framework for (semi-) autonomous vehicles in public traffic. IEEE Trans. Control Syst. Technol. 25(6), 2034–2046 (2017)CrossRef
35.
go back to reference Wolf, M.T., Burdick, J.W.: Artificial potential functions for highway driving with collision avoidance. In: ICRA, pp. 3731–3736 (2008) Wolf, M.T., Burdick, J.W.: Artificial potential functions for highway driving with collision avoidance. In: ICRA, pp. 3731–3736 (2008)
Metadata
Title
Reconnaissance for Reinforcement Learning with Safety Constraints
Authors
Shin-ichi Maeda
Hayato Watahiki
Yi Ouyang
Shintarou Okada
Masanori Koyama
Prabhat Nagarajan
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-86520-7_35

Premium Partner