nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Reconnaissance for Reinforcement Learning with Safety Constraints

verfasst von : Shin-ichi Maeda, Hayato Watahiki, Yi Ouyang, Shintarou Okada, Masanori Koyama, Prabhat Nagarajan

Erschienen in: Machine Learning and Knowledge Discovery in Databases. Research Track

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As RL algorithms have grown more powerful and sophisticated, they show promise for several practical applications in the real world. However, safety is a necessary prerequisite to deploying RL systems in real world domains such as autonomous vehicles or cooperative robotics. Safe RL problems are often formulated as constrained Markov decision processes (CMDPs). In particular, solving CMDPs becomes challenging when safety must be ensured in rare, dangerous situations in stochastic environments. In this paper, we propose an approach for CMDPs where we have access to a generative model (e.g. a simulator) that can preferentially sample rare, dangerous events. In particular, our approach, termed the RP algorithm decomposes the CMDP into a pair of MDPs which we term a reconnaissance MDP (R-MDP) and a planning MDP (P-MDP). In the R-MDP, we leverage the generative model to preferentially sample rare, dangerous events and train a threat function, the Q-function analog of danger that can determine the safety level of a given state-action pair. In the P-MDP, we train a reward-seeking policy while using the trained threat function to ensure that the agent considers only safe actions. We show that our approach, termed the RP algorithm enjoys several useful theoretical properties. Moreover, we present an approximate version of the RP algorithm that can significantly reduce the difficulty of solving the R-MDP. We demonstrate the efficacy of our method over classical approaches in multiple tasks, including a collision-free navigation task with dynamic obstacles.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Embedding Knowledge Graphs Attentive to Positional and Centrality Qualities

Nächstes Kapitel VeriDL: Integrity Verification of Outsourced Deep Learning Services

Total variation distance is defined as \(d_{TV}(p(a),q(a)) = \frac{1}{2}\sum _a |p(a)-q(a)|\).

The code and videos are available at https://github.com/pfnet-research/rp-safe-rl.

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: ICML, pp. 22–31 (2017)

Akametalu, A.K., Fisac, J.F., Gillula, J.H., Kaynama, S., Zeilinger, M.N., Tomlin, C.J.: Reachability-based safe learning with Gaussian processes. In: CDC, pp. 1424–1431 (2014)

Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)MATH

Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: ECC, pp. 3420–3431 (2019)

Bansal, S., Chen, M., Herbert, S.L., Tomlin, C.J.: Hamilton-Jacobi reachability: a brief overview and recent advances. In: CDC, pp. 2242–2253 (2017)

Blake, R.J., Mayne David, Q.: Model Predictive Control: Theory and Design. Nob Hill Pub., Madison (2009)

Brockman, G., et al.: OpenAI gym (2016)

Cetin, O., Kurnaz, S., Kaynak, O., Temeltas, H.: Potential field-based navigation task for autonomous flight control of unmanned aerial vehicles. Int. J. Autom. Control 5(1), 1–21 (2011)CrossRef

Chang, P., Mertz, C.: Monte Carlo sampling based imminent collision detection algorithm. In: ICTIS, pp. 368–376 (2017)

10.

Chen, M., Herbert, S., Tomlin, C.J.: Fast reachable set approximations via state decoupling disturbances. In: CDC, pp. 191–196 (2016)

11.

Chen, M., Herbert, S., Tomlin, C.J.: Exact and efficient Hamilton-Jacobi-based guaranteed safety analysis via system decomposition. In: ICRA (2017)

12.

Chow, Y., Ghavamzadeh, M., Janson, L., Pavone, M.: Risk-constrained reinforcement learning with percentile risk criteria. JMLR (2018)

13.

Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A Lyapunov-based approach to safe reinforcement learning. In: NeurIPS (2018)

14.

Chow, Y., Nachum, O., Faust, A., Ghavamzadeh, M., Duenez-Guzman, E.: Lyapunov-based safe policy optimization for continuous control. In: ICML (2019)

15.

Di Cairano, S., Bernardini, D., Bemporad, A., Kolmanovsky, I.V.: Stochastic MPC with learning for driver-predictive vehicle control and its application to HEV energy management. IEEE Trans. Control Syst. Technol. 22(3), 1018–1031 (2013)CrossRef

16.

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: CoRL, pp. 1–16 (2017)

17.

Eidehall, A., Petersson, L.: Statistical threat assessment for general road scenes using Monte Carlo sampling. IEEE Trans. Intell. Transp. Syst. 9(1), 137–147 (2008)CrossRef

18.

Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., Kaynama, S., Gillula, J., Tomlin, C.J.: A general safety framework for learning-based control in uncertain robotic systems. IEEE Trans. Autom. Control 64(7), 2737–2752 (2019)MathSciNetCrossRef

19.

Ge, S.S., Cui, Y.J.: New potential functions for mobile robot path planning. IEEE Trans. Robot. Autom. 16(5), 615–620 (2000)CrossRef

20.

Geibel, P.: Reinforcement learning for MDPs with constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 646–653. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_63CrossRef

21.

Ji, J., Khajepour, A., Melek, W.W., Huang, Y.: Path planning and tracking for vehicle collision avoidance based on model predictive control with multiconstraints. IEEE Trans. Veh. Technol. 66(2), 952–964 (2016)CrossRef

22.

Koller, T., Berkenkamp, F., Turchetta, M., Krause, A.: Learning-based model predictive control for safe exploration. In: CDC, pp. 6059–6066 (2018)

23.

Lam, C.P., Chou, C.T., Chiang, K.H., Fu, L.C.: Human-centered robot navigation-towards a harmoniously human-robot coexisting environment. T-RO 27(1), 99–112 (2010)

24.

Lee, D.N.: A theory of visual control of braking based on information about time-to-collision. Perception 5(4), 437–459 (1976)CrossRef

25.

Maciejowski, J.M.: Predictive Control: With Constraints. Pearson Education, London (2002)MATH

26.

Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef

27.

Moldovan, T.M., Abbeel, P.: Safe exploration in Markov decision processes. In: ICML (2012)

28.

Prajna, S., Jadbabaie, A., Pappas, G.J.: A framework for worst-case and stochastic safety verification using barrier certificates. IEEE Trans. Autom. Control 52(8), 1415–1428 (2007)MathSciNetCrossRef

29.

Rasekhipour, Y., Khajepour, A., Chen, S.K., Litkouhi, B.: A potential field-based model predictive path-planning controller for autonomous road vehicles. IEEE Trans. Intell. Transp. Syst. 18(5), 1255–1267 (2016)CrossRef

30.

Maeda, S., Watahiki, H., Ouyang, Y., Okada, S., Koyama, M., Nagarajan, P.: Supplementary of reconnaissance for reinforcement learning with safety constraints (2021). https://github.com/pfnet-research/rp-safe-rl

31.

Summers, S., Kamgarpour, M., Lygeros, J., Tomlin, C.: A stochastic reach-avoid problem with random obstacles. In: 14th International Conference on Hybrid Systems: Computation and Control, pp. 251–260 (2011)

32.

Wabersich, K.P., Zeilinger, M.N.: Linear model predictive safety certification for learning-based control. In: CDC, pp. 7130–7135 (2018)

33.

Wang, Y., Boyd, S.: Fast model predictive control using online optimization. IEEE Trans. Control Syst. Technol. 18(2), 267–278 (2010)CrossRef

34.

Weiskircher, T., Wang, Q., Ayalew, B.: Predictive guidance and control framework for (semi-) autonomous vehicles in public traffic. IEEE Trans. Control Syst. Technol. 25(6), 2034–2046 (2017)CrossRef

35.

Wolf, M.T., Burdick, J.W.: Artificial potential functions for highway driving with collision avoidance. In: ICRA, pp. 3731–3736 (2008)

Titel: Reconnaissance for Reinforcement Learning with Safety Constraints
verfasst von: Shin-ichi Maeda
Hayato Watahiki
Yi Ouyang
Shintarou Okada
Masanori Koyama
Prabhat Nagarajan
Verlag: Springer International Publishing
Buch: Machine Learning and Knowledge Discovery in Databases. Research Track
Print ISBN: 978-3-030-86519-1

Electronic ISBN: 978-3-030-86520-7

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-86520-7_35

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner