Skip to main content
Erschienen in: Neural Computing and Applications 23/2023

23.12.2022 | S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots

Example-guided learning of stochastic human driving policies using deep reinforcement learning

verfasst von: Ran Emuna, Rotem Duffney, Avinoam Borowsky, Armin Biess

Erschienen in: Neural Computing and Applications | Ausgabe 23/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep reinforcement learning has been successfully applied to the generation of goal-directed behavior in artificial agents. However, existing algorithms are often not designed to reproduce human-like behavior, which may be desired in many environments, such as human–robot collaborations, social robotics and autonomous vehicles. Here we introduce a model-free and easy-to-implement deep reinforcement learning approach to mimic the stochastic behavior of a human expert by learning distributions of task variables from examples. As tractable use-cases, we study static and dynamic obstacle avoidance tasks for an autonomous vehicle on a highway road in simulation (Unity). Our control algorithm receives a feedback signal from two sources: a deterministic (handcrafted) part encoding basic task goals and a stochastic (data-driven) part that incorporates human expert knowledge. Gaussian processes are used to model human state distributions and to assess the similarity between machine and human behavior. Using this generic approach, we demonstrate that the learning agent acquires human-like driving skills and can generalize to new roads and obstacle distributions unseen during training.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
2
Video 1: Generalization I: Comparison of the performance of the vRL- and mRL-agent in an obstacle avoidance task on road track 1 with random obstacle distribution.
 
3
Video 2: Generalization II: Testing the vRL-agent in an obstacle avoidance task on road track 1 with three different obstacle distributions (A:random, B: Gaussian, C: Batch).
 
4
Video 3: Testing the vRL-agent in an overtaking task on road track 2. The same learning algorithm (PPO) and network architecture (except for the input layers) as for the obstacle avoidance task were used.
 
Literatur
2.
Zurück zum Zitat François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J et al (2018) An introduction to deep reinforcement learning. Found Trends Mach Learn 11(3–4):219–354CrossRefMATH François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J et al (2018) An introduction to deep reinforcement learning. Found Trends Mach Learn 11(3–4):219–354CrossRefMATH
3.
Zurück zum Zitat Heess N, TB D, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami S et al. (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 Heess N, TB D, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami S et al. (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:​1707.​02286
4.
Zurück zum Zitat Schwarting W, Pierson A, Alonso-Mora J, Karaman S, Rus D (2019) Social behavior for autonomous vehicles. Proc Natl Acad Sci 116(50):24972–24978MathSciNetCrossRefMATH Schwarting W, Pierson A, Alonso-Mora J, Karaman S, Rus D (2019) Social behavior for autonomous vehicles. Proc Natl Acad Sci 116(50):24972–24978MathSciNetCrossRefMATH
5.
Zurück zum Zitat Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
6.
Zurück zum Zitat Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning 37:1889–1897 Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning 37:1889–1897
7.
Zurück zum Zitat Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:​1707.​06347.
8.
Zurück zum Zitat Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. p 2672–2680 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. p 2672–2680
9.
Zurück zum Zitat Ho J, Ermon S (2016) Generative adversarial imitation learning. In: Advances in neural information processing systems, vol 26. p 4565–4573 Ho J, Ermon S (2016) Generative adversarial imitation learning. In: Advances in neural information processing systems, vol 26. p 4565–4573
10.
Zurück zum Zitat Ranney TA (1994) Models of driving behavior: a review of their evolution. Accid Anal Prev 26(6):733–750CrossRef Ranney TA (1994) Models of driving behavior: a review of their evolution. Accid Anal Prev 26(6):733–750CrossRef
11.
Zurück zum Zitat Fuller R (2005) Towards a general theory of driver behaviour. Accid Anal Prev 37(3):461–472CrossRef Fuller R (2005) Towards a general theory of driver behaviour. Accid Anal Prev 37(3):461–472CrossRef
12.
Zurück zum Zitat Plöchl M, Edelmann J (2007) Driver models in automobile dynamics application. Veh Syst Dyn 45(7–8):699–741CrossRef Plöchl M, Edelmann J (2007) Driver models in automobile dynamics application. Veh Syst Dyn 45(7–8):699–741CrossRef
13.
Zurück zum Zitat Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37:362–386 Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37:362–386
14.
Zurück zum Zitat Fridman L, Brown DE, Glazer M, Angell W, Dodd S, Jenik B, Terwilliger J, Patsekin A, Kindelsberger J, Ding L et al (2019) MIT advanced vehicle technology study: large-scale naturalistic driving study of driver behavior and interaction with automation. IEEE Access 7:102021–102038CrossRef Fridman L, Brown DE, Glazer M, Angell W, Dodd S, Jenik B, Terwilliger J, Patsekin A, Kindelsberger J, Ding L et al (2019) MIT advanced vehicle technology study: large-scale naturalistic driving study of driver behavior and interaction with automation. IEEE Access 7:102021–102038CrossRef
15.
Zurück zum Zitat Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst
16.
Zurück zum Zitat Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst 22(2):712–733 Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst 22(2):712–733
17.
Zurück zum Zitat Zhu Z, Zhao H (2021) A survey of deep rl and il for autonomous driving policy learning. IEEE Trans Intell Transp Syst Zhu Z, Zhao H (2021) A survey of deep rl and il for autonomous driving policy learning. IEEE Trans Intell Transp Syst
18.
Zurück zum Zitat Peng XB, Abbeel P, Levine S, van de Panne M (2018) Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Graph (TOG) 37(4):143 Peng XB, Abbeel P, Levine S, van de Panne M (2018) Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Graph (TOG) 37(4):143
19.
Zurück zum Zitat Lu C, Wang H, Lv C, Gong J, Xi J, Cao D (2018) Learning driver-specific behavior for overtaking: a combined learning framework. IEEE Trans Veh Technol 67(8):6788–6802CrossRef Lu C, Wang H, Lv C, Gong J, Xi J, Cao D (2018) Learning driver-specific behavior for overtaking: a combined learning framework. IEEE Trans Veh Technol 67(8):6788–6802CrossRef
20.
Zurück zum Zitat Zhu M, Wang X, Wang Y (2018) Human-like autonomous car-following model with deep reinforcement learning. Transport Res Part C 97:348–368CrossRef Zhu M, Wang X, Wang Y (2018) Human-like autonomous car-following model with deep reinforcement learning. Transport Res Part C 97:348–368CrossRef
21.
Zurück zum Zitat Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J et al (2018) An algorithmic perspective on imitation learning. Founda Trends Robot 7(1–2):1–179 Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J et al (2018) An algorithmic perspective on imitation learning. Founda Trends Robot 7(1–2):1–179
22.
Zurück zum Zitat Ng A.Y, Russell SJ (2000) et al. (2000) Algorithms for inverse reinforcement learning. ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, 663–670 Ng A.Y, Russell SJ (2000) et al. (2000) Algorithms for inverse reinforcement learning. ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, 663–670
23.
Zurück zum Zitat Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. ICML '04: Proceedings of the twenty-first International Conference on Machine Learning, 2004 Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. ICML '04: Proceedings of the twenty-first International Conference on Machine Learning, 2004
25.
Zurück zum Zitat Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with gaussian processes. In: Advances in Neural Information Processing Systems vol 24. p 19–27 Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with gaussian processes. In: Advances in Neural Information Processing Systems vol 24. p 19–27
26.
29.
Zurück zum Zitat Leung K, Schmerling E, Pavone M (2016) Distributional prediction of human driving behaviours using mixture density networks. Stanford University, Stanford Leung K, Schmerling E, Pavone M (2016) Distributional prediction of human driving behaviours using mixture density networks. Stanford University, Stanford
30.
Zurück zum Zitat Borrelli F, Falcone P, Keviczky T, Asgari J, Hrovat D (2005) MPC-based approach to active steering for autonomous vehicle systems. Int J Veh Auton Syst 3(2):265–291CrossRef Borrelli F, Falcone P, Keviczky T, Asgari J, Hrovat D (2005) MPC-based approach to active steering for autonomous vehicle systems. Int J Veh Auton Syst 3(2):265–291CrossRef
31.
Zurück zum Zitat Kong J, Pfeiffer M, Schildbach G, Borrelli F (2015) Kinematic and dynamic vehicle models for autonomous driving control design. In: 2015 IEEE Intelligent vehicles symposium (IV), 1094–1099 Kong J, Pfeiffer M, Schildbach G, Borrelli F (2015) Kinematic and dynamic vehicle models for autonomous driving control design. In: 2015 IEEE Intelligent vehicles symposium (IV), 1094–1099
32.
Zurück zum Zitat Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, CambridgeMATH Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, CambridgeMATH
33.
Zurück zum Zitat Bishop C.M (1994) Mixture density networks. Neural Computing Research Group Report: NCRG/94/004 Bishop C.M (1994) Mixture density networks. Neural Computing Research Group Report: NCRG/94/004
34.
Zurück zum Zitat Zolna K, Reed S, Novikov A, Colmenarej SG, Budden D, Cabi S, Denil M, de Freitas N, Wang Z (2019) Task-relevant adversarial imitation learning. arXiv preprint arXiv:1910.01077 Zolna K, Reed S, Novikov A, Colmenarej SG, Budden D, Cabi S, Denil M, de Freitas N, Wang Z (2019) Task-relevant adversarial imitation learning. arXiv preprint arXiv:​1910.​01077
35.
Zurück zum Zitat Peng XB, Kanazawa A, Toyer S, Abbeel P, Levine S (2018) Variational discriminator bottleneck: Improving imitation learning, inverse RL, and GANs by constraining information flow. arXiv preprint arXiv:1810.00821 Peng XB, Kanazawa A, Toyer S, Abbeel P, Levine S (2018) Variational discriminator bottleneck: Improving imitation learning, inverse RL, and GANs by constraining information flow. arXiv preprint arXiv:​1810.​00821
36.
Zurück zum Zitat Wang R, Ciliberto C, Amadori PV, Demiris Y (2019) Random expert distillation: Imitation learning via expert policy support estimation. In: International Conference on Machine Learning, PMLR Vol 97. p 6536–6544 Wang R, Ciliberto C, Amadori PV, Demiris Y (2019) Random expert distillation: Imitation learning via expert policy support estimation. In: International Conference on Machine Learning, PMLR Vol 97. p 6536–6544
37.
Zurück zum Zitat Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2018) Quantifying generalization in reinforcement learning. arXiv preprint arXiv:1812.02341 Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2018) Quantifying generalization in reinforcement learning. arXiv preprint arXiv:​1812.​02341
Metadaten
Titel
Example-guided learning of stochastic human driving policies using deep reinforcement learning
verfasst von
Ran Emuna
Rotem Duffney
Avinoam Borowsky
Armin Biess
Publikationsdatum
23.12.2022
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 23/2023
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-022-07947-2

Weitere Artikel der Ausgabe 23/2023

Neural Computing and Applications 23/2023 Zur Ausgabe

S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots

Explainable reinforcement learning for broad-XAI: a conceptual framework and survey

S.I : Human-aligned Reinforcement Learning for Autonomous Agents and Robots

Communicative capital: a key resource for human–machine shared agency and collaborative capacity

Premium Partner