nach oben

Neural Computing and Applications

Erschienen in:

23.12.2022 | S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots

Example-guided learning of stochastic human driving policies using deep reinforcement learning

verfasst von: Ran Emuna, Rotem Duffney, Avinoam Borowsky, Armin Biess

Erschienen in: Neural Computing and Applications | Ausgabe 23/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Deep reinforcement learning has been successfully applied to the generation of goal-directed behavior in artificial agents. However, existing algorithms are often not designed to reproduce human-like behavior, which may be desired in many environments, such as human–robot collaborations, social robotics and autonomous vehicles. Here we introduce a model-free and easy-to-implement deep reinforcement learning approach to mimic the stochastic behavior of a human expert by learning distributions of task variables from examples. As tractable use-cases, we study static and dynamic obstacle avoidance tasks for an autonomous vehicle on a highway road in simulation (Unity). Our control algorithm receives a feedback signal from two sources: a deterministic (handcrafted) part encoding basic task goals and a stochastic (data-driven) part that incorporates human expert knowledge. Gaussian processes are used to model human state distributions and to assess the similarity between machine and human behavior. Using this generic approach, we demonstrate that the learning agent acquires human-like driving skills and can generalize to new roads and obstacle distributions unseen during training.

Vorheriger Artikel Policy regularization for legible behavior

Nächster Artikel Communicative capital: a key resource for human–machine shared agency and collaborative capacity

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

https://github.com/emunaran/stochastic-human-driving-policies-drl

Video 1: Generalization I: Comparison of the performance of the vRL- and mRL-agent in an obstacle avoidance task on road track 1 with random obstacle distribution.

Video 2: Generalization II: Testing the vRL-agent in an obstacle avoidance task on road track 1 with three different obstacle distributions (A:random, B: Gaussian, C: Batch).

Video 3: Testing the vRL-agent in an overtaking task on road track 2. The same learning algorithm (PPO) and network architecture (except for the input layers) as for the obstacle avoidance task were used.

Li Y (2017) Deep reinforcement learning: an overview. arXiv preprint arXiv:1701.07274.

François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J et al (2018) An introduction to deep reinforcement learning. Found Trends Mach Learn 11(3–4):219–354CrossRefMATH

Heess N, TB D, Sriram S, Lemmon J, Merel J, Wayne G, Tassa Y, Erez T, Wang Z, Eslami S et al. (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286

Schwarting W, Pierson A, Alonso-Mora J, Karaman S, Rus D (2019) Social behavior for autonomous vehicles. Proc Natl Acad Sci 116(50):24972–24978MathSciNetCrossRefMATH

Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York

Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. ICML'15: Proceedings of the 32nd International Conference on International Conference on Machine Learning 37:1889–1897

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. p 2672–2680

Ho J, Ermon S (2016) Generative adversarial imitation learning. In: Advances in neural information processing systems, vol 26. p 4565–4573

10.

Ranney TA (1994) Models of driving behavior: a review of their evolution. Accid Anal Prev 26(6):733–750CrossRef

11.

Fuller R (2005) Towards a general theory of driver behaviour. Accid Anal Prev 37(3):461–472CrossRef

12.

Plöchl M, Edelmann J (2007) Driver models in automobile dynamics application. Veh Syst Dyn 45(7–8):699–741CrossRef

13.

Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37:362–386

14.

Fridman L, Brown DE, Glazer M, Angell W, Dodd S, Jenik B, Terwilliger J, Patsekin A, Kindelsberger J, Ding L et al (2019) MIT advanced vehicle technology study: large-scale naturalistic driving study of driver behavior and interaction with automation. IEEE Access 7:102021–102038CrossRef

15.

Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst

16.

Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Trans Intell Transp Syst 22(2):712–733

17.

Zhu Z, Zhao H (2021) A survey of deep rl and il for autonomous driving policy learning. IEEE Trans Intell Transp Syst

18.

Peng XB, Abbeel P, Levine S, van de Panne M (2018) Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Graph (TOG) 37(4):143

19.

Lu C, Wang H, Lv C, Gong J, Xi J, Cao D (2018) Learning driver-specific behavior for overtaking: a combined learning framework. IEEE Trans Veh Technol 67(8):6788–6802CrossRef

20.

Zhu M, Wang X, Wang Y (2018) Human-like autonomous car-following model with deep reinforcement learning. Transport Res Part C 97:348–368CrossRef

21.

Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J et al (2018) An algorithmic perspective on imitation learning. Founda Trends Robot 7(1–2):1–179

22.

Ng A.Y, Russell SJ (2000) et al. (2000) Algorithms for inverse reinforcement learning. ICML '00: Proceedings of the Seventeenth International Conference on Machine Learning, 663–670

23.

Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. ICML '04: Proceedings of the twenty-first International Conference on Machine Learning, 2004

24.

Kuderer M, Gulati S, Burgard W (2015) Learning driving styles for autonomous vehicles from demonstration. In: 2015 IEEE International Conference on Robotics and Automation (ICRA). p 2641–2646. https://doi.org/10.1109/ICRA.2015.7139555

25.

Levine S, Popovic Z, Koltun V (2011) Nonlinear inverse reinforcement learning with gaussian processes. In: Advances in Neural Information Processing Systems vol 24. p 19–27

26.

Levine S, Koltun V (2012) Continuous inverse optimal control with locally optimal examples. arXiv preprint arXiv:1206.4617

27.

Udacity: (2017) Udacity’s self-driving car simulator. https://github.com/udacity/self-driving-car-sim

28.

Udacity: (2017) Self-driving car engineer nanodegree program. https://github.com/udacity/CarND-Path-Planning-Project

29.

Leung K, Schmerling E, Pavone M (2016) Distributional prediction of human driving behaviours using mixture density networks. Stanford University, Stanford

30.

Borrelli F, Falcone P, Keviczky T, Asgari J, Hrovat D (2005) MPC-based approach to active steering for autonomous vehicle systems. Int J Veh Auton Syst 3(2):265–291CrossRef

31.

Kong J, Pfeiffer M, Schildbach G, Borrelli F (2015) Kinematic and dynamic vehicle models for autonomous driving control design. In: 2015 IEEE Intelligent vehicles symposium (IV), 1094–1099

32.

Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, CambridgeMATH

33.

Bishop C.M (1994) Mixture density networks. Neural Computing Research Group Report: NCRG/94/004

34.

Zolna K, Reed S, Novikov A, Colmenarej SG, Budden D, Cabi S, Denil M, de Freitas N, Wang Z (2019) Task-relevant adversarial imitation learning. arXiv preprint arXiv:1910.01077

35.

Peng XB, Kanazawa A, Toyer S, Abbeel P, Levine S (2018) Variational discriminator bottleneck: Improving imitation learning, inverse RL, and GANs by constraining information flow. arXiv preprint arXiv:1810.00821

36.

Wang R, Ciliberto C, Amadori PV, Demiris Y (2019) Random expert distillation: Imitation learning via expert policy support estimation. In: International Conference on Machine Learning, PMLR Vol 97. p 6536–6544

37.

Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2018) Quantifying generalization in reinforcement learning. arXiv preprint arXiv:1812.02341

Titel: Example-guided learning of stochastic human driving policies using deep reinforcement learning
verfasst von: Ran Emuna
Rotem Duffney
Avinoam Borowsky
Armin Biess
Publikationsdatum: 23.12.2022
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 23/2023
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-022-07947-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 23/2023

A survey on deep learning models for detection of COVID-19

RePReL: a unified framework for integrating relational planning and reinforcement learning for effective abstraction in discrete and continuous domains

Explainable reinforcement learning for broad-XAI: a conceptual framework and survey

Communicative capital: a key resource for human–machine shared agency and collaborative capacity

Time-series benchmarks based on frequency features for fair comparative evaluation

A data-driven intelligent decision support system that combines predictive and prescriptive analytics for the design of new textile fabrics

Premium Partner