Skip to main content

2021 | OriginalPaper | Buchkapitel

Human-in-the-Loop Learning Methods Toward Safe DL-Based Autonomous Systems: A Review

verfasst von : Prajit T. Rajendran, Huascar Espinoza, Agnes Delaborde, Chokri Mraidha

Erschienen in: Computer Safety, Reliability, and Security. SAFECOMP 2021 Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The involvement of humans during the training phase can play a crucial role in mitigating some safety issues of Deep learning (DL)-based autonomous systems. This paper reviews the main concepts and methods for human-in-the-loop learning as a first step towards the development of a framework for human-machine teaming through safe learning and anomaly prediction. The methods come with their own set of challenges such as the variation in the training data provided by the human and test-time distributions, the cost involved to keep the human in the loop during the long training phase and the imperfection of the human to deal with unforeseen circumstances and define safer policies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Heuilleta, A., Couthouis, F., et al.: Explainability in deep reinforcement learning. Knowl.-Based Syst. 214, 106685 (2020) CrossRef Heuilleta, A., Couthouis, F., et al.: Explainability in deep reinforcement learning. Knowl.-Based Syst. 214, 106685 (2020) CrossRef
2.
Zurück zum Zitat Papernot, N., McDaniel, P., et al.: The limitations of deep learning in adversarial settings. In: 1st IEEE European Symposium on Security and Privacy, Saarbrucken, Germany. IEEE (2016) Papernot, N., McDaniel, P., et al.: The limitations of deep learning in adversarial settings. In: 1st IEEE European Symposium on Security and Privacy, Saarbrucken, Germany. IEEE (2016)
3.
Zurück zum Zitat Ramakrishnan, R., Kamar, E., Nushi, B., Dey, D., Shah, J., Horvitz, E.: Overcoming blind spots in the real world: leveraging complementary abilities for joint execution. In: AAAI, pp. 6137–6145 (2019) Ramakrishnan, R., Kamar, E., Nushi, B., Dey, D., Shah, J., Horvitz, E.: Overcoming blind spots in the real world: leveraging complementary abilities for joint execution. In: AAAI, pp. 6137–6145 (2019)
4.
Zurück zum Zitat SAE International, Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles, SAE International (J3016) (2018) SAE International, Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles, SAE International (J3016) (2018)
5.
Zurück zum Zitat International Organization for Standardization: ISO 26262-1:2018 Road vehicles - Functional safety. Standard, International Organization for Standardization, Geneva, CH (2018) International Organization for Standardization: ISO 26262-1:2018 Road vehicles - Functional safety. Standard, International Organization for Standardization, Geneva, CH (2018)
6.
Zurück zum Zitat International Organization for Standardization: ISO/PAS 21448:2019 Road vehicles - Safety of the intended functionality. Standard, International Organization for Standardization, Geneva, CH (2019) International Organization for Standardization: ISO/PAS 21448:2019 Road vehicles - Safety of the intended functionality. Standard, International Organization for Standardization, Geneva, CH (2019)
7.
Zurück zum Zitat Systems and software engineering — Vocabulary, ISO/IEC/IEEE 24765:2017 (2017) Systems and software engineering — Vocabulary, ISO/IEC/IEEE 24765:2017 (2017)
8.
Zurück zum Zitat Arnez, F., Espinoza, H., et al.: A comparison of uncertainty estimation approaches in deep learning components for autonomous vehicle applications. In: Workshop AISafety 2020 - Workshop in Artificial Intelligence Safety (2020) Arnez, F., Espinoza, H., et al.: A comparison of uncertainty estimation approaches in deep learning components for autonomous vehicle applications. In: Workshop AISafety 2020 - Workshop in Artificial Intelligence Safety (2020)
9.
Zurück zum Zitat Lakkaraju, H., Kamar, E., et al.: Identifying unknown unknowns in the open world: representations and policies for guided exploration. In: NIPS Workshop on Reliability in ML (2016) Lakkaraju, H., Kamar, E., et al.: Identifying unknown unknowns in the open world: representations and policies for guided exploration. In: NIPS Workshop on Reliability in ML (2016)
10.
Zurück zum Zitat McAllister, R., Kahn, G., et al.: Robustness to out-of-distribution inputs via task-aware generative uncertainty. In: International Conference on Robotics and Automation (ICRA), Palais des congres de Montreal, Montreal, Canada, 20–24 May 2019 (2019) McAllister, R., Kahn, G., et al.: Robustness to out-of-distribution inputs via task-aware generative uncertainty. In: International Conference on Robotics and Automation (ICRA), Palais des congres de Montreal, Montreal, Canada, 20–24 May 2019 (2019)
11.
Zurück zum Zitat Geiger, A., Liu, D., et al.: TadGAN: time series anomaly detection using generative adversarial networks. In: IEEE International Conference on Big Data (Big Data) Atlanta, Georgia, USA, 10–13 December 2020 (2020) Geiger, A., Liu, D., et al.: TadGAN: time series anomaly detection using generative adversarial networks. In: IEEE International Conference on Big Data (Big Data) Atlanta, Georgia, USA, 10–13 December 2020 (2020)
13.
Zurück zum Zitat Waytowich, N.R., Goecks, V.G., et al.: Cycle-of-Learning for Autonomous Systems from Human Interaction. arXiv preprint arXiv:​1808.​09572 (2018) Waytowich, N.R., Goecks, V.G., et al.: Cycle-of-Learning for Autonomous Systems from Human Interaction. arXiv preprint arXiv:​1808.​09572 (2018)
14.
Zurück zum Zitat Goecks, V.G.: Human-in-the-loop methods for data-driven and reinforcement learning systems. Ph.D. thesis (2020) Goecks, V.G.: Human-in-the-loop methods for data-driven and reinforcement learning systems. Ph.D. thesis (2020)
15.
Zurück zum Zitat Settles, B.: Active learning literature survey, Computer Sciences Technical report 1648 University of Wisconsin-Madison (2010) Settles, B.: Active learning literature survey, Computer Sciences Technical report 1648 University of Wisconsin-Madison (2010)
16.
Zurück zum Zitat Druck, G., Settles, B., McCallum, A.: Active learning by labeling features. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 81–90. ACL Press (2009) Druck, G., Settles, B., McCallum, A.: Active learning by labeling features. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 81–90. ACL Press (2009)
17.
Zurück zum Zitat Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective samping using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997) CrossRef Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective samping using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997) CrossRef
18.
Zurück zum Zitat Torabi, F., Warnell, G., et al.: Behavioral cloning from observation. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, July 2018 Torabi, F., Warnell, G., et al.: Behavioral cloning from observation. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, July 2018
19.
Zurück zum Zitat Goecks, V.G., Gremillion, G.M., et al.: Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments. In: International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2020), Auckland, New Zealand, 9–13 May 2020 (2020) Goecks, V.G., Gremillion, G.M., et al.: Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments. In: International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2020), Auckland, New Zealand, 9–13 May 2020 (2020)
20.
Zurück zum Zitat Farag, W., Saleh, Z., et al.: Behavior cloning for autonomous driving using convolutional neural networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, July 2018 Farag, W., Saleh, Z., et al.: Behavior cloning for autonomous driving using convolutional neural networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), Stockholm, Sweden, July 2018
21.
Zurück zum Zitat Duan, Y., Andrychowicz, M., et al.: One-shot imitation learning. In: Advances in Neural Information Processing Systems 30 (NIPS 2017) (2017) Duan, Y., Andrychowicz, M., et al.: One-shot imitation learning. In: Advances in Neural Information Processing Systems 30 (NIPS 2017) (2017)
22.
Zurück zum Zitat Codevilla, F., Muller, M., et al.: End-to-end driving via conditional imitation learning. In: IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Queensland, Australia, 21–25 May 2018 (2018) Codevilla, F., Muller, M., et al.: End-to-end driving via conditional imitation learning. In: IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Queensland, Australia, 21–25 May 2018 (2018)
23.
Zurück zum Zitat Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning (2004) Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning (2004)
25.
Zurück zum Zitat Schulman, J., Levine, S., et al.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol. 37, pp. 1889–1897, July 2015 Schulman, J., Levine, S., et al.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol. 37, pp. 1889–1897, July 2015
26.
Zurück zum Zitat Lacotte, J., Ghavamzadeh, M., et al.: Risk-sensitive generative adversarial imitation learning. In: The 21st International Conference on Artificial Intelligence and Statistics (AISTATS), Lanzarote, Canary Islands, 9–11 April 2018 (2018) Lacotte, J., Ghavamzadeh, M., et al.: Risk-sensitive generative adversarial imitation learning. In: The 21st International Conference on Artificial Intelligence and Statistics (AISTATS), Lanzarote, Canary Islands, 9–11 April 2018 (2018)
28.
Zurück zum Zitat Zołna, K., Reed, S., et al.: Task-relevant adversarial imitation learning. In: Conference on Robot Learning (CoRL), 16–18 November 2020 (2020) Zołna, K., Reed, S., et al.: Task-relevant adversarial imitation learning. In: Conference on Robot Learning (CoRL), 16–18 November 2020 (2020)
29.
Zurück zum Zitat Knox, W.B., Stone, P.: TAMER: training an agent manually via evaluative reinforcement. In: The 7th IEEE International Conference on Development and Learning, pp. 292–297 (2008) Knox, W.B., Stone, P.: TAMER: training an agent manually via evaluative reinforcement. In: The 7th IEEE International Conference on Development and Learning, pp. 292–297 (2008)
30.
Zurück zum Zitat Knox, W.B., Stone, P., et al.: Learning from feedback on actions past and intended. In: The 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Boston, Massachusetts, USA, 5–8 March 2012 (2012) Knox, W.B., Stone, P., et al.: Learning from feedback on actions past and intended. In: The 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Boston, Massachusetts, USA, 5–8 March 2012 (2012)
31.
Zurück zum Zitat Vien, N.A., Ertel, W.: Reinforcement learning combined with human feedback in continuous state and action spaces. In: IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL) SSan Diego, California, USA, 7–9 November 2012 (2012) Vien, N.A., Ertel, W.: Reinforcement learning combined with human feedback in continuous state and action spaces. In: IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL) SSan Diego, California, USA, 7–9 November 2012 (2012)
32.
Zurück zum Zitat Warnell, G., Waytowich, N., et al.: Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces. arXiv preprint arXiv:​1709.​10163 (2017) Warnell, G., Waytowich, N., et al.: Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces. arXiv preprint arXiv:​1709.​10163 (2017)
33.
Zurück zum Zitat Arakawa, R., Kobayashi, S., et al.: DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback. arXiv preprint arXiv:​1810.​11748 (2018) Arakawa, R., Kobayashi, S., et al.: DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback. arXiv preprint arXiv:​1810.​11748 (2018)
35.
Zurück zum Zitat Saunders, W., Sastry, G., et al.: Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. arXiv preprint arXiv:​1707.​05173v1 (2017) Saunders, W., Sastry, G., et al.: Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. arXiv preprint arXiv:​1707.​05173v1 (2017)
36.
Zurück zum Zitat Prakash, B., Khatwani, M., et al.: Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention. arXiv preprint arXiv:​1903.​09328 (2019) Prakash, B., Khatwani, M., et al.: Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention. arXiv preprint arXiv:​1903.​09328 (2019)
37.
Zurück zum Zitat Jevtic, A., Colomé, A., et al.: Robot motion adaptation through user intervention and reinforcement learning. Pattern Recogn. Lett. 105(1), 67–75 (2018) CrossRef Jevtic, A., Colomé, A., et al.: Robot motion adaptation through user intervention and reinforcement learning. Pattern Recogn. Lett. 105(1), 67–75 (2018) CrossRef
38.
Zurück zum Zitat Ross, S., Gordon, G.J.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) 2011, Fort Lauderdale, FL, USA. Volume 15 of JMLR: W&CP 15 (2011) Ross, S., Gordon, G.J.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) 2011, Fort Lauderdale, FL, USA. Volume 15 of JMLR: W&CP 15 (2011)
39.
Zurück zum Zitat Zhang, J., Cho, K.: Query-efficient imitation learning for end-to-end simulated driving. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017) (2017) Zhang, J., Cho, K.: Query-efficient imitation learning for end-to-end simulated driving. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017) (2017)
40.
Zurück zum Zitat Menda, K., Driggs-Campbell, K., et al.: EnsembleDAgger: a Bayesian approach to safe imitation learning. In: Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019) (2019) Menda, K., Driggs-Campbell, K., et al.: EnsembleDAgger: a Bayesian approach to safe imitation learning. In: Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019) (2019)
Metadaten
Titel
Human-in-the-Loop Learning Methods Toward Safe DL-Based Autonomous Systems: A Review
verfasst von
Prajit T. Rajendran
Huascar Espinoza
Agnes Delaborde
Chokri Mraidha
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-83906-2_20