Top

Autonomous Agents and Multi-Agent Systems

Published in:

01-04-2021

I2RL: online inverse reinforcement learning under occlusion

Authors: Saurabh Arora, Prashant Doshi, Bikramjit Banerjee

Published in: Autonomous Agents and Multi-Agent Systems | Issue 1/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from observing its behavior on a task. It inverts RL which focuses on learning an agent’s behavior on a task based on the reward signals received. IRL is witnessing sustained attention due to promising applications in robotics, computer games, and finance, as well as in other sectors. Methods for IRL have, for the most part, focused on batch settings where the observed agent’s behavioral data has already been collected. However, the related problem of online IRL—where observations are incrementally accrued, yet the real-time demands of the application often prohibit a full rerun of an IRL method—has received significantly less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), which can serve as a common ground for online IRL methods. We demonstrate the usefulness of this framework by casting existing online IRL techniques into this framework. Importantly, we present a new method that advances maximum entropy IRL with hidden variables to the online setting. Our analysis shows that the new method has monotonically improving performance with more demonstration data as well as probabilistically bounded error, both under full and partial observability. Simulated and physical robot experiments in a multi-robot patrolling application situated in varied-sized worlds, which involves learning under high levels of occlusion, show a significantly improved performance of I2RL as compared to both batch IRL and an online imitation learning method.

previous article Electric vehicle charging strategy study and the application on charging station placement

next article Irrelevant matches in round-robin tournaments

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Repeated trajectories in a demonstration can usually be excluded for many methods without affecting the learning.

This assumption holds when each session starts from the same state and the trajectories are produced by the expert’s fixed policy. In case of occlusion, even though inferring the hidden portion Z of a trajectory \(X \in {\mathscr {X}}_i\), is influenced by the visible portion, Y, this does not make the trajectories necessarily dependent on each other.

As more trajectory data is provided to GAIL, the accuracy of the expert’s estimated occupancy measure for the occluded state-action pairs improves. This helps GAIL in achieving its objective of minimizing the regularized cost.

Abbeel, P., & Ng, A.Y. (2004). Apprenticeship learning via inverse reinforcement learning. In Twenty-first international conference on machine learning (ICML), pp. 1–8.

Aghasadeghi, N., & Bretl, T. (2011). Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals. In: 2011 IEEE/RSJ International conference on intelligent robots and systems, pp. 1561–1566.

Amin, K., Jiang, N., & Singh, S. (2017). Repeated inverse reinforcement learning. In Advances in neural information processing systems, pp. 1815–1824.

Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483.CrossRef

Arora, S., & Doshi, P. (2018). A survey of inverse reinforcement learning: Challenges, methods and progress. CoRR arXiv:1806.06877

Arora, S., Doshi, P., & Banerjee, B. (2019). Online inverse reinforcement learning under occlusion. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, pp. 1170–1178. International Foundation for Autonomous Agents and Multiagent Systems

Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R.E. (2000). Gambling in a rigged casino: The adversarial multi-armed bandit problem. Electronic Colloquium on Computational Complexity (ECCC) 7(68).

Babes-Vroman, M., Marivate, V., Subramanian, K., & Littman, M. (2011). Apprenticeship learning about multiple intentions. In 28th International conference on machine learning (ICML), pp. 897–904.

Bogert, K., & Doshi, P. (2014). Multi-robot inverse reinforcement learning under occlusion with interactions. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’14, pp. 173–180.

10.

Bogert, K., & Doshi, P. (2015). Toward estimating others’ transition models under occlusion for multi-robot irl. In 24th International joint conference on artificial intelligence (IJCAI), pp. 1867–1873.

11.

Bogert, K., & Doshi, P. (2017). Scaling expectation-maximization for inverse reinforcement learning to multiple robots under occlusion. In Proceedings of the 16th conference on autonomous agents and multiagent systems, AAMAS ’17, pp. 522–529.

12.

Bogert, K., Lin, J.F.S., Doshi, P., & Kulic, D. (2016). Expectation-maximization for inverse reinforcement learning with hidden data. In 2016 International conference on autonomous agents and multiagent systems, pp. 1034–1042.

13.

Boularias, A., Kober, J., & Peters, J. (2011). Relative entropy inverse reinforcement learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, pp. 182–189

14.

Boularias, A., Krömer, O., & Peters, J. (2012). Structured apprenticeship learning. European Conference on Machine Learning and Knowledge Discovery in Databases, Part, II, 227–242.

15.

Choi, J., & Kim, K. E. (2011). Inverse reinforcement learning in partially observable environments. J. Mach. Learn. Res., 12, 691–730.MathSciNetMATH

16.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1–38.MathSciNetCrossRef

17.

Dudík, M., Phillips, S. J., & Schapire, R. E. (2004). Performance guarantees for regularized maximum entropy density estimation. In J. Shawe-Taylor & Y. Singer (Eds.), Learning Theory (pp. 472–486). Berlin Heidelberg: Springer.CrossRef

18.

Gerkey, B., Vaughan, R.T., & Howard, A. (2003). The player/stage project: Tools for multi-robot and distributed sensor systems. In Proceedings of the 11th international conference on advanced robotics, vol. 1.

19.

Herman, M., Fischer, V., Gindele, T., & Burgard, W. (2015). Inverse reinforcement learning of behavioral models for online-adapting navigation strategies. In 2015 IEEE international conference on robotics and automation (ICRA), pp. 3215–3222. IEEE.

20.

Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. Advances in Neural Information Processing Systems (NIPS), 29, 4565–4573.

21.

Jun Jin, Z., Qian, H., Yi Chen, S., & Liang Zhu, M. (2010). Convergence analysis of an incremental approach to online inverse reinforcement learning. Journal of Zhejiang University-Science C, 12(1), 17–24.CrossRef

22.

Kamalaruban, P., Devidze, R., Cevher, V., & Singla, A. (2019). Interactive teaching algorithms for inverse reinforcement learning. arXiv preprint arXiv:1905.11867.

23.

Kitani, K.M., Ziebart, B.D., Bagnell, J.A., & Hebert, M. (2012). Activity forecasting. In 12th European conference on computer vision - Volume Part IV, pp. 201–214.

24.

Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1), 1–63.MathSciNetCrossRef

25.

Levine, S., Popović, Z., & Koltun, V. (2010). Feature construction for inverse reinforcement learning. In Proceedings of the 23rd international conference on neural information processing systems, NIPS’10, pp. 1342–1350. Curran Associates Inc., USA

26.

Ng, A., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In Seventeenth international conference on machine learning, pp. 663–670.

27.

Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., & Peters, J. (2018). An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics 7(2), 1–179.

28.

Ramachandran, D., & Amir, E. (2007). Bayesian inverse reinforcement learning. In 20th international joint conference on artifical intelligence (IJCAI), pp. 2586–2591.

29.

Ratliff, N., Bagnell, J., & Zinkevich, M. (2007). (online) subgradient methods for structured prediction. Journal of Machine Learning Research - Proceedings Track, 2, 380–387.

30.

Ratliff, N.D., Bagnell, J.A., & Zinkevich, M.A. (2006). Maximum margin planning. In 23rd international conference on machine learning, pp. 729–736.

31.

Rhinehart, N., & Kitani, K.M. (2017). First-person activity forecasting with online inverse reinforcement learning. In International conference on computer vision (ICCV).

32.

Russell, S. (1998). Learning agents for uncertain environments (extended abstract). In Eleventh annual conference on computational learning theory, pp. 101–103.

33.

Steinhardt, J., & Liang, P. (2014). Adaptivity and optimism: An improved exponentiated gradient algorithm. In 31st International conference on machine learning, pp. 1593–1601.

34.

Trivedi, M., & Doshi, P. (2018). Inverse learning of robot behavior for collaborative planning. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1–9.

35.

Wang, S., Rosenfeld, R., Zhao, Y., & Schuurmans, D. (2002). The Latent Maximum Entropy Principle. In IEEE international symposium on information theory, pp. 131–131.

36.

Wang, S., & Schuurmans Yunxin Zhao, D. (2012). The Latent Maximum Entropy Principle. ACM Transactions on Knowledge Discovery from Data 6(8).

37.

Wulfmeier, M., & Posner, I. (2015). Maximum Entropy Deep Inverse Reinforcement Learning. arXiv preprint.

38.

Ziebart, B.D., Maas, A., Bagnell, J.A., & Dey, A.K. (2008). Maximum entropy inverse reinforcement learning. In 23rd national conference on artificial intelligence - Volume 3, pp. 1433–1438.

39.

Ziebart, B.D., Ratliff, N., Gallagher, G., Mertz, C., Peterson, K., Bagnell, J.A., Hebert, M., Dey, A.K., & Srinivasa, S. (2009). Planning-based prediction for pedestrians. In: Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS’09, pp. 3931–3936. IEEE Press, Piscataway, NJ, USA.

Title: I2RL: online inverse reinforcement learning under occlusion
Authors: Saurabh Arora
Prashant Doshi
Bikramjit Banerjee
Publication date: 01-04-2021
Publisher: Springer US
Published in: Autonomous Agents and Multi-Agent Systems / Issue 1/2021
Print ISSN: 1387-2532
Electronic ISSN: 1573-7454
DOI: https://doi.org/10.1007/s10458-020-09485-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2021

GDL as a unifying domain description language for declarative automated negotiation

Reaching consensus under a deadline

Enabling scalable and fault-tolerant multi-agent systems by utilizing cloud-native computing

A decentralised self-healing approach for network topology maintenance

Logic-based technologies for multi-agent systems: a systematic literature review

Assisting humans in privacy management: an agent-based approach

Premium Partner