Skip to main content
Erschienen in: International Journal of Computer Vision 5/2020

11.10.2019

Model-Based Robot Imitation with Future Image Similarity

verfasst von: A. Wu, A. J. Piergiovanni, M. S. Ryoo

Erschienen in: International Journal of Computer Vision | Ausgabe 5/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a visual imitation learning framework that enables learning of robot action policies solely based on expert samples without any robot trials. Robot exploration and on-policy trials in a real-world environment could often be expensive/dangerous. We present a new approach to address this problem by learning a future scene prediction model solely from a collection of expert trajectories consisting of unlabeled example videos and actions, and by enabling action selection using future image similarity. In this approach, the robot learns to visually imagine the consequences of taking an action, and obtains the policy by evaluating how similar the predicted future image is to an expert sample. We develop an action-conditioned convolutional autoencoder, and present how we take advantage of future images for zero-online-trial imitation learning. We conduct experiments in simulated and real-life environments using a ground mobility robot with and without obstacles in reaching target objects. We explicitly compare our models to multiple baseline methods requiring only offline samples. The results confirm that our proposed methods perform superior to previous methods, including 1.5 \(\times \) and 2.5 \(\times \) higher success rate in two different tasks than behavioral cloning.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Abbeel, P., & Ng, A. Y. (2004) Apprenticeship learning via inverse reinforcement learning. In International conference on machine learning (ICML). Abbeel, P., & Ng, A. Y. (2004) Apprenticeship learning via inverse reinforcement learning. In International conference on machine learning (ICML).
Zurück zum Zitat Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 31, 469–483.CrossRef Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 31, 469–483.CrossRef
Zurück zum Zitat Baram, N., Anschel, O., Caspi, I., & Mannor, S. (2017). End-to-end differentiable adversarial imitation learning. In International conference on machine learning (ICML) (pp. 390–399). Baram, N., Anschel, O., Caspi, I., & Mannor, S. (2017). End-to-end differentiable adversarial imitation learning. In International conference on machine learning (ICML) (pp. 390–399).
Zurück zum Zitat Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. (2016) End to end learning for self-driving cars. arXiv:1604.07316. Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L. D., Monfort, M., Muller, U., Zhang, J., et al. (2016) End to end learning for self-driving cars. arXiv:​1604.​07316.
Zurück zum Zitat Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE conference on computer vision and pattern recognition (CVPR). Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Chao, Y. W., Yang, J., Price, B., Cohen, S., & Deng, J. (2016). Forecasting human dynamics from static images. In: IEEE conference on computer vision and pattern recognition (CVPR). Chao, Y. W., Yang, J., Price, B., Cohen, S., & Deng, J. (2016). Forecasting human dynamics from static images. In: IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Dosovitskiy, A., Springenberg, J. T., Tatarchenko, M., & Brox, T. (2017). Learning to generate chairs, tables and cars with convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 692–705. Dosovitskiy, A., Springenberg, J. T., Tatarchenko, M., & Brox, T. (2017). Learning to generate chairs, tables and cars with convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 692–705.
Zurück zum Zitat Finn, C., & Levine, S. (2017). Deep visual foresight for planning robot motion. In IEEE international conference on robotics and automation (ICRA). IEEE (pp. 2786–2793). Finn, C., & Levine, S. (2017). Deep visual foresight for planning robot motion. In IEEE international conference on robotics and automation (ICRA). IEEE (pp. 2786–2793).
Zurück zum Zitat Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. arXiv:1603.00448. Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. arXiv:​1603.​00448.
Zurück zum Zitat Giusti, A., Guzzi, J., Cireşan, D. C., He, F.-L., Rodríguez, J. P., Fontana, F., et al. (2016). A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 1(2), 661–667.CrossRef Giusti, A., Guzzi, J., Cireşan, D. C., He, F.-L., Rodríguez, J. P., Fontana, F., et al. (2016). A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 1(2), 661–667.CrossRef
Zurück zum Zitat Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In Advances in neural information processing systems (NIPS). Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In Advances in neural information processing systems (NIPS).
Zurück zum Zitat Laskey, M., Lee, J., Hsieh, W., Liaw, R., Mahler, J., Fox, R., & Goldberg, K. (2017). Iterative noise injection for scalable imitation learning. arXiv:1703.09327. Laskey, M., Lee, J., Hsieh, W., Liaw, R., Mahler, J., Fox, R., & Goldberg, K. (2017). Iterative noise injection for scalable imitation learning. arXiv:​1703.​09327.
Zurück zum Zitat Lee, J., & Ryoo, M. S. (2017). Learning robot activities from first-person human videos using convolutional future regression. In IEEE/RSJ international conference on intelligent robots and systems (IROS). Lee, J., & Ryoo, M. S. (2017). Learning robot activities from first-person human videos using convolutional future regression. In IEEE/RSJ international conference on intelligent robots and systems (IROS).
Zurück zum Zitat Levine, S., Pastor, P., Krizhevsky, A., & Quillen, D. (2016). Learning hand-eye coordination for robotic grasping with large-scale data collection. In International symposium on experimental robotics (pp. 173–184). Springer. Levine, S., Pastor, P., Krizhevsky, A., & Quillen, D. (2016). Learning hand-eye coordination for robotic grasping with large-scale data collection. In International symposium on experimental robotics (pp. 173–184). Springer.
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (ECCV). Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (ECCV).
Zurück zum Zitat Liu, Y., Gupta, A., Abbeel, P., & Levine, S. (2018). Imitation from observation: learning to imitate behaviors from raw video via context translation. arXiv:1707.03374. Liu, Y., Gupta, A., Abbeel, P., & Levine, S. (2018). Imitation from observation: learning to imitate behaviors from raw video via context translation. arXiv:​1707.​03374.
Zurück zum Zitat Liu, Z., Yeh, R. A., Tang, X., Liu, Y., & Agarwala, A. (2017). Video frame synthesis using deep voxel flow. In IEEE international conference on computer vision (ICCV). Liu, Z., Yeh, R. A., Tang, X., Liu, Y., & Agarwala, A. (2017). Video frame synthesis using deep voxel flow. In IEEE international conference on computer vision (ICCV).
Zurück zum Zitat Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.CrossRef
Zurück zum Zitat Ng, A. Y., & Jordan, M. I. (2000). Inverse reinforcement learning. In International conference on machine learning (ICML). Ng, A. Y., & Jordan, M. I. (2000). Inverse reinforcement learning. In International conference on machine learning (ICML).
Zurück zum Zitat Oh, J., Guo, X., Lee, H., Lewis, R. L., & Singh, S. (2015). Action-conditional video prediction using deep networks in atari games. In CoRR. arXiv:1507.08750. Oh, J., Guo, X., Lee, H., Lewis, R. L., & Singh, S. (2015). Action-conditional video prediction using deep networks in atari games. In CoRR. arXiv:​1507.​08750.
Zurück zum Zitat Pathak, D., Mahmoudieh, P., Luo, G., Agrawal, P., Chen, D., Shentu, Y., Shelhamer, E., Malik, Y., Efros, A. A., & Darrell, T. (2018). Zero-shot visual imitation. arXiv:1804.08606. Pathak, D., Mahmoudieh, P., Luo, G., Agrawal, P., Chen, D., Shentu, Y., Shelhamer, E., Malik, Y., Efros, A. A., & Darrell, T. (2018). Zero-shot visual imitation. arXiv:​1804.​08606.
Zurück zum Zitat Peng, X. B., Abbeel, P., Levine, S., & van de Panne, M. (2018). Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. In ACM SIGGRAPH. Peng, X. B., Abbeel, P., Levine, S., & van de Panne, M. (2018). Deepmimic: Example-guided deep reinforcement learning of physics-based character skills. In ACM SIGGRAPH.
Zurück zum Zitat Piergiovanni, A. J., & Ryoo, M. S. (2018). Learning latent super-events to detect multiple activities in videos. In IEEE conference on computer vision and pattern recognition (CVPR). Piergiovanni, A. J., & Ryoo, M. S. (2018). Learning latent super-events to detect multiple activities in videos. In IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Pomerleau, D. A. (1989). Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (NIPS) (pp. 305–313). Pomerleau, D. A. (1989). Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (NIPS) (pp. 305–313).
Zurück zum Zitat Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1), 88–97.CrossRef Pomerleau, D. A. (1991). Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1), 88–97.CrossRef
Zurück zum Zitat Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434. Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:​1511.​06434.
Zurück zum Zitat Ross, S., Gordon, G., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In International conference on artificial intelligence and statistics (pp. 627–635). Ross, S., Gordon, G., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In International conference on artificial intelligence and statistics (pp. 627–635).
Zurück zum Zitat Sadeghi, F., Toshev, A., Jang, E., & Levine, S. (2017). Sim2real view invariant visual servoing by recurrent control. arXiv:1712.07642. Sadeghi, F., Toshev, A., Jang, E., & Levine, S. (2017). Sim2real view invariant visual servoing by recurrent control. arXiv:​1712.​07642.
Zurück zum Zitat Salvador, S., & Chan, P. (2004). Fastdtw: Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis, 11(5), 561–580.CrossRef Salvador, S., & Chan, P. (2004). Fastdtw: Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis, 11(5), 561–580.CrossRef
Zurück zum Zitat Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge: MIT press.MATH Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge: MIT press.MATH
Zurück zum Zitat Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016). Multi-view 3D models from single images with a convolutional network. In European conference on computer vision (ECCV). Tatarchenko, M., Dosovitskiy, A., & Brox, T. (2016). Multi-view 3D models from single images with a convolutional network. In European conference on computer vision (ECCV).
Zurück zum Zitat Vakanski, A., Mantegh, I., Irish, A., & Janabi-Sharifi, F. (2012). Trajectory learning for robot programming by demonstration using hidden markov model and dynamic time warping. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4), 1039–1052.CrossRef Vakanski, A., Mantegh, I., Irish, A., & Janabi-Sharifi, F. (2012). Trajectory learning for robot programming by demonstration using hidden markov model and dynamic time warping. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4), 1039–1052.CrossRef
Zurück zum Zitat Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Anticipating visual representations from unlabeled video. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 98–106). Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Anticipating visual representations from unlabeled video. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 98–106).
Zurück zum Zitat Walker, J., Gupta, A., & Hebert, M. (2014). Patch to the future: Unsupervised visual prediction. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3302–3309). Walker, J., Gupta, A., & Hebert, M. (2014). Patch to the future: Unsupervised visual prediction. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3302–3309).
Zurück zum Zitat Walker, J., Marino, K., Gupta, A., & Hebert, M. (2017). The pose knows: Video forecasting by generating pose futures. In IEEE international conference on computer vision (ICCV) (pp. 3352–3361). Walker, J., Marino, K., Gupta, A., & Hebert, M. (2017). The pose knows: Video forecasting by generating pose futures. In IEEE international conference on computer vision (ICCV) (pp. 3352–3361).
Zurück zum Zitat Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.CrossRef Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.CrossRef
Zurück zum Zitat Zhou, T., Tulsiani, S., Sun, W., Malik, J., & Efros, A. A. (2016). View synthesis by appearance flow. In European conference on computer vision (ECCV) (2016) (pp. 286–301). Zhou, T., Tulsiani, S., Sun, W., Malik, J., & Efros, A. A. (2016). View synthesis by appearance flow. In European conference on computer vision (ECCV) (2016) (pp. 286–301).
Zurück zum Zitat Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2016). Target-driven visual navigation in indoor scenes using deep reinforcement learning. arXiv:1609.05143. Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L., & Farhadi, A. (2016). Target-driven visual navigation in indoor scenes using deep reinforcement learning. arXiv:​1609.​05143.
Zurück zum Zitat Ziebart, B. D., Maas, A., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In PAAAI conference on artificial intelligence (AAAI). Ziebart, B. D., Maas, A., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In PAAAI conference on artificial intelligence (AAAI).
Metadaten
Titel
Model-Based Robot Imitation with Future Image Similarity
verfasst von
A. Wu
A. J. Piergiovanni
M. S. Ryoo
Publikationsdatum
11.10.2019
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 5/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01238-5

Weitere Artikel der Ausgabe 5/2020

International Journal of Computer Vision 5/2020 Zur Ausgabe