Skip to main content

2016 | OriginalPaper | Buchkapitel

Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback

verfasst von : Ashesh Jain, Shikhar Sharma, Ashutosh Saxena

Erschienen in: Robotics Research

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than those arising from simple geometric constraints on robot’s trajectory, such as distance of the robot from human etc. Our preferences are rather governed by the surrounding context of various objects and human interactions in the environment. Such preferences makes the problem challenging because the criterion of defining a good trajectory now varies with the task, with the environment and across the users. Furthermore, demonstrating optimal trajectories (e.g., learning from expert’s demonstrations) is often challenging and non-intuitive on high degrees of freedom manipulators. In this work, we propose an approach that requires a non-expert user to only incrementally improve the trajectory currently proposed by the robot. We implement our algorithm on two high degree-of-freedom robots, PR2 and Baxter, and present three intuitive mechanisms for providing such incremental feedback. In our experimental evaluation we consider two context rich settings—household chores and grocery store checkout—and show that users are able to train the robot with just a few feedbacks (taking only a few minutes). Despite receiving sub-optimal feedback from non-expert users, our algorithm enjoys theoretical bounds on regret that match the asymptotic rates of optimal trajectory algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A kitchen knife originating in Japan.
 
2
When RRT becomes too slow, we switch to a more efficient bidirectional-RRT.The cost function (or its approximation) we learn can be fed to trajectory optimizers like CHOMP [39] or optimal planners like RRT* [23] to produce reasonably good trajectories.
 
3
Consider the following analogy. In search engine results, it is much harder for the user to provide the best web-pages for each query, but it is easier to provide relative ranking on the search results by clicking.
 
4
Similar results were obtained with nDCG@1 metric, not included here due to space constraints.
 
5
The smaller user size on PR2 is because it requires users with experience in Rviz-ROS. Further, we also observed users found it harder to correct trajectory waypoints in a simulator than providing zero-G feedback on the robot. For the same reason we report training time only on Baxter for grocery store setting.
 
Literatur
1.
Zurück zum Zitat Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. IJRR 29(13) (2010) Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. IJRR 29(13) (2010)
2.
Zurück zum Zitat Akgun, B., Cakmak, M., Jiang, K., Thomaz, A.L.: Keyframe-based learning from demonstration. IJSR 4(4), 343–355 (2012) Akgun, B., Cakmak, M., Jiang, K., Thomaz, A.L.: Keyframe-based learning from demonstration. IJSR 4(4), 343–355 (2012)
3.
Zurück zum Zitat Alterovitz, R., Siméon, T., Goldberg, K.: The stochastic motion roadmap: A sampling framework for planning with markov motion uncertainty. In: RSS (2007) Alterovitz, R., Siméon, T., Goldberg, K.: The stochastic motion roadmap: A sampling framework for planning with markov motion uncertainty. In: RSS (2007)
4.
Zurück zum Zitat Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Autonom. Syst. 57(5), 469–483 (2009)CrossRef Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Autonom. Syst. 57(5), 469–483 (2009)CrossRef
5.
Zurück zum Zitat Berenson, D., Abbeel, P., Goldberg, K.: A robot path planning framework that learns from experience. In: ICRA (2012) Berenson, D., Abbeel, P., Goldberg, K.: A robot path planning framework that learns from experience. In: ICRA (2012)
6.
Zurück zum Zitat Berg, J.V.D., Abbeel, P., Goldberg, K.: LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information. In: RSS (2010) Berg, J.V.D., Abbeel, P., Goldberg, K.: LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information. In: RSS (2010)
7.
Zurück zum Zitat Bhattacharya, S., Likhachev, M., Kumar, V.: Identification and representation of homotopy classes of trajectories for search-based path planning in 3d. In: RSS (2011) Bhattacharya, S., Likhachev, M., Kumar, V.: Identification and representation of homotopy classes of trajectories for search-based path planning in 3d. In: RSS (2011)
8.
Zurück zum Zitat Bischoff, R., Kazi, A., Seyfarth, M.: The morpha style guide for icon-based programming. In: Proceedings of the 11th IEEE International Workshop on RHIC (2002) Bischoff, R., Kazi, A., Seyfarth, M.: The morpha style guide for icon-based programming. In: Proceedings of the 11th IEEE International Workshop on RHIC (2002)
9.
Zurück zum Zitat Calinon, S., Guenter, F., Billard, A.: On learning, representing, and generalizing a task in a humanoid robot. In: IEEE Transactions on Systems Man and Cybernetics (2007) Calinon, S., Guenter, F., Billard, A.: On learning, representing, and generalizing a task in a humanoid robot. In: IEEE Transactions on Systems Man and Cybernetics (2007)
10.
Zurück zum Zitat Cohen, B.J., Chitta, S., Likhachev, M.: Search-based planning for manipulation with motion primitives. In: ICRA (2010) Cohen, B.J., Chitta, S., Likhachev, M.: Search-based planning for manipulation with motion primitives. In: ICRA (2010)
11.
Zurück zum Zitat Dey, D., Liu, T.Y., Hebert, M., Bagnell, J.A.: Contextual sequence prediction with application to control library optimization. In: RSS (2012) Dey, D., Liu, T.Y., Hebert, M., Bagnell, J.A.: Contextual sequence prediction with application to control library optimization. In: RSS (2012)
12.
Zurück zum Zitat Diankov, R.: Automated Construction of Robotic Manipulation Programs. Ph.D. thesis, CMU, RI (2010) Diankov, R.: Automated Construction of Robotic Manipulation Programs. Ph.D. thesis, CMU, RI (2010)
13.
Zurück zum Zitat Dragan, A., Srinivasa, S.: Generating legible motion. In: RSS (2013) Dragan, A., Srinivasa, S.: Generating legible motion. In: RSS (2013)
14.
Zurück zum Zitat Dragan, A., Lee, K., Srinivasa, S.: Legibility and predictability of robot motion. In: HRI (2013) Dragan, A., Lee, K., Srinivasa, S.: Legibility and predictability of robot motion. In: HRI (2013)
15.
Zurück zum Zitat Erickson, L.H., LaValle, S.M.: Survivability: Measuring and ensuring path diversity. In: ICRA (2009) Erickson, L.H., LaValle, S.M.: Survivability: Measuring and ensuring path diversity. In: ICRA (2009)
16.
Zurück zum Zitat Gossow, D., Leeperand, A., Hershberger, D., Ciocarlie, M.: Interactive markers: 3-d user interfaces for ros applications [ros topics]. IEEE Robot. Autom. Mag. 18(4), 14–15 (2011)CrossRef Gossow, D., Leeperand, A., Hershberger, D., Ciocarlie, M.: Interactive markers: 3-d user interfaces for ros applications [ros topics]. IEEE Robot. Autom. Mag. 18(4), 14–15 (2011)CrossRef
17.
Zurück zum Zitat Green, C.J., Kelly, A.: Toward optimal sampling in the space of paths. In: ISRR (2007) Green, C.J., Kelly, A.: Toward optimal sampling in the space of paths. In: ISRR (2007)
18.
Zurück zum Zitat Hovland, G.E., Sikka, P., McCarragher, B.J.: Skill acquisition from human demonstration using a hidden markov model. In: ICRA (1996) Hovland, G.E., Sikka, P., McCarragher, B.J.: Skill acquisition from human demonstration using a hidden markov model. In: ICRA (1996)
19.
Zurück zum Zitat Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning trajectory preferences for manipulators via iterative improvement. In: NIPS (2013) Jain, A., Wojcik, B., Joachims, T., Saxena, A.: Learning trajectory preferences for manipulators via iterative improvement. In: NIPS (2013)
20.
Zurück zum Zitat Jiang, Y., Lim, M., Zheng, C., Saxena, A.: Learning to place new objects in a scene. IJRR, 31(9) (2012) Jiang, Y., Lim, M., Zheng, C., Saxena, A.: Learning to place new objects in a scene. IJRR, 31(9) (2012)
21.
Zurück zum Zitat Joachims, T.: Training linear svms in linear time. In: KDD (2006) Joachims, T.: Training linear svms in linear time. In: KDD (2006)
22.
Zurück zum Zitat Joachims, T., Finley, T., Yu, C.: Cutting-plane training of structural SVMS. Mach Learn, 77(1) (2009) Joachims, T., Finley, T., Yu, C.: Cutting-plane training of structural SVMS. Mach Learn, 77(1) (2009)
23.
Zurück zum Zitat Karaman, S., Frazzoli, E.: Incremental sampling-based algorithms for optimal motion planning. In: RSS (2010) Karaman, S., Frazzoli, E.: Incremental sampling-based algorithms for optimal motion planning. In: RSS (2010)
24.
Zurück zum Zitat Klingbeil, E., Rao, D., Carpenter, B., Ganapathi, V., Ng, A.Y., Khatib, O.: Grasping with application to an autonomous checkout robot. In: ICRA (2011) Klingbeil, E., Rao, D., Carpenter, B., Ganapathi, V., Ng, A.Y., Khatib, O.: Grasping with application to an autonomous checkout robot. In: ICRA (2011)
25.
Zurück zum Zitat Kober, J., Peters, J.: Policy search for motor primitives in robotics. Machine Learning, 84(1) (2011) Kober, J., Peters, J.: Policy search for motor primitives in robotics. Machine Learning, 84(1) (2011)
26.
Zurück zum Zitat Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013) Koppula, H.S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: RSS (2013)
27.
Zurück zum Zitat LaValle, S.M., Kuffner, J.J.: Randomized kinodynamic planning. IJRR 20(5), 378–400 (2001) LaValle, S.M., Kuffner, J.J.: Randomized kinodynamic planning. IJRR 20(5), 378–400 (2001)
28.
Zurück zum Zitat Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. In: RSS (2013) Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. In: RSS (2013)
29.
Zurück zum Zitat Levine, S., Koltun, V.: Continuous inverse optimal control with locally optimal examples. In: ICML (2012) Levine, S., Koltun, V.: Continuous inverse optimal control with locally optimal examples. In: ICML (2012)
30.
Zurück zum Zitat Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1, Cambridge University Press, Cambridge (2008) Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1, Cambridge University Press, Cambridge (2008)
31.
Zurück zum Zitat Nicolescu, M.N., Mataric, M.J.: Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (2003) Nicolescu, M.N., Mataric, M.J.: Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems (2003)
32.
Zurück zum Zitat Nikolaidis, S., Shah, J.: Human-robot teaming using shared mental models. In: HRI, Workshop on Human-Agent-Robot Teamwork (2012) Nikolaidis, S., Shah, J.: Human-robot teaming using shared mental models. In: HRI, Workshop on Human-Agent-Robot Teamwork (2012)
33.
Zurück zum Zitat Nikolaidis, S., Shah, J.: Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In: IEEE/ACM ICHRI (2013) Nikolaidis, S., Shah, J.: Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In: IEEE/ACM ICHRI (2013)
34.
Zurück zum Zitat Phillips, M., Cohen, B., Chitta, S., Likhachev, M.: E-graphs: Bootstrapping planning with experience graphs. In: RSS (2012) Phillips, M., Cohen, B., Chitta, S., Likhachev, M.: E-graphs: Bootstrapping planning with experience graphs. In: RSS (2012)
35.
Zurück zum Zitat Raman, K., Joachims, T.: Learning socially optimal information systems from egoistic users. In: Proceedings of the ECML (2013) Raman, K., Joachims, T.: Learning socially optimal information systems from egoistic users. In: Proceedings of the ECML (2013)
36.
Zurück zum Zitat Ratliff, N.: Learning to search: structured prediction techniques for imitation learning. Ph.D. thesis, CMU, RI (2009) Ratliff, N.: Learning to search: structured prediction techniques for imitation learning. Ph.D. thesis, CMU, RI (2009)
37.
Zurück zum Zitat Ratliff, N., Bagnell, J.A., Zinkevich, M.: Maximum margin planning. In: ICML (2006) Ratliff, N., Bagnell, J.A., Zinkevich, M.: Maximum margin planning. In: ICML (2006)
38.
Zurück zum Zitat Ratliff, N., Silver, D., Bagnell, J.A.: Learning to search: Functional gradient techniques for imitation learning. Autonom. Robot. 27(1), 25–53 (2009a)CrossRef Ratliff, N., Silver, D., Bagnell, J.A.: Learning to search: Functional gradient techniques for imitation learning. Autonom. Robot. 27(1), 25–53 (2009a)CrossRef
39.
Zurück zum Zitat Ratliff, N., Zucker, M., Bagnell, J.A., Srinivasa, S.: Chomp: Gradient optimization techniques for efficient motion planning. In: ICRA (2009b) Ratliff, N., Zucker, M., Bagnell, J.A., Srinivasa, S.: Chomp: Gradient optimization techniques for efficient motion planning. In: ICRA (2009b)
40.
Zurück zum Zitat Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. IJRR, 27(2) (2008) Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. IJRR, 27(2) (2008)
41.
Zurück zum Zitat Shivaswamy, P., Joachims, T.: Online structured prediction via coactive learning. In: ICML (2012) Shivaswamy, P., Joachims, T.: Online structured prediction via coactive learning. In: ICML (2012)
42.
Zurück zum Zitat Shneiderman, B., Plaisant, C.: Designing The User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley Publication (2010) Shneiderman, B., Plaisant, C.: Designing The User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley Publication (2010)
43.
Zurück zum Zitat Stopp, A., Horstmann, S., Kristensen, S., Lohnert, F.: Towards interactive learning for manufacturing assistants. In: Proceedings of the 10th IEEE International Workshop on RHIC (2001) Stopp, A., Horstmann, S., Kristensen, S., Lohnert, F.: Towards interactive learning for manufacturing assistants. In: Proceedings of the 10th IEEE International Workshop on RHIC (2001)
45.
Zurück zum Zitat Tamane, K., Revfi, M., Asfour, T.: Synthesizing object receiving motions of humanoid robots with human motion database. In: ICRA (2013) Tamane, K., Revfi, M., Asfour, T.: Synthesizing object receiving motions of humanoid robots with human motion database. In: ICRA (2013)
46.
Zurück zum Zitat Vernaza, P., Bagnell, J.A.: Efficient high dimensional maximum entropy modeling via symmetric partition functions. In: NIPS (2012) Vernaza, P., Bagnell, J.A.: Efficient high dimensional maximum entropy modeling via symmetric partition functions. In: NIPS (2012)
47.
Zurück zum Zitat Wilson, A., Fern, A., Tadepalli, P.: A bayesian approach for policy learning from trajectory preference queries. In: NIPS (2012) Wilson, A., Fern, A., Tadepalli, P.: A bayesian approach for policy learning from trajectory preference queries. In: NIPS (2012)
48.
Zurück zum Zitat Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008) Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: AAAI (2008)
Metadaten
Titel
Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback
verfasst von
Ashesh Jain
Shikhar Sharma
Ashutosh Saxena
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-28872-7_19

Neuer Inhalt