Skip to main content
Erschienen in: Autonomous Robots 2/2019

13.08.2018

Grounding natural language instructions to semantic goal representations for abstraction and generalization

verfasst von: Dilip Arumugam, Siddharth Karamcheti, Nakul Gopalan, Edward C. Williams, Mina Rhee, Lawson L. S. Wong, Stefanie Tellex

Erschienen in: Autonomous Robots | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Language grounding is broadly defined as the problem of mapping natural language instructions to robot behavior. To truly be effective, these language grounding systems must be accurate in their selection of behavior, efficient in the robot’s realization of that selected behavior, and capable of generalizing beyond commands and environment configurations only seen at training time. One choice that is crucial to the success of a language grounding model is the choice of representation used to capture the objective specified by the input command. Prior work has been varied in its use of explicit goal representations, with some approaches lacking a representation altogether, resulting in models that infer whole sequences of robot actions, while other approaches map to carefully constructed logical form representations. While many of the models in either category are reasonably accurate, they fail to offer either efficient execution or any generalization without requiring a large amount of manual specification. In this work, we take a first step towards language grounding models that excel across accuracy, efficiency, and generalization through the construction of simple, semantic goal representations within Markov decision processes. We propose two related semantic goal representations that take advantage of the hierarchical structure of tasks and the compositional nature of language respectively, and present multiple grounding models for each. We validate these ideas empirically with results collected from following text instructions within a simulated mobile-manipulator domain, as well as demonstrations of a physical robot responding to spoken instructions in real time. Our grounding models tie abstraction in language commands to a hierarchical planner for the robot’s execution, enabling a response-time speed-up of several orders of magnitude over baseline planners within sufficiently large domains. Concurrently, our grounding models for generalization infer elements of the semantic representation that are subsequently combined to form a complete goal description, enabling the interpretation of commands involving novel combinations never seen during training. Taken together, our results show that the design of semantic goal representation has powerful implications for the accuracy, efficiency, and generalization capabilities of language grounding models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Artzi, Y, & Zettlemoyer, L. (2013). Weakly supervised learning of semantic parsers for mapping instructions to actions. In Annual meeting of the association for computational linguistics. Artzi, Y, & Zettlemoyer, L. (2013). Weakly supervised learning of semantic parsers for mapping instructions to actions. In Annual meeting of the association for computational linguistics.
Zurück zum Zitat Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.MATH Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press.MATH
Zurück zum Zitat Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2000). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.MATH Bengio, Y., Ducharme, R., Vincent, P., & Janvin, C. (2000). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.MATH
Zurück zum Zitat Brown, P. F., Cocke, J., Pietra, S. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., et al. (1990). A statistical approach to machine translation. Computational Linguistics, 16, 79–85. Brown, P. F., Cocke, J., Pietra, S. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., et al. (1990). A statistical approach to machine translation. Computational Linguistics, 16, 79–85.
Zurück zum Zitat Brown, P. F., Pietra, S. D., Pietra, V. J. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19, 263–311. Brown, P. F., Pietra, S. D., Pietra, V. J. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19, 263–311.
Zurück zum Zitat Chen, D. L., & Mooney, R. J. (2011). Learning to interpret natural language navigation instructions from observations. In AAAI Conference on artificial intelligence. Chen, D. L., & Mooney, R. J. (2011). Learning to interpret natural language navigation instructions from observations. In AAAI Conference on artificial intelligence.
Zurück zum Zitat Cho, K., van Merriënboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1724–1734). Doha, Qatar: Association for Computational Linguistics. http://www.aclweb.org/anthology/D14-1179. Cho, K., van Merriënboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1724–1734). Doha, Qatar: Association for Computational Linguistics. http://​www.​aclweb.​org/​anthology/​D14-1179.
Zurück zum Zitat Chung, J., Gülçehre, Ç., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Presented at the deep learning workshop at NIPS2014. arXiv:1412.3555. Chung, J., Gülçehre, Ç., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Presented at the deep learning workshop at NIPS2014. arXiv:​1412.​3555.
Zurück zum Zitat Dieterrich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal on Artificial Intelligence Research, 13, 227–303.MathSciNetCrossRefMATH Dieterrich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal on Artificial Intelligence Research, 13, 227–303.MathSciNetCrossRefMATH
Zurück zum Zitat Diuk, C., Cohen, A., & Littman, M. L. (2008). An object-oriented representation for efficient reinforcement learning. In International conference on machine learning. Diuk, C., Cohen, A., & Littman, M. L. (2008). An object-oriented representation for efficient reinforcement learning. In International conference on machine learning.
Zurück zum Zitat Dzifcak, J., Scheutz, M., Baral, C., & Schermerhorn, P. (2009). What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In IEEE international conference on robotics and automation. Dzifcak, J., Scheutz, M., Baral, C., & Schermerhorn, P. (2009). What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution. In IEEE international conference on robotics and automation.
Zurück zum Zitat Gopalan, N., desJardins, M., Littman, M. L., MacGlashan, J., Squire, S., Tellex, S., Winder, R. J., & Wong, L. L. S. (2017). Planning with abstract Markov decision processes. In International conference on automated planning and scheduling. Gopalan, N., desJardins, M., Littman, M. L., MacGlashan, J., Squire, S., Tellex, S., Winder, R. J., & Wong, L. L. S. (2017). Planning with abstract Markov decision processes. In International conference on automated planning and scheduling.
Zurück zum Zitat Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef
Zurück zum Zitat Howard, T. M., Tellex, S., & Roy, N. (2014). A natural language planner interface for mobile manipulators. In IEEE International conference on robotics and automation. Howard, T. M., Tellex, S., & Roy, N. (2014). A natural language planner interface for mobile manipulators. In IEEE International conference on robotics and automation.
Zurück zum Zitat Iyyer, M., Manjunatha, V., Boyd-Graber, J. L., Daumé, H. (2015). Deep unordered composition rivals syntactic methods for text classification. In Annual meeting of the association for computational linguistics. Iyyer, M., Manjunatha, V., Boyd-Graber, J. L., Daumé, H. (2015). Deep unordered composition rivals syntactic methods for text classification. In Annual meeting of the association for computational linguistics.
Zurück zum Zitat Jong, N. K., & Stone, P. (2008). Hierarchical model-based reinforcement learning: R-max + MAXQ. In International conference on machine learning. Jong, N. K., & Stone, P. (2008). Hierarchical model-based reinforcement learning: R-max + MAXQ. In International conference on machine learning.
Zurück zum Zitat Junghanns, A., & Schaeeer, J. (1997). Sokoban: A challenging single-agent search problem. In International joint conference on artificial intelligence workshop on using games as an experimental testbed for AI reasearch. Junghanns, A., & Schaeeer, J. (1997). Sokoban: A challenging single-agent search problem. In International joint conference on artificial intelligence workshop on using games as an experimental testbed for AI reasearch.
Zurück zum Zitat Karamcheti, S., Williams, E. C., Arumugam, D., Rhee, M., Gopalan, N., Wong, L. L. S., & Tellex, S. (2017). A tale of two DRAGGNs: A hybrid approach for interpreting action-oriented and goal-oriented instructions. In Annual meeting of the association for computational linguistics workshop on language grounding for robotics. Karamcheti, S., Williams, E. C., Arumugam, D., Rhee, M., Gopalan, N., Wong, L. L. S., & Tellex, S. (2017). A tale of two DRAGGNs: A hybrid approach for interpreting action-oriented and goal-oriented instructions. In Annual meeting of the association for computational linguistics workshop on language grounding for robotics.
Zurück zum Zitat Liang, P. (2016). Learning executable semantic parsers for natural language understanding. Communications of the ACM, 59(9), 68–76.CrossRef Liang, P. (2016). Learning executable semantic parsers for natural language understanding. Communications of the ACM, 59(9), 68–76.CrossRef
Zurück zum Zitat MacMahon, M., Stankiewicz, B., & Kuipers, B. (2006). Walk the talk: Connecting language, knowledge, and action in route instructions. In National conference on artificial intelligence. MacMahon, M., Stankiewicz, B., & Kuipers, B. (2006). Walk the talk: Connecting language, knowledge, and action in route instructions. In National conference on artificial intelligence.
Zurück zum Zitat Matuszek, C., Herbst, E., Zettlemoyer, L., & Fox, D. (2012). Learning to parse natural language commands to a robot control system. In International symposium on experimental robotics. Matuszek, C., Herbst, E., Zettlemoyer, L., & Fox, D. (2012). Learning to parse natural language commands to a robot control system. In International symposium on experimental robotics.
Zurück zum Zitat McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing (pp. 13–18). McGovern, A., Sutton, R. S., & Fagg, A. H. (1997). Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper celebration of women in computing (pp. 13–18).
Zurück zum Zitat McMahan, H. ., Likhachev, M., & Gordon, G. J. (2005). Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In International conference on machine learning. McMahan, H. ., Likhachev, M., & Gordon, G. J. (2005). Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In International conference on machine learning.
Zurück zum Zitat Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., & Khudanpur, S. (2011). Extensions of recurrent neural network language model. In IEEE international conference on acoustics, speech, and signal processing. Mikolov, T., Kombrink, S., Burget, L., Cernocký, J., & Khudanpur, S. (2011). Extensions of recurrent neural network language model. In IEEE international conference on acoustics, speech, and signal processing.
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR. arxiv:1301.3781. Mikolov, T., Chen, K., Corrado, G. S., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR. arxiv:​1301.​3781.
Zurück zum Zitat Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In International conference on machine learning. Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In International conference on machine learning.
Zurück zum Zitat Quigley, M., Faust, J., Foote, T., & Leibs, J. (2009). ROS: an open-source robot operating system. In IEEE international conference on robotics and automation workshop on open source software. Quigley, M., Faust, J., Foote, T., & Leibs, J. (2009). ROS: an open-source robot operating system. In IEEE international conference on robotics and automation workshop on open source software.
Zurück zum Zitat Reed, S. E., & de Freitas, N. (2016). Neural programmer-interpreters. In International conference on learning representations. Reed, S. E., & de Freitas, N. (2016). Neural programmer-interpreters. In International conference on learning representations.
Zurück zum Zitat Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.MathSciNetMATH Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.MathSciNetMATH
Zurück zum Zitat Sutton, R. S., Precup, D., & Singh, S. P. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.MathSciNetCrossRefMATH Sutton, R. S., Precup, D., & Singh, S. P. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.MathSciNetCrossRefMATH
Zurück zum Zitat Tellex, S., Kollar, T., Dickerson, S., Walter, M. R., Banerjee, A. G., Teller, S., & Roy, N. (2011). Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI conference on artificial intelligence. Tellex, S., Kollar, T., Dickerson, S., Walter, M. R., Banerjee, A. G., Teller, S., & Roy, N. (2011). Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI conference on artificial intelligence.
Zurück zum Zitat Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. Technical report, Artificial Intelligence Laboratory, Massachusetts Institute of Technology. Winograd, T. (1971). Procedures as a representation for data in a computer program for understanding natural language. Technical report, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
Zurück zum Zitat Yamada, T., Murata, S., Arie, H., & Ogata, T. (2016). Dynamical linking of positive and negative sentences to goal-oriented robot behavior by hierarchical RNN. In International conference on artificial neural networks. Yamada, T., Murata, S., Arie, H., & Ogata, T. (2016). Dynamical linking of positive and negative sentences to goal-oriented robot behavior by hierarchical RNN. In International conference on artificial neural networks.
Zurück zum Zitat Zelle, J. M., & Mooney, R. J. (1996) Learning to parse database queries using inductive logic programming. In National conference on artificial intelligence. Zelle, J. M., & Mooney, R. J. (1996) Learning to parse database queries using inductive logic programming. In National conference on artificial intelligence.
Metadaten
Titel
Grounding natural language instructions to semantic goal representations for abstraction and generalization
verfasst von
Dilip Arumugam
Siddharth Karamcheti
Nakul Gopalan
Edward C. Williams
Mina Rhee
Lawson L. S. Wong
Stefanie Tellex
Publikationsdatum
13.08.2018
Verlag
Springer US
Erschienen in
Autonomous Robots / Ausgabe 2/2019
Print ISSN: 0929-5593
Elektronische ISSN: 1573-7527
DOI
https://doi.org/10.1007/s10514-018-9792-8

Weitere Artikel der Ausgabe 2/2019

Autonomous Robots 2/2019 Zur Ausgabe

Neuer Inhalt