research-article

Spatially-aware dialogue control using hierarchical reinforcement learning

Authors:
Heriberto Cuayáhuitl

University of Bremen, Germany

University of Bremen, Germany
View Profile

,
Nina Dethlefs

University of Bremen, Germany

University of Bremen, Germany
View Profile

ACM Transactions on Speech and Language Processing Volume 7 Issue 3Article No.: 5pp 1–26https://doi.org/10.1145/1966407.1966410

Published:06 June 2011Publication History

ACM Transactions on Speech and Language Processing

Abstract

This article addresses the problem of scalable optimization for spatially-aware dialogue systems. These kinds of systems must perceive, reason, and act about the spatial environment where they are embedded. We formulate the problem in terms of Semi-Markov Decision Processes and propose a hierarchical reinforcement learning approach to optimize subbehaviors rather than full behaviors. Because of the vast number of policies that are required to control the interaction in a dynamic environment (e.g., a dialogue system assisting a user to navigate in a building from one location to another), our learning approach is based on two stages: (a) the first stage learns low-level behavior, in advance; and (b) the second stage learns high-level behavior, in real time. For such a purpose we extend an existing algorithm in the literature of reinforcement learning in order to support reusable policies and therefore to perform fast learning. We argue that our learning approach makes the problem feasible, and we report on a novel reinforcement learning dialogue system that performs a joint optimization between dialogue and spatial behaviors. Our experiments, using simulated and real environments, are based on a text-based dialogue system for indoor navigation. Experimental results in a realistic environment reported an overall user satisfaction result of 89%, which suggests that our proposed approach is attractive for its application in real interactions as it combines fast learning with adaptive and reasonable behavior.

References

Bateman, J. A. 1997. Enabling technology for multilingual natural language generation: the KPML development environment. J. Nat. Lang. Engin. 3, 1, 15--55. Google ScholarDigital Library
Becker, T., Nagel, C., and Kolbe, T. H. 2008. A multilayered space-event model for navigation in indoor spaces. In Proceedings of the 3rd International Workshop on 3D Geo-Information. J. Lee and S. Zlatanova Eds., Springer, Berlin.Google Scholar
Belz, A. 2008. Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Nat. Lang. Engin. 1, 1--26. Google ScholarDigital Library
Burnett, G., Smith, D., and May, A. 2001. Supporting the Navigation Task: Characteristics of ‘G’ood Landmarks. Contemp. Ergonom. 441--446.Google Scholar
Callaway, C. 2007. Non-localized, interactive multimodal direction giving. In Proceedings of the Workshop on Multimodal Output Generation (MOG'07). I. van der Sluis, M. Theune, E. Reiter, and E. Krahmer Eds., Centre for Telematics and Information Technology (CTIT), University of Twente, 41--50.Google Scholar
Clark, S., Hockenmaier, J., and Steedman, M. 2002. Building deep dependency structures using a wide-coverage CCG parser. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). 327--334. Google ScholarDigital Library
Cuayáhuitl, H. 2009. Hierarchical reinforcement learning for spoken dialogue systems. Ph.D. thesis, School of Informatics, University of Edinburgh.Google Scholar
Cuayáhuitl, H., Dethlefs, N., Frommberger, L., Richter, K.-F., and Bateman, J. 2010a. Generating adaptive route instructions using hierarchical reinforcement learning. In Proceedings of the International Conference on Spatial Cognition (Spatial Cognition VII). Google ScholarDigital Library
Cuayáhuitl, H., Dethlefs, N., Richter, K.-F., Tenbrink, T., and Bateman, J. 2010b. A dialogue system for indoor wayfinding using text-based natural language. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics (CICLing).Google Scholar
Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2007. Hierarchical dialogue optimization using Semi-Markov decision processes. In Proceedings of INTERSPEECH. 2693--2696.Google Scholar
Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2010c. Evaluation of a hierarchical reinforcement learning spoken dialogue system. Comput. Speech Lang. 24, 2, 395--429. Google ScholarDigital Library
Denecke, M., Dohsaka, K., and Nakano, M. 2004. Fast reinforcement learning of dialogue policies using stable function approximation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP). 1--11. Google ScholarDigital Library
Denis, M. 1997. The description of routes: A cognitive approach to the production of spatial discourse. Cahiers Psychologie Cognitive 16, 4, 409--458.Google Scholar
Dethlefs, N. and Cuayáhuitl, H. 2010. Hierarchical reinforcement learning for adaptive text generation. In Proceedings of the International Conference on Natural Language Generation (INLG). Google ScholarDigital Library
Dethlefs, N., Cuayáhuitl, H., Richter, K.-F., Andonova, E., and Bateman, J. 2010. Evaluating task success in a dialogue system for indoor navigation. In Proceedings of the 14th Workshop on the Semantics and Pragmatics of Dialogue (SemDial).Google Scholar
Dietterich, T. 2000a. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Resear. 13, 1, 227--303. Google ScholarDigital Library
Dietterich, T. 2000b. An overview of MAXQ hierarchical reinforcement learning. In Proceedings of the Symposium on Abstraction, Reformulation, and Approximation. 26--44. Google ScholarDigital Library
Duckham, M. and Kulik, L. 2003. “Simplest” paths: Automated route selection for navigation. In Spatial Information Theory, W. Kuhn, M. Worboys, and S. Timpf Eds., Lecture Notes in Computer Science, vol. 2825, Springer, Berlin, 169--185.Google Scholar
Fernández, F. and Veloso, M. 2006. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS). 720--727. Google ScholarDigital Library
Henderson, J., Lemon, O., and Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. Computat. Ling. 34, 4, 487--511. Google ScholarDigital Library
Hochmair, H. H. 2008. PDA-assisted indoor-navigation with imprecise positioning: Results of a desktop usability study. In Map-Based Mobile Services: Interactivity, Usability and Case Studies, L. Meng, A. Zipf, and S. Winter Eds., Springer, Berlin, 228--247.Google Scholar
Janarthanam, S. and Lemon, O. 2010. Learning to adapt to unknown users: referring expression generation in spoken dialogue systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 69--78. Google ScholarDigital Library
Kaelbling, L., Littman, M., and Moore, A. 1996. Reinforcement learning: A survey. J. Artifi. Intell. Resear. 4, 237--285. Google ScholarDigital Library
Klippel, A., Hansen, S., Richter, K.-F., and Winter, S. 2009. Urban granularities - a data structure for cognitively ergonomic route directions. GeoInformatica 13, 2, 223--247. Google ScholarDigital Library
Klippel, A., Tenbrink, T., and Montello, D. R. 2010. The role of structure and function in the conceptualization of directions. In Motion Encoding in Language and Space, E. van der Zee and M. Vulchanova Eds., Oxford University Press.Google Scholar
Kray, C., Kortuem, G., and Krüger, A. 2005. Adaptive navigation support with public displays. In Proceedings of Conference on Intelligent User Interfaces (IUI). ACM, R. S. Amant, J. Riedl, and A. Jameson Eds., ACM, NY, 326--328. Google ScholarDigital Library
Kruijff, G., Zender, H., Jensfelt, P., and Christensen, H. 2007. Situated dialogue and spatial organization: What, where… and why&quest; Int. J. Adv. Robo. Syst. 4, 2. (Special Issue on Human and Robot Interactive Communication.)Google Scholar
Lemon, O. 2010. Learning what to say and how to say it: Joint optimization of spoken dialogue management and natural language generation. Comput. Speech Lang. Google ScholarDigital Library
Lemon, O., Bracy, A., Gruenstein, A., and Peters, S. 2001. The WITAS multi-modal dialogue system I. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech). 1559--1562.Google Scholar
Lemon, O., Georgila, K., and Henderson, J. 2006. Evaluating efectiveness and portability of reinforcement learned dialogue strategies with real users: The TALK TownInfo evaluation. In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT). 178--181.Google Scholar
Levin, E., Pieraccini, R., and Eckert, W. 2000. A stochastic model of human machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Proc. 8, 1, 11--23.Google ScholarCross Ref
Lovelace, K. L., Hegarty, M., and Montello, D. R. 1999. Elements of good route directions in familiar and unfamiliar environments. In Spatial Information Theory: Cognitive and Computational Foundations of Geographic Information Science, Lecture Notes in Computer Science, vol. 1661. Google ScholarDigital Library
May, A. J., Ross, T., and Bayer, S. H. 2003. Drivers' Information Requirements when Navigating in an Urban Environment. J. Navigation 56, 01, 89--100.Google ScholarCross Ref
Münzer, S. and Stahl, C. 2007. Providing individual route instructions for indoor wayfinding in complex, multi-level buildings. In Proceedings of the 5th Geographic Information Days. F. Probst and C. Kessler Eds., 241--246.Google Scholar
Ohlbach, H. J. and Stoffel, E.-P. 2008. Versatile route descriptions for pedestrian guidance in buildings: Conceptual model and systematic method. In Proceedings of the 11th AGILE International Conference on Geographic Information Science.Google Scholar
Pietquin, O. 2004. A framework for unsupervised learning of dialogue strategies. Ph.D. thesis, Faculté Polytechnique de Mons.Google Scholar
Pineau, J. 2004. Tractable planning under uncertainty: Exploiting structure. Ph.D. thesis, Carnegie Mellon University. Google ScholarDigital Library
Prommer, T., Holzapfel, H., and Waibel, A. 2006. Rapid simulation-driven reinforcement learning of multimodal dialog strategies in human-robot interaction. In Proceedings of the INTERSPEECH. 1918--1921.Google Scholar
Raubal, M. and Winter, S. 2002. Enriching wayfinding instructions with local landmarks. In Proceedings of the 2nd International Conference on Geographic Information Science, 2002, M. Egenhofer and D. Mark Eds., Lecture Notes in Computer Science, vol. 2478, Springer, Berlin, 243--259. Google ScholarDigital Library
Richter, K.-F. and Duckham, M. 2008. Simplest instructions: Finding easy-to-describe routes for navigation. In Proceedings of the 5th International Conference on Geographic Information Science, T. J. Cova, H. J. Miller, K. Beard, A. U. Frank, and M. F. Goodchild Eds., Lecture Notes in Computer Science, vol. 5266. Springer, Berlin, 274--289. Google ScholarDigital Library
Rieser, V. and Lemon, O. 2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: Bootstrapping and evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 638--646.Google Scholar
Ross, R. J. and Bateman, J. A. 2009. Daisie: Information state dialogues for situated systems. In Proceedings of the International Conference on Text, Speech and Dialogue. Lecture Notes in Compter Science, vol. 5729. Springer, 379--386. Google ScholarDigital Library
Russell, S. and Norvig, P. 2003. Artificial Intelligence: A Modern Approach. Pearson Education. Google ScholarDigital Library
Schatzmann, J., Weilhammer, K., Stuttle, M., and Young, S. 2006. A survey on statistical user simulation techniques for reinforcement learning of dialogue management strategies. Knowl. Engin. Rev. 21, 2, 97--126. Google ScholarDigital Library
Scheffler, K. 2002. Automatic design of spoken dialogue systems. Ph.D. thesis, Cambridge University.Google Scholar
Singh, S., Litman, D., Kearns, M., and Walker, M. 2002. Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. J. Artif. Intell. Resear. 16, 105--133. Google ScholarDigital Library
Sorrows, M. E. and Hirtle, S. C. 1999. The nature of landmarks for real and electronic spaces. In Spatial Information Theory, C. Freksa and D. M. Mark Eds., Lecture Notes in Compter Science, vol. 1661, Springer, 37--50. LNCS 1661. Google ScholarDigital Library
Stiefelhagen, R., Ekenel, H., Fugen, C., Gieselmann, P., Holzapfel, H., Kraft, F., Nickel, K., Voit, M., and Waibel, A. 2007. Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot. IEEE Trans. Robot. 23, 5, 840--851. Google ScholarDigital Library
Sutton, R. and Barto, A. 1998. Reinforcement Learing: An Introduction. MIT Press. Google ScholarDigital Library
Tenbrink, T. and Winter, S. 2009. Variable granularity in route directions. Spat. Cognit. Computat. 9, 1, 64--93.Google ScholarCross Ref
Thomson, B. 2009. Statistical methods for spoken dialogue management. Ph.D. thesis, University of Cambridge.Google Scholar
Toney, D. 2007. Evolutionary reinforcement learning of spoken dialogue strategies. Ph.D. thesis, University of Edinburgh.Google Scholar
Walker, M. 2000. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. J. Artif. Intell. Resear. 12, 387--416. Google ScholarDigital Library
Walker, M., Kamm, C., and Litman, D. 2000. Towards developing general models of usability with PARADISE. Nat. Lang. Engine. 6, 3, 363--377. Google ScholarDigital Library
Werner, S., Krieg-Br&yiml;ckner, B., and Herrmann, T. 2000. Modelling navigational knowledge by route graphs. In Spatial Cognition II, E. A. Freksa Ed., Lecture Notes in Compter Science, vol. 1849, Springer, 295--316. Google ScholarDigital Library
Williams, J. 2006. Partially observable Markov decision processes for spoken dialogue management. Ph.D. thesis, Cambridge University.Google Scholar
Wyatt, J. 2005. Planning clarification questions to resolve ambiguous references to objects. In Proceedings of the Workshop on Knowledge and Reasoning in Practical Dialogue Systems (IJCAI).Google Scholar
Young, S. 2000. Probabilistic methods in spoken dialogue systems. Phil. Trans. Roy. Soc. (Series A) 358, 1769, 1389--1402.Google Scholar
Young, Y., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., B., T., and Yu, K. 2010. The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Comput. Speech Lang. 24, 2, 150--174. Google ScholarDigital Library

Recommendations

Sample-efficient batch reinforcement learning for dialogue management optimization

Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. ...
Read More
Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots
Special Issue on Multiple Modalities in Interactive Systems and Robots

Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the ...
Read More
Dialogue manager domain adaptation using Gaussian process reinforcement learning

Generic-specific policy model.Policy committee model.Multi-agent policy model.Human user evaluation. Spoken dialogue systems allow humans to interact with machines using natural speech. As such, they have many benefits. By using speech as the primary ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Speech and Language Processing Volume 7, Issue 3
May 2011
155 pages
ISSN:1550-4875
EISSN:1550-4883
DOI:10.1145/1966407
Issue’s Table of Contents

Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2011
- Accepted: 1 December 2010
- Revised: 1 November 2010
- Received: 1 July 2010
Published in tslp Volume 7, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Dialogue systems
dialogue optimization
dynamic environments
hierarchical control
machine learning
policy reuse
reinforcement learning
route instruction generation
spatial cognition
system evaluation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 416
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Spatially-aware dialogue control using hierarchical reinforcement learning

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Recommendations

Sample-efficient batch reinforcement learning for dialogue management optimization

Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots

Dialogue manager domain adaptation using Gaussian process reinforcement learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Spatially-aware dialogue control using hierarchical reinforcement learning

ACM Transactions on Speech and Language Processing

Abstract

References

Cited By

Recommendations

Sample-efficient batch reinforcement learning for dialogue management optimization

Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots

Dialogue manager domain adaptation using Gaussian process reinforcement learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media