skip to main content
research-article

Spatially-aware dialogue control using hierarchical reinforcement learning

Published:06 June 2011Publication History
Skip Abstract Section

Abstract

This article addresses the problem of scalable optimization for spatially-aware dialogue systems. These kinds of systems must perceive, reason, and act about the spatial environment where they are embedded. We formulate the problem in terms of Semi-Markov Decision Processes and propose a hierarchical reinforcement learning approach to optimize subbehaviors rather than full behaviors. Because of the vast number of policies that are required to control the interaction in a dynamic environment (e.g., a dialogue system assisting a user to navigate in a building from one location to another), our learning approach is based on two stages: (a) the first stage learns low-level behavior, in advance; and (b) the second stage learns high-level behavior, in real time. For such a purpose we extend an existing algorithm in the literature of reinforcement learning in order to support reusable policies and therefore to perform fast learning. We argue that our learning approach makes the problem feasible, and we report on a novel reinforcement learning dialogue system that performs a joint optimization between dialogue and spatial behaviors. Our experiments, using simulated and real environments, are based on a text-based dialogue system for indoor navigation. Experimental results in a realistic environment reported an overall user satisfaction result of 89%, which suggests that our proposed approach is attractive for its application in real interactions as it combines fast learning with adaptive and reasonable behavior.

References

  1. Bateman, J. A. 1997. Enabling technology for multilingual natural language generation: the KPML development environment. J. Nat. Lang. Engin. 3, 1, 15--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Becker, T., Nagel, C., and Kolbe, T. H. 2008. A multilayered space-event model for navigation in indoor spaces. In Proceedings of the 3rd International Workshop on 3D Geo-Information. J. Lee and S. Zlatanova Eds., Springer, Berlin.Google ScholarGoogle Scholar
  3. Belz, A. 2008. Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Nat. Lang. Engin. 1, 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Burnett, G., Smith, D., and May, A. 2001. Supporting the Navigation Task: Characteristics of ‘G’ood Landmarks. Contemp. Ergonom. 441--446.Google ScholarGoogle Scholar
  5. Callaway, C. 2007. Non-localized, interactive multimodal direction giving. In Proceedings of the Workshop on Multimodal Output Generation (MOG'07). I. van der Sluis, M. Theune, E. Reiter, and E. Krahmer Eds., Centre for Telematics and Information Technology (CTIT), University of Twente, 41--50.Google ScholarGoogle Scholar
  6. Clark, S., Hockenmaier, J., and Steedman, M. 2002. Building deep dependency structures using a wide-coverage CCG parser. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). 327--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cuayáhuitl, H. 2009. Hierarchical reinforcement learning for spoken dialogue systems. Ph.D. thesis, School of Informatics, University of Edinburgh.Google ScholarGoogle Scholar
  8. Cuayáhuitl, H., Dethlefs, N., Frommberger, L., Richter, K.-F., and Bateman, J. 2010a. Generating adaptive route instructions using hierarchical reinforcement learning. In Proceedings of the International Conference on Spatial Cognition (Spatial Cognition VII). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cuayáhuitl, H., Dethlefs, N., Richter, K.-F., Tenbrink, T., and Bateman, J. 2010b. A dialogue system for indoor wayfinding using text-based natural language. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics (CICLing).Google ScholarGoogle Scholar
  10. Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2007. Hierarchical dialogue optimization using Semi-Markov decision processes. In Proceedings of INTERSPEECH. 2693--2696.Google ScholarGoogle Scholar
  11. Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2010c. Evaluation of a hierarchical reinforcement learning spoken dialogue system. Comput. Speech Lang. 24, 2, 395--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Denecke, M., Dohsaka, K., and Nakano, M. 2004. Fast reinforcement learning of dialogue policies using stable function approximation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP). 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Denis, M. 1997. The description of routes: A cognitive approach to the production of spatial discourse. Cahiers Psychologie Cognitive 16, 4, 409--458.Google ScholarGoogle Scholar
  14. Dethlefs, N. and Cuayáhuitl, H. 2010. Hierarchical reinforcement learning for adaptive text generation. In Proceedings of the International Conference on Natural Language Generation (INLG). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dethlefs, N., Cuayáhuitl, H., Richter, K.-F., Andonova, E., and Bateman, J. 2010. Evaluating task success in a dialogue system for indoor navigation. In Proceedings of the 14th Workshop on the Semantics and Pragmatics of Dialogue (SemDial).Google ScholarGoogle Scholar
  16. Dietterich, T. 2000a. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Resear. 13, 1, 227--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dietterich, T. 2000b. An overview of MAXQ hierarchical reinforcement learning. In Proceedings of the Symposium on Abstraction, Reformulation, and Approximation. 26--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Duckham, M. and Kulik, L. 2003. “Simplest” paths: Automated route selection for navigation. In Spatial Information Theory, W. Kuhn, M. Worboys, and S. Timpf Eds., Lecture Notes in Computer Science, vol. 2825, Springer, Berlin, 169--185.Google ScholarGoogle Scholar
  19. Fernández, F. and Veloso, M. 2006. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS). 720--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Henderson, J., Lemon, O., and Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. Computat. Ling. 34, 4, 487--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hochmair, H. H. 2008. PDA-assisted indoor-navigation with imprecise positioning: Results of a desktop usability study. In Map-Based Mobile Services: Interactivity, Usability and Case Studies, L. Meng, A. Zipf, and S. Winter Eds., Springer, Berlin, 228--247.Google ScholarGoogle Scholar
  22. Janarthanam, S. and Lemon, O. 2010. Learning to adapt to unknown users: referring expression generation in spoken dialogue systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kaelbling, L., Littman, M., and Moore, A. 1996. Reinforcement learning: A survey. J. Artifi. Intell. Resear. 4, 237--285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Klippel, A., Hansen, S., Richter, K.-F., and Winter, S. 2009. Urban granularities - a data structure for cognitively ergonomic route directions. GeoInformatica 13, 2, 223--247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Klippel, A., Tenbrink, T., and Montello, D. R. 2010. The role of structure and function in the conceptualization of directions. In Motion Encoding in Language and Space, E. van der Zee and M. Vulchanova Eds., Oxford University Press.Google ScholarGoogle Scholar
  26. Kray, C., Kortuem, G., and Krüger, A. 2005. Adaptive navigation support with public displays. In Proceedings of Conference on Intelligent User Interfaces (IUI). ACM, R. S. Amant, J. Riedl, and A. Jameson Eds., ACM, NY, 326--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kruijff, G., Zender, H., Jensfelt, P., and Christensen, H. 2007. Situated dialogue and spatial organization: What, where… and why? Int. J. Adv. Robo. Syst. 4, 2. (Special Issue on Human and Robot Interactive Communication.)Google ScholarGoogle Scholar
  28. Lemon, O. 2010. Learning what to say and how to say it: Joint optimization of spoken dialogue management and natural language generation. Comput. Speech Lang. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lemon, O., Bracy, A., Gruenstein, A., and Peters, S. 2001. The WITAS multi-modal dialogue system I. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech). 1559--1562.Google ScholarGoogle Scholar
  30. Lemon, O., Georgila, K., and Henderson, J. 2006. Evaluating efectiveness and portability of reinforcement learned dialogue strategies with real users: The TALK TownInfo evaluation. In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT). 178--181.Google ScholarGoogle Scholar
  31. Levin, E., Pieraccini, R., and Eckert, W. 2000. A stochastic model of human machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Proc. 8, 1, 11--23.Google ScholarGoogle ScholarCross RefCross Ref
  32. Lovelace, K. L., Hegarty, M., and Montello, D. R. 1999. Elements of good route directions in familiar and unfamiliar environments. In Spatial Information Theory: Cognitive and Computational Foundations of Geographic Information Science, Lecture Notes in Computer Science, vol. 1661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. May, A. J., Ross, T., and Bayer, S. H. 2003. Drivers' Information Requirements when Navigating in an Urban Environment. J. Navigation 56, 01, 89--100.Google ScholarGoogle ScholarCross RefCross Ref
  34. Münzer, S. and Stahl, C. 2007. Providing individual route instructions for indoor wayfinding in complex, multi-level buildings. In Proceedings of the 5th Geographic Information Days. F. Probst and C. Kessler Eds., 241--246.Google ScholarGoogle Scholar
  35. Ohlbach, H. J. and Stoffel, E.-P. 2008. Versatile route descriptions for pedestrian guidance in buildings: Conceptual model and systematic method. In Proceedings of the 11th AGILE International Conference on Geographic Information Science.Google ScholarGoogle Scholar
  36. Pietquin, O. 2004. A framework for unsupervised learning of dialogue strategies. Ph.D. thesis, Faculté Polytechnique de Mons.Google ScholarGoogle Scholar
  37. Pineau, J. 2004. Tractable planning under uncertainty: Exploiting structure. Ph.D. thesis, Carnegie Mellon University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Prommer, T., Holzapfel, H., and Waibel, A. 2006. Rapid simulation-driven reinforcement learning of multimodal dialog strategies in human-robot interaction. In Proceedings of the INTERSPEECH. 1918--1921.Google ScholarGoogle Scholar
  39. Raubal, M. and Winter, S. 2002. Enriching wayfinding instructions with local landmarks. In Proceedings of the 2nd International Conference on Geographic Information Science, 2002, M. Egenhofer and D. Mark Eds., Lecture Notes in Computer Science, vol. 2478, Springer, Berlin, 243--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Richter, K.-F. and Duckham, M. 2008. Simplest instructions: Finding easy-to-describe routes for navigation. In Proceedings of the 5th International Conference on Geographic Information Science, T. J. Cova, H. J. Miller, K. Beard, A. U. Frank, and M. F. Goodchild Eds., Lecture Notes in Computer Science, vol. 5266. Springer, Berlin, 274--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Rieser, V. and Lemon, O. 2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: Bootstrapping and evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 638--646.Google ScholarGoogle Scholar
  42. Ross, R. J. and Bateman, J. A. 2009. Daisie: Information state dialogues for situated systems. In Proceedings of the International Conference on Text, Speech and Dialogue. Lecture Notes in Compter Science, vol. 5729. Springer, 379--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Russell, S. and Norvig, P. 2003. Artificial Intelligence: A Modern Approach. Pearson Education. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Schatzmann, J., Weilhammer, K., Stuttle, M., and Young, S. 2006. A survey on statistical user simulation techniques for reinforcement learning of dialogue management strategies. Knowl. Engin. Rev. 21, 2, 97--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Scheffler, K. 2002. Automatic design of spoken dialogue systems. Ph.D. thesis, Cambridge University.Google ScholarGoogle Scholar
  46. Singh, S., Litman, D., Kearns, M., and Walker, M. 2002. Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. J. Artif. Intell. Resear. 16, 105--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sorrows, M. E. and Hirtle, S. C. 1999. The nature of landmarks for real and electronic spaces. In Spatial Information Theory, C. Freksa and D. M. Mark Eds., Lecture Notes in Compter Science, vol. 1661, Springer, 37--50. LNCS 1661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Stiefelhagen, R., Ekenel, H., Fugen, C., Gieselmann, P., Holzapfel, H., Kraft, F., Nickel, K., Voit, M., and Waibel, A. 2007. Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot. IEEE Trans. Robot. 23, 5, 840--851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Sutton, R. and Barto, A. 1998. Reinforcement Learing: An Introduction. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tenbrink, T. and Winter, S. 2009. Variable granularity in route directions. Spat. Cognit. Computat. 9, 1, 64--93.Google ScholarGoogle ScholarCross RefCross Ref
  51. Thomson, B. 2009. Statistical methods for spoken dialogue management. Ph.D. thesis, University of Cambridge.Google ScholarGoogle Scholar
  52. Toney, D. 2007. Evolutionary reinforcement learning of spoken dialogue strategies. Ph.D. thesis, University of Edinburgh.Google ScholarGoogle Scholar
  53. Walker, M. 2000. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. J. Artif. Intell. Resear. 12, 387--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Walker, M., Kamm, C., and Litman, D. 2000. Towards developing general models of usability with PARADISE. Nat. Lang. Engine. 6, 3, 363--377. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Werner, S., Krieg-Br&yiml;ckner, B., and Herrmann, T. 2000. Modelling navigational knowledge by route graphs. In Spatial Cognition II, E. A. Freksa Ed., Lecture Notes in Compter Science, vol. 1849, Springer, 295--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Williams, J. 2006. Partially observable Markov decision processes for spoken dialogue management. Ph.D. thesis, Cambridge University.Google ScholarGoogle Scholar
  57. Wyatt, J. 2005. Planning clarification questions to resolve ambiguous references to objects. In Proceedings of the Workshop on Knowledge and Reasoning in Practical Dialogue Systems (IJCAI).Google ScholarGoogle Scholar
  58. Young, S. 2000. Probabilistic methods in spoken dialogue systems. Phil. Trans. Roy. Soc. (Series A) 358, 1769, 1389--1402.Google ScholarGoogle Scholar
  59. Young, Y., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., B., T., and Yu, K. 2010. The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Comput. Speech Lang. 24, 2, 150--174. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Speech and Language Processing
    ACM Transactions on Speech and Language Processing   Volume 7, Issue 3
    May 2011
    155 pages
    ISSN:1550-4875
    EISSN:1550-4883
    DOI:10.1145/1966407
    Issue’s Table of Contents

    Copyright © 2011 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 6 June 2011
    • Accepted: 1 December 2010
    • Revised: 1 November 2010
    • Received: 1 July 2010
    Published in tslp Volume 7, Issue 3

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader