Abstract
This article addresses the problem of scalable optimization for spatially-aware dialogue systems. These kinds of systems must perceive, reason, and act about the spatial environment where they are embedded. We formulate the problem in terms of Semi-Markov Decision Processes and propose a hierarchical reinforcement learning approach to optimize subbehaviors rather than full behaviors. Because of the vast number of policies that are required to control the interaction in a dynamic environment (e.g., a dialogue system assisting a user to navigate in a building from one location to another), our learning approach is based on two stages: (a) the first stage learns low-level behavior, in advance; and (b) the second stage learns high-level behavior, in real time. For such a purpose we extend an existing algorithm in the literature of reinforcement learning in order to support reusable policies and therefore to perform fast learning. We argue that our learning approach makes the problem feasible, and we report on a novel reinforcement learning dialogue system that performs a joint optimization between dialogue and spatial behaviors. Our experiments, using simulated and real environments, are based on a text-based dialogue system for indoor navigation. Experimental results in a realistic environment reported an overall user satisfaction result of 89%, which suggests that our proposed approach is attractive for its application in real interactions as it combines fast learning with adaptive and reasonable behavior.
- Bateman, J. A. 1997. Enabling technology for multilingual natural language generation: the KPML development environment. J. Nat. Lang. Engin. 3, 1, 15--55. Google ScholarDigital Library
- Becker, T., Nagel, C., and Kolbe, T. H. 2008. A multilayered space-event model for navigation in indoor spaces. In Proceedings of the 3rd International Workshop on 3D Geo-Information. J. Lee and S. Zlatanova Eds., Springer, Berlin.Google Scholar
- Belz, A. 2008. Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Nat. Lang. Engin. 1, 1--26. Google ScholarDigital Library
- Burnett, G., Smith, D., and May, A. 2001. Supporting the Navigation Task: Characteristics of ‘G’ood Landmarks. Contemp. Ergonom. 441--446.Google Scholar
- Callaway, C. 2007. Non-localized, interactive multimodal direction giving. In Proceedings of the Workshop on Multimodal Output Generation (MOG'07). I. van der Sluis, M. Theune, E. Reiter, and E. Krahmer Eds., Centre for Telematics and Information Technology (CTIT), University of Twente, 41--50.Google Scholar
- Clark, S., Hockenmaier, J., and Steedman, M. 2002. Building deep dependency structures using a wide-coverage CCG parser. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). 327--334. Google ScholarDigital Library
- Cuayáhuitl, H. 2009. Hierarchical reinforcement learning for spoken dialogue systems. Ph.D. thesis, School of Informatics, University of Edinburgh.Google Scholar
- Cuayáhuitl, H., Dethlefs, N., Frommberger, L., Richter, K.-F., and Bateman, J. 2010a. Generating adaptive route instructions using hierarchical reinforcement learning. In Proceedings of the International Conference on Spatial Cognition (Spatial Cognition VII). Google ScholarDigital Library
- Cuayáhuitl, H., Dethlefs, N., Richter, K.-F., Tenbrink, T., and Bateman, J. 2010b. A dialogue system for indoor wayfinding using text-based natural language. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics (CICLing).Google Scholar
- Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2007. Hierarchical dialogue optimization using Semi-Markov decision processes. In Proceedings of INTERSPEECH. 2693--2696.Google Scholar
- Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2010c. Evaluation of a hierarchical reinforcement learning spoken dialogue system. Comput. Speech Lang. 24, 2, 395--429. Google ScholarDigital Library
- Denecke, M., Dohsaka, K., and Nakano, M. 2004. Fast reinforcement learning of dialogue policies using stable function approximation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP). 1--11. Google ScholarDigital Library
- Denis, M. 1997. The description of routes: A cognitive approach to the production of spatial discourse. Cahiers Psychologie Cognitive 16, 4, 409--458.Google Scholar
- Dethlefs, N. and Cuayáhuitl, H. 2010. Hierarchical reinforcement learning for adaptive text generation. In Proceedings of the International Conference on Natural Language Generation (INLG). Google ScholarDigital Library
- Dethlefs, N., Cuayáhuitl, H., Richter, K.-F., Andonova, E., and Bateman, J. 2010. Evaluating task success in a dialogue system for indoor navigation. In Proceedings of the 14th Workshop on the Semantics and Pragmatics of Dialogue (SemDial).Google Scholar
- Dietterich, T. 2000a. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Resear. 13, 1, 227--303. Google ScholarDigital Library
- Dietterich, T. 2000b. An overview of MAXQ hierarchical reinforcement learning. In Proceedings of the Symposium on Abstraction, Reformulation, and Approximation. 26--44. Google ScholarDigital Library
- Duckham, M. and Kulik, L. 2003. “Simplest” paths: Automated route selection for navigation. In Spatial Information Theory, W. Kuhn, M. Worboys, and S. Timpf Eds., Lecture Notes in Computer Science, vol. 2825, Springer, Berlin, 169--185.Google Scholar
- Fernández, F. and Veloso, M. 2006. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS). 720--727. Google ScholarDigital Library
- Henderson, J., Lemon, O., and Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. Computat. Ling. 34, 4, 487--511. Google ScholarDigital Library
- Hochmair, H. H. 2008. PDA-assisted indoor-navigation with imprecise positioning: Results of a desktop usability study. In Map-Based Mobile Services: Interactivity, Usability and Case Studies, L. Meng, A. Zipf, and S. Winter Eds., Springer, Berlin, 228--247.Google Scholar
- Janarthanam, S. and Lemon, O. 2010. Learning to adapt to unknown users: referring expression generation in spoken dialogue systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 69--78. Google ScholarDigital Library
- Kaelbling, L., Littman, M., and Moore, A. 1996. Reinforcement learning: A survey. J. Artifi. Intell. Resear. 4, 237--285. Google ScholarDigital Library
- Klippel, A., Hansen, S., Richter, K.-F., and Winter, S. 2009. Urban granularities - a data structure for cognitively ergonomic route directions. GeoInformatica 13, 2, 223--247. Google ScholarDigital Library
- Klippel, A., Tenbrink, T., and Montello, D. R. 2010. The role of structure and function in the conceptualization of directions. In Motion Encoding in Language and Space, E. van der Zee and M. Vulchanova Eds., Oxford University Press.Google Scholar
- Kray, C., Kortuem, G., and Krüger, A. 2005. Adaptive navigation support with public displays. In Proceedings of Conference on Intelligent User Interfaces (IUI). ACM, R. S. Amant, J. Riedl, and A. Jameson Eds., ACM, NY, 326--328. Google ScholarDigital Library
- Kruijff, G., Zender, H., Jensfelt, P., and Christensen, H. 2007. Situated dialogue and spatial organization: What, where… and why? Int. J. Adv. Robo. Syst. 4, 2. (Special Issue on Human and Robot Interactive Communication.)Google Scholar
- Lemon, O. 2010. Learning what to say and how to say it: Joint optimization of spoken dialogue management and natural language generation. Comput. Speech Lang. Google ScholarDigital Library
- Lemon, O., Bracy, A., Gruenstein, A., and Peters, S. 2001. The WITAS multi-modal dialogue system I. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech). 1559--1562.Google Scholar
- Lemon, O., Georgila, K., and Henderson, J. 2006. Evaluating efectiveness and portability of reinforcement learned dialogue strategies with real users: The TALK TownInfo evaluation. In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT). 178--181.Google Scholar
- Levin, E., Pieraccini, R., and Eckert, W. 2000. A stochastic model of human machine interaction for learning dialog strategies. IEEE Trans. Speech Audio Proc. 8, 1, 11--23.Google ScholarCross Ref
- Lovelace, K. L., Hegarty, M., and Montello, D. R. 1999. Elements of good route directions in familiar and unfamiliar environments. In Spatial Information Theory: Cognitive and Computational Foundations of Geographic Information Science, Lecture Notes in Computer Science, vol. 1661. Google ScholarDigital Library
- May, A. J., Ross, T., and Bayer, S. H. 2003. Drivers' Information Requirements when Navigating in an Urban Environment. J. Navigation 56, 01, 89--100.Google ScholarCross Ref
- Münzer, S. and Stahl, C. 2007. Providing individual route instructions for indoor wayfinding in complex, multi-level buildings. In Proceedings of the 5th Geographic Information Days. F. Probst and C. Kessler Eds., 241--246.Google Scholar
- Ohlbach, H. J. and Stoffel, E.-P. 2008. Versatile route descriptions for pedestrian guidance in buildings: Conceptual model and systematic method. In Proceedings of the 11th AGILE International Conference on Geographic Information Science.Google Scholar
- Pietquin, O. 2004. A framework for unsupervised learning of dialogue strategies. Ph.D. thesis, Faculté Polytechnique de Mons.Google Scholar
- Pineau, J. 2004. Tractable planning under uncertainty: Exploiting structure. Ph.D. thesis, Carnegie Mellon University. Google ScholarDigital Library
- Prommer, T., Holzapfel, H., and Waibel, A. 2006. Rapid simulation-driven reinforcement learning of multimodal dialog strategies in human-robot interaction. In Proceedings of the INTERSPEECH. 1918--1921.Google Scholar
- Raubal, M. and Winter, S. 2002. Enriching wayfinding instructions with local landmarks. In Proceedings of the 2nd International Conference on Geographic Information Science, 2002, M. Egenhofer and D. Mark Eds., Lecture Notes in Computer Science, vol. 2478, Springer, Berlin, 243--259. Google ScholarDigital Library
- Richter, K.-F. and Duckham, M. 2008. Simplest instructions: Finding easy-to-describe routes for navigation. In Proceedings of the 5th International Conference on Geographic Information Science, T. J. Cova, H. J. Miller, K. Beard, A. U. Frank, and M. F. Goodchild Eds., Lecture Notes in Computer Science, vol. 5266. Springer, Berlin, 274--289. Google ScholarDigital Library
- Rieser, V. and Lemon, O. 2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: Bootstrapping and evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). 638--646.Google Scholar
- Ross, R. J. and Bateman, J. A. 2009. Daisie: Information state dialogues for situated systems. In Proceedings of the International Conference on Text, Speech and Dialogue. Lecture Notes in Compter Science, vol. 5729. Springer, 379--386. Google ScholarDigital Library
- Russell, S. and Norvig, P. 2003. Artificial Intelligence: A Modern Approach. Pearson Education. Google ScholarDigital Library
- Schatzmann, J., Weilhammer, K., Stuttle, M., and Young, S. 2006. A survey on statistical user simulation techniques for reinforcement learning of dialogue management strategies. Knowl. Engin. Rev. 21, 2, 97--126. Google ScholarDigital Library
- Scheffler, K. 2002. Automatic design of spoken dialogue systems. Ph.D. thesis, Cambridge University.Google Scholar
- Singh, S., Litman, D., Kearns, M., and Walker, M. 2002. Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. J. Artif. Intell. Resear. 16, 105--133. Google ScholarDigital Library
- Sorrows, M. E. and Hirtle, S. C. 1999. The nature of landmarks for real and electronic spaces. In Spatial Information Theory, C. Freksa and D. M. Mark Eds., Lecture Notes in Compter Science, vol. 1661, Springer, 37--50. LNCS 1661. Google ScholarDigital Library
- Stiefelhagen, R., Ekenel, H., Fugen, C., Gieselmann, P., Holzapfel, H., Kraft, F., Nickel, K., Voit, M., and Waibel, A. 2007. Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot. IEEE Trans. Robot. 23, 5, 840--851. Google ScholarDigital Library
- Sutton, R. and Barto, A. 1998. Reinforcement Learing: An Introduction. MIT Press. Google ScholarDigital Library
- Tenbrink, T. and Winter, S. 2009. Variable granularity in route directions. Spat. Cognit. Computat. 9, 1, 64--93.Google ScholarCross Ref
- Thomson, B. 2009. Statistical methods for spoken dialogue management. Ph.D. thesis, University of Cambridge.Google Scholar
- Toney, D. 2007. Evolutionary reinforcement learning of spoken dialogue strategies. Ph.D. thesis, University of Edinburgh.Google Scholar
- Walker, M. 2000. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. J. Artif. Intell. Resear. 12, 387--416. Google ScholarDigital Library
- Walker, M., Kamm, C., and Litman, D. 2000. Towards developing general models of usability with PARADISE. Nat. Lang. Engine. 6, 3, 363--377. Google ScholarDigital Library
- Werner, S., Krieg-Br&yiml;ckner, B., and Herrmann, T. 2000. Modelling navigational knowledge by route graphs. In Spatial Cognition II, E. A. Freksa Ed., Lecture Notes in Compter Science, vol. 1849, Springer, 295--316. Google ScholarDigital Library
- Williams, J. 2006. Partially observable Markov decision processes for spoken dialogue management. Ph.D. thesis, Cambridge University.Google Scholar
- Wyatt, J. 2005. Planning clarification questions to resolve ambiguous references to objects. In Proceedings of the Workshop on Knowledge and Reasoning in Practical Dialogue Systems (IJCAI).Google Scholar
- Young, S. 2000. Probabilistic methods in spoken dialogue systems. Phil. Trans. Roy. Soc. (Series A) 358, 1769, 1389--1402.Google Scholar
- Young, Y., Gasic, M., Keizer, S., Mairesse, F., Schatzmann, J., B., T., and Yu, K. 2010. The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Comput. Speech Lang. 24, 2, 150--174. Google ScholarDigital Library
Recommendations
Sample-efficient batch reinforcement learning for dialogue management optimization
Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. ...
Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots
Special Issue on Multiple Modalities in Interactive Systems and RobotsConversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the ...
Dialogue manager domain adaptation using Gaussian process reinforcement learning
Generic-specific policy model.Policy committee model.Multi-agent policy model.Human user evaluation. Spoken dialogue systems allow humans to interact with machines using natural speech. As such, they have many benefits. By using speech as the primary ...
Comments