Abstract
Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users.
- A. Atrash, R. Kaplow, J. Villemure, R. West, H. Yamani, and J. Pineau. 2009. Development and validation of a robust speech interface for improved human-robot interaction. International Journal of Social Robotics 1, 4 (2009), 345--356.Google ScholarCross Ref
- A. Atrash and J. Pineau. 2009. A Bayesian reinforcement learning approach for customizing human-robot interfaces. In International Conference on Intelligent User Interfaces (IUI’09). 355--360. Google ScholarDigital Library
- J. Baillie. 2005. URBI: Towards a universal robotic low-level programming language. In International Conference on Intelligent Robots and Systems (IROS’05). IEEE, 3219--3224.Google ScholarCross Ref
- A. Barto and S. Mahadevan. 2003. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications 13, 1--2 (2003), 41--77. Google ScholarDigital Library
- A. Beck, L. Cañamero, and K. A. Bard. 2010. Towards an affect space for robots to display emotional body language. In International Symposium on Robot and Human Interactive Communication (Ro-Man’10). IEEE, 464--469.Google Scholar
- T. Belpaeme, P. Baxter, R. Read, R. Wood, H. Cuayáhuitl, B. Kiefer, S. Racioppa, I. Kruijff Korbayová, G. Athanasopoulos, V. Enescu, R. Looije, M. Neerincx, Y. Demiris, R. Ros-Espinoza, A. Beck, L. Canãmero, A. Hiolle, M. Lewis, I. Baroni, M. Nalin, P. Cosi, G. Paci, F. Tesser, G. Sommavilla, and R. Humbert 2012. Multimodal child-robot interaction: Building social bonds. Journal of Human-Robot Interaction 1, 2 (2012), 32--455.Google Scholar
- J. Betteridge, A. Carlson, S. A. Hong, E. R. Hruschka Jr., E. L. M. Law, T. M. Mitchell, and S. H. Wang. 2009. Toward never ending language learning. In AAAI Spring Symposium: Learning by Reading and Learning to Read. 1--2.Google Scholar
- D. Bohus and A. I. Rudnicky. 2009. The ravenclaw dialog management framework: Architecture and systems. Computer Speech & Language 23, 3 (2009), 332--361. Google ScholarDigital Library
- F. Cao and S. Ray. 2012. Bayesian hierarchical reinforcement learning. In Neural Information Processing Systems Foundation (NIPS’12). 73--81.Google Scholar
- C. Chao and A. L. Thomaz. 2012. Timing in multimodal reciprocal interactions: Control and analysis using timed petri nets. Journal of Human-Robot Interaction 1, 1 (2012), 4--25.Google ScholarDigital Library
- P. A. Crook, A. Wang, X. Liu, and L. Lemon. 2012. A statistical spoken dialogue system using complex user goals and value directed compression. In Conference of the European Chapter of the Association for Computational Linguistics (EACL’12). 46--50. Google ScholarDigital Library
- H. Cuayáhuitl. 2009. Hierarchical Reinforcement Learning for Spoken Dialogue Systems. Ph.D. Dissertation. School of Informatics, University of Edinburgh.Google Scholar
- H. Cuayáhuitl and N. Dethlefs. 2011. Optimizing situated dialogue management in unknown environments. In Annual Conference of the International Speech Communication Association (INTERSPEECH’11). 1009--1012.Google Scholar
- H. Cuayáhuitl and N. Dethlefs. 2011. Spatially-aware dialogue control using hierarchical reinforcement learning. ACM Transactions on Speech and Language Processing 7, 3 (2011), 5:1--5:26. Google ScholarDigital Library
- H. Cuayáhuitl and N. Dethlefs. 2012. Hierarchical multiagent reinforcement learning for coordinating verbal and non-verbal actions in robots. In ECAI Workshop on Machine Learning for Interactive Systems (MLIS’12). 27--29.Google Scholar
- H. Cuayáhuitl, N. Dethlefs, H. Hastie, and O. Lemon. 2013. Barge-in effects in Bayesian dialogue act recognition and simulation. In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU’13).Google Scholar
- H. Cuayáhuitl and I. Kruijff-Korbayová. 2011. Learning human-robot dialogue policies combining speech and visual beliefs. In International Workshop on Spoken Dialogue Systems (IWSDS’11). 133--140.Google Scholar
- H. Cuayáhuitl, S. Renals, O. Lemon, and H. Shimodaira. 2007. Hierarchical dialogue optimization using semi-arkov decision processes. In Annual Conference of the International Speech Communication Association (INTERSPEECH’07). 2693--2696.Google Scholar
- H. Cuayáhuitl, S. Renals, O. Lemon, and H. Shimodaira. 2010. Evaluation of a hierarchical reinforcement learning spoken dialogue system. Computer Speech and Language 24, 2 (2010), 395--429. Google ScholarDigital Library
- H. Cuayáhuitl, M. van Otterlo, N. Dethlefs, and L. Frommberger. 2013. Machine learning for interactive systems and robots: A brief introduction. In IJCAI Workshop on Machine Learning for Interactive Systems (MLIS’13). Google ScholarDigital Library
- L. Daubigney, M. Geist, S. Chandramohan, and O. Pietquin. 2012. A comprehensive reinforcement learning framework for dialogue management optimization. Journal of Selected Topics in Signal Processing 6, 8 (2012).Google Scholar
- N. Dethlefs. 2013. Hierarchical Joint Learning for Natural Language Generation. Ph.D. Dissertation. University of Bremen.Google Scholar
- N. Dethlefs and H. Cuayáhuitl. 2010. Hierarchical reinforcement learning for adaptive text generation. In International Conference on Natural Language Generation (INLG’10). Google ScholarDigital Library
- N. Dethlefs and H. Cuayáhuitl. 2011a. Combining hierarchical reinforcement learning and Bayesian networks for natural language generation in situated dialogue. In European Workshop on Natural Language Generation (ENLG’11). 110--120. Google ScholarDigital Library
- N. Dethlefs and H. Cuayáhuitl. 2011b. Hierarchical reinforcement learning and hidden Markov models for task-oriented natural language generation. In Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’11). 654--659. Google ScholarDigital Library
- N. Dethlefs, H. Cuayáhuitl, and J. Viethen. 2011. Optimising natural language generation decision making for situated dialogue. In Annual Meeting on Discourse and Dialogue (SIGdial’11). Google ScholarDigital Library
- N. Dethlefs, H. Hastie, R. Rieser, and O. Lemon. 2012a. Optimising incremental dialogue decisions using information density for interactive systems. In Conference on Empirical Methods in Natural Language Processing (EMNLP’12). Google ScholarDigital Library
- N. Dethlefs, V. Rieser, H. Hastie, and O. Lemon. 2012b. Towards optimising modality allocation for multimodal output generation in incremental dialogue. In ECAI Workshop on Machine Learning for Interactive Systems (MLIS’12). 31--36.Google Scholar
- T. Dietterich. 2000a. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 1 (2000), 227--303. Google ScholarCross Ref
- T. Dietterich. 2000b. An overview of MAXQ hierarchical reinforcement learning. In Symposium on Abstraction, Reformulation, and Approximation (SARA’00). 26--44. Google ScholarDigital Library
- T. G. Dietterich. 2000c. An overview of MAXQ hierarchical reinforcement learning. In Symposium on Abstraction, Reformulation, and Approximation (SARA’00). Google ScholarDigital Library
- L. Frommberger. 2012. Qualitative Spatial Abstraction in Reinforcement Learning. Springer-Verlag, New York.Google Scholar
- M. Gašić and S. Young. 2011. Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager. ACM Transactions on Speech and Language Processing 7, 3 (2011). Google ScholarDigital Library
- P. Heeman. 2007. Combining reinforcement learning with information-state update rules. In Human Language Technology Conference (HLT’07). 268--275.Google Scholar
- J. Henderson, O. Lemon, and K. Georgila. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. Computational Linguistics 34, 4 (2008), 487--511. Google ScholarDigital Library
- S. Janarthanam, L. Lemon, X. Liu, P. Bartie, W. Mackaness, T. Dalmas, and J. Goetze. 2012. Integrating location, visibility, and question-answering in a spoken dialogue system for pedestrian city exploration. In Workshop on Semantics and Pragmatics of Dialogue (SEMDIAL’12).Google Scholar
- F. Jurčíček, B. Thomson, and S. Young. 2011. Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs. ACM Transactions on Speech and Language Processing 7, 3 (2011). Google ScholarDigital Library
- S. Keizer, M. E. Foster, O. Lemon, A. Gaschler, and M. Giuliani. 2013. Training and evaluation of an MDP model for social multi-user human-robot interaction. In Annual Meeting on Discourse and Dialogue (SIGDIAL’13).Google Scholar
- J. Kober, J. A. Bagnell, and J. Peters. 2013. Reinforcement learning in robotics: A survey. International Journal of Robotics Research 32, 11 (2013). Google ScholarDigital Library
- I. Kruijff-Korbayová, H. Cuayáhuitl, B. Kiefer, M. Schröder, P. Cosi, G. Paci, G. Sommavilla, F. Tesser, H. Sahli, G. Athanasopoulos, W. Wang, V. Enescu, and W. Verhelst. 2012a. Spoken language processing in a conversational system for child-robot interaction. In INTERSPEECH Workshop on Child-Computer Interaction.Google Scholar
- I. Kruijff-Korbayová, H. Cuayáhuitl, B. Kiefer, M. Schröder, P. Cosi, G. Paci, G. Sommavilla, F. Tesser, H. Sahli, G. Athanasopoulos, W. Wang, V. Enescu, and W. Verhelst. 2012b. A conversational system for multi-session child-robot interaction with several games. In German Conference on Artificial Intelligence (KI’12). System demonstration description.Google Scholar
- O. Lemon. 2011. Learning what to say and how to say it: Joint optimization of spoken dialogue management and natural language generation. Computer Speech and Language 25, 2 (2011). Google ScholarDigital Library
- O. Lemon and O. Pietquin. 2007. Machine learning for spoken dialogue systems. In Annual Conference of the International Speech Communication Association (INTERSPEECH’07). 2685--2688.Google Scholar
- E. Levin, R. Pieraccini, and W. Eckert. 2000. A stochastic model of human machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing 8, 1 (2000), 11--23.Google ScholarCross Ref
- L. Li, D. J. Williams, and S. Balakrishnan. 2009. Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. In Annual Conference of the International Speech Communication Association (INTERSPEECH’09). 2475--2478.Google Scholar
- D. Litman, M. Kearns, S. Singh, and M. Walker. 2000. Automatic optimization of dialogue management. In International Conference on Computational Linguistics (COLING’00). 502--508. Google ScholarDigital Library
- N. Mehta, S. Ray, P. Tadepalli, and T. G. Dietterich. 2008. Automatic discovery and transfer of MAXQ hierarchies. In International Conference on Machine Learning (ICML’08). 648--655. Google ScholarDigital Library
- N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro, and N. Hagita. 2005. Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In International Conference on Intelligent Robots and Systems (IROS’05). 218--225.Google Scholar
- M. Nalin, I. Baroni, I. Kruijff-Korbayová, L. Cañamero, M. Lewis, A. Beck, H. Cuayáhuitl, and A. Sanna. 2012. Children’s adaptation in multi-session interaction with a humanoid robot. In International Symposium on Robot and Human Interactive Communication (RO-MAN’12). 351--357.Google Scholar
- O. Pietquin. 2011. Batch reinforcement learning for spoken dialogue systems with sparse value function approximation. In NIPS Workshop on Learning and Planning from Batch Time Series Data.Google Scholar
- O. Pietquin, M. Geist, and S. Chandramohan. 2011. Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transactions on Speech and Language Processing 7, 3 (2011), 7. Google ScholarDigital Library
- J. Pineau. 2004. Tractable Planning Under Uncertainty: Exploiting Structure. Ph.D. Dissertation. Carnegie Mellon University. Google ScholarDigital Library
- N. Roy, J. Pineau, and S. Thrun. 2000. Spoken dialogue management using probabilistic reasoning. In International Conference on Computational Linguistics (ACL’00). 93--100. Google ScholarDigital Library
- D. Schlangen and G. Skantze. 2009. A general, abstract model of incremental dialogue processing. In Conference of the European Chapter of the Association for Computational Linguistics (EACL’09). Google ScholarDigital Library
- S. Singh, D. Litman, M. Kearns, and M. Walker. 2002. Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. Journal of Artificial Intelligence Research 16 (2002), 105--133. Google ScholarCross Ref
- R. Stiefelhagen, H. K. Ekenel, C. Fügen, P. Gieselmann, H. Holzapfel, F. Kraft, K. Nickel, M. Voit, and A. Waibel. 2007. Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot. IEEE Transactions on Robotics 23, 5 (2007), 840--851. Google ScholarDigital Library
- R. Sutton and A. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press. Google ScholarDigital Library
- R. S. Sutton, D. Precup, and S. P. Singh. 1999. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 1--2 (1999), 181--211. Google ScholarDigital Library
- C. Szepesvári. 2010. Algorithms for Reinforcement Learning. Morgan and Claypool. Google ScholarDigital Library
- A. L. Thomaz and C. Breazeal. 2006. Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In AAAI Conference on Artificial Intelligence. 1000--1006. Google ScholarDigital Library
- B. Thomson. 2009. Statistical Methods for Spoken Dialogue Management. Ph.D. Dissertation. University of Cambridge.Google Scholar
- M. Walker. 2000. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. Journal of Artificial Intelligence Research 12 (2000), 387--416. Google ScholarDigital Library
- J. Williams. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21, 2 (2007), 393--422. Google ScholarDigital Library
- J. Williams. 2008. The best of both worlds: Unifying conventional dialog systems and POMDPs. In Annual Conference of the International Speech Communication Association (INTERSPEECH). Brisbane, Australia.Google Scholar
- S. Young. 2000. Probabilistic methods in spoken dialogue systems. Philosophical Transactions of the Royal Society (Series A) 358, 1769 (2000), 1389--1402.Google Scholar
- Y. Young, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, and K. Yu. 2010. The hidden information state model: A practical framework for POMDP-based spoken dialogue management. Computer Speech and Language 24, 2 (2010), 150--174. Google ScholarDigital Library
- V. Zue and J. Glass. 2000. Conversational interfaces: Advances and challenges. IEEE Transactions on Speech and Audio Processing 88, 8 (2000), 1166--1180.Google Scholar
Index Terms
- Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots
Recommendations
Spatially-aware dialogue control using hierarchical reinforcement learning
This article addresses the problem of scalable optimization for spatially-aware dialogue systems. These kinds of systems must perceive, reason, and act about the spatial environment where they are embedded. We formulate the problem in terms of Semi-...
Sample-efficient batch reinforcement learning for dialogue management optimization
Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. ...
Statistical Spoken Dialogue Systems and the Challenges for Machine Learning
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningThis talk will review the principal components of a spoken dialogue system and then discuss the opportunities for applying machine learning for building robust high performance open-domain systems. The talk will be illustrated by recent work at ...
Comments