skip to main content
research-article

Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots

Published:14 October 2014Publication History
Skip Abstract Section

Abstract

Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximation and that allows flexible transitions between dialogue subtasks to give human users more control over the dialogue. To this end, each reinforcement learning agent in the hierarchy is extended with a subtask transition function and a dynamic state space to allow flexible switching between subdialogues. In addition, the subtask policies are represented with linear function approximation in order to generalize the decision making to situations unseen in training. Our proposed approach is evaluated in an interactive conversational robot that learns to play quiz games. Experimental results, using simulation and real users, provide evidence that our proposed approach can lead to more flexible (natural) interactions than strict hierarchical control and that it is preferred by human users.

References

  1. A. Atrash, R. Kaplow, J. Villemure, R. West, H. Yamani, and J. Pineau. 2009. Development and validation of a robust speech interface for improved human-robot interaction. International Journal of Social Robotics 1, 4 (2009), 345--356.Google ScholarGoogle ScholarCross RefCross Ref
  2. A. Atrash and J. Pineau. 2009. A Bayesian reinforcement learning approach for customizing human-robot interfaces. In International Conference on Intelligent User Interfaces (IUI’09). 355--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Baillie. 2005. URBI: Towards a universal robotic low-level programming language. In International Conference on Intelligent Robots and Systems (IROS’05). IEEE, 3219--3224.Google ScholarGoogle ScholarCross RefCross Ref
  4. A. Barto and S. Mahadevan. 2003. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications 13, 1--2 (2003), 41--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Beck, L. Cañamero, and K. A. Bard. 2010. Towards an affect space for robots to display emotional body language. In International Symposium on Robot and Human Interactive Communication (Ro-Man’10). IEEE, 464--469.Google ScholarGoogle Scholar
  6. T. Belpaeme, P. Baxter, R. Read, R. Wood, H. Cuayáhuitl, B. Kiefer, S. Racioppa, I. Kruijff Korbayová, G. Athanasopoulos, V. Enescu, R. Looije, M. Neerincx, Y. Demiris, R. Ros-Espinoza, A. Beck, L. Canãmero, A. Hiolle, M. Lewis, I. Baroni, M. Nalin, P. Cosi, G. Paci, F. Tesser, G. Sommavilla, and R. Humbert 2012. Multimodal child-robot interaction: Building social bonds. Journal of Human-Robot Interaction 1, 2 (2012), 32--455.Google ScholarGoogle Scholar
  7. J. Betteridge, A. Carlson, S. A. Hong, E. R. Hruschka Jr., E. L. M. Law, T. M. Mitchell, and S. H. Wang. 2009. Toward never ending language learning. In AAAI Spring Symposium: Learning by Reading and Learning to Read. 1--2.Google ScholarGoogle Scholar
  8. D. Bohus and A. I. Rudnicky. 2009. The ravenclaw dialog management framework: Architecture and systems. Computer Speech & Language 23, 3 (2009), 332--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Cao and S. Ray. 2012. Bayesian hierarchical reinforcement learning. In Neural Information Processing Systems Foundation (NIPS’12). 73--81.Google ScholarGoogle Scholar
  10. C. Chao and A. L. Thomaz. 2012. Timing in multimodal reciprocal interactions: Control and analysis using timed petri nets. Journal of Human-Robot Interaction 1, 1 (2012), 4--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. A. Crook, A. Wang, X. Liu, and L. Lemon. 2012. A statistical spoken dialogue system using complex user goals and value directed compression. In Conference of the European Chapter of the Association for Computational Linguistics (EACL’12). 46--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Cuayáhuitl. 2009. Hierarchical Reinforcement Learning for Spoken Dialogue Systems. Ph.D. Dissertation. School of Informatics, University of Edinburgh.Google ScholarGoogle Scholar
  13. H. Cuayáhuitl and N. Dethlefs. 2011. Optimizing situated dialogue management in unknown environments. In Annual Conference of the International Speech Communication Association (INTERSPEECH’11). 1009--1012.Google ScholarGoogle Scholar
  14. H. Cuayáhuitl and N. Dethlefs. 2011. Spatially-aware dialogue control using hierarchical reinforcement learning. ACM Transactions on Speech and Language Processing 7, 3 (2011), 5:1--5:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Cuayáhuitl and N. Dethlefs. 2012. Hierarchical multiagent reinforcement learning for coordinating verbal and non-verbal actions in robots. In ECAI Workshop on Machine Learning for Interactive Systems (MLIS’12). 27--29.Google ScholarGoogle Scholar
  16. H. Cuayáhuitl, N. Dethlefs, H. Hastie, and O. Lemon. 2013. Barge-in effects in Bayesian dialogue act recognition and simulation. In IEEE Automatic Speech Recognition and Understanding Workshop (ASRU’13).Google ScholarGoogle Scholar
  17. H. Cuayáhuitl and I. Kruijff-Korbayová. 2011. Learning human-robot dialogue policies combining speech and visual beliefs. In International Workshop on Spoken Dialogue Systems (IWSDS’11). 133--140.Google ScholarGoogle Scholar
  18. H. Cuayáhuitl, S. Renals, O. Lemon, and H. Shimodaira. 2007. Hierarchical dialogue optimization using semi-arkov decision processes. In Annual Conference of the International Speech Communication Association (INTERSPEECH’07). 2693--2696.Google ScholarGoogle Scholar
  19. H. Cuayáhuitl, S. Renals, O. Lemon, and H. Shimodaira. 2010. Evaluation of a hierarchical reinforcement learning spoken dialogue system. Computer Speech and Language 24, 2 (2010), 395--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Cuayáhuitl, M. van Otterlo, N. Dethlefs, and L. Frommberger. 2013. Machine learning for interactive systems and robots: A brief introduction. In IJCAI Workshop on Machine Learning for Interactive Systems (MLIS’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Daubigney, M. Geist, S. Chandramohan, and O. Pietquin. 2012. A comprehensive reinforcement learning framework for dialogue management optimization. Journal of Selected Topics in Signal Processing 6, 8 (2012).Google ScholarGoogle Scholar
  22. N. Dethlefs. 2013. Hierarchical Joint Learning for Natural Language Generation. Ph.D. Dissertation. University of Bremen.Google ScholarGoogle Scholar
  23. N. Dethlefs and H. Cuayáhuitl. 2010. Hierarchical reinforcement learning for adaptive text generation. In International Conference on Natural Language Generation (INLG’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Dethlefs and H. Cuayáhuitl. 2011a. Combining hierarchical reinforcement learning and Bayesian networks for natural language generation in situated dialogue. In European Workshop on Natural Language Generation (ENLG’11). 110--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Dethlefs and H. Cuayáhuitl. 2011b. Hierarchical reinforcement learning and hidden Markov models for task-oriented natural language generation. In Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’11). 654--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Dethlefs, H. Cuayáhuitl, and J. Viethen. 2011. Optimising natural language generation decision making for situated dialogue. In Annual Meeting on Discourse and Dialogue (SIGdial’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Dethlefs, H. Hastie, R. Rieser, and O. Lemon. 2012a. Optimising incremental dialogue decisions using information density for interactive systems. In Conference on Empirical Methods in Natural Language Processing (EMNLP’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Dethlefs, V. Rieser, H. Hastie, and O. Lemon. 2012b. Towards optimising modality allocation for multimodal output generation in incremental dialogue. In ECAI Workshop on Machine Learning for Interactive Systems (MLIS’12). 31--36.Google ScholarGoogle Scholar
  29. T. Dietterich. 2000a. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 1 (2000), 227--303. Google ScholarGoogle ScholarCross RefCross Ref
  30. T. Dietterich. 2000b. An overview of MAXQ hierarchical reinforcement learning. In Symposium on Abstraction, Reformulation, and Approximation (SARA’00). 26--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. G. Dietterich. 2000c. An overview of MAXQ hierarchical reinforcement learning. In Symposium on Abstraction, Reformulation, and Approximation (SARA’00). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Frommberger. 2012. Qualitative Spatial Abstraction in Reinforcement Learning. Springer-Verlag, New York.Google ScholarGoogle Scholar
  33. M. Gašić and S. Young. 2011. Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager. ACM Transactions on Speech and Language Processing 7, 3 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Heeman. 2007. Combining reinforcement learning with information-state update rules. In Human Language Technology Conference (HLT’07). 268--275.Google ScholarGoogle Scholar
  35. J. Henderson, O. Lemon, and K. Georgila. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. Computational Linguistics 34, 4 (2008), 487--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Janarthanam, L. Lemon, X. Liu, P. Bartie, W. Mackaness, T. Dalmas, and J. Goetze. 2012. Integrating location, visibility, and question-answering in a spoken dialogue system for pedestrian city exploration. In Workshop on Semantics and Pragmatics of Dialogue (SEMDIAL’12).Google ScholarGoogle Scholar
  37. F. Jurčíček, B. Thomson, and S. Young. 2011. Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs. ACM Transactions on Speech and Language Processing 7, 3 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Keizer, M. E. Foster, O. Lemon, A. Gaschler, and M. Giuliani. 2013. Training and evaluation of an MDP model for social multi-user human-robot interaction. In Annual Meeting on Discourse and Dialogue (SIGDIAL’13).Google ScholarGoogle Scholar
  39. J. Kober, J. A. Bagnell, and J. Peters. 2013. Reinforcement learning in robotics: A survey. International Journal of Robotics Research 32, 11 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. I. Kruijff-Korbayová, H. Cuayáhuitl, B. Kiefer, M. Schröder, P. Cosi, G. Paci, G. Sommavilla, F. Tesser, H. Sahli, G. Athanasopoulos, W. Wang, V. Enescu, and W. Verhelst. 2012a. Spoken language processing in a conversational system for child-robot interaction. In INTERSPEECH Workshop on Child-Computer Interaction.Google ScholarGoogle Scholar
  41. I. Kruijff-Korbayová, H. Cuayáhuitl, B. Kiefer, M. Schröder, P. Cosi, G. Paci, G. Sommavilla, F. Tesser, H. Sahli, G. Athanasopoulos, W. Wang, V. Enescu, and W. Verhelst. 2012b. A conversational system for multi-session child-robot interaction with several games. In German Conference on Artificial Intelligence (KI’12). System demonstration description.Google ScholarGoogle Scholar
  42. O. Lemon. 2011. Learning what to say and how to say it: Joint optimization of spoken dialogue management and natural language generation. Computer Speech and Language 25, 2 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. O. Lemon and O. Pietquin. 2007. Machine learning for spoken dialogue systems. In Annual Conference of the International Speech Communication Association (INTERSPEECH’07). 2685--2688.Google ScholarGoogle Scholar
  44. E. Levin, R. Pieraccini, and W. Eckert. 2000. A stochastic model of human machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing 8, 1 (2000), 11--23.Google ScholarGoogle ScholarCross RefCross Ref
  45. L. Li, D. J. Williams, and S. Balakrishnan. 2009. Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. In Annual Conference of the International Speech Communication Association (INTERSPEECH’09). 2475--2478.Google ScholarGoogle Scholar
  46. D. Litman, M. Kearns, S. Singh, and M. Walker. 2000. Automatic optimization of dialogue management. In International Conference on Computational Linguistics (COLING’00). 502--508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. N. Mehta, S. Ray, P. Tadepalli, and T. G. Dietterich. 2008. Automatic discovery and transfer of MAXQ hierarchies. In International Conference on Machine Learning (ICML’08). 648--655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro, and N. Hagita. 2005. Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In International Conference on Intelligent Robots and Systems (IROS’05). 218--225.Google ScholarGoogle Scholar
  49. M. Nalin, I. Baroni, I. Kruijff-Korbayová, L. Cañamero, M. Lewis, A. Beck, H. Cuayáhuitl, and A. Sanna. 2012. Children’s adaptation in multi-session interaction with a humanoid robot. In International Symposium on Robot and Human Interactive Communication (RO-MAN’12). 351--357.Google ScholarGoogle Scholar
  50. O. Pietquin. 2011. Batch reinforcement learning for spoken dialogue systems with sparse value function approximation. In NIPS Workshop on Learning and Planning from Batch Time Series Data.Google ScholarGoogle Scholar
  51. O. Pietquin, M. Geist, and S. Chandramohan. 2011. Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transactions on Speech and Language Processing 7, 3 (2011), 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Pineau. 2004. Tractable Planning Under Uncertainty: Exploiting Structure. Ph.D. Dissertation. Carnegie Mellon University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. N. Roy, J. Pineau, and S. Thrun. 2000. Spoken dialogue management using probabilistic reasoning. In International Conference on Computational Linguistics (ACL’00). 93--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. D. Schlangen and G. Skantze. 2009. A general, abstract model of incremental dialogue processing. In Conference of the European Chapter of the Association for Computational Linguistics (EACL’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. S. Singh, D. Litman, M. Kearns, and M. Walker. 2002. Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system. Journal of Artificial Intelligence Research 16 (2002), 105--133. Google ScholarGoogle ScholarCross RefCross Ref
  56. R. Stiefelhagen, H. K. Ekenel, C. Fügen, P. Gieselmann, H. Holzapfel, F. Kraft, K. Nickel, M. Voit, and A. Waibel. 2007. Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot. IEEE Transactions on Robotics 23, 5 (2007), 840--851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. R. Sutton and A. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. R. S. Sutton, D. Precup, and S. P. Singh. 1999. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 1--2 (1999), 181--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. C. Szepesvári. 2010. Algorithms for Reinforcement Learning. Morgan and Claypool. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. A. L. Thomaz and C. Breazeal. 2006. Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In AAAI Conference on Artificial Intelligence. 1000--1006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. B. Thomson. 2009. Statistical Methods for Spoken Dialogue Management. Ph.D. Dissertation. University of Cambridge.Google ScholarGoogle Scholar
  62. M. Walker. 2000. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. Journal of Artificial Intelligence Research 12 (2000), 387--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. J. Williams. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21, 2 (2007), 393--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. J. Williams. 2008. The best of both worlds: Unifying conventional dialog systems and POMDPs. In Annual Conference of the International Speech Communication Association (INTERSPEECH). Brisbane, Australia.Google ScholarGoogle Scholar
  65. S. Young. 2000. Probabilistic methods in spoken dialogue systems. Philosophical Transactions of the Royal Society (Series A) 358, 1769 (2000), 1389--1402.Google ScholarGoogle Scholar
  66. Y. Young, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, and K. Yu. 2010. The hidden information state model: A practical framework for POMDP-based spoken dialogue management. Computer Speech and Language 24, 2 (2010), 150--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. V. Zue and J. Glass. 2000. Conversational interfaces: Advances and challenges. IEEE Transactions on Speech and Audio Processing 88, 8 (2000), 1166--1180.Google ScholarGoogle Scholar

Index Terms

  1. Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Interactive Intelligent Systems
      ACM Transactions on Interactive Intelligent Systems  Volume 4, Issue 3
      Special Issue on Multiple Modalities in Interactive Systems and Robots
      October 2014
      115 pages
      ISSN:2160-6455
      EISSN:2160-6463
      DOI:10.1145/2660857
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 October 2014
      • Accepted: 1 July 2014
      • Revised: 1 June 2014
      • Received: 1 March 2013
      Published in tiis Volume 4, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader