skip to main content
article
Free Access

Dyna, an integrated architecture for learning, planning, and reacting

Published:01 July 1991Publication History
Skip Abstract Section

Abstract

Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples---these are among the basic building blocks making up the architecture---yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.

References

  1. Barto, A. G., Sutton, R. S., & Watkins, C. J. C. H. (1990) Learning and sequential decision making. In Learning and Computational Neuroscience, M. Gabriel and J.W. Moore (Eds.), 539--602, MIT Press.Google ScholarGoogle Scholar
  2. Bertsekas, D. P. (1987) Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bertsekas, D. P. & Tsitsiklis, J. N. (1989) Parallel Distributed Processing: Numerical Methods, Prentice-Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Craik, K. J. W. (1943) The Nature of Explanation. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  5. Dennett, D. C. (1978) Why the law of effect will not go away. In Brainstorms, by D. C. Dennett, 71--89, Bradford Books.Google ScholarGoogle Scholar
  6. Grefenstette, J. J., Ramsey, C. L., & Schultz, A. C. (1990) Learning sequential decision rules using simulation models and competition. Machine Learning 5, 355--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. Michalski, J. Carbonell & T. Mitchell, Eds., Machine learning II, Morgan Kaufmann.Google ScholarGoogle Scholar
  8. Kaelbling, L. P. (1990) Learning in Embedded Systems. Ph.D. thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Korf, R. E. (1990) Real-Time Heuristic Search. Artificial Intelligence 42: 189--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lin, Long-Ji. (1991) Self-improving reactive agents: Case studies of reinforcement learning frameworks. In: Proceedings of the International Conference on the Simulation of Adaptive Behavior, 297--305, MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mahadevan, S. & Connell, J. (1990) Automatic programming of behavior-based robots using reinforcement learning. IBM technical report.Google ScholarGoogle Scholar
  12. Riolo, R. (1991) Lookahead planning and latent learning in a classifier system. In: Proceedings of the International Conference on the Simulation of Adaptive Behavior, MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Russell, S. J. (1989) Execution architectures and compilation. Proceedings of IJCAI-89, 15--20.Google ScholarGoogle Scholar
  14. Sutton, R. S. (1984) Temporal credit assignment in reinforcement learning. PhD thesis, COINS Dept., Univ. of Mass., Amherst, MA 01003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning 3: 9--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sutton, R. S. (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning, 216--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sutton, R.S., Barto, A.G. (1981) An adaptive network that constructs and uses an internal model of its environment. Cognition and Brain Theory Quarterly 4: 217--246.Google ScholarGoogle Scholar
  18. Watkins, C. J. C. H. (1989) Learning with Delayed Rewards. PhD thesis, Cambridge University Psychology Department.Google ScholarGoogle Scholar
  19. Werbos, P. J. (1987) Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, SMC-17, No. 1, 7--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Whitehead, S. D., Ballard, D.H. (1991) Learning to perceive and act by trial and error. Machine Learning 7:, 45--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Whitehead, S. D. (1989) Scaling reinforcement learning systems. Technical Report 304, Dept. of Computer Science, University of Rochester, Rochester, NY 14627.Google ScholarGoogle Scholar

Index Terms

  1. Dyna, an integrated architecture for learning, planning, and reacting

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGART Bulletin
          ACM SIGART Bulletin  Volume 2, Issue 4
          Aug. 1991
          221 pages
          ISSN:0163-5719
          DOI:10.1145/122344
          Issue’s Table of Contents

          Copyright © 1991 Author

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 July 1991

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader