Dyna, an integrated architecture for learning, planning, and reacting

Author:
Richard S. Sutton

View Profile

Authors Info & Claims

ACM SIGART Bulletin Volume 2 Issue 4Aug. 1991pp 160–163https://doi.org/10.1145/122344.122377

Published:01 July 1991Publication History

ACM SIGART Bulletin

Abstract

Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples---these are among the basic building blocks making up the architecture---yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.

References

Barto, A. G., Sutton, R. S., &amp; Watkins, C. J. C. H. (1990) Learning and sequential decision making. In Learning and Computational Neuroscience, M. Gabriel and J.W. Moore (Eds.), 539--602, MIT Press.Google Scholar
Bertsekas, D. P. (1987) Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall. Google ScholarDigital Library
Bertsekas, D. P. &amp; Tsitsiklis, J. N. (1989) Parallel Distributed Processing: Numerical Methods, Prentice-Hall. Google ScholarDigital Library
Craik, K. J. W. (1943) The Nature of Explanation. Cambridge University Press, Cambridge, UK.Google Scholar
Dennett, D. C. (1978) Why the law of effect will not go away. In Brainstorms, by D. C. Dennett, 71--89, Bradford Books.Google Scholar
Grefenstette, J. J., Ramsey, C. L., &amp; Schultz, A. C. (1990) Learning sequential decision rules using simulation models and competition. Machine Learning 5, 355--382. Google ScholarDigital Library
Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. Michalski, J. Carbonell &amp; T. Mitchell, Eds., Machine learning II, Morgan Kaufmann.Google Scholar
Kaelbling, L. P. (1990) Learning in Embedded Systems. Ph.D. thesis, Stanford University. Google ScholarDigital Library
Korf, R. E. (1990) Real-Time Heuristic Search. Artificial Intelligence 42: 189--211. Google ScholarDigital Library
Lin, Long-Ji. (1991) Self-improving reactive agents: Case studies of reinforcement learning frameworks. In: Proceedings of the International Conference on the Simulation of Adaptive Behavior, 297--305, MIT Press. Google ScholarDigital Library
Mahadevan, S. &amp; Connell, J. (1990) Automatic programming of behavior-based robots using reinforcement learning. IBM technical report.Google Scholar
Riolo, R. (1991) Lookahead planning and latent learning in a classifier system. In: Proceedings of the International Conference on the Simulation of Adaptive Behavior, MIT Press. Google ScholarDigital Library
Russell, S. J. (1989) Execution architectures and compilation. Proceedings of IJCAI-89, 15--20.Google Scholar
Sutton, R. S. (1984) Temporal credit assignment in reinforcement learning. PhD thesis, COINS Dept., Univ. of Mass., Amherst, MA 01003. Google ScholarDigital Library
Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning 3: 9--44. Google ScholarDigital Library
Sutton, R. S. (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning, 216--224. Google ScholarDigital Library
Sutton, R.S., Barto, A.G. (1981) An adaptive network that constructs and uses an internal model of its environment. Cognition and Brain Theory Quarterly 4: 217--246.Google Scholar
Watkins, C. J. C. H. (1989) Learning with Delayed Rewards. PhD thesis, Cambridge University Psychology Department.Google Scholar
Werbos, P. J. (1987) Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, SMC-17, No. 1, 7--20. Google ScholarDigital Library
Whitehead, S. D., Ballard, D.H. (1991) Learning to perceive and act by trial and error. Machine Learning 7:, 45--83. Google ScholarDigital Library
Whitehead, S. D. (1989) Scaling reinforcement learning systems. Technical Report 304, Dept. of Computer Science, University of Rochester, Rochester, NY 14627.Google Scholar

Index Terms

Dyna, an integrated architecture for learning, planning, and reacting
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
  2. Machine learning
2. Information systems
  1. Information systems applications
    1. Decision support systems
      1. Expert systems

Recommendations

An Architectural Framework for Integrated Multiagent Planning, Reacting, and Learning
ATAL '00: Proceedings of the 7th International Workshop on Intelligent Agents VII. Agent Theories Architectures and Languages

Dyna is a single-agent architectural framework that integrates learning, planning, and reacting. Well known instantiations of Dyna are Dyna-ACand Dyna-Q. Here a multiagent extension of Dyna-Q is presented. This extension, called MDyna-Q, constitutes a ...
Read More
An Integrated Approach of Learning, Planning, and Execution

Agents (hardware or software) that act autonomously in an environment have to be able to integrate three basic behaviors: planning, execution, and learning. This integration is mandatory when the agent has no knowledge about how its actions can affect ...
Read More
An integrated architecture for planning and learning

We are developing an integrated system for planning and learning. The type of learning we are interested in is advice-taking, or learning by being told. Our model of planning is based on the Reactive Action Packages (RAPs) of Firby (1989). Our model of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGART Bulletin Volume 2, Issue 4
Aug. 1991
221 pages
ISSN:0163-5719
DOI:10.1145/122344
Editor:
Stuart Shapiro
SUNY at Buffalo, Buffalo, NY
Issue’s Table of Contents
Copyright © 1991 Author
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 1991
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 346
  Total Citations
  View Citations
- 5,142
  Total Downloads
- Downloads (Last 12 months)1,802
- Downloads (Last 6 weeks)371
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dyna, an integrated architecture for learning, planning, and reacting

ACM SIGART Bulletin

Abstract

References

Cited By

Index Terms

Recommendations

An Architectural Framework for Integrated Multiagent Planning, Reacting, and Learning

An Integrated Approach of Learning, Planning, and Execution

An integrated architecture for planning and learning