Abstract
Dyna is an AI architecture that integrates learning, planning, and reactive execution. Learning methods are used in Dyna both for compiling planning results and for updating a model of the effects of the agent's actions on the world. Planning is incremental and can use the probabilistic and ofttimes incorrect world models generated by learning processes. Execution is fully reactive in the sense that no planning intervenes between perception and action. Dyna relies on machine learning methods for learning from examples---these are among the basic building blocks making up the architecture---yet is not tied to any particular method. This paper briefly introduces Dyna and discusses its strengths and weaknesses with respect to other architectures.
- Barto, A. G., Sutton, R. S., & Watkins, C. J. C. H. (1990) Learning and sequential decision making. In Learning and Computational Neuroscience, M. Gabriel and J.W. Moore (Eds.), 539--602, MIT Press.Google Scholar
- Bertsekas, D. P. (1987) Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall. Google ScholarDigital Library
- Bertsekas, D. P. & Tsitsiklis, J. N. (1989) Parallel Distributed Processing: Numerical Methods, Prentice-Hall. Google ScholarDigital Library
- Craik, K. J. W. (1943) The Nature of Explanation. Cambridge University Press, Cambridge, UK.Google Scholar
- Dennett, D. C. (1978) Why the law of effect will not go away. In Brainstorms, by D. C. Dennett, 71--89, Bradford Books.Google Scholar
- Grefenstette, J. J., Ramsey, C. L., & Schultz, A. C. (1990) Learning sequential decision rules using simulation models and competition. Machine Learning 5, 355--382. Google ScholarDigital Library
- Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. Michalski, J. Carbonell & T. Mitchell, Eds., Machine learning II, Morgan Kaufmann.Google Scholar
- Kaelbling, L. P. (1990) Learning in Embedded Systems. Ph.D. thesis, Stanford University. Google ScholarDigital Library
- Korf, R. E. (1990) Real-Time Heuristic Search. Artificial Intelligence 42: 189--211. Google ScholarDigital Library
- Lin, Long-Ji. (1991) Self-improving reactive agents: Case studies of reinforcement learning frameworks. In: Proceedings of the International Conference on the Simulation of Adaptive Behavior, 297--305, MIT Press. Google ScholarDigital Library
- Mahadevan, S. & Connell, J. (1990) Automatic programming of behavior-based robots using reinforcement learning. IBM technical report.Google Scholar
- Riolo, R. (1991) Lookahead planning and latent learning in a classifier system. In: Proceedings of the International Conference on the Simulation of Adaptive Behavior, MIT Press. Google ScholarDigital Library
- Russell, S. J. (1989) Execution architectures and compilation. Proceedings of IJCAI-89, 15--20.Google Scholar
- Sutton, R. S. (1984) Temporal credit assignment in reinforcement learning. PhD thesis, COINS Dept., Univ. of Mass., Amherst, MA 01003. Google ScholarDigital Library
- Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning 3: 9--44. Google ScholarDigital Library
- Sutton, R. S. (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning, 216--224. Google ScholarDigital Library
- Sutton, R.S., Barto, A.G. (1981) An adaptive network that constructs and uses an internal model of its environment. Cognition and Brain Theory Quarterly 4: 217--246.Google Scholar
- Watkins, C. J. C. H. (1989) Learning with Delayed Rewards. PhD thesis, Cambridge University Psychology Department.Google Scholar
- Werbos, P. J. (1987) Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, SMC-17, No. 1, 7--20. Google ScholarDigital Library
- Whitehead, S. D., Ballard, D.H. (1991) Learning to perceive and act by trial and error. Machine Learning 7:, 45--83. Google ScholarDigital Library
- Whitehead, S. D. (1989) Scaling reinforcement learning systems. Technical Report 304, Dept. of Computer Science, University of Rochester, Rochester, NY 14627.Google Scholar
Index Terms
- Dyna, an integrated architecture for learning, planning, and reacting
Recommendations
An Architectural Framework for Integrated Multiagent Planning, Reacting, and Learning
ATAL '00: Proceedings of the 7th International Workshop on Intelligent Agents VII. Agent Theories Architectures and LanguagesDyna is a single-agent architectural framework that integrates learning, planning, and reacting. Well known instantiations of Dyna are Dyna-ACand Dyna-Q. Here a multiagent extension of Dyna-Q is presented. This extension, called MDyna-Q, constitutes a ...
An Integrated Approach of Learning, Planning, and Execution
Agents (hardware or software) that act autonomously in an environment have to be able to integrate three basic behaviors: planning, execution, and learning. This integration is mandatory when the agent has no knowledge about how its actions can affect ...
An integrated architecture for planning and learning
We are developing an integrated system for planning and learning. The type of learning we are interested in is advice-taking, or learning by being told. Our model of planning is based on the Reactive Action Packages (RAPs) of Firby (1989). Our model of ...
Comments