- Astrom, K. J. (1965). Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Applic., 10, 174-205.Google ScholarCross Ref
- Bertsekas, D. C., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Athena Scientific, Belmont, Mass. Google ScholarDigital Library
- Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997a). Adaptive probabllistic networks with hidden variables. Machine Learning, 29, 213-244. Google ScholarDigital Library
- Binder, J., Murphy, K., & Russell, S. (1997b). Space-efficient inference in dynamic probabilistic networks. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97) Nagoya, Japan. Morgan Kaufmann. Google ScholarDigital Library
- Boyen, X., & Koller, D. (1998). Tractable inference for complex stochastic processes. In Proc. 14th Annual Conference on Uncertainty in AI (UAD. to appear. Google ScholarDigital Library
- Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. ComputationallnteUigence, 5(3), 142-150. Google ScholarDigital Library
- Doya, K., & Sejnowski, T. (1995). A novel reinforcement model of birdsong vocalization learning. In Tesauro, G., Touretzky, D., & Leen, T (Eds.), Advances in Neural Information Processing Systems, Vol. 8, pp. 101-8 Denver, CO. M1T Press.Google Scholar
- Farley, C. T., & Taylor, C. R. (1991). A mechanical trigger for the trot-gallop transition in horses. Science, 253(5017), 306- 308.Google ScholarCross Ref
- Friedman, N., Murphy, K., & Russell, S. (1998). Leaming the structure of dynamic probabilistic networks. In Uncertainty in Artificial Intelligence : Proceedings of the Fourteenth Conference Madison, Wisconsin. Morgan Kaufmann. Google ScholarDigital Library
- Hoyt, D., & Taylor, C. (1981). Gait and the energetics of locomotion in horses. Nature, 292,239-240.Google ScholarCross Ref
- Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4,237-285. Google ScholarDigital Library
- Kanazawa, K., Koller, D., & Russell, S. (1995). Stochastic simulation algorithms for dynamic probabilistic networks. In Eleventh Conference, pp. 346-351 Montreal, Canada. Morgan Kaufmann. Google ScholarDigital Library
- Keeney, R. L., & Raiffa, H. (1976). Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, New York.Google Scholar
- McCallum, A. R. (1993). Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pp. 190-196 Amherst, Massachusetts. Morgan Kaufmann.Google ScholarDigital Library
- Montague, P. R., Dayan, P., Person, C., & Sejnowski, T. J. (1995). Bee foraging in uncertain environments using predictive hebbian learning. Nature, 377, 725-728.Google ScholarCross Ref
- Parr, R., & Russell, S. (1995). Approximating optimal policies for partially observable stochastic domains. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95) Montreal, Canada. Morgan Kaufmann. Google ScholarDigital Library
- Parr, R., & Russell, S. (1998). Reinforcement leaming with hierarchies of machines. In Keams, M. (Ed.), Advances in Neural Information Processing Systems 10. MIT Press, Cambridge, Massachusetts. Google ScholarDigital Library
- Russell, S. J., & Norvig, P. (1995). ArtificialIntelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, New Jersey. Google ScholarDigital Library
- Rust, J. (1994). Do people behave according to bellman's principal of optimality?. Submitted to Journal of Economic Perspecfives.Google Scholar
- Sargent, T. J. (1978). Estimation of dynamic labor demand schedules under rational expectations. Journal of Political Economy, 86(6), 1009-1044.Google ScholarCross Ref
- Schmajuk, N. A., & Zanutto, B. S. (1997). Escape, avoidance, and imitation: a neural network approach. Adaptive Behavior, 6(1), 63-129. Google ScholarDigital Library
- Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44. Google ScholarDigital Library
- Touretzky, D. S., & Saksida, L. M. (1997). Operant conditioning in Skinnerbots. Adaptive Behavior, 5(3-4), 219-47. Google ScholarDigital Library
Index Terms
- Learning agents for uncertain environments (extended abstract)
Recommendations
Allocating training instances to learning agents for team formation
Agents can learn to improve their coordination with their teammates and increase team performance. There are finite training instances, where each training instance is an opportunity for the learning agents to improve their coordination. In this article,...
Dynamic Coalitions Formation in Dynamic Uncertain Environments
WI-IAT '15: Proceedings of the 2015 IEEE / WIC / ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) - Volume 01This study presents a novel solution to agent coalition formation. It focuses on heterogeneous, distributed multi-agent systems deployed in real-world environments. Specifically, we study dynamic, uncertain environments in which tasks may evolve during ...
Temporal planning in dynamic environments for mobile agents
FIT '09: Proceedings of the 7th International Conference on Frontiers of Information TechnologyIn this thesis we intend to propose and implement a planning based enhancement of the agent oriented programming language CLAIM, in which an agent is an intelligent and mobile entity. This new language which is called P-CLAIM, should have the following ...
Comments