Skip to main content
Erschienen in: Autonomous Agents and Multi-Agent Systems 1/2015

01.01.2015

Strategies for simulating pedestrian navigation with multiple reinforcement learning agents

verfasst von: Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández

Erschienen in: Autonomous Agents and Multi-Agent Systems | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, a new multi-agent reinforcement learning approach is introduced for the simulation of pedestrian groups. Unlike other solutions, where the behaviors of the pedestrians are coded in the system, in our approach the agents learn by interacting with the environment. The embodied agents must learn to control their velocity, avoiding obstacles and the other pedestrians, to reach a goal inside the scenario. The main contribution of this paper is to propose this new methodology that uses different iterative learning strategies, combining a vector quantization (state space generalization) with the Q-learning algorithm (VQQL). Two algorithmic schemas, Iterative VQQL and Incremental, which differ in the way of addressing the problems, have been designed and used with and without transfer of knowledge. These algorithms are tested and compared with the VQQL algorithm as a baseline in two scenarios where agents need to solve well-known problems in pedestrian modeling. In the first, agents in a closed room need to reach the unique exit producing and solving a bottleneck. In in the second, two groups of agents inside a corridor need to reach their goal that is placed in opposite sides (they need to solve the crossing). In the first scenario, we focus on scalability, use metrics from the pedestrian modeling field, and compare with the Helbing’s social force model. The emergence of collective behaviors, that is, the shell-shaped clogging in front of the exit in the first scenario, and the lane formation as a solution to the problem of the crossing, have been obtained and analyzed. The results demonstrate that the proposed schemas find policies that carry out the tasks, suggesting that they are applicable and generalizable to the simulation of pedestrians groups.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The term ‘trial’ situated in the abscissa of the graphics has the same meaning that the term ‘episode’ in the text.
 
2
In machine learning, many different approaches are used to fill in unobserved features. We have studied informally some of them, specifically random imputation and mean imputation, obtaining similar performances.
 
3
In the experiments, we will show that 18 iterations is a value large enough to ensure convergence in all the proposed scenarios.
 
4
Assuming that a soft variation in the values of the parameters produce a soft variation in the learning performance (the experiments agree with this assumption), the way of finding the values for the learning parameters consists of a coarse search inside the allowed values followed by a refinement over the candidate with better performance.
 
5
Specifically, the policy \(\pi _0\) choose randomly from the set of actions that turns the agent’s velocity vector towards the right side of the corridor
 
6
In order to fit the size of the table, we have abbreviated the names of the schemas in all the tables. Thus, IT means ITVQQL and the prefix TF means “with transfer of knowledge”.
 
Literatur
1.
Zurück zum Zitat Agre, P. & Chapman, D. (1987). Pengi: An implementation of a theory of activity. In: Proceedings of the Sixth National Conference on Artificial Intelligence, (pp. 268–272). Burlington: Morgan Kaufmann Agre, P. & Chapman, D. (1987). Pengi: An implementation of a theory of activity. In: Proceedings of the Sixth National Conference on Artificial Intelligence, (pp. 268–272). Burlington: Morgan Kaufmann
2.
Zurück zum Zitat Banerjee, B., Abukmail, A., & Kraemer, L. (2009). Layered intelligence for agent-based crowd simulation. Simulation, 85, 621–632.CrossRef Banerjee, B., Abukmail, A., & Kraemer, L. (2009). Layered intelligence for agent-based crowd simulation. Simulation, 85, 621–632.CrossRef
3.
Zurück zum Zitat van den Berg, J., Lin, M. & Manocha, D. (2008). Reciprocal velocity obstales for real-time multi-agent navigator. In: Proceedings of the IEEE International Conference on Robotics and Automation (pp. 1928–1935). van den Berg, J., Lin, M. & Manocha, D. (2008). Reciprocal velocity obstales for real-time multi-agent navigator. In: Proceedings of the IEEE International Conference on Robotics and Automation (pp. 1928–1935).
4.
Zurück zum Zitat Bierlaire, M., & Robin, T. (2009). Pedestrians choices. In H. Timmermans (Ed.), Pedestrian Behavior (pp. 1–26). Bradford: Emerald. Bierlaire, M., & Robin, T. (2009). Pedestrians choices. In H. Timmermans (Ed.), Pedestrian Behavior (pp. 1–26). Bradford: Emerald.
5.
Zurück zum Zitat Bosse, T., Hoogendoorn, M., Klein, M. C. A., Treur, J., van der Wal, C. N., & van Wissen, A. (2013). Modelling collective decision making in groups and crowds: Integrating social contagion and interacting emotions, beliefs and intentions. Autonomous Agents and Multi-Agent Systems, 27(1), 52–84.CrossRef Bosse, T., Hoogendoorn, M., Klein, M. C. A., Treur, J., van der Wal, C. N., & van Wissen, A. (2013). Modelling collective decision making in groups and crowds: Integrating social contagion and interacting emotions, beliefs and intentions. Autonomous Agents and Multi-Agent Systems, 27(1), 52–84.CrossRef
6.
Zurück zum Zitat Campanella, M., Hoogendoorn, S., Daamen, W. (2010). Calibrating walker models: A methodology and applications. In: Proceedings of the 12th World Conference on Transport Research WCTR 2010. Lisbon: 12th WCTR Comitee. Campanella, M., Hoogendoorn, S., Daamen, W. (2010). Calibrating walker models: A methodology and applications. In: Proceedings of the 12th World Conference on Transport Research WCTR 2010. Lisbon: 12th WCTR Comitee.
7.
Zurück zum Zitat Claus, C. & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 746–752). Menlo Park: AAAI Press. Claus, C. & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence (pp. 746–752). Menlo Park: AAAI Press.
8.
Zurück zum Zitat Daamen, W. & Hoogendoorn, S. (2003). Experimental research of pedestrian walking behavior. In: Transportation Research Board Annual Meeting 2003, (pp. 1–16). Washington: National Academy Press. Daamen, W. & Hoogendoorn, S. (2003). Experimental research of pedestrian walking behavior. In: Transportation Research Board Annual Meeting 2003, (pp. 1–16). Washington: National Academy Press.
9.
Zurück zum Zitat Fernández, F., & Borrajo, D. (2008). Two steps reinforcement learning. International Journal of Intelligent Systems, 23(2), 213–245.CrossRefMATH Fernández, F., & Borrajo, D. (2008). Two steps reinforcement learning. International Journal of Intelligent Systems, 23(2), 213–245.CrossRefMATH
10.
Zurück zum Zitat Fernández, F., Borrajo, D., & Parker, L. (2005). A reinforcement learning algorithm in cooperative multi-robot domains. Journal of Intelligent Robotics Systems, 43(2–4), 161–174.CrossRef Fernández, F., Borrajo, D., & Parker, L. (2005). A reinforcement learning algorithm in cooperative multi-robot domains. Journal of Intelligent Robotics Systems, 43(2–4), 161–174.CrossRef
11.
Zurück zum Zitat Fernández, F., García, J., & Veloso, M. (2010). Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems, 58(7), 866–871. Fernández, F., García, J., & Veloso, M. (2010). Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems, 58(7), 866–871.
12.
Zurück zum Zitat Fernando Fernández, J. G., & Veloso, M. (2010). Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems. Special Issue on Advances in Autonomous Robots for Service and Entertainment, 58(7), 866–871. Fernando Fernández, J. G., & Veloso, M. (2010). Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems. Special Issue on Advances in Autonomous Robots for Service and Entertainment, 58(7), 866–871.
13.
Zurück zum Zitat Fruin, J. (1971). Pedestrian and planning design. Tech. rep., Metropolitan Association of Urban Designers and Environmental Planners. New York, Library of congress catalogue number 70–159312. Fruin, J. (1971). Pedestrian and planning design. Tech. rep., Metropolitan Association of Urban Designers and Environmental Planners. New York, Library of congress catalogue number 70–159312.
14.
Zurück zum Zitat García, J., López-Bueno, I., Fernández, F. & Borrajo, D. (2010). A Comparative Study of Discretization Approaches for State Space Generalization in the Keepaway Soccer Task. In: Reinforcement Learning: Algorithms, Implementations and Aplications. Hauppauge: Nova Science Publishers. García, J., López-Bueno, I., Fernández, F. & Borrajo, D. (2010). A Comparative Study of Discretization Approaches for State Space Generalization in the Keepaway Soccer Task. In: Reinforcement Learning: Algorithms, Implementations and Aplications. Hauppauge: Nova Science Publishers.
15.
Zurück zum Zitat Gipps, P., & Marsjo, B. (1985). A microsimulation model for pedestrian flows. Mathematics and Computers in Simulation, 27, 95–105.CrossRef Gipps, P., & Marsjo, B. (1985). A microsimulation model for pedestrian flows. Mathematics and Computers in Simulation, 27, 95–105.CrossRef
16.
Zurück zum Zitat Gray, R. M. (1984). Vector quantization. IEEE ASSP Magazine, 1(2), 4–29.CrossRef Gray, R. M. (1984). Vector quantization. IEEE ASSP Magazine, 1(2), 4–29.CrossRef
17.
Zurück zum Zitat Helbing, D. (2004). Collective phenomena and states in traffic and self-driven many-particle systems. Computational Materials Science, 30, 180–187.CrossRef Helbing, D. (2004). Collective phenomena and states in traffic and self-driven many-particle systems. Computational Materials Science, 30, 180–187.CrossRef
18.
Zurück zum Zitat Helbing, D., Buzna, L., Johansson, A., & Werner, T. (2005). Self-organized pedestrian crowd dynamics: Experiments, simulations, and design solutions. Transportation Science, 39(1), 1–24.CrossRef Helbing, D., Buzna, L., Johansson, A., & Werner, T. (2005). Self-organized pedestrian crowd dynamics: Experiments, simulations, and design solutions. Transportation Science, 39(1), 1–24.CrossRef
19.
Zurück zum Zitat Helbing, D., Farkas, I., & Vicsek, T. (2000). Simulating dynamical features of escape panic. Nature, 407, 487.CrossRef Helbing, D., Farkas, I., & Vicsek, T. (2000). Simulating dynamical features of escape panic. Nature, 407, 487.CrossRef
20.
Zurück zum Zitat Helbing, D. & Johansson, A. (2009). Pedestrian, Crowd and Evacuation Dynamics. Encyclopedia of Complexity and Systems Science, Part 16. (pp. 6476–6495). New York: Springer. . Helbing, D. & Johansson, A. (2009). Pedestrian, Crowd and Evacuation Dynamics. Encyclopedia of Complexity and Systems Science, Part 16. (pp. 6476–6495). New York: Springer. .
21.
Zurück zum Zitat Helbing, D., Johansson, A., & Al-Abideen, H. Z. (2007). Dynamics of crowd disasters: An empirical study. Physical Review E, 75, 046109.CrossRef Helbing, D., Johansson, A., & Al-Abideen, H. Z. (2007). Dynamics of crowd disasters: An empirical study. Physical Review E, 75, 046109.CrossRef
22.
Zurück zum Zitat Helbing, D., & Molnár, P. (1995). Social force model for pedestrian dynamics. Physics Review E, 51, 4282–4286.CrossRef Helbing, D., & Molnár, P. (1995). Social force model for pedestrian dynamics. Physics Review E, 51, 4282–4286.CrossRef
23.
Zurück zum Zitat Helbing, D., Molnár, P., Farkas, I., & Bolay, K. (2001). Self-organizing pedestrian movement. Environment and Planning. Part B. Planning and Design, 28, 361–383.CrossRef Helbing, D., Molnár, P., Farkas, I., & Bolay, K. (2001). Self-organizing pedestrian movement. Environment and Planning. Part B. Planning and Design, 28, 361–383.CrossRef
24.
Zurück zum Zitat Javier García, F. B., & Fernández, F. (2012). Reinforcement learning for decision-making in a business simulator. International Journal of Information Technology & Decision Making, 11(5), 935–960.CrossRef Javier García, F. B., & Fernández, F. (2012). Reinforcement learning for decision-making in a business simulator. International Journal of Information Technology & Decision Making, 11(5), 935–960.CrossRef
25.
Zurück zum Zitat Karamouzas, I., & Overmars, M. (2012). Simulating and evaluating the local behavior of small pedestrian groups. IEEE Transactions on Visualization and Computer Graphics, 18, 394–406.CrossRef Karamouzas, I., & Overmars, M. (2012). Simulating and evaluating the local behavior of small pedestrian groups. IEEE Transactions on Visualization and Computer Graphics, 18, 394–406.CrossRef
26.
Zurück zum Zitat Klein, F., Bourjot, C. & Chevrier, V. (2009). Application of reinforcement learning to control a multiagent system. In: International Conference on Agents and Artificial Intelligence. Berlin: Springer. Klein, F., Bourjot, C. & Chevrier, V. (2009). Application of reinforcement learning to control a multiagent system. In: International Conference on Agents and Artificial Intelligence. Berlin: Springer.
27.
Zurück zum Zitat Lane, T., Ridens, M., Stevens, S. (2007). Reinforcement learning in nonstationary environment navigation tasks. In: Advances in Artificial Intelligence (LNCS 4509), pp. 429–440. Berlin: Springer. Lane, T., Ridens, M., Stevens, S. (2007). Reinforcement learning in nonstationary environment navigation tasks. In: Advances in Artificial Intelligence (LNCS 4509), pp. 429–440. Berlin: Springer.
28.
Zurück zum Zitat Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.CrossRef Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.CrossRef
29.
Zurück zum Zitat Littman, M.L. (2005). Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eleventh International Conference on Machine Learning (pp. 157–163). New Brunswick: Morgan Kaufmann. Littman, M.L. (2005). Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the Eleventh International Conference on Machine Learning (pp. 157–163). New Brunswick: Morgan Kaufmann.
30.
Zurück zum Zitat Lovas, G. (1994). Modelling and simulation of pedestrian traffic flow. Transportation Research, 28B, 429–443.CrossRef Lovas, G. (1994). Modelling and simulation of pedestrian traffic flow. Transportation Research, 28B, 429–443.CrossRef
31.
Zurück zum Zitat Martinez-Gil, F., Barber, F., Lozano, M., Grimaldo, F., Fernández, F. (2010). A reinforcement learning approach for multiagent navigation. In: ICAART 2010—Proceedings of the International Conferencenon Agents and Artificial Intelligence, Volume 1 (pp. 607–610). Artificial Intelligence: Valencia, January 22–24, 2010. Martinez-Gil, F., Barber, F., Lozano, M., Grimaldo, F., Fernández, F. (2010). A reinforcement learning approach for multiagent navigation. In: ICAART 2010—Proceedings of the International Conferencenon Agents and Artificial Intelligence, Volume 1 (pp. 607–610). Artificial Intelligence: Valencia, January 22–24, 2010.
32.
Zurück zum Zitat Martinez-Gil, F., Lozano, M. & Fernández, F. (2012). Calibrating a motion model based on reinforcement learning for pedestrian simulation. In: Motion in Games - 5th International Conference, MIG 2012, Rennes, France, November 15–17, 2012. Proceedings, Lecture Notes in Computer Science, vol. 7660, pp. 302–313. Springer. Martinez-Gil, F., Lozano, M. & Fernández, F. (2012). Calibrating a motion model based on reinforcement learning for pedestrian simulation. In: Motion in Games - 5th International Conference, MIG 2012, Rennes, France, November 15–17, 2012. Proceedings, Lecture Notes in Computer Science, vol. 7660, pp. 302–313. Springer.
33.
Zurück zum Zitat Martinez-Gil, F., Lozano, M. & Fernández, F. (2012). Multi-agent reinforcement learning for simulating pedestrian navigation. In: Adaptive and Learning Agents - International Workshop, ALA 2011, Held at AAMAS 2011, Taipei, Taiwan, May 2, 2011, Revised Selected Papers, Lecture Notes in Computer Science, vol. 7113, pp. 54–69. Springer. Martinez-Gil, F., Lozano, M. & Fernández, F. (2012). Multi-agent reinforcement learning for simulating pedestrian navigation. In: Adaptive and Learning Agents - International Workshop, ALA 2011, Held at AAMAS 2011, Taipei, Taiwan, May 2, 2011, Revised Selected Papers, Lecture Notes in Computer Science, vol. 7113, pp. 54–69. Springer.
34.
Zurück zum Zitat Mataric, M. J. (1994). Learning to behave socially. In: From Animals to Animats: International Conference on Simulation of Adaptive Behavior (pp. 453–462). Cambridge: MIT Press. Mataric, M. J. (1994). Learning to behave socially. In: From Animals to Animats: International Conference on Simulation of Adaptive Behavior (pp. 453–462). Cambridge: MIT Press.
35.
Zurück zum Zitat Pelechano, N., Allbeck, J. & Badler, N. (2007). Controlling individual agents in high-density crowd simulation. In: Proc. ACM/SIGGRAPH/Eurographycs Symp. Computer Animation, pp. 99–108. Pelechano, N., Allbeck, J. & Badler, N. (2007). Controlling individual agents in high-density crowd simulation. In: Proc. ACM/SIGGRAPH/Eurographycs Symp. Computer Animation, pp. 99–108.
36.
Zurück zum Zitat Pettré, J., Ondrej, J., Olivier, A., Crétual, A., Donikian, S. (2009). Experiment-based modeling.simulation and validation of interactions between virtual walkers. In: Proceedings of the Symposium on Computer, Animation SCA’09 (pp. 189–198). Pettré, J., Ondrej, J., Olivier, A., Crétual, A., Donikian, S. (2009). Experiment-based modeling.simulation and validation of interactions between virtual walkers. In: Proceedings of the Symposium on Computer, Animation SCA’09 (pp. 189–198).
37.
Zurück zum Zitat Reynolds, C. (2003). Evolution of corridor following behavior in a noisy world. In: From animals to animats. Proceedings of the third international conference on simulation of adaptive behavior. Cambridge: MIT Press. Reynolds, C. (2003). Evolution of corridor following behavior in a noisy world. In: From animals to animats. Proceedings of the third international conference on simulation of adaptive behavior. Cambridge: MIT Press.
38.
Zurück zum Zitat Rindsfüser, G., & Klügl, F. (2007). Agent-based pedestrian simulation: A case study of the Bern railway station. disP, 3, 9–18. Rindsfüser, G., & Klügl, F. (2007). Agent-based pedestrian simulation: A case study of the Bern railway station. disP, 3, 9–18.
39.
Zurück zum Zitat Robin, T., Antonioni, G., Bierlaire, M., & Cruz, J. (2009). Specification, estimation and validation of a pedestrian walking behavior model. Transportation Research, 43, 36–56.CrossRef Robin, T., Antonioni, G., Bierlaire, M., & Cruz, J. (2009). Specification, estimation and validation of a pedestrian walking behavior model. Transportation Research, 43, 36–56.CrossRef
40.
Zurück zum Zitat Schadschneider, A., Klingsch, W., Kluepfel, H., Kretz, T., Rogsch, C., & Seyfried, A. (2008). Evacuation dynamics: empirical results, modelling and applications. In R. A. Meyers (Ed.), Encyclopedia of Complexity and Systems Science (pp. 3142–3176). Heidelberg: Springer. Schadschneider, A., Klingsch, W., Kluepfel, H., Kretz, T., Rogsch, C., & Seyfried, A. (2008). Evacuation dynamics: empirical results, modelling and applications. In R. A. Meyers (Ed.), Encyclopedia of Complexity and Systems Science (pp. 3142–3176). Heidelberg: Springer.
41.
Zurück zum Zitat Schadschneider, A., & Syfried, A. (2011). Empirical results for pedestrian dynamics and their implications for modeling. Networks and Heterogeneous Media, 6, 545–560.CrossRefMATHMathSciNet Schadschneider, A., & Syfried, A. (2011). Empirical results for pedestrian dynamics and their implications for modeling. Networks and Heterogeneous Media, 6, 545–560.CrossRefMATHMathSciNet
42.
Zurück zum Zitat Sen, S. & Sekaran, M. (1996). Multiagent coordination with learning classifier systems. In: IJCAI95 Workshop on Adaptation and Learning in Multiagent Systems (pp. 218–233). Berlin: Springer. Sen, S. & Sekaran, M. (1996). Multiagent coordination with learning classifier systems. In: IJCAI95 Workshop on Adaptation and Learning in Multiagent Systems (pp. 218–233). Berlin: Springer.
43.
Zurück zum Zitat Seyfried, A., Steffen, B., Klingsch, W. & Boltes, M. (2005). The fundamental diagram of pedestrian movement revisited. Journal of Statistical Mechanics: Theory and Experiment, p. P10002. Seyfried, A., Steffen, B., Klingsch, W. & Boltes, M. (2005). The fundamental diagram of pedestrian movement revisited. Journal of Statistical Mechanics: Theory and Experiment, p. P10002.
44.
Zurück zum Zitat Shao, W., & Terzopoulos, D. (2005). Autonomous pedestrians. In: Proceedings of the 2005 ACM SIGGRAPH symposium on Computer animation. New York: ACM Press.CrossRef Shao, W., & Terzopoulos, D. (2005). Autonomous pedestrians. In: Proceedings of the 2005 ACM SIGGRAPH symposium on Computer animation. New York: ACM Press.CrossRef
45.
Zurück zum Zitat Steiner, A., Philipp, M. & Schmid, A. (2007). Parameter stimation for pedestrian simulation model. In: Proc. 7th Swiss Transport Research Conference (pp. 1–29). Steiner, A., Philipp, M. & Schmid, A. (2007). Parameter stimation for pedestrian simulation model. In: Proc. 7th Swiss Transport Research Conference (pp. 1–29).
46.
Zurück zum Zitat Still, K. (2000). Crowd dynamics. Ph.D. thesis, Department of Mathematics. Warwick University, UK. Still, K. (2000). Crowd dynamics. Ph.D. thesis, Department of Mathematics. Warwick University, UK.
47.
Zurück zum Zitat Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3), 165–188.CrossRef Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3), 165–188.CrossRef
48.
Zurück zum Zitat Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge: MIT Press. Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge: MIT Press.
49.
Zurück zum Zitat Sakuma, T., & Mukai, S. K. (2005). Psychological model for animating crowded pedestrians: virtual humans and social agents. Computer animation virtual worlds, 16, 343–351.CrossRef Sakuma, T., & Mukai, S. K. (2005). Psychological model for animating crowded pedestrians: virtual humans and social agents. Computer animation virtual worlds, 16, 343–351.CrossRef
50.
Zurück zum Zitat Taylor, M. & Stone, P. (2007). Representation transfer in reinforcement learning. In: AAAI 2007 Fall Symposium on Computational Approacher to Representation Change during Learning and Development. Taylor, M. & Stone, P. (2007). Representation transfer in reinforcement learning. In: AAAI 2007 Fall Symposium on Computational Approacher to Representation Change during Learning and Development.
51.
Zurück zum Zitat Taylor, M., & Stone, P. (2009). Transfer learning for reinforcement learning domains: a survey. Journal of Machine Learning Research, 10, 1633–1685.MATHMathSciNet Taylor, M., & Stone, P. (2009). Transfer learning for reinforcement learning domains: a survey. Journal of Machine Learning Research, 10, 1633–1685.MATHMathSciNet
52.
Zurück zum Zitat Taylor, M.E., Suay, H.B. & Chernova, S. (2011). Integrating reinforcement learning with human demonstrations of varying ability. In: Proceedings International Conference on Autonomous Agents and Multiagent Systems. Taylor, M.E., Suay, H.B. & Chernova, S. (2011). Integrating reinforcement learning with human demonstrations of varying ability. In: Proceedings International Conference on Autonomous Agents and Multiagent Systems.
53.
Zurück zum Zitat Teknomo, K. (2002). Microscopic pedestrian flow characteristics: Development of an image processing data collection and simulation model. Ph.D. thesis, Department of Human Social Information Sciencies. Tohoku University, Japan. Teknomo, K. (2002). Microscopic pedestrian flow characteristics: Development of an image processing data collection and simulation model. Ph.D. thesis, Department of Human Social Information Sciencies. Tohoku University, Japan.
54.
Zurück zum Zitat Thesauro, G. & Kephart, J. (2002). Pricing in agent economies using multi-agent q-learning. In: International Conference on Autonomous Agents and Multiagents Systems (AAMAS’02). Thesauro, G. & Kephart, J. (2002). Pricing in agent economies using multi-agent q-learning. In: International Conference on Autonomous Agents and Multiagents Systems (AAMAS’02).
55.
Zurück zum Zitat Torrey, L. (2010). Crowd simulation via multi-agent reinforcement learning. In: Proceedings of the Sixth AAAI Conference On Artificial Intelligence and Interactive Digital Entertainment. Menlo Park: AAAI Press. Torrey, L. (2010). Crowd simulation via multi-agent reinforcement learning. In: Proceedings of the Sixth AAAI Conference On Artificial Intelligence and Interactive Digital Entertainment. Menlo Park: AAAI Press.
56.
Zurück zum Zitat Torrey, L. & Taylor, M.E. (2012). Help an agent out: Student/teacher learning in sequential decision tasks. In: Proceedings of the Adaptive and Learning Agents workshop (at AAMAS-12). Torrey, L. & Taylor, M.E. (2012). Help an agent out: Student/teacher learning in sequential decision tasks. In: Proceedings of the Adaptive and Learning Agents workshop (at AAMAS-12).
57.
Zurück zum Zitat Vigueras, G., Lozano, M., Orduña, J. M., & Grimaldo, F. (2010). A comparative study of partitioning methods for crowd simulations. Applied Soft Computing, 10(1), 225–235.CrossRef Vigueras, G., Lozano, M., Orduña, J. M., & Grimaldo, F. (2010). A comparative study of partitioning methods for crowd simulations. Applied Soft Computing, 10(1), 225–235.CrossRef
58.
Zurück zum Zitat Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.MATH Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.MATH
59.
Zurück zum Zitat Weidmann, U. (1993). Transporttechnik der fussgänger - transporttechnische eigenschaften des fussgngerverkehrs (literaturstudie). Literature Research 90, Institut füer Verkehrsplanung, Transporttechnik, Strassen- undEisenbahnbau IVT an der ETH Zürich, ETH-Hönggerberg, CH-8093 Zürich. Weidmann, U. (1993). Transporttechnik der fussgänger - transporttechnische eigenschaften des fussgngerverkehrs (literaturstudie). Literature Research 90, Institut füer Verkehrsplanung, Transporttechnik, Strassen- undEisenbahnbau IVT an der ETH Zürich, ETH-Hönggerberg, CH-8093 Zürich.
60.
Zurück zum Zitat Whitehead, S.D. & Ballard, D.H. (1991). Learning to perceive and act by trial and error. Machine Learning pp. 45–83. Whitehead, S.D. & Ballard, D.H. (1991). Learning to perceive and act by trial and error. Machine Learning pp. 45–83.
Metadaten
Titel
Strategies for simulating pedestrian navigation with multiple reinforcement learning agents
verfasst von
Francisco Martinez-Gil
Miguel Lozano
Fernando Fernández
Publikationsdatum
01.01.2015
Verlag
Springer US
Erschienen in
Autonomous Agents and Multi-Agent Systems / Ausgabe 1/2015
Print ISSN: 1387-2532
Elektronische ISSN: 1573-7454
DOI
https://doi.org/10.1007/s10458-014-9252-6

Weitere Artikel der Ausgabe 1/2015

Autonomous Agents and Multi-Agent Systems 1/2015 Zur Ausgabe

Premium Partner