Skip to main content
Erschienen in: Cognitive Computation 2/2018

25.09.2017

A Brain-Inspired Decision Making Model Based on Top-Down Biasing of Prefrontal Cortex to Basal Ganglia and Its Application in Autonomous UAV Explorations

verfasst von: Feifei Zhao, Yi Zeng, Guixiang Wang, Jun Bai, Bo Xu

Erschienen in: Cognitive Computation | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Decision making is a fundamental ability for intelligent agents (e.g., humanoid robots and unmanned aerial vehicles). During decision making process, agents can improve the strategy for interacting with the dynamic environment through reinforcement learning. Many state-of-the-art reinforcement learning models deal with relatively smaller number of state-action pairs, and the states are preferably discrete, such as Q-learning and Actor-Critic algorithms. While in practice, in many scenario, the states are continuous and hard to be properly discretized. Better autonomous decision making methods need to be proposed to handle these problems. Inspired by the mechanism of decision making in human brain, we propose a general computational model, named as prefrontal cortex-basal ganglia (PFC-BG) algorithm. The proposed model is inspired by the biological reinforcement learning pathway and mechanisms from the following perspectives: (1) Dopamine signals continuously update reward-relevant information for both basal ganglia and working memory in prefrontal cortex. (2) We maintain the contextual reward information in working memory. This has a top-down biasing effect on reinforcement learning in basal ganglia. The proposed model separates the continuous states into smaller distinguishable states, and introduces continuous reward function for each state to obtain reward information at different time. To verify the performance of our model, we apply it to many UAV decision making experiments, such as avoiding obstacles and flying through window and door, and the experiments support the effectiveness of the model. Compared with traditional Q-learning and Actor-Critic algorithms, the proposed model is more biologically inspired, and more accurate and faster to make decision.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Botvinick MM. Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol. 2012;22(6): 956–962.CrossRefPubMed Botvinick MM. Hierarchical reinforcement learning and decision making. Curr Opin Neurobiol. 2012;22(6): 956–962.CrossRefPubMed
3.
Zurück zum Zitat Humphrys M. Action selection methods using reinforcement learning. Proceedings of the International Conference on Simulation of Adaptive Behavior; 1996. p. 135–144. Humphrys M. Action selection methods using reinforcement learning. Proceedings of the International Conference on Simulation of Adaptive Behavior; 1996. p. 135–144.
4.
Zurück zum Zitat Arel I. Theoretical foundations of artificial general intelligence, chapter deep reinforcement learning as foundation for artificial general Intelligence:89–102. 2012. Arel I. Theoretical foundations of artificial general intelligence, chapter deep reinforcement learning as foundation for artificial general Intelligence:89–102. 2012.
5.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing atari with deep reinforcement learning. 2013. arXiv:1312.5602. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing atari with deep reinforcement learning. 2013. arXiv:1312.​5602.
6.
Zurück zum Zitat Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533.CrossRefPubMed Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533.CrossRefPubMed
7.
Zurück zum Zitat Hearn RA, Granger RH. Learning hierarchical representations and behaviors. Association for the Advancement of Artificial Intelligence. 2008. Hearn RA, Granger RH. Learning hierarchical representations and behaviors. Association for the Advancement of Artificial Intelligence. 2008.
8.
Zurück zum Zitat Schultz W, Dickinson A. Neuronal coding of prediction errors. Ann Rev Neurosci. 2000;23:473–500.CrossRefPubMed Schultz W, Dickinson A. Neuronal coding of prediction errors. Ann Rev Neurosci. 2000;23:473–500.CrossRefPubMed
9.
Zurück zum Zitat Alexander GE, Crutcher MD. Functional architecture of basal ganglia circuits Neural substrates of parallel processing. Trends Neurosci. 1990;13(7):266–271.CrossRefPubMed Alexander GE, Crutcher MD. Functional architecture of basal ganglia circuits Neural substrates of parallel processing. Trends Neurosci. 1990;13(7):266–271.CrossRefPubMed
10.
Zurück zum Zitat Gerfen CR. The neostriatal mosaic: multiple levels of compartmental organization in the basal ganglia. J Neural Transm Suppl. 1992;36(4):43–59.PubMed Gerfen CR. The neostriatal mosaic: multiple levels of compartmental organization in the basal ganglia. J Neural Transm Suppl. 1992;36(4):43–59.PubMed
11.
Zurück zum Zitat Joel D, Weiner I. The organization of the basal ganglia-thalamocortical circuits: open interconnected rather than closed segregated. Neuroscience. 1994;63(2):363–379.CrossRefPubMed Joel D, Weiner I. The organization of the basal ganglia-thalamocortical circuits: open interconnected rather than closed segregated. Neuroscience. 1994;63(2):363–379.CrossRefPubMed
12.
Zurück zum Zitat Joel D, Weiner I. The connections of the primate subthalamic nucleus: indirect pathways and the open-interconnected scheme of basal ganglia-thalamocortical circuitry. Brain Res Rev. 1997;23:62–78.CrossRefPubMed Joel D, Weiner I. The connections of the primate subthalamic nucleus: indirect pathways and the open-interconnected scheme of basal ganglia-thalamocortical circuitry. Brain Res Rev. 1997;23:62–78.CrossRefPubMed
13.
14.
Zurück zum Zitat Joel D, Weiner I. The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum. Neuroscience. 2000;96(3): 451–474.CrossRefPubMed Joel D, Weiner I. The connections of the dopaminergic system with the striatum in rats and primates: an analysis with respect to the functional and compartmental organization of the striatum. Neuroscience. 2000;96(3): 451–474.CrossRefPubMed
15.
Zurück zum Zitat Schultz W, Apicella P, Scarnati E, Ljungberg T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J Neurosci. 1992;12(12):4595–4610.CrossRefPubMed Schultz W, Apicella P, Scarnati E, Ljungberg T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J Neurosci. 1992;12(12):4595–4610.CrossRefPubMed
16.
Zurück zum Zitat O’Reilly RC, Frank MJ. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput. 2006;18(2):283–328.CrossRefPubMed O’Reilly RC, Frank MJ. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput. 2006;18(2):283–328.CrossRefPubMed
17.
Zurück zum Zitat Frank MJ, Claus ED. Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol Rev. 2006;113(2):300–326.CrossRefPubMed Frank MJ, Claus ED. Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversal. Psychol Rev. 2006;113(2):300–326.CrossRefPubMed
18.
Zurück zum Zitat Dayan P, Daw ND. Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci. 2008; 8(4):429–453.CrossRefPubMed Dayan P, Daw ND. Decision theory, reinforcement learning, and the brain. Cogn Affect Behav Neurosci. 2008; 8(4):429–453.CrossRefPubMed
20.
Zurück zum Zitat Karni E. A theory of bayesian decision making with action-dependent subjective probabilities. Econ Theory. 2011; 48(1):125–146.CrossRef Karni E. A theory of bayesian decision making with action-dependent subjective probabilities. Econ Theory. 2011; 48(1):125–146.CrossRef
21.
Zurück zum Zitat Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. Proceedings of the 33th international conference on machine learning; 2016. p. 1928–1937. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. Proceedings of the 33th international conference on machine learning; 2016. p. 1928–1937.
22.
Zurück zum Zitat Timothy P, Lillicrap J, Hunt J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. 2015. arXiv:1509.02971. Timothy P, Lillicrap J, Hunt J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. 2015. arXiv:1509.​02971.
23.
Zurück zum Zitat Hasselt HV, Guez A, Silver D. Deep reinforcement learning with double q-learning. Proceedings of the 30th AAAI conference on artificial intelligence; 2016. Hasselt HV, Guez A, Silver D. Deep reinforcement learning with double q-learning. Proceedings of the 30th AAAI conference on artificial intelligence; 2016.
24.
Zurück zum Zitat Nair A, Srinivasan P, Blackwell S, Alcicek C, Fearon R, De Maria A, Panneershelvam V, Suleyman M, Beattie C, Petersen S. Massively parallel methods for deep reinforcement learning. 2015. arXiv:1507.04296. Nair A, Srinivasan P, Blackwell S, Alcicek C, Fearon R, De Maria A, Panneershelvam V, Suleyman M, Beattie C, Petersen S. Massively parallel methods for deep reinforcement learning. 2015. arXiv:1507.​04296.
25.
Zurück zum Zitat Barto AG, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst. 2003;13(1):41–77.CrossRef Barto AG, Mahadevan S. Recent advances in hierarchical reinforcement learning. Discrete Event Dyn Syst. 2003;13(1):41–77.CrossRef
26.
Zurück zum Zitat Morimoto J, Doyayy K. Hierarchical reinforcement learning of low-dimensional subgoals and high-dimensional trajectories. Proceedings of the 5th International Conference on Neural Information Processing; 1998. p. 850–853. Morimoto J, Doyayy K. Hierarchical reinforcement learning of low-dimensional subgoals and high-dimensional trajectories. Proceedings of the 5th International Conference on Neural Information Processing; 1998. p. 850–853.
27.
Zurück zum Zitat Smart WD, Kaelbling LP. Practical reinforcement learning in continuous spaces. Proceedings of the 17th International Conference on Machine Learning; 2000. p. 903–910. Smart WD, Kaelbling LP. Practical reinforcement learning in continuous spaces. Proceedings of the 17th International Conference on Machine Learning; 2000. p. 903–910.
28.
Zurück zum Zitat Lazaric A, Restelli M, Bonarini A. Reinforcement learning in continuous action spaces through sequential monte carlo methods. Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems; 2007. p. 833–840. Lazaric A, Restelli M, Bonarini A. Reinforcement learning in continuous action spaces through sequential monte carlo methods. Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems; 2007. p. 833–840.
29.
Zurück zum Zitat Joel D, Niv Y, Ruppin E. Actor-ccritic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw. 2002;15(4):535–547.CrossRefPubMed Joel D, Niv Y, Ruppin E. Actor-ccritic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw. 2002;15(4):535–547.CrossRefPubMed
30.
Zurück zum Zitat Frémaux N, Sprekeler H, Gerstner W. Reinforcement learning using a continuous time actor-critic framework with spiking neurons. PLOS Comput Biology. 2013;9(4):1–21.CrossRef Frémaux N, Sprekeler H, Gerstner W. Reinforcement learning using a continuous time actor-critic framework with spiking neurons. PLOS Comput Biology. 2013;9(4):1–21.CrossRef
31.
Zurück zum Zitat Ellaithy K, Bogdan M. A reinforcement learning framework for spiking networks with dynamic synapses. Comput Intell Neuroscience. 2011;2011(3):713–750. Ellaithy K, Bogdan M. A reinforcement learning framework for spiking networks with dynamic synapses. Comput Intell Neuroscience. 2011;2011(3):713–750.
32.
33.
Zurück zum Zitat Berns GS, Sejnowski TJ. A computational model of how the basal ganglia produce sequences. J Cogn Neurosci. 1998;10(1):108–121.CrossRefPubMed Berns GS, Sejnowski TJ. A computational model of how the basal ganglia produce sequences. J Cogn Neurosci. 1998;10(1):108–121.CrossRefPubMed
34.
Zurück zum Zitat Kumaravelu K, Brocker DT, Grill WM. A biophysical model of the cortex-basal ganglia-thalamus network in the 6-ohda lesioned rat model of parkinson’s disease. J Comput Neurosci. 2016;40(2):207–229.CrossRefPubMedPubMedCentral Kumaravelu K, Brocker DT, Grill WM. A biophysical model of the cortex-basal ganglia-thalamus network in the 6-ohda lesioned rat model of parkinson’s disease. J Comput Neurosci. 2016;40(2):207–229.CrossRefPubMedPubMedCentral
35.
Zurück zum Zitat Debnath S, Nassour J. Extending cortical-basal inspired reinforcement learning model with success-failure experience. Proceedings of 4th IEEE International Conference on Development and Learning and on Epigenetic Robotics; 2014. p. 293–298. Debnath S, Nassour J. Extending cortical-basal inspired reinforcement learning model with success-failure experience. Proceedings of 4th IEEE International Conference on Development and Learning and on Epigenetic Robotics; 2014. p. 293–298.
36.
Zurück zum Zitat Vijay R, John N. Tsitsiklis Konda actor-critic algorithms. SLAM J Control Optim. 2003;42(4):1143–1166.CrossRef Vijay R, John N. Tsitsiklis Konda actor-critic algorithms. SLAM J Control Optim. 2003;42(4):1143–1166.CrossRef
37.
Zurück zum Zitat Grondman I, Busoniu L, Lopes G, Babuska R. A survey of actor-critic reinforcement learning Standard and natural policy grdients. IEEE Trans Syst Man Cybern. 2012;42(6):1291–1307.CrossRef Grondman I, Busoniu L, Lopes G, Babuska R. A survey of actor-critic reinforcement learning Standard and natural policy grdients. IEEE Trans Syst Man Cybern. 2012;42(6):1291–1307.CrossRef
38.
Zurück zum Zitat Sutton RS, Barto AG. 1998. Reinforcement Learning: an introduction, chapter the reinforcement learning problem:70–71. Sutton RS, Barto AG. 1998. Reinforcement Learning: an introduction, chapter the reinforcement learning problem:70–71.
39.
Zurück zum Zitat Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter temporal-difference learning:188–190. 1998. Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter temporal-difference learning:188–190. 1998.
40.
Zurück zum Zitat Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter evaluative feedback:40–42. 1998. Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter evaluative feedback:40–42. 1998.
41.
Zurück zum Zitat Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter temporal-difference learning:185–186. 1998. Sutton RS, Barto AG. Reinforcement Learning: an introduction, chapter temporal-difference learning:185–186. 1998.
Metadaten
Titel
A Brain-Inspired Decision Making Model Based on Top-Down Biasing of Prefrontal Cortex to Basal Ganglia and Its Application in Autonomous UAV Explorations
verfasst von
Feifei Zhao
Yi Zeng
Guixiang Wang
Jun Bai
Bo Xu
Publikationsdatum
25.09.2017
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 2/2018
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-017-9511-3

Weitere Artikel der Ausgabe 2/2018

Cognitive Computation 2/2018 Zur Ausgabe

Premium Partner