Skip to main content
Erschienen in: Journal of Computational Neuroscience 2/2014

01.10.2014

Learning and control of exploration primitives

verfasst von: Goren Gordon, Ehud Fonio, Ehud Ahissar

Erschienen in: Journal of Computational Neuroscience | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Animals explore novel environments in a cautious manner, exhibiting alternation between curiosity-driven behavior and retreats. We present a detailed formal framework for exploration behavior, which generates behavior that maintains a constant level of novelty. Similar to other types of complex behaviors, the resulting exploratory behavior is composed of exploration motor primitives. These primitives can be learned during a developmental period, wherein the agent experiences repeated interactions with environments that share common traits, thus allowing transference of motor learning to novel environments. The emergence of exploration motor primitives is the result of reinforcement learning in which information gain serves as intrinsic reward. Furthermore, actors and critics are local and ego-centric, thus enabling transference to other environments. Novelty control, i.e. the principle which governs the maintenance of constant novelty, is implemented by a central action-selection mechanism, which switches between the emergent exploration primitives and a retreat policy, based on the currently-experienced novelty. The framework has only a few parameters, wherein time-scales, learning rates and thresholds are adaptive, and can thus be easily applied to many scenarios. We implement it by modeling the rodent’s whisking system and show that it can explain characteristic observed behaviors. A detailed discussion of the framework’s merits and flaws, as compared to other related models, concludes the paper.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Ahissar, E. (1998). Temporal-code to rate-code conversion by neuronal phase-locked loops. Neural Computer, 10(3), 597–650.CrossRef Ahissar, E. (1998). Temporal-code to rate-code conversion by neuronal phase-locked loops. Neural Computer, 10(3), 597–650.CrossRef
Zurück zum Zitat Ahissar, E., & Kleinfeld, D. (2003). Closed-loop neuronal computations: focus on vibrissa somatosensation in rat. Cereb Cortex, 13(1), 53–62.PubMedCrossRef Ahissar, E., & Kleinfeld, D. (2003). Closed-loop neuronal computations: focus on vibrissa somatosensation in rat. Cereb Cortex, 13(1), 53–62.PubMedCrossRef
Zurück zum Zitat Ahissar, E., & Knutsen, P.M. (2008). Object localization with whiskers. Biol Cybern, 98, 449–458.PubMedCrossRef Ahissar, E., & Knutsen, P.M. (2008). Object localization with whiskers. Biol Cybern, 98, 449–458.PubMedCrossRef
Zurück zum Zitat Ahissar, E., & Oram, T. (2013). Thalamic relay or cortico-thalamic processing? Old question, New Answers. Cerebral Cortex: bht296. Ahissar, E., & Oram, T. (2013). Thalamic relay or cortico-thalamic processing? Old question, New Answers. Cerebral Cortex: bht296.
Zurück zum Zitat Baldassarre, G. (2011). What are intrinsic motivations? a biological perspective. IEEE International conference developmental learning (ICDL), (Vol. 2, pp. 1–8). Baldassarre, G. (2011). What are intrinsic motivations? a biological perspective. IEEE International conference developmental learning (ICDL), (Vol. 2, pp. 1–8).
Zurück zum Zitat Bahar, A., Dudai, Y., Ahissar, E. (2004). Neural signature of taste familiarity in the gustatory cortex of the freely behaving rat. J Neurophysiol, 92, 3298–3308.PubMedCrossRef Bahar, A., Dudai, Y., Ahissar, E. (2004). Neural signature of taste familiarity in the gustatory cortex of the freely behaving rat. J Neurophysiol, 92, 3298–3308.PubMedCrossRef
Zurück zum Zitat Barto, A.G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamical System, 13(1–2), 41–77.CrossRef Barto, A.G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamical System, 13(1–2), 41–77.CrossRef
Zurück zum Zitat Barto, A.G., Singh, S., Chentanez, N. (2004). Intrinsically motivated learning of hierarchical collections of skills. In International conference on developmental learning (ICDL). Barto, A.G., Singh, S., Chentanez, N. (2004). Intrinsically motivated learning of hierarchical collections of skills. In International conference on developmental learning (ICDL).
Zurück zum Zitat Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J. (2012). Canonical microcircuits for predictive coding. Neuron, 76(4), 695–711.PubMedCentralPubMedCrossRef Bastos, A.M., Usrey, W.M., Adams, R.A., Mangun, G.R., Fries, P., Friston, K.J. (2012). Canonical microcircuits for predictive coding. Neuron, 76(4), 695–711.PubMedCentralPubMedCrossRef
Zurück zum Zitat Behera, L., Gopal, M., Chaudhury, S. (1995). Self-organizing neural networks for learning inverse dynamics of robot manipulator. In IEEE/IAS International conference on industrial automation and control (I A & C’95) (pp. 457–460). Behera, L., Gopal, M., Chaudhury, S. (1995). Self-organizing neural networks for learning inverse dynamics of robot manipulator. In IEEE/IAS International conference on industrial automation and control (I A & C’95) (pp. 457–460).
Zurück zum Zitat Berg, R.W., & Kleinfeld, D. (2003). Rhythmic whisking by rat: Retraction as well as protraction of the vibrissae is under active muscular control. Journal of Neurophysiol, 89(1), 104–117.CrossRef Berg, R.W., & Kleinfeld, D. (2003). Rhythmic whisking by rat: Retraction as well as protraction of the vibrissae is under active muscular control. Journal of Neurophysiol, 89(1), 104–117.CrossRef
Zurück zum Zitat Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M. (2007). Incremental natural actor-critic algorithms. In Twenty-first annual conference on advances in neural information processing systems (pp. 105–112). Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M. (2007). Incremental natural actor-critic algorithms. In Twenty-first annual conference on advances in neural information processing systems (pp. 105–112).
Zurück zum Zitat Cools, R., Nakamura, K., Daw, N.D. (2011). Serotonin and dopamine: Unifying affective, activational, and decision functions. Neuropsychopharmacology, 36(1), 98–113.PubMedCentralPubMedCrossRef Cools, R., Nakamura, K., Daw, N.D. (2011). Serotonin and dopamine: Unifying affective, activational, and decision functions. Neuropsychopharmacology, 36(1), 98–113.PubMedCentralPubMedCrossRef
Zurück zum Zitat Der, R., & Martius, G. (2012). The playful machine. Cognitive System Monographia. Springer. Der, R., & Martius, G. (2012). The playful machine. Cognitive System Monographia. Springer.
Zurück zum Zitat Deschenes, M., Moore, J.W., Kleinfeld, D. (2012). Sniffing and whisking in rodents. Current Opinion in Neurobiology, 22(2), 243–250.PubMedCrossRef Deschenes, M., Moore, J.W., Kleinfeld, D. (2012). Sniffing and whisking in rodents. Current Opinion in Neurobiology, 22(2), 243–250.PubMedCrossRef
Zurück zum Zitat Deutsch, D., Pietr, M., Knutsen, P.M., Ahissar, E., Schneidman, E. (2012). Fast feedback in active sensing: touch-induced changes to whisker-object interaction. PLoS One, 7(9), e44, 272.CrossRef Deutsch, D., Pietr, M., Knutsen, P.M., Ahissar, E., Schneidman, E. (2012). Fast feedback in active sensing: touch-induced changes to whisker-object interaction. PLoS One, 7(9), e44, 272.CrossRef
Zurück zum Zitat Diamond, M.E., von Heimendahl, M., Knutsen, P.M., Kleinfeld, D., Ahissar, E. (2008). Where and what in the whisker sensorimotor system. Natural Reviews Neuroscience, 9(8), 601–612.CrossRef Diamond, M.E., von Heimendahl, M., Knutsen, P.M., Kleinfeld, D., Ahissar, E. (2008). Where and what in the whisker sensorimotor system. Natural Reviews Neuroscience, 9(8), 601–612.CrossRef
Zurück zum Zitat Elliot, A.J. (2006). The hierarchical model of approach-avoidance motivation. Motivaton and Emotion, 30, 111–116.CrossRef Elliot, A.J. (2006). The hierarchical model of approach-avoidance motivation. Motivaton and Emotion, 30, 111–116.CrossRef
Zurück zum Zitat Fanselow, E.E., Sameshima, K., Baccala, L.A., Nicolelis, M.A. (2001). Thalamic bursting in rats during different awake behavioral states. Proceedings of the National Academy of Sciences of the United States of America, 98(26), 15330–5.PubMedCentralPubMedCrossRef Fanselow, E.E., Sameshima, K., Baccala, L.A., Nicolelis, M.A. (2001). Thalamic bursting in rats during different awake behavioral states. Proceedings of the National Academy of Sciences of the United States of America, 98(26), 15330–5.PubMedCentralPubMedCrossRef
Zurück zum Zitat Feldmeyer, D., Brecht, M., Helmchen, F., Petersen, C.CH., Poulet, J.FA., Staiger, J.F., Luhmann, H.J., Schwarz, C. (2012). Barrel cortex function. Progress in Neurobiology, 103(0), 3–27.PubMed Feldmeyer, D., Brecht, M., Helmchen, F., Petersen, C.CH., Poulet, J.FA., Staiger, J.F., Luhmann, H.J., Schwarz, C. (2012). Barrel cortex function. Progress in Neurobiology, 103(0), 3–27.PubMed
Zurück zum Zitat File, S.E. (2001). Factors controlling measures of anxiety and responses to novelty in the mouse. Behavioural Brain Research, 125(1–2), 151–7.PubMedCrossRef File, S.E. (2001). Factors controlling measures of anxiety and responses to novelty in the mouse. Behavioural Brain Research, 125(1–2), 151–7.PubMedCrossRef
Zurück zum Zitat Flash, T., & Hochner, B. (2005). Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology, 15(6), 660–6.PubMedCrossRef Flash, T., & Hochner, B. (2005). Motor primitives in vertebrates and invertebrates. Current Opinion in Neurobiology, 15(6), 660–6.PubMedCrossRef
Zurück zum Zitat Fonio, E., Benjamini, Y., Golani, I. (2009). Freedom of movement and the stability of its unfolding in free exploration of mice. Proceedings of the National Academy of Sciences of the United States of America, 106(50), 21, 335–40.CrossRef Fonio, E., Benjamini, Y., Golani, I. (2009). Freedom of movement and the stability of its unfolding in free exploration of mice. Proceedings of the National Academy of Sciences of the United States of America, 106(50), 21, 335–40.CrossRef
Zurück zum Zitat Fox, C.J., Girdhar, N., Gurney, K.N. (2008). A causal bayesian network view of reinforcement learning. In Twenty-first international florida artificial intelligence research society conference (pp. 109–110). AAAI Press. Fox, C.J., Girdhar, N., Gurney, K.N. (2008). A causal bayesian network view of reinforcement learning. In Twenty-first international florida artificial intelligence research society conference (pp. 109–110). AAAI Press.
Zurück zum Zitat Friston, K. (2010). The free-energy principle: a unified brain theory?. Nature Reviews Neuroscience, 11(2), 127–38.PubMedCrossRef Friston, K. (2010). The free-energy principle: a unified brain theory?. Nature Reviews Neuroscience, 11(2), 127–38.PubMedCrossRef
Zurück zum Zitat Frommberger, L., & Wolter, D. (2010). Structural knowledge transfer by spatial abstraction for reinforcement learning agents. Adaptive Behavior - Animals, Animats, Software Agents, Robot, Adaptive System, 18(6), 507–525. Frommberger, L., & Wolter, D. (2010). Structural knowledge transfer by spatial abstraction for reinforcement learning agents. Adaptive Behavior - Animals, Animats, Software Agents, Robot, Adaptive System, 18(6), 507–525.
Zurück zum Zitat Gao, P., Bermejo, R., Zeigler, H.P. (2001). Whisker deafferentation and rodent whisking patterns: Behavioral evidence for a central pattern generator. Journal of Neuroscience, 21(14), 5374–5380.PubMed Gao, P., Bermejo, R., Zeigler, H.P. (2001). Whisker deafferentation and rodent whisking patterns: Behavioral evidence for a central pattern generator. Journal of Neuroscience, 21(14), 5374–5380.PubMed
Zurück zum Zitat Gordon, G., & Ahissar, E. (2012). Hierarchical curiosity loops and active sensing. Neural Network, 32, 119–29.CrossRef Gordon, G., & Ahissar, E. (2012). Hierarchical curiosity loops and active sensing. Neural Network, 32, 119–29.CrossRef
Zurück zum Zitat Gordon, G., Kaplan, D.M., Lankow, B., Little, D.Y., Sherwin, J., Suter, B.A., Thaler, L. (2011). Toward an integrated approach to perception and action: conference report and future directions. Frontiers System Neuroscience, 5, 20.CrossRef Gordon, G., Kaplan, D.M., Lankow, B., Little, D.Y., Sherwin, J., Suter, B.A., Thaler, L. (2011). Toward an integrated approach to perception and action: conference report and future directions. Frontiers System Neuroscience, 5, 20.CrossRef
Zurück zum Zitat Grant, R.A., Mitchinson, B., Fox, C.W., Prescott, T.J. (2009). Active touch sensing in the rat: anticipatory and regulatory control of whisker movements during surface exploration. Journal of Neurophysiology, 101(2), 862–74.PubMedCentralPubMedCrossRef Grant, R.A., Mitchinson, B., Fox, C.W., Prescott, T.J. (2009). Active touch sensing in the rat: anticipatory and regulatory control of whisker movements during surface exploration. Journal of Neurophysiology, 101(2), 862–74.PubMedCentralPubMedCrossRef
Zurück zum Zitat Grant, R.A., Mitchinson, B., Prescott, T.J. (2012). The development of whisker control in rats in relation to locomotion. Developmental Psychobiology, 54(2), 151–168.PubMedCrossRef Grant, R.A., Mitchinson, B., Prescott, T.J. (2012). The development of whisker control in rats in relation to locomotion. Developmental Psychobiology, 54(2), 151–168.PubMedCrossRef
Zurück zum Zitat Guillery, R.W., & Sherman, S.M. (2012). The thalamus as a monitor of motor outputs. Philosophical Transactions R Society London B Biological Sciences, 357(1428), 1809–1821.CrossRef Guillery, R.W., & Sherman, S.M. (2012). The thalamus as a monitor of motor outputs. Philosophical Transactions R Society London B Biological Sciences, 357(1428), 1809–1821.CrossRef
Zurück zum Zitat Harish, O., & Golomb, D. (2010). Control of the firing patterns of vibrissa motoneurons by modulatory and phasic synaptic inputs: a modeling study. Journal of Neurophysiology, 103(5), 2684–99.PubMedCrossRef Harish, O., & Golomb, D. (2010). Control of the firing patterns of vibrissa motoneurons by modulatory and phasic synaptic inputs: a modeling study. Journal of Neurophysiology, 103(5), 2684–99.PubMedCrossRef
Zurück zum Zitat Harlow, H.F. (1950). Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. Journal of Comparative & Physiological Psychology, 43(4), 289–94.CrossRef Harlow, H.F. (1950). Learning and satiation of response in intrinsically motivated complex puzzle performance by monkeys. Journal of Comparative & Physiological Psychology, 43(4), 289–94.CrossRef
Zurück zum Zitat Hill, D.N., Bermejo, R., Zeigler, H.P., Kleinfeld, D. (2008). Biomechanics of the vibrissa motor plant in rat: Rhythmic whisking consists of triphasic neuromuscular activity. Journal of Neurophysiology, 28(13), 3438–3455. Hill, D.N., Bermejo, R., Zeigler, H.P., Kleinfeld, D. (2008). Biomechanics of the vibrissa motor plant in rat: Rhythmic whisking consists of triphasic neuromuscular activity. Journal of Neurophysiology, 28(13), 3438–3455.
Zurück zum Zitat Hughes, R.N. (2007). Neotic preferences in laboratory rodents: issues, assessment and substrates. Neuroscience and Biobehavioral, 31(3), 441–64.CrossRef Hughes, R.N. (2007). Neotic preferences in laboratory rodents: issues, assessment and substrates. Neuroscience and Biobehavioral, 31(3), 441–64.CrossRef
Zurück zum Zitat Kaelbling, L.P., Littman, M.L., Moore, A.W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285. Kaelbling, L.P., Littman, M.L., Moore, A.W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285.
Zurück zum Zitat Kawato, M.M. (1999). Internal models for motor control and trajectory planning. Current Opinion in Neurobiology, 9, 718–727.PubMedCrossRef Kawato, M.M. (1999). Internal models for motor control and trajectory planning. Current Opinion in Neurobiology, 9, 718–727.PubMedCrossRef
Zurück zum Zitat Kleinfeld, D., Ahissar, E., Diamond, M.E. (2006). Active sensation: insights from the rodent vibrissa sensorimotor system. Current Opinion in Neurobiology, 16(4), 435–44.PubMedCrossRef Kleinfeld, D., Ahissar, E., Diamond, M.E. (2006). Active sensation: insights from the rodent vibrissa sensorimotor system. Current Opinion in Neurobiology, 16(4), 435–44.PubMedCrossRef
Zurück zum Zitat Knutsen, P.M., Biess, A., Ahissar, E. (2008). Vibrissal kinematics in 3d: Tight coupling of azimuth, elevation, and torsion across different whisking modes. Neuron, 59(1), 35–42.PubMedCrossRef Knutsen, P.M., Biess, A., Ahissar, E. (2008). Vibrissal kinematics in 3d: Tight coupling of azimuth, elevation, and torsion across different whisking modes. Neuron, 59(1), 35–42.PubMedCrossRef
Zurück zum Zitat Konidaris, G., & Barto, A. (2007). Building portable options: skill transfer in reinforcement learning. In Proceedings of the 20th international joint conference on artifical intelligence (pp. 895–900). Hyderabad: Morgan Kaufmann. Konidaris, G., & Barto, A. (2007). Building portable options: skill transfer in reinforcement learning. In Proceedings of the 20th international joint conference on artifical intelligence (pp. 895–900). Hyderabad: Morgan Kaufmann.
Zurück zum Zitat Lalazar, H., & Vaadia, E. (2008). Neural basis of sensorimotor learning: modifying internal models. Current Opinion in Neurobiology, 18((6)), 573–581.PubMedCrossRef Lalazar, H., & Vaadia, E. (2008). Neural basis of sensorimotor learning: modifying internal models. Current Opinion in Neurobiology, 18((6)), 573–581.PubMedCrossRef
Zurück zum Zitat Leiser, S.C., & Moxon, K.A. (2007). Responses of trigeminal ganglion neurons during natural whisking behaviors in the awake rat. Neuron, 53(1), 117–33.PubMedCrossRef Leiser, S.C., & Moxon, K.A. (2007). Responses of trigeminal ganglion neurons during natural whisking behaviors in the awake rat. Neuron, 53(1), 117–33.PubMedCrossRef
Zurück zum Zitat Little, D.Y., & Sommer, F.T. (2013). Learning and exploration in action-perception loops. Frontiers in Neural Circuits (in press). Little, D.Y., & Sommer, F.T. (2013). Learning and exploration in action-perception loops. Frontiers in Neural Circuits (in press).
Zurück zum Zitat Matyas, F., Sreenivasan, V., Marbach, F., Wacongne, C., Barsy, B., Mateo, C., Aronoff, R., Petersen, C.C. (2010). Motor control by sensory cortex. Science, 330(6008), 1240–3.PubMedCrossRef Matyas, F., Sreenivasan, V., Marbach, F., Wacongne, C., Barsy, B., Mateo, C., Aronoff, R., Petersen, C.C. (2010). Motor control by sensory cortex. Science, 330(6008), 1240–3.PubMedCrossRef
Zurück zum Zitat Misslin, R., & Cigrang, M. (1986). Does neophobia necessarily imply fear or anxiety?. Behavior Processes, 12(1), 45–50.CrossRef Misslin, R., & Cigrang, M. (1986). Does neophobia necessarily imply fear or anxiety?. Behavior Processes, 12(1), 45–50.CrossRef
Zurück zum Zitat Mitchinson, B., Martin, C.J., Grant, R.A., Prescott, T.J. (2007). Feedback control in active sensing: rat exploratory whisking is modulated by environmental contact. Proceedings of the Biological Sciences, 274(1613), 1035–41.CrossRef Mitchinson, B., Martin, C.J., Grant, R.A., Prescott, T.J. (2007). Feedback control in active sensing: rat exploratory whisking is modulated by environmental contact. Proceedings of the Biological Sciences, 274(1613), 1035–41.CrossRef
Zurück zum Zitat Miyazaki, M., Yamamoto, S., Uchida, S., Kitazawa, S. (2006). Bayesian calibration of simultaneity in tactile temporal order judgment. Nature Neuroscience, 9(7), 875–7.PubMedCrossRef Miyazaki, M., Yamamoto, S., Uchida, S., Kitazawa, S. (2006). Bayesian calibration of simultaneity in tactile temporal order judgment. Nature Neuroscience, 9(7), 875–7.PubMedCrossRef
Zurück zum Zitat Moldovan, T.M., & Abbeel, P. (2012). Safe exploration in markov decision processes. In ICML 2012. Moldovan, T.M., & Abbeel, P. (2012). Safe exploration in markov decision processes. In ICML 2012.
Zurück zum Zitat Ngo, H., Luciw, M., Foerster, A., Schmidhuber, J. (2012). Learning skills from play: Artificial curiosity on a katana robot arm. In IJCNN 2012. Ngo, H., Luciw, M., Foerster, A., Schmidhuber, J. (2012). Learning skills from play: Artificial curiosity on a katana robot arm. In IJCNN 2012.
Zurück zum Zitat Nicolelis, M.A., Baccala, L.A., Lin, R.C., Chapin, J.K. (1995). Sensorimotor encoding by synchronous neural ensemble activity at multiple levels of the somatosensory system. Science, 268(5215), 1353–8.PubMedCrossRef Nicolelis, M.A., Baccala, L.A., Lin, R.C., Chapin, J.K. (1995). Sensorimotor encoding by synchronous neural ensemble activity at multiple levels of the somatosensory system. Science, 268(5215), 1353–8.PubMedCrossRef
Zurück zum Zitat Niv, Y., Daw, N.D., Joel, D., Dayan, P. (2007). Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacol (Berl), 191(3), 507–520.CrossRef Niv, Y., Daw, N.D., Joel, D., Dayan, P. (2007). Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacol (Berl), 191(3), 507–520.CrossRef
Zurück zum Zitat Oudeyer, P.Y., Kaplan, F., Hafner, V.V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computations, 11(2), 265–286.CrossRef Oudeyer, P.Y., Kaplan, F., Hafner, V.V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computations, 11(2), 265–286.CrossRef
Zurück zum Zitat Ouyang, P.R., Zhang, W.J., Gupta, M.M. (2006). An adaptive switching learning control method for trajectory tracking of robot manipulators. Metchatronics, 16, 51–61.CrossRef Ouyang, P.R., Zhang, W.J., Gupta, M.M. (2006). An adaptive switching learning control method for trajectory tracking of robot manipulators. Metchatronics, 16, 51–61.CrossRef
Zurück zum Zitat Pape, L., Oddo, C.M., Controzzi, M., Cipriani, C., Frster, A., Carrozza, M.C., Schmidhuber, J. (2012). Learning tactile skills through curious exploration. Frontiers in Neurorobotics, 6. Pape, L., Oddo, C.M., Controzzi, M., Cipriani, C., Frster, A., Carrozza, M.C., Schmidhuber, J. (2012). Learning tactile skills through curious exploration. Frontiers in Neurorobotics, 6.
Zurück zum Zitat Precup, D., Sutton, R.A., Dasgupta, S. (2001). Off-policy temporal difference learning with function approximation. In Proceedings of the eighteenth international conference on machine learning (pp. 417–424). Precup, D., Sutton, R.A., Dasgupta, S. (2001). Off-policy temporal difference learning with function approximation. In Proceedings of the eighteenth international conference on machine learning (pp. 417–424).
Zurück zum Zitat Richardson, M.J., & Flash, T. (2002). Comparing smooth arm movements with the two-thirds power law and the related segmented-control hypothesis. Journal of Neuroscience, 22(18), 8201–11.PubMed Richardson, M.J., & Flash, T. (2002). Comparing smooth arm movements with the two-thirds power law and the related segmented-control hypothesis. Journal of Neuroscience, 22(18), 8201–11.PubMed
Zurück zum Zitat Saig, A., Gordon, G., Assa, E., Arieli, A., Ahissar, E. (2012). Motor-sensory confluence in tactile perception. Journal of Neuroscience, 32(40), 14,022–32.CrossRef Saig, A., Gordon, G., Assa, E., Arieli, A., Ahissar, E. (2012). Motor-sensory confluence in tactile perception. Journal of Neuroscience, 32(40), 14,022–32.CrossRef
Zurück zum Zitat Schmidhuber, J. (1990). A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the first international conference on simulation of adaptive behavior on from animals to animats (Vol. 116542, pp. 222? 227). MIT Press. Schmidhuber, J. (1990). A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the first international conference on simulation of adaptive behavior on from animals to animats (Vol. 116542, pp. 222? 227). MIT Press.
Zurück zum Zitat Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230–247.CrossRef Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3), 230–247.CrossRef
Zurück zum Zitat Schultz, W., Dayan, P., Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–9.PubMedCrossRef Schultz, W., Dayan, P., Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–9.PubMedCrossRef
Zurück zum Zitat Semba, K., Szechtman, H., Komisaruk, B.R. (1980). Synchrony among rhythmical facial tremor, neocortical ’alpha’ waves, and thalamic non-sensory neuronal bursts in intact awake rats. Brain Research, 195(2), 281–98.PubMedCrossRef Semba, K., Szechtman, H., Komisaruk, B.R. (1980). Synchrony among rhythmical facial tremor, neocortical ’alpha’ waves, and thalamic non-sensory neuronal bursts in intact awake rats. Brain Research, 195(2), 281–98.PubMedCrossRef
Zurück zum Zitat Sesack, S.R., & Grace, A.A. (2009). Cortico-basal ganglia reward network: Microcircuitry. Neuropsychopharmacology, 35(1), 27–47.PubMedCentralCrossRef Sesack, S.R., & Grace, A.A. (2009). Cortico-basal ganglia reward network: Microcircuitry. Neuropsychopharmacology, 35(1), 27–47.PubMedCentralCrossRef
Zurück zum Zitat Shadmehr, R., & Krakauer, J.W. (2008). A computational neuroanatomy for motor control. Experimentalis Brain Research, 185(3), 359–81.CrossRef Shadmehr, R., & Krakauer, J.W. (2008). A computational neuroanatomy for motor control. Experimentalis Brain Research, 185(3), 359–81.CrossRef
Zurück zum Zitat Simony, E., Bagdasarian, K., Herfst, L., Brecht, M., Ahissar, E., Golomb, D. (2010). Temporal and spatial characteristics of vibrissa responses to motor commands. Journal of Neuroscience, 30(26), 8935–8952.PubMedCrossRef Simony, E., Bagdasarian, K., Herfst, L., Brecht, M., Ahissar, E., Golomb, D. (2010). Temporal and spatial characteristics of vibrissa responses to motor commands. Journal of Neuroscience, 30(26), 8935–8952.PubMedCrossRef
Zurück zum Zitat Singh, S, Lewis, R.L., Barto, A.G., Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions Autonomous Mental Development, 2(2), 70–82.CrossRef Singh, S, Lewis, R.L., Barto, A.G., Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions Autonomous Mental Development, 2(2), 70–82.CrossRef
Zurück zum Zitat Stolle, M., & Precup, D. (2002). Learning options in reinforcement learning, lecture notes in computer science (Vol. 2371, pp. 212–223). Berlin Heidelberg: Springer. Stolle, M., & Precup, D. (2002). Learning options in reinforcement learning, lecture notes in computer science (Vol. 2371, pp. 212–223). Berlin Heidelberg: Springer.
Zurück zum Zitat Sutton, R., Precup, D., Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence in Engineering, 112, 181–211.CrossRef Sutton, R., Precup, D., Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence in Engineering, 112, 181–211.CrossRef
Zurück zum Zitat Sutton, R.A., Modayil, J., Delp, M., Degris, T., Pilarski, P.M., White, A., Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th international conference on autonomous agents and multiagent systems - volume 2, international foundation for autonomous agents and multiagent systems, 2031726 (pp. 761–768). Sutton, R.A., Modayil, J., Delp, M., Degris, T., Pilarski, P.M., White, A., Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th international conference on autonomous agents and multiagent systems - volume 2, international foundation for autonomous agents and multiagent systems, 2031726 (pp. 761–768).
Zurück zum Zitat Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.
Zurück zum Zitat Szwed, M., Bagdasarian, K., Ahissar, E. (2003). Encoding of vibrissal active touch. Neuron, 40(3), 621–30.PubMedCrossRef Szwed, M., Bagdasarian, K., Ahissar, E. (2003). Encoding of vibrissal active touch. Neuron, 40(3), 621–30.PubMedCrossRef
Zurück zum Zitat Szwed, M., Bagdasarian, K., Blumenfeld, B., Barak, O., Derdikman, D., Ahissar, E. (2006). Responses of trigeminal ganglion neurons to the radial distance of contact during active vibrissal touch. Journal of Neurophysiology, 95(2), 791–802.PubMedCrossRef Szwed, M., Bagdasarian, K., Blumenfeld, B., Barak, O., Derdikman, D., Ahissar, E. (2006). Responses of trigeminal ganglion neurons to the radial distance of contact during active vibrissal touch. Journal of Neurophysiology, 95(2), 791–802.PubMedCrossRef
Zurück zum Zitat Tchernichovski, O., & Benjamini, Y. (1998). The dynamics of long-term exploration in the rat. part ii. an analytical model of the kinematic structure of rat exploratory behavior. Biological Cybernetics, 78(6), 433–40.PubMedCrossRef Tchernichovski, O., & Benjamini, Y. (1998). The dynamics of long-term exploration in the rat. part ii. an analytical model of the kinematic structure of rat exploratory behavior. Biological Cybernetics, 78(6), 433–40.PubMedCrossRef
Zurück zum Zitat Tchernichovski, O., Benjamini, Y., Golani, I. (1998). The dynamics of long-term exploration in the rat. part i. a phase-plane analysis of the relationship between location and velocity. Biological Cybernetics, 78(6), 423–32.PubMedCrossRef Tchernichovski, O., Benjamini, Y., Golani, I. (1998). The dynamics of long-term exploration in the rat. part i. a phase-plane analysis of the relationship between location and velocity. Biological Cybernetics, 78(6), 423–32.PubMedCrossRef
Zurück zum Zitat Tinbergen, N. (1951). The study of instinct. New York: Oxford University Press. Tinbergen, N. (1951). The study of instinct. New York: Oxford University Press.
Zurück zum Zitat Tishby, N., & Polani, D. (2011). Information theory of decisions and actions. Springer series in cognitive and neural systems (chap. 19 pp. 601–636). New York: Springer. Tishby, N., & Polani, D. (2011). Information theory of decisions and actions. Springer series in cognitive and neural systems (chap. 19 pp. 601–636). New York: Springer.
Zurück zum Zitat Towal, R.B., & Hartmann, M.J. (2006). Right-left asymmetries in the whisking behavior of rats anticipate head movements. Journal of Neuroscience, 26(34), 8838–46.PubMedCrossRef Towal, R.B., & Hartmann, M.J. (2006). Right-left asymmetries in the whisking behavior of rats anticipate head movements. Journal of Neuroscience, 26(34), 8838–46.PubMedCrossRef
Zurück zum Zitat Towal, R.B., & Hartmann, M.J. (2008). Variability in velocity profiles during free-air whisking behavior of unrestrained rats. Journal of Neurophysiology, 100(2), 740–52.PubMedCrossRef Towal, R.B., & Hartmann, M.J. (2008). Variability in velocity profiles during free-air whisking behavior of unrestrained rats. Journal of Neurophysiology, 100(2), 740–52.PubMedCrossRef
Zurück zum Zitat Vergassola, M., Villermaux, E., Shraiman, B.I. (2007). Infotaxis as a strategy for searching without gradients. Natural, 445(7126), 406–9.CrossRef Vergassola, M., Villermaux, E., Shraiman, B.I. (2007). Infotaxis as a strategy for searching without gradients. Natural, 445(7126), 406–9.CrossRef
Zurück zum Zitat Wawrzynski, P., & Pacut, A. (2004). Model-free off-policy reinforcement learning in continuous environment. In Proceedings of the 2004 IEEE international joint conference on neural networks, 2004. (vol 2, pp. 1091–1096). Wawrzynski, P., & Pacut, A. (2004). Model-free off-policy reinforcement learning in continuous environment. In Proceedings of the 2004 IEEE international joint conference on neural networks, 2004. (vol 2, pp. 1091–1096).
Zurück zum Zitat Weng, J. (2004). Developmental robotics: theory and experiments. International Journal Humanoid Robotics, 1(2), 199–236.CrossRef Weng, J. (2004). Developmental robotics: theory and experiments. International Journal Humanoid Robotics, 1(2), 199–236.CrossRef
Zurück zum Zitat Whishaw, I.Q., Gharbawie, O.A., Clark, B.J., Lehmann, H. (2006). The exploratory behavior of rats in an open environment optimizes security. Behavior Brain Research, 171(2), 230–9.CrossRef Whishaw, I.Q., Gharbawie, O.A., Clark, B.J., Lehmann, H. (2006). The exploratory behavior of rats in an open environment optimizes security. Behavior Brain Research, 171(2), 230–9.CrossRef
Zurück zum Zitat Yu, C., Horev, G., Rubin, N., Derdikman, D., Haidarliu, S., Ahissar, E. (2013). Coding of object location in the vibrissal thalamocortical system. Cerebral Cortex: bht241. Yu, C., Horev, G., Rubin, N., Derdikman, D., Haidarliu, S., Ahissar, E. (2013). Coding of object location in the vibrissal thalamocortical system. Cerebral Cortex: bht241.
Metadaten
Titel
Learning and control of exploration primitives
verfasst von
Goren Gordon
Ehud Fonio
Ehud Ahissar
Publikationsdatum
01.10.2014
Verlag
Springer US
Erschienen in
Journal of Computational Neuroscience / Ausgabe 2/2014
Print ISSN: 0929-5313
Elektronische ISSN: 1573-6873
DOI
https://doi.org/10.1007/s10827-014-0500-1

Weitere Artikel der Ausgabe 2/2014

Journal of Computational Neuroscience 2/2014 Zur Ausgabe

Premium Partner