Skip to main content
Log in

Biological arm motion through reinforcement learning

  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract.

The present paper discusses an optimal learning control method using reinforcement learning for biological systems with a redundant actuator. It is difficult to apply reinforcement learning to biological control systems because of the redundancy in muscle activation space. We solve this problem with the following method. First, we divide the control input space into two subspaces according to a priority order of learning and restrict the search noise for reinforcement learning to the first priority subspace. Then the constraint is reduced as the learning progresses, with the search space extending to the second priority subspace. The higher priority subspace is designed so that the impedance of the arm can be high. A smooth reaching motion is obtained through reinforcement learning without any previous knowledge of the arm’s dynamics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. An K, Kwak B, Chao E, Morrey B (1984) Determination of muscle and joint forces: a new technique to solve the indeterminate problem. Trans Am Soc Mech Eng 106:364–367

  2. Barto A, Sutton R, Abderson C (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans Sys Man Cybern 13(5):834–846

  3. Bizzi E, Mussa-Ivaldi FA, Giszter S (1991) Computations underlying the execution of movement: a biological perspective. Science 253:287–291

  4. Dormont J, Conde H, Farin D (1998) The role of the pedunculopontine tegmental nucleus in relation to conditioned motor performance in the cat. Exp Brain Res 121:401–410

  5. Doya K (2000) Reinforcement Learning in continuous time and space. Neural Comput 12: 219–245

  6. Doya K (1999) What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw 12:961–974

  7. Doya K (2002) Metalearning and neuromodulation. Neural Netw 15:495–506

  8. Feldman A (1986) Once more on the equilibrium point hypothesis (λ model) for motor control. J Mot Behav 18(1):17–54

  9. Flash T, Hogan N (1985) The coordination of arm movements: an experimentally confirmed mathematical model. J Neurosci 5:1688–1703

  10. Harris C (1998) On the optimal control of behavior: a stochastic perspective. J Neurosci Methods 83:73–88

  11. Hogan N (1984) Adaptive control of mechanical impedance by coactivation of antagonistic muscles. IEEE Trans Automat Control AC (29):681–690

  12. Houk J, Adams J, Barto A (1994) A model of how the basal ganglia generate and use neural signals that predict reinforcement. In: Models of information processing in the basal ganglia. MIT Press, Cambridge, MA

  13. Ito K, Ito M (1991) Motion control in living bodies and roberts (in Japanese). Soc Instrum Control Eng pp. 133–140

  14. Joel D, Niv Y, Ruppin E (2002) Actor-critic models of the basal ganglia: new anatomical and computational perspectives. Neural Netw 15:535–547

  15. Jordan M, Wolpert D (1999) The cognitive neuroscience, chap 42. MIT Press, Cambridge, MA

  16. Katayama M, Inoue S, Kawato M (1993) A strategy of motor learning using adjustable parameters for arm movement. In: Proceedings of the 20th annual international conference of the IEEE Engineering in Medicine and Biology Society, pp 2370–2373

  17. Mclntyre J, Bizzi E (1993) Servo hypotheses for the biological control of movement. J Mot Behav 25(3):193–202

  18. Montague P, Dayan P, Sejnowski T (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci 16:1936–1947

  19. Nelson W (1983) Physical principles for economies of skilled movements. Biol Cybern 46:135–147

  20. Osu R, Franklin D, Kato H, Gomi H, Yoshioka KDT, Kawato M (2002) Short- and long-term changes in joint co-contraction associated with motor learning as revealed from surface emg. J Neurophysiol 88:991–1004

  21. Pearson K, Gordon J (2000) Spinal reflexes, chap 36. McGraw-Hill, New York

  22. Saltiel P, Wyler-duda K, D’Avella A, Tresch M, Bizzi E (2001) Muscle synergies encoded within the spinal cord: evidence from focal intraspinal nmda iontophoresis in the frog. J Neurophysiol 85:605–619

  23. Sanger T (1994) Neural network learning control of robot manipulators using gradually increasing task difficulty. IEEE Trans Robot Automat 10(3):323–333

  24. Schultz W, Dayan P, Montague P (1997) A neural substrate of prediction and reward. Science 275(14):1593–1598

  25. Shibata K, Sugisaka M, Ito K (2000) Hand reaching movement acquired through reinforcement learning. In: Proceedings of Korea Automatic Control Conference (KACC 2000), vol 90 (CD-ROM)

  26. Suri R (2002) Td models of reward predictive responses in dopamine neurons. Neural Netw 15:523–533

  27. Sutton R, Barto A (1998) Reinforcement learning. MIT Press, Cambridge, MA

  28. Takakusaki K, Habaguchi T, Ohinata-sugimoto J, Saito K, Sakamoto T (2003) Basal ganglia efferents to the brainstem centers controlling postual muscle tone and locomotion: a new concept for understanding motor disorders in basal ganglia dysfunction. J Neurosci 119:293–308

  29. Takakusaki K, Kohyama J, Matsuyama K, Mori S (2001) Medullary reticulospinal tract mediating the generalized motor inhibition in cats: parallel inhibitory mechanisms acting on motoneurons and on interneuronal transmission in reflex pathways. J Neurosci 103:511–527

  30. Thelen E, Smith L (1994) Dynamic systems approach to the development of cognition and action. MIT Press/Bradford Books, Cambridge, MA

  31. Thoroughman K, Shadmehr R (1999) Electromyographic correlates of learning an internal model of reaching movements. J Neurosci 19(19):8573–8588

  32. Todrodov E, Jordan M (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5(11):1226–1235

  33. Tresch M, Saltiel P, Bizzi E (1999) The construction of movement by the spinal cord. Nat Neurosci 2:162–167

  34. Uno Y, Kawato M, Suzuki R (1989) Formation and control of optimal trajectory in human multijoint arm movement. Minimum torque-change model. Biol Cybern 61(2):89–101

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Izawa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Izawa, J., Kondo, T. & Ito, K. Biological arm motion through reinforcement learning. Biol. Cybern. 91, 10–22 (2004). https://doi.org/10.1007/s00422-004-0485-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-004-0485-3

Keywords

Navigation