Skip to main content

2023 | OriginalPaper | Buchkapitel

Value Cores for Inner and Outer Alignment: Simulating Personality Formation via Iterated Policy Selection and Preference Learning with Self-World Modeling Active Inference Agents

verfasst von : Adam Safron, Zahra Sheikhbahaee, Nick Hay, Jeff Orchard, Jesse Hoey

Erschienen in: Active Inference

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Humanity faces multiple existential risks in the coming decades due to technological advances in AI, and the possibility of unintended behaviors emerging from such systems. We believe that better outcomes may be possible by rigorously exploring frameworks for intelligent (goal-oriented) behavior inspired by computational neuroscience. Here, we explore how the Free Energy Principle and Active Inference (FEP-AI) framework may provide solutions for these challenges via affording the realization of control systems operating according to principles of hierarchical Bayesian modeling and prediction-error (i.e., surprisal) minimization. Such FEP-AI agents are equipped with hierarchically-organized world models capable of counterfactual planning, realized by the kinds of reciprocal message passing performed by mammalian nervous systems, so allowing for the flexible construction of representations of self-world dynamics with varying degrees of temporal depth. We will describe how such systems can not only infer the abstract causal structure of their environment, but also develop capacities for “theory of mind” and collaborative (human-aligned) decision making. Such architectures could help to sidestep potentially dangerous combinations of systems with high intelligence and human-incompatible values, since such mental processes are entangled (rather than orthogonal) in FEP-AI agents. We will further describe how (meta-)learned deep goal hierarchies may also well-describe biological systems, suggesting that potential risks from “mesa-optimisers” may actually represent one of the most promising approaches to AI safety: minimizing prediction-error relative to causal self-world models can be used to cultivate modes of policy selection and agent personalities that robustly optimize for achieving goals that are consistently aligned with both individual and shared values. Finally, we will describe how iterative policy selection and preference learning can result in “value cores” or self-reinforcing, relatively stable attracting states that agents will seek to return to through their goal-oriented imaginings and actions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Froese, T., Ziemke, T.: Enactive artificial intelligence: investigating the systemic organization of life and mind. Artif. Intell. 173(3), 466–500 (2009)CrossRef Froese, T., Ziemke, T.: Enactive artificial intelligence: investigating the systemic organization of life and mind. Artif. Intell. 173(3), 466–500 (2009)CrossRef
2.
Zurück zum Zitat Sarma, G.P., Hay, N., Safron, A., SAFECOMP 2018: AI safety and reproducibility: establishing robust foundations for the neuropsychology of human values. In: Computer Safety, Reliability, and Security, pp. 507–512 (2018). https://arxiv.org/abs/1712.0430 Sarma, G.P., Hay, N., Safron, A., SAFECOMP 2018: AI safety and reproducibility: establishing robust foundations for the neuropsychology of human values. In: Computer Safety, Reliability, and Security, pp. 507–512 (2018). https://​arxiv.​org/​abs/​1712.​0430
3.
Zurück zum Zitat Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850 (2016) Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850 (2016)
4.
Zurück zum Zitat Friston, K.J., Rosch, R., Parr, T., Price, C., Bowman, H.: Deep temporal models and active inference. Neurosci. Biobehav. Rev. 90, 486–501 (2018)CrossRef Friston, K.J., Rosch, R., Parr, T., Price, C., Bowman, H.: Deep temporal models and active inference. Neurosci. Biobehav. Rev. 90, 486–501 (2018)CrossRef
5.
7.
Zurück zum Zitat Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014). ISBN 978-0199678112 Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014). ISBN 978-0199678112
8.
Zurück zum Zitat Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., Pezzulo, G.: Active inference and epistemic value. Cogn. Neurosci. 6(4), 187–214 (2015)CrossRef Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., Pezzulo, G.: Active inference and epistemic value. Cogn. Neurosci. 6(4), 187–214 (2015)CrossRef
9.
Zurück zum Zitat Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., Garrabrant, S.: Risks from learned optimization in advanced machine learning systems. In: Advanced Machine Learning Systems. arXiv: 1906.01820 (2019) Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., Garrabrant, S.: Risks from learned optimization in advanced machine learning systems. In: Advanced Machine Learning Systems. arXiv:​ 1906.​01820 (2019)
10.
11.
Zurück zum Zitat Schmidhuber, J.: PowerPlay: training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Front. Psychol. 4, 313 (2013)CrossRef Schmidhuber, J.: PowerPlay: training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Front. Psychol. 4, 313 (2013)CrossRef
12.
Zurück zum Zitat Friston, K.J., Lin, M., Frith, C.D., Pezzulo, G., Hobson, J.A., Ondobaka, S.: Active inference, curiosity and insight. Neural Comput. 29(10), 2633–2683 (2017)MathSciNetCrossRefMATH Friston, K.J., Lin, M., Frith, C.D., Pezzulo, G., Hobson, J.A., Ondobaka, S.: Active inference, curiosity and insight. Neural Comput. 29(10), 2633–2683 (2017)MathSciNetCrossRefMATH
13.
Zurück zum Zitat Schwartenbeck, P., Passecker, J., Hauser, T.U., FitzGerald, T.H.B., Kronbichler, M., Friston, K.J.: Computational mechanisms of curiosity and goal-directed exploration. ELife 8, e41703 (2019)CrossRef Schwartenbeck, P., Passecker, J., Hauser, T.U., FitzGerald, T.H.B., Kronbichler, M., Friston, K.J.: Computational mechanisms of curiosity and goal-directed exploration. ELife 8, e41703 (2019)CrossRef
14.
Zurück zum Zitat Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135 (2005) Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135 (2005)
15.
Zurück zum Zitat Jung, T., Polani, D., Stone, P.: Empowerment for continuous agent-environment systems. Adapt. Behav. 19(1), 16–39 (2011)CrossRef Jung, T., Polani, D., Stone, P.: Empowerment for continuous agent-environment systems. Adapt. Behav. 19(1), 16–39 (2011)CrossRef
17.
Zurück zum Zitat Brouwer, A., Carhart-Harris, R.L.: Pivotal mental states. J. Psychopharmacol. 35(4), 319–352 (2021)CrossRef Brouwer, A., Carhart-Harris, R.L.: Pivotal mental states. J. Psychopharmacol. 35(4), 319–352 (2021)CrossRef
19.
Zurück zum Zitat Ramstead, M.J.D., Badcock, P.B., Friston, K.J.: Answering Schrödinger’s question: a free-energy formulation. Phys. Life Rev. 24, 1–16 (2018)CrossRef Ramstead, M.J.D., Badcock, P.B., Friston, K.J.: Answering Schrödinger’s question: a free-energy formulation. Phys. Life Rev. 24, 1–16 (2018)CrossRef
20.
22.
Zurück zum Zitat Pezzulo, G., Rigoli, F., Friston, K.: Active inference, homeostatic regulation and adaptive behavioural control. Prog. Neurobiol. 134, 17–35 (2015)CrossRef Pezzulo, G., Rigoli, F., Friston, K.: Active inference, homeostatic regulation and adaptive behavioural control. Prog. Neurobiol. 134, 17–35 (2015)CrossRef
23.
Zurück zum Zitat Taylor, J., Yudkowsky, E., LaVictoire, P., Critch, A.: Alignment for Advanced Machine Learning Systems, Ethics of artificial intelligence, pp. 342–367. Oxford University Press Taylor, J., Yudkowsky, E., LaVictoire, P., Critch, A.: Alignment for Advanced Machine Learning Systems, Ethics of artificial intelligence, pp. 342–367. Oxford University Press
24.
Zurück zum Zitat Hadfield-Menell, D., Dragan, A., Abbeel, P., Russell, S.: Cooperative inverse reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3909–3917 (2016) Hadfield-Menell, D., Dragan, A., Abbeel, P., Russell, S.: Cooperative inverse reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3909–3917 (2016)
25.
Zurück zum Zitat Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S.M.A., Botvinick, M.: Machine theory of mind. In: Proceedings of the 35th International Conference on Machine Learning, vol. 18, pp. 4218–4227 (2018) Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S.M.A., Botvinick, M.: Machine theory of mind. In: Proceedings of the 35th International Conference on Machine Learning, vol. 18, pp. 4218–4227 (2018)
26.
Zurück zum Zitat Xu, K., Ratner, E., Dragan, A., Levine, S., Finn, C.: Learning a prior over intent via meta-inverse reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR 97, pp. 6952–6962 (2019) Xu, K., Ratner, E., Dragan, A., Levine, S., Finn, C.: Learning a prior over intent via meta-inverse reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR 97, pp. 6952–6962 (2019)
27.
Zurück zum Zitat Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–423 (2019)CrossRef Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–423 (2019)CrossRef
28.
29.
Zurück zum Zitat Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. arXiv:1802.06070 (2018) Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. arXiv:​1802.​06070 (2018)
30.
Zurück zum Zitat Dalege, J., Borsboom, D., van Harreveld, F., van der Maas, H.L.J.: The attitudinal entropy (AE) framework as a general theory of individual attitudes. Psychol. Inq. 29(4), 175–193 (2018)CrossRef Dalege, J., Borsboom, D., van Harreveld, F., van der Maas, H.L.J.: The attitudinal entropy (AE) framework as a general theory of individual attitudes. Psychol. Inq. 29(4), 175–193 (2018)CrossRef
31.
Zurück zum Zitat Safron, A., çatal, C., Verbelen, T.: Generalized simultaneous localization and mapping (G-SLAM) as unification framework for natural and artificial intelligences: towards reverse engineering the hippocampal/entorhinal system and principles of high-level cognition. PsyArXiv. https://doi.org/10.31234/osf.io/tdw82(2021) Safron, A., çatal, C., Verbelen, T.: Generalized simultaneous localization and mapping (G-SLAM) as unification framework for natural and artificial intelligences: towards reverse engineering the hippocampal/entorhinal system and principles of high-level cognition. PsyArXiv. https://​doi.​org/​10.​31234/​osf.​io/​tdw82(2021)
32.
Zurück zum Zitat Safron, A., Sheikhbahaee, Z.: Dream to explore: 5-HT2a as adaptive temperature parameter for sophisticated affective inference. In: Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021. Communications in Computer and Information Science, vol. 1524, pp. 799–809. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93736-2_56CrossRef Safron, A., Sheikhbahaee, Z.: Dream to explore: 5-HT2a as adaptive temperature parameter for sophisticated affective inference. In: Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021. Communications in Computer and Information Science, vol. 1524, pp. 799–809. Springer, Cham (2021). https://​doi.​org/​10.​1007/​978-3-030-93736-2_​56CrossRef
33.
Zurück zum Zitat Carhart-Harris, R.L., Nutt, D.J.: Serotonin and brain function: a tale of two receptors. J. Psychopharmacol. 31(9), 1091–1120 (2017)CrossRef Carhart-Harris, R.L., Nutt, D.J.: Serotonin and brain function: a tale of two receptors. J. Psychopharmacol. 31(9), 1091–1120 (2017)CrossRef
35.
Zurück zum Zitat Hesp, C., Smith, R., Parr, T., Allen, M., Friston, K.J., Ramstead, M.J.D.: Deeply felt affect: the emergence of valence in deep active inference. Neural Comput. 33(2), 398–446 (2021)MathSciNetCrossRefMATH Hesp, C., Smith, R., Parr, T., Allen, M., Friston, K.J., Ramstead, M.J.D.: Deeply felt affect: the emergence of valence in deep active inference. Neural Comput. 33(2), 398–446 (2021)MathSciNetCrossRefMATH
36.
Zurück zum Zitat Worbe, Y., et al.: Valence-dependent influence of serotonin depletion on model-based choice strategy. Mol. Psychiatry 21, 624–629 (2016)CrossRef Worbe, Y., et al.: Valence-dependent influence of serotonin depletion on model-based choice strategy. Mol. Psychiatry 21, 624–629 (2016)CrossRef
37.
Zurück zum Zitat Bang, D., Kishida, K.T., Lohrenz, T., Tatter, S.B., Fleming, S.T., Montague, P.R.: Sub-second dopamine and serotonin signaling in human striatum during perceptual decision-making. Neuron 118(5), 999–1010 (2020)CrossRef Bang, D., Kishida, K.T., Lohrenz, T., Tatter, S.B., Fleming, S.T., Montague, P.R.: Sub-second dopamine and serotonin signaling in human striatum during perceptual decision-making. Neuron 118(5), 999–1010 (2020)CrossRef
38.
Zurück zum Zitat Grossman, C.D., Bari, B.A., Cohen, J.Y.: Serotonin neurons modulate learning rate through uncertainty. Curr. Biol. 32(3), 586–599 (2022)CrossRef Grossman, C.D., Bari, B.A., Cohen, J.Y.: Serotonin neurons modulate learning rate through uncertainty. Curr. Biol. 32(3), 586–599 (2022)CrossRef
39.
Zurück zum Zitat Miller, M., Kiverstein, J., Rietveld, E.: The predictive dynamics of happiness and well-being. Emot. Rev. 14(1), 15–30 (2022)CrossRef Miller, M., Kiverstein, J., Rietveld, E.: The predictive dynamics of happiness and well-being. Emot. Rev. 14(1), 15–30 (2022)CrossRef
40.
Zurück zum Zitat Sarma, G.P., Safron, A., Hay, N.J.: Integrative biological simulation, neuropsychology, and AI safety. In: Workshop on Artificial Intelligence Safety 2019 Co-located with the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) (2019) Sarma, G.P., Safron, A., Hay, N.J.: Integrative biological simulation, neuropsychology, and AI safety. In: Workshop on Artificial Intelligence Safety 2019 Co-located with the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) (2019)
42.
Zurück zum Zitat Friston, K.J., Frith, C.: A duet for one. Conscious. Cogn. 36, 390–405 (2015)CrossRef Friston, K.J., Frith, C.: A duet for one. Conscious. Cogn. 36, 390–405 (2015)CrossRef
43.
Zurück zum Zitat Veissiére, S.P.L., Constant, A., Ramstead, M.J.D., Friston, K.J., Kirmayer, K.L.: Thinking through other minds: a variational approach to cognition and culture. Behav. Brain Sci. 43(90), 1–75 (2019) Veissiére, S.P.L., Constant, A., Ramstead, M.J.D., Friston, K.J., Kirmayer, K.L.: Thinking through other minds: a variational approach to cognition and culture. Behav. Brain Sci. 43(90), 1–75 (2019)
44.
Zurück zum Zitat Graziano, M.S.A.: The attention schema theory: a foundation for engineering artificial consciousness. Front. Robot. AI 4, 60 (2017)CrossRef Graziano, M.S.A.: The attention schema theory: a foundation for engineering artificial consciousness. Front. Robot. AI 4, 60 (2017)CrossRef
45.
Zurück zum Zitat Safron, A., DeYoung, C.G.: Integrating cybernetic big five theory with the free energy principle: a new strategy for modeling personalities as complex systems. In: Measuring and Modeling Persons and Situations, vol. 18, pp. 617–649 (2021) Safron, A., DeYoung, C.G.: Integrating cybernetic big five theory with the free energy principle: a new strategy for modeling personalities as complex systems. In: Measuring and Modeling Persons and Situations, vol. 18, pp. 617–649 (2021)
46.
Zurück zum Zitat Safron, A., Klimaj, V.: Learned but not chosen: a reward competition feedback model for the origins of sexual preferences and orientations. In: VanderLaan, D.P., Wong, W.I. (eds.) Gender and Sexuality Development. Focus on Sexuality Research, pp. 443–490. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-84273-4_16 Safron, A., Klimaj, V.: Learned but not chosen: a reward competition feedback model for the origins of sexual preferences and orientations. In: VanderLaan, D.P., Wong, W.I. (eds.) Gender and Sexuality Development. Focus on Sexuality Research, pp. 443–490. Springer, Cham (2022). https://​doi.​org/​10.​1007/​978-3-030-84273-4_​16
47.
Zurück zum Zitat Russell, S.: Human Compatible: Artificial Intelligence and the Problem of Control, Penguin Publishing Group, New York (2019). ISBN 0525558624, 9780525558620 Russell, S.: Human Compatible: Artificial Intelligence and the Problem of Control, Penguin Publishing Group, New York (2019). ISBN 0525558624, 9780525558620
Metadaten
Titel
Value Cores for Inner and Outer Alignment: Simulating Personality Formation via Iterated Policy Selection and Preference Learning with Self-World Modeling Active Inference Agents
verfasst von
Adam Safron
Zahra Sheikhbahaee
Nick Hay
Jeff Orchard
Jesse Hoey
Copyright-Jahr
2023
DOI
https://doi.org/10.1007/978-3-031-28719-0_24

Premium Partner