nach oben

Erschienen in:

2023 | OriginalPaper | Buchkapitel

Value Cores for Inner and Outer Alignment: Simulating Personality Formation via Iterated Policy Selection and Preference Learning with Self-World Modeling Active Inference Agents

verfasst von : Adam Safron, Zahra Sheikhbahaee, Nick Hay, Jeff Orchard, Jesse Hoey

Erschienen in: Active Inference

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Humanity faces multiple existential risks in the coming decades due to technological advances in AI, and the possibility of unintended behaviors emerging from such systems. We believe that better outcomes may be possible by rigorously exploring frameworks for intelligent (goal-oriented) behavior inspired by computational neuroscience. Here, we explore how the Free Energy Principle and Active Inference (FEP-AI) framework may provide solutions for these challenges via affording the realization of control systems operating according to principles of hierarchical Bayesian modeling and prediction-error (i.e., surprisal) minimization. Such FEP-AI agents are equipped with hierarchically-organized world models capable of counterfactual planning, realized by the kinds of reciprocal message passing performed by mammalian nervous systems, so allowing for the flexible construction of representations of self-world dynamics with varying degrees of temporal depth. We will describe how such systems can not only infer the abstract causal structure of their environment, but also develop capacities for “theory of mind” and collaborative (human-aligned) decision making. Such architectures could help to sidestep potentially dangerous combinations of systems with high intelligence and human-incompatible values, since such mental processes are entangled (rather than orthogonal) in FEP-AI agents. We will further describe how (meta-)learned deep goal hierarchies may also well-describe biological systems, suggesting that potential risks from “mesa-optimisers” may actually represent one of the most promising approaches to AI safety: minimizing prediction-error relative to causal self-world models can be used to cultivate modes of policy selection and agent personalities that robustly optimize for achieving goals that are consistently aligned with both individual and shared values. Finally, we will describe how iterative policy selection and preference learning can result in “value cores” or self-reinforcing, relatively stable attracting states that agents will seek to return to through their goal-oriented imaginings and actions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Efficient Search of Active Inference Policy Spaces Using k-Means

Nächstes Kapitel Deriving Time-Averaged Active Inference from Control Principles

Froese, T., Ziemke, T.: Enactive artificial intelligence: investigating the systemic organization of life and mind. Artif. Intell. 173(3), 466–500 (2009)CrossRef

Sarma, G.P., Hay, N., Safron, A., SAFECOMP 2018: AI safety and reproducibility: establishing robust foundations for the neuropsychology of human values. In: Computer Safety, Reliability, and Security, pp. 507–512 (2018). https://arxiv.org/abs/1712.0430

Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: International Conference on Machine Learning, pp. 1842–1850 (2016)

Friston, K.J., Rosch, R., Parr, T., Price, C., Bowman, H.: Deep temporal models and active inference. Neurosci. Biobehav. Rev. 90, 486–501 (2018)CrossRef

Friston, K.J., Da Costa, L., Hafner, D., Hesp, C., Parr, T.: Sophisticated inference. Neural Comput. 33(3), 713–763 (2020)MathSciNetCrossRefMATH

Bostrom, N.: The superintelligent will: motivation and instrumental rationality in advanced artificial agents. Mind. Mach. 22, 71–85 (2012). https://doi.org/10.1007/s11023-012-9281-3CrossRef

Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014). ISBN 978-0199678112

Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., Pezzulo, G.: Active inference and epistemic value. Cogn. Neurosci. 6(4), 187–214 (2015)CrossRef

Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., Garrabrant, S.: Risks from learned optimization in advanced machine learning systems. In: Advanced Machine Learning Systems. arXiv: 1906.01820 (2019)

10.

Yampolskiy, R.V. : Verifier theory from axioms to unverifiability of mathematical proofs, software and AI. arXiv: 1609.00331v1 (2016)

11.

Schmidhuber, J.: PowerPlay: training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Front. Psychol. 4, 313 (2013)CrossRef

12.

Friston, K.J., Lin, M., Frith, C.D., Pezzulo, G., Hobson, J.A., Ondobaka, S.: Active inference, curiosity and insight. Neural Comput. 29(10), 2633–2683 (2017)MathSciNetCrossRefMATH

13.

Schwartenbeck, P., Passecker, J., Hauser, T.U., FitzGerald, T.H.B., Kronbichler, M., Friston, K.J.: Computational mechanisms of curiosity and goal-directed exploration. ELife 8, e41703 (2019)CrossRef

14.

Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135 (2005)

15.

Jung, T., Polani, D., Stone, P.: Empowerment for continuous agent-environment systems. Adapt. Behav. 19(1), 16–39 (2011)CrossRef

16.

Schmidhuber, J.: Gödel machines: fully self-referential optimal universal self-improvers. In: Goertzel, B., Pennachin, C. (eds.) Artificial General Intelligence, Cognitive Technologies, pp. 119–226. Springer, Berlin (2006). https://doi.org/10.1007/978-3-540-68677-4_7

17.

Brouwer, A., Carhart-Harris, R.L.: Pivotal mental states. J. Psychopharmacol. 35(4), 319–352 (2021)CrossRef

18.

Demski, A., Garrabrant, S.: Embedded agency. arXiv preprint arXiv:1902.09469 (2019)

19.

Ramstead, M.J.D., Badcock, P.B., Friston, K.J.: Answering Schrödinger’s question: a free-energy formulation. Phys. Life Rev. 24, 1–16 (2018)CrossRef

20.

Man, K., Damasio, A., Neven, H.: Need is all you need: homeostatic neural networks adapt to concept shift. arxiv: 2205.08645 (2022)

21.

Warrell, J., Gerstein, M.: Cyclic and multilevel causation in evolutionary processes. Biol. Philos. 35, 50 (2020). https://doi.org/10.1007/s10539-020-09753-3CrossRef

22.

Pezzulo, G., Rigoli, F., Friston, K.: Active inference, homeostatic regulation and adaptive behavioural control. Prog. Neurobiol. 134, 17–35 (2015)CrossRef

23.

Taylor, J., Yudkowsky, E., LaVictoire, P., Critch, A.: Alignment for Advanced Machine Learning Systems, Ethics of artificial intelligence, pp. 342–367. Oxford University Press

24.

Hadfield-Menell, D., Dragan, A., Abbeel, P., Russell, S.: Cooperative inverse reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3909–3917 (2016)

25.

Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S.M.A., Botvinick, M.: Machine theory of mind. In: Proceedings of the 35th International Conference on Machine Learning, vol. 18, pp. 4218–4227 (2018)

26.

Xu, K., Ratner, E., Dragan, A., Levine, S., Finn, C.: Learning a prior over intent via meta-inverse reinforcement learning. In: Proceedings of the 36th International Conference on Machine Learning, PMLR 97, pp. 6952–6962 (2019)

27.

Botvinick, M., Ritter, S., Wang, J.X., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–423 (2019)CrossRef

28.

Gupta, A., Eysenbach, B., Finn, C., Levine, S.: Unsupervised Meta-Learning for Reinforcement Learning. arXiv:1806.04640 (2018)

29.

Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. arXiv:1802.06070 (2018)

30.

Dalege, J., Borsboom, D., van Harreveld, F., van der Maas, H.L.J.: The attitudinal entropy (AE) framework as a general theory of individual attitudes. Psychol. Inq. 29(4), 175–193 (2018)CrossRef

31.

Safron, A., çatal, C., Verbelen, T.: Generalized simultaneous localization and mapping (G-SLAM) as unification framework for natural and artificial intelligences: towards reverse engineering the hippocampal/entorhinal system and principles of high-level cognition. PsyArXiv. https://doi.org/10.31234/osf.io/tdw82(2021)

32.

Safron, A., Sheikhbahaee, Z.: Dream to explore: 5-HT2a as adaptive temperature parameter for sophisticated affective inference. In: Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2021. Communications in Computer and Information Science, vol. 1524, pp. 799–809. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-93736-2_56CrossRef

33.

Carhart-Harris, R.L., Nutt, D.J.: Serotonin and brain function: a tale of two receptors. J. Psychopharmacol. 31(9), 1091–1120 (2017)CrossRef

34.

Parr, T., Friston, K.J.: The anatomy of inference: generative models and brain structure. Front. Comput. Neurosci. 12, 90 (2018). https://doi.org/10.3389/fncom.2018.00090CrossRef

35.

Hesp, C., Smith, R., Parr, T., Allen, M., Friston, K.J., Ramstead, M.J.D.: Deeply felt affect: the emergence of valence in deep active inference. Neural Comput. 33(2), 398–446 (2021)MathSciNetCrossRefMATH

36.

Worbe, Y., et al.: Valence-dependent influence of serotonin depletion on model-based choice strategy. Mol. Psychiatry 21, 624–629 (2016)CrossRef

37.

Bang, D., Kishida, K.T., Lohrenz, T., Tatter, S.B., Fleming, S.T., Montague, P.R.: Sub-second dopamine and serotonin signaling in human striatum during perceptual decision-making. Neuron 118(5), 999–1010 (2020)CrossRef

38.

Grossman, C.D., Bari, B.A., Cohen, J.Y.: Serotonin neurons modulate learning rate through uncertainty. Curr. Biol. 32(3), 586–599 (2022)CrossRef

39.

Miller, M., Kiverstein, J., Rietveld, E.: The predictive dynamics of happiness and well-being. Emot. Rev. 14(1), 15–30 (2022)CrossRef

40.

Sarma, G.P., Safron, A., Hay, N.J.: Integrative biological simulation, neuropsychology, and AI safety. In: Workshop on Artificial Intelligence Safety 2019 Co-located with the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) (2019)

41.

Friston, K.J., Frith, C.D.: Active inference, communication and hermeneutics. Cortex 68, 129–143 (2015). https://doi.org/10.1016/j.cortex.2015.03.025CrossRef

42.

Friston, K.J., Frith, C.: A duet for one. Conscious. Cogn. 36, 390–405 (2015)CrossRef

43.

Veissiére, S.P.L., Constant, A., Ramstead, M.J.D., Friston, K.J., Kirmayer, K.L.: Thinking through other minds: a variational approach to cognition and culture. Behav. Brain Sci. 43(90), 1–75 (2019)

44.

Graziano, M.S.A.: The attention schema theory: a foundation for engineering artificial consciousness. Front. Robot. AI 4, 60 (2017)CrossRef

45.

Safron, A., DeYoung, C.G.: Integrating cybernetic big five theory with the free energy principle: a new strategy for modeling personalities as complex systems. In: Measuring and Modeling Persons and Situations, vol. 18, pp. 617–649 (2021)

46.

Safron, A., Klimaj, V.: Learned but not chosen: a reward competition feedback model for the origins of sexual preferences and orientations. In: VanderLaan, D.P., Wong, W.I. (eds.) Gender and Sexuality Development. Focus on Sexuality Research, pp. 443–490. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-84273-4_16

47.

Russell, S.: Human Compatible: Artificial Intelligence and the Problem of Control, Penguin Publishing Group, New York (2019). ISBN 0525558624, 9780525558620

Titel: Value Cores for Inner and Outer Alignment: Simulating Personality Formation via Iterated Policy Selection and Preference Learning with Self-World Modeling Active Inference Agents
verfasst von: Adam Safron
Zahra Sheikhbahaee
Nick Hay
Jeff Orchard
Jesse Hoey
Verlag: Springer Nature Switzerland
Buch: Active Inference
Print ISBN: 978-3-031-28718-3

Electronic ISBN: 978-3-031-28719-0

Copyright-Jahr: 2023
DOI: https://doi.org/10.1007/978-3-031-28719-0_24

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner