Roman Leventov comments on Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

Roman Leventov 12 Jan 2023 23:25 UTC
4 points
1
The relevant paragraph that I quoted refutes exactly this. In the bolded sentence, “value function” is used as a synonym to “utility function”. You simply cannot represent an agent that always seeks to maximise “empowerment” (as defined in the paper for Self-preserving agents), for example, or always seeks to minimise free energy (as in Active Inference agents), as maximising some quantity over its lifetime: if you integrate empowerment or free energy over time you don’t get a sensible information quantity that you can label as “utility”.

This is an uncontroversial idea, and is not a contribution of the paper. The paper contributes a formal demonstration that such agents are “stable”, “self-preserving”. Previously, this hasn’t been shown for arbitrary Active Inference agents, formally.

Note that the fact that these agents are not utility maximisers doesn’t mean they don’t instrumentally converge. Cf. https://www.lesswrong.com/posts/ostLZyhnBPndno2zP/active-inference-as-a-formalisation-of-instrumental. I haven’t read the full paper yet, maybe I will see how the framework in there could admit mild optimisation, but so far I don’t.