Rewards and Utilities are different concepts. To reject that reward is necessary to get/build agency is not the same thing as rejecting EU maximization as a basin of idealized agency.
The relevant paragraph that I quoted refutes exactly this. In the bolded sentence, “value function” is used as a synonym to “utility function”. You simply cannot represent an agent that always seeks to maximise “empowerment” (as defined in the paper for Self-preserving agents), for example, or always seeks to minimise free energy (as in Active Inference agents), as maximising some quantity over its lifetime: if you integrate empowerment or free energy over time you don’t get a sensible information quantity that you can label as “utility”.
This is an uncontroversial idea, and is not a contribution of the paper. The paper contributes a formal demonstration that such agents are “stable”, “self-preserving”. Previously, this hasn’t been shown for arbitrary Active Inference agents, formally.
Rewards and Utilities are different concepts. To reject that reward is necessary to get/build agency is not the same thing as rejecting EU maximization as a basin of idealized agency.
The relevant paragraph that I quoted refutes exactly this. In the bolded sentence, “value function” is used as a synonym to “utility function”. You simply cannot represent an agent that always seeks to maximise “empowerment” (as defined in the paper for Self-preserving agents), for example, or always seeks to minimise free energy (as in Active Inference agents), as maximising some quantity over its lifetime: if you integrate empowerment or free energy over time you don’t get a sensible information quantity that you can label as “utility”.
This is an uncontroversial idea, and is not a contribution of the paper. The paper contributes a formal demonstration that such agents are “stable”, “self-preserving”. Previously, this hasn’t been shown for arbitrary Active Inference agents, formally.
Note that the fact that these agents are not utility maximisers doesn’t mean they don’t instrumentally converge. Cf. https://www.lesswrong.com/posts/ostLZyhnBPndno2zP/active-inference-as-a-formalisation-of-instrumental. I haven’t read the full paper yet, maybe I will see how the framework in there could admit mild optimisation, but so far I don’t.