Actually, no matter what the policy is, we can view the agent as an EU maximizer. The construction is simple: the agent can be thought as optimizing the utility function U, where U(h, a) = 1 if the policy would take action a given history h, else 0. Here I’m assuming that U is defined over histories that are composed of states/observations and actions.
This is not the type signature for a utility function that matters for the coherence arguments (by which I don’t mean VNM—see this comment). It does often fit the type signature in the way those arguments are formulated/formalised, but intuitively, it’s not getting at the point of the theorems. I suggest you consider utility functions defined as functions of the state of the world only, not including the action taken. (Yes I know actions could be logged in the world state, the agent is embedded in the state, etc. - this is all irrelevant for the point I’m trying to make—I’m suggesting to consider the setup where there’s a Cartesian boundary, an unknown transition function, and environment states that don’t contain a log of actions.) I don’t think the above kind of construction works in that setting. I think that’s the kind of setting it’s better to focus on.
Have you seen this post, which looks at the setting you mentioned?
From my perspective, I want to know why it makes sense to assume that the AI system will have preferences over world states, before I start reasoning about that scenario. And there are reasons to expect something along these lines! I talk about some of them in the next post in this sequence! But I think once you’ve incorporated some additional reason like “humans will want goal-directed agents” or “agents optimized to do some tasks we write down will hit upon a core of general intelligence”, then I’m already on board that you get goal-directed behavior, and I’m not interested in the construction in this post any more. The only point of the construction in this post is to demonstrate that you need this additional reason.
This is not the type signature for a utility function that matters for the coherence arguments (by which I don’t mean VNM—see this comment). It does often fit the type signature in the way those arguments are formulated/formalised, but intuitively, it’s not getting at the point of the theorems. I suggest you consider utility functions defined as functions of the state of the world only, not including the action taken. (Yes I know actions could be logged in the world state, the agent is embedded in the state, etc. - this is all irrelevant for the point I’m trying to make—I’m suggesting to consider the setup where there’s a Cartesian boundary, an unknown transition function, and environment states that don’t contain a log of actions.) I don’t think the above kind of construction works in that setting. I think that’s the kind of setting it’s better to focus on.
Have you seen this post, which looks at the setting you mentioned?
From my perspective, I want to know why it makes sense to assume that the AI system will have preferences over world states, before I start reasoning about that scenario. And there are reasons to expect something along these lines! I talk about some of them in the next post in this sequence! But I think once you’ve incorporated some additional reason like “humans will want goal-directed agents” or “agents optimized to do some tasks we write down will hit upon a core of general intelligence”, then I’m already on board that you get goal-directed behavior, and I’m not interested in the construction in this post any more. The only point of the construction in this post is to demonstrate that you need this additional reason.