Actually, no matter what the policy is, we can view the agent as an EU maximizer.
There is an even broader argument to be made. For an agent that is represented by a program, no matter what the preferences are, even if inconsistent, we can view it as an EU maximizer that always chooses the output it is programmed to take. (If it is randomized, its preferences are weighted between those options.)
I suspect there are other constructions that are at least slightly less trivial, because this trivial construction has utilities over only the “outcomes” of which action it takes, which is a deontological goal, rather than the external world, which would allow more typically consequentialist goals. Still, it is consistent with definitions of EU maximization.
There is an even broader argument to be made. For an agent that is represented by a program, no matter what the preferences are, even if inconsistent, we can view it as an EU maximizer that always chooses the output it is programmed to take. (If it is randomized, its preferences are weighted between those options.)
I suspect there are other constructions that are at least slightly less trivial, because this trivial construction has utilities over only the “outcomes” of which action it takes, which is a deontological goal, rather than the external world, which would allow more typically consequentialist goals. Still, it is consistent with definitions of EU maximization.