Say w2a is the world where the agent starts in w2 and w2b is the world that results after the agent moves from w1 to w2.
Without considering the agent’s memory part of the world, it seems like the problem is worse: the only way to distinguish between w2a and w2b is the agent’s memory of past events, so it seems that leaving the agent’s memory over the past out of the utility function requires U(w2a) = U(w2b)
Say w2a is the world where the agent starts in w2 and w2b is the world that results after the agent moves from w1 to w2.
Without considering the agent’s memory part of the world, it seems like the problem is worse: the only way to distinguish between w2a and w2b is the agent’s memory of past events, so it seems that leaving the agent’s memory over the past out of the utility function requires U(w2a) = U(w2b)
U could depend on the entire history of states (rather than on the agent’s memory of that history).
Ah, misunderstood that, thanks.