I am starting to see what you mean. Let’s stick with utility functions over histories of length m_k (whole sequences) like you proposed and denote them with a capital U to distinguish them from the prefix utilities. I think your Agent 4 runs into the following problem: modeled_action(n,m) actually depends on the actions and observations yx_{k:m-1} and needs to be calculated for each combination, so y_m is actually
which clutters up the notation so much that I don’t want to write it down anymore.
We also get into trouble with taking the expectation, the observations x_{k+1:n} are only considered in modeling the actions of the future agents, but not now. What is M(yx_<k,yx_k:n) even supposed to mean, where do the x’s come from?
So let’s torture some indices:
where n>=k and
This is not really AIXI anymore and I am not sure what to do with it, but I like it.
There is also a more detailed paper by Lattimore and Hutter (2011) on discounting and time consistency that is interesting in that context.