Comparing the three proposed agents, we notice that Agent 1 is dynamically inconsistent: it will optimize for future opportunities, that it predictably will not take later.
This seems so flawed as to be pretty much useless. Specification for an agent that optimizes for its current utility function under the knowledge that its utility function will change:
First, replace the action-perception sequence with an action-perception-utility sequence u1,y1,x1,u2,y2,x2,etc. Let the action-generating function be represented by action(k), where k is the step. This will make use of a recursive helper function modeled_action(n, k), representing what it thinks it will do in the future, where n-k is the number of steps forward it looks.
Oops. Yes, that’s what I meant. But it is not the same as Agent 2, because this (Agent 4?) uses its current utility function to evaluate the desirability of future observations and actions, even though it knows that it will use a different utility function to choose between them later. For example, Agent 4 will not take the Simpleton’s Gambit because it cares about its current utility function getting satisfied in the future, not about its future utility function getting satisfied in the future.
Agent 4 can be seen as a set of agents, one for each possible utility function, that are using game theory with each other.
You were using u_k to represent the utility of the last step of its input, so that total utility is the sum of the utilities of its prefixes, while I was using u_k to represent the utility of the whole sequence. If I adapt Agent 4 to your use of u_k, I get
I am starting to see what you mean. Let’s stick with utility functions over histories of length m_k (whole sequences) like you proposed and denote them with a capital U to distinguish them from the prefix utilities.
I think your Agent 4 runs into the following problem:
modeled_action(n,m) actually depends on the actions and observations yx_{k:m-1} and needs to be calculated for each combination, so y_m is actually
)
which clutters up the notation so much that I don’t want to write it down anymore.
We also get into trouble with taking the expectation, the observations x_{k+1:n} are only considered in modeling the actions of the future agents, but not now. What is M(yx_<k,yx_k:n) even supposed to mean, where do the x’s come from?
so y_m is actually [...] which clutters up the notation so much that I don’t want to write it down anymore.
Yes.
We also get into trouble with taking the expectation, the observations x{k+1:n} are only considered in modeling the actions of the future agents, but not now. What is M(yx<k,yx_k:n) even supposed to mean, where do the x’s come from?
Oops, you are right. The sum should have been over x_{k:n}, not just over x_k.
So let’s torture some indices: [...]
Yes, that is a cleaner and actually correct version what I was trying to describe. Thanks.
This seems so flawed as to be pretty much useless. Specification for an agent that optimizes for its current utility function under the knowledge that its utility function will change:
First, replace the action-perception sequence with an action-perception-utility sequence u1,y1,x1,u2,y2,x2,etc. Let the action-generating function be represented by action(k), where k is the step. This will make use of a recursive helper function modeled_action(n, k), representing what it thinks it will do in the future, where n-k is the number of steps forward it looks.
action(k) = modeled_action(m_k, k).
modeled_action(k, k) = argmax(y_k) u_k(yx_<k, yx_k)*M(uyx_<k, uyx_k)
for n>k: modeled_action(n, k) = argmax(y_k) u_k(yx_k.
Apologies for the lack of LaTeX.
This seems unnecessary. The information u_i is already contained in x_i.
This completely breaks the expectimax principle. I assume you actually mean something like
=\textrm{arg}\max_{y_k}\sum_{x_k}u_k(\.{y}\.{x}_{%3Ck}y\underline{x}_{k:n})M(\.{y}\.{x}_{%3Ck}y\underline{x}_{k:n}))which is just Agent 2 in disguise.
Oops. Yes, that’s what I meant. But it is not the same as Agent 2, because this (Agent 4?) uses its current utility function to evaluate the desirability of future observations and actions, even though it knows that it will use a different utility function to choose between them later. For example, Agent 4 will not take the Simpleton’s Gambit because it cares about its current utility function getting satisfied in the future, not about its future utility function getting satisfied in the future.
Agent 4 can be seen as a set of agents, one for each possible utility function, that are using game theory with each other.
I second the general sentiment that it would be good for an agent to have these traits, but if I follow your equations I end up with Agent 2.
No, you don’t. If you tried to represent Agent 2 in that notation, you would get
modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_k.
You were using u_k to represent the utility of the last step of its input, so that total utility is the sum of the utilities of its prefixes, while I was using u_k to represent the utility of the whole sequence. If I adapt Agent 4 to your use of u_k, I get
modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_k.
I am starting to see what you mean. Let’s stick with utility functions over histories of length m_k (whole sequences) like you proposed and denote them with a capital U to distinguish them from the prefix utilities. I think your Agent 4 runs into the following problem: modeled_action(n,m) actually depends on the actions and observations yx_{k:m-1} and needs to be calculated for each combination, so y_m is actually
)which clutters up the notation so much that I don’t want to write it down anymore.
We also get into trouble with taking the expectation, the observations x_{k+1:n} are only considered in modeling the actions of the future agents, but not now. What is M(yx_<k,yx_k:n) even supposed to mean, where do the x’s come from?
So let’s torture some indices:
=\textrm{arg}\max_{y_n}\sum_{x_{n:m_k}}U_n(yx_{1:n}\hat{y}_{n+1,k}(yx_{1:n})x_{n+1}\dots) x_{m_k})M(\.{y}\.{x}_{%3Ck}yx_{k:n-1}\hat{y}\underline{x}_{n:m_k}))where n>=k and
This is not really AIXI anymore and I am not sure what to do with it, but I like it.
Yes.
Oops, you are right. The sum should have been over x_{k:n}, not just over x_k.
Yes, that is a cleaner and actually correct version what I was trying to describe. Thanks.