Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later,
Ah, right, that abstraction thing. I’m still fairly confused by it. Maybe a simple game will help see what’s going on.
The simple game can be something like a two-step choice. At time T1, the agent can send either A or B. Then at time T2, the agent can send A or B again, but its utility function might have changed in between.
For the original utility function, our payoff matrix looks like AA: 10, AB: −1, BA: 0, BB: 1. So if the utility function didn’t change, the agent would just send A at time T1 and A at time T2, and get a reward of 10.
But suppose in between T1 and T2, a program predictably changes the agent’s payoff matrix, as stored in memory, to AA: −1, AB: 10, BA: 0, BB: 1. Now if the agent sent A at time T1, it will send B at time T2, to claim the new payoff for AB of 10 units. Even though AB is lowest on the preference ordering of the agent at T1. So if our agent is clever, it sends B at time T1 rather than A, knowing that the future program will also pick B, leading to an outcome (BB, for a reward of 1) that the agent at T1 prefers to AB.
I would assume that it is not smart enough to forsee its own future actions and therefore dynamically inconsistent. The original AIXI does not allow for the agent to be part of the environment. If we tried to relax the dualism then your question depends strongly on the approximation to AIXI we would use to make it computable. If this approximation can be scaled down in a way such that it is still a good estimator for the agent’s future actions, then maybe an environment containing a scaled down, more abstract AIXI model will, after a lot of observations, become one of the consistent programs with lowest complexity. Maybe. That is about the only way I can imagine right now that we would not run into this problem.
Ah, right, that abstraction thing. I’m still fairly confused by it. Maybe a simple game will help see what’s going on.
The simple game can be something like a two-step choice. At time T1, the agent can send either A or B. Then at time T2, the agent can send A or B again, but its utility function might have changed in between.
For the original utility function, our payoff matrix looks like AA: 10, AB: −1, BA: 0, BB: 1. So if the utility function didn’t change, the agent would just send A at time T1 and A at time T2, and get a reward of 10.
But suppose in between T1 and T2, a program predictably changes the agent’s payoff matrix, as stored in memory, to AA: −1, AB: 10, BA: 0, BB: 1. Now if the agent sent A at time T1, it will send B at time T2, to claim the new payoff for AB of 10 units. Even though AB is lowest on the preference ordering of the agent at T1. So if our agent is clever, it sends B at time T1 rather than A, knowing that the future program will also pick B, leading to an outcome (BB, for a reward of 1) that the agent at T1 prefers to AB.
So, is our AIXI Agent 1 clever enough to do that?
I would assume that it is not smart enough to forsee its own future actions and therefore dynamically inconsistent. The original AIXI does not allow for the agent to be part of the environment. If we tried to relax the dualism then your question depends strongly on the approximation to AIXI we would use to make it computable. If this approximation can be scaled down in a way such that it is still a good estimator for the agent’s future actions, then maybe an environment containing a scaled down, more abstract AIXI model will, after a lot of observations, become one of the consistent programs with lowest complexity. Maybe. That is about the only way I can imagine right now that we would not run into this problem.
Thanks, that helps.