Could you explain more about why you’re down on agent 1, and think agent 2 won’t wirehead?
My first impression is that agent 1 will take its expected changes into account when trying to maximize the time-summed (current) utility function, and so it won’t just purchase options it will never use, or similar “dumb stuff.” On the other topic, the only way agent 2 can’t wirehead is if there’s no possible way for it to influence its likely future utility functions—otherwise it’ll act to increase the probability that it chooses big, easy utility functions, and then it will choose same big easy utility functions, and then it’s wireheaded.
I am pretty sure that Agent 2 will wirehead on the Simpleton Gambit, depending heavily on the number of time cycles to follow, the comparative advantage that can be gained from wireheading and the negative utility the current utility function assigns to the change.
Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later, as described in AIXI and existential despair. So basically the two futures look very similar to the agent except that for the part where the screen says something different and then it all comes down to whether the utility function has preferences over that particular fact.
Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later,
Ah, right, that abstraction thing. I’m still fairly confused by it. Maybe a simple game will help see what’s going on.
The simple game can be something like a two-step choice. At time T1, the agent can send either A or B. Then at time T2, the agent can send A or B again, but its utility function might have changed in between.
For the original utility function, our payoff matrix looks like AA: 10, AB: −1, BA: 0, BB: 1. So if the utility function didn’t change, the agent would just send A at time T1 and A at time T2, and get a reward of 10.
But suppose in between T1 and T2, a program predictably changes the agent’s payoff matrix, as stored in memory, to AA: −1, AB: 10, BA: 0, BB: 1. Now if the agent sent A at time T1, it will send B at time T2, to claim the new payoff for AB of 10 units. Even though AB is lowest on the preference ordering of the agent at T1. So if our agent is clever, it sends B at time T1 rather than A, knowing that the future program will also pick B, leading to an outcome (BB, for a reward of 1) that the agent at T1 prefers to AB.
I would assume that it is not smart enough to forsee its own future actions and therefore dynamically inconsistent. The original AIXI does not allow for the agent to be part of the environment. If we tried to relax the dualism then your question depends strongly on the approximation to AIXI we would use to make it computable. If this approximation can be scaled down in a way such that it is still a good estimator for the agent’s future actions, then maybe an environment containing a scaled down, more abstract AIXI model will, after a lot of observations, become one of the consistent programs with lowest complexity. Maybe. That is about the only way I can imagine right now that we would not run into this problem.
Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later, as described in AIXI and existential despair.
Be warned that that post made practically no sense—and surely isn’t a good reference.
Could you explain more about why you’re down on agent 1, and think agent 2 won’t wirehead?
My first impression is that agent 1 will take its expected changes into account when trying to maximize the time-summed (current) utility function, and so it won’t just purchase options it will never use, or similar “dumb stuff.” On the other topic, the only way agent 2 can’t wirehead is if there’s no possible way for it to influence its likely future utility functions—otherwise it’ll act to increase the probability that it chooses big, easy utility functions, and then it will choose same big easy utility functions, and then it’s wireheaded.
I am pretty sure that Agent 2 will wirehead on the Simpleton Gambit, depending heavily on the number of time cycles to follow, the comparative advantage that can be gained from wireheading and the negative utility the current utility function assigns to the change.
Agent 1 will have trouble modeling how its decision to change its utility function now will influence its own decisions later, as described in AIXI and existential despair. So basically the two futures look very similar to the agent except that for the part where the screen says something different and then it all comes down to whether the utility function has preferences over that particular fact.
Ah, right, that abstraction thing. I’m still fairly confused by it. Maybe a simple game will help see what’s going on.
The simple game can be something like a two-step choice. At time T1, the agent can send either A or B. Then at time T2, the agent can send A or B again, but its utility function might have changed in between.
For the original utility function, our payoff matrix looks like AA: 10, AB: −1, BA: 0, BB: 1. So if the utility function didn’t change, the agent would just send A at time T1 and A at time T2, and get a reward of 10.
But suppose in between T1 and T2, a program predictably changes the agent’s payoff matrix, as stored in memory, to AA: −1, AB: 10, BA: 0, BB: 1. Now if the agent sent A at time T1, it will send B at time T2, to claim the new payoff for AB of 10 units. Even though AB is lowest on the preference ordering of the agent at T1. So if our agent is clever, it sends B at time T1 rather than A, knowing that the future program will also pick B, leading to an outcome (BB, for a reward of 1) that the agent at T1 prefers to AB.
So, is our AIXI Agent 1 clever enough to do that?
I would assume that it is not smart enough to forsee its own future actions and therefore dynamically inconsistent. The original AIXI does not allow for the agent to be part of the environment. If we tried to relax the dualism then your question depends strongly on the approximation to AIXI we would use to make it computable. If this approximation can be scaled down in a way such that it is still a good estimator for the agent’s future actions, then maybe an environment containing a scaled down, more abstract AIXI model will, after a lot of observations, become one of the consistent programs with lowest complexity. Maybe. That is about the only way I can imagine right now that we would not run into this problem.
Thanks, that helps.
Be warned that that post made practically no sense—and surely isn’t a good reference.