It’s closer to #1. AIXI’s hypothesis space does not contain any possible worlds at all in which the agent is not separated from the environment by a Cartesian barrier.
Not really.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”. Technically, the original agent still exists in the model, but since it output has been disconnected it doesn’t directly affect the world after that specific time. The only way it has to continue to exert an influence, is by affecting the “successor” before the disconnection occurs.
These world models are clearly part of the hypothesis space, so the problem is whether the induction process can assign them a sufficient probability mass. This is non-trivial because scenarios which allow self-modification are usually non-ergodic (e.g. they allow you to drop an anvil on your head), and AIXI, like most reinforcement learning paradigms, is provably optimal only under an assumption of ergodicity. But if you can get around non-ergodicity issues (e.g. by bootstrapping the agent with self-preservation instincts and/or giving it a guardian which prevents it from harming itself until it has learned enough), there is in principle no reason why AIXI-like agent should be incapable of reasoning about self-modification.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”.
That’s true, but in all such hypotheses, the successor is computable, and AIXI is not computable, so the successor agent cannot itself be AIXI.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
...
Counting that as “identifying itself with the approximation” seems overly generous, but if we grant that, I still don’t see any reason that AIXI would end up considering such a hypothesis likely, or that it would be likely to understand any mechanism for it to self-modify correctly in terms of its model of modifying an external approximation to itself.
The simple answer is: because these models accurately predict the observations after self-modification actions are performed. Of course, the problem is not that simple, because of the non-ergodicity issues I’ve discussed before: when you fiddle with your source code without knowing what you are doing is easy to accidentally ‘drop an anvil’ on yourself. But this is a hard problem without any simple solution IMHO.
The simple answer is: because these models accurately predict the observations after self-modification actions are performed.
For that to be true, the environment has to keep sending AIXI the same signals that it sends the approximation even after it stops paying attention to AIXI’s output. Even in that case, the fact that this model correctly predicts future observations doesn’t help at all. Prior to self-modifying, AIXI does not have access to information about what it will observe after self-modifying.
I agree with you that the non-ergodicity issues don’t have any simple solution. I haven’t been making a big deal about non-ergodicity because there don’t exist any agents that perform optimally in all non-ergodic environments (since one computable environment can permanently screw you for doing one thing, and another computable environment can permanently screw you for doing anything else), so it’s not a problem specific to AIXI-like agents, and AIXI actually seems like it should act fairly reasonably in non-ergodic computable environments separated from the agent by a Cartesian barrier, given the information available to it.
Then I don’t think we actually disagree. I mean, it was well known that the AIXI proof of optimality requred ergodicity, since the original Hutter’s paper.
Not really.
The hypothesis space contains worlds where the agent actions past a certain time point are ignored, and whatever part of the world the agent output channel used to drive becomes driven by another part of the world: the “successor”.
Technically, the original agent still exists in the model, but since it output has been disconnected it doesn’t directly affect the world after that specific time. The only way it has to continue to exert an influence, is by affecting the “successor” before the disconnection occurs.
These world models are clearly part of the hypothesis space, so the problem is whether the induction process can assign them a sufficient probability mass. This is non-trivial because scenarios which allow self-modification are usually non-ergodic (e.g. they allow you to drop an anvil on your head), and AIXI, like most reinforcement learning paradigms, is provably optimal only under an assumption of ergodicity.
But if you can get around non-ergodicity issues (e.g. by bootstrapping the agent with self-preservation instincts and/or giving it a guardian which prevents it from harming itself until it has learned enough), there is in principle no reason why AIXI-like agent should be incapable of reasoning about self-modification.
That’s true, but in all such hypotheses, the successor is computable, and AIXI is not computable, so the successor agent cannot itself be AIXI.
Edit: Come to think of it, your response would probably start with something like “I know that, but...”, but I may have addressed some of what your further objections would have been in my replay to Cyan.
...
The simple answer is: because these models accurately predict the observations after self-modification actions are performed.
Of course, the problem is not that simple, because of the non-ergodicity issues I’ve discussed before: when you fiddle with your source code without knowing what you are doing is easy to accidentally ‘drop an anvil’ on yourself. But this is a hard problem without any simple solution IMHO.
For that to be true, the environment has to keep sending AIXI the same signals that it sends the approximation even after it stops paying attention to AIXI’s output. Even in that case, the fact that this model correctly predicts future observations doesn’t help at all. Prior to self-modifying, AIXI does not have access to information about what it will observe after self-modifying.
I agree with you that the non-ergodicity issues don’t have any simple solution. I haven’t been making a big deal about non-ergodicity because there don’t exist any agents that perform optimally in all non-ergodic environments (since one computable environment can permanently screw you for doing one thing, and another computable environment can permanently screw you for doing anything else), so it’s not a problem specific to AIXI-like agents, and AIXI actually seems like it should act fairly reasonably in non-ergodic computable environments separated from the agent by a Cartesian barrier, given the information available to it.
Then I don’t think we actually disagree.
I mean, it was well known that the AIXI proof of optimality requred ergodicity, since the original Hutter’s paper.