Quite frankly, it seems that you have completely misunderstood what AIXI is. AIXI (and its computable variants) is a reinforcement learning agent. You can’t expect it to perform well in a fixed duration one-shot problem.
The thing that you describe as AIXI in your comment doesn’t do any learning and therefore is not AIXI.
I’m not sure what you have in mind, but you seem to describe some sort of expected utility maximizer agent which operates on an explicit model of the world, iterating over Turing machines rather than actions for some (possibly erroneous) reason (AIXI iterates over Turing machines to perform Solomonoff induction. This thing doesn’t perform any induction, hence why bother with Turing machines? Maybe you are thinking of something like UDT, but it is not clear).
But in any case, your model is broken: if the agent simulates a Turing machine which performs the action “Rewrite self into smallest Turing machine which does nothing ever.”, outputting the content of the output tape of the simulated machine on the agent output channel, then the rewrite is not carried out in the simulation inside the agent, but in the real world, therefore the agent gets rewritten and the player reap their reward.
Yes, yes I was implicitly assuming that the AIXI has already been trained up on the game—the technical argument (which allows for training and explores possibilities like “allow the AIXI to choose the agent machine”) is somewhat more nuaunced, and will be explored in depth in an upcoming post. (I was hoping that readers could see the problem from the sketch above, but I suppose if you can’t see AIXI’s problems from Robby’s posts then you’ll probably want to wait for the fully explicit argument.)
Quite frankly, it seems that you have completely misunderstood what AIXI is.
AIXI (and its computable variants) is a reinforcement learning agent. You can’t expect it to perform well in a fixed duration one-shot problem.
The thing that you describe as AIXI in your comment doesn’t do any learning and therefore is not AIXI. I’m not sure what you have in mind, but you seem to describe some sort of expected utility maximizer agent which operates on an explicit model of the world, iterating over Turing machines rather than actions for some (possibly erroneous) reason (AIXI iterates over Turing machines to perform Solomonoff induction. This thing doesn’t perform any induction, hence why bother with Turing machines? Maybe you are thinking of something like UDT, but it is not clear).
But in any case, your model is broken: if the agent simulates a Turing machine which performs the action “Rewrite self into smallest Turing machine which does nothing ever.”, outputting the content of the output tape of the simulated machine on the agent output channel, then the rewrite is not carried out in the simulation inside the agent, but in the real world, therefore the agent gets rewritten and the player reap their reward.
Yes, yes I was implicitly assuming that the AIXI has already been trained up on the game—the technical argument (which allows for training and explores possibilities like “allow the AIXI to choose the agent machine”) is somewhat more nuaunced, and will be explored in depth in an upcoming post. (I was hoping that readers could see the problem from the sketch above, but I suppose if you can’t see AIXI’s problems from Robby’s posts then you’ll probably want to wait for the fully explicit argument.)
Ok, I’ll wait.