But I am an intelligence that can only communicate with the environment via input/output channels! And so are you!
How is it that we are able to represent the hypothesis that one can die? I refuse to accept that humans do something that AIXI can’t until I see the actual math. (I don’t affirm the opposite claim, mind you.)
Incorrect—your implementation itself also affects the environment via more than your chosen output channels. (Your brain can be scanned, etc.) If you define waste heat, neural patterns, and so on as “output channels” then sure, we can say you only interact via I/O (although the line between I and O is fuzzy enough and your control over the O is small enough that I’d personally object to the distinction).
However, AIXI is not an agent that communicates with the environment only via I/O in this way: if you insist on using the I/O model then I point out that AIXI neglects crucial I/O channels (such as its source code).
until I see the actual math
In fact, Botworld is a tool that directly lets us see where AIXI falls short. (To see the ‘actual math’, simply construct the game described below with an AIXItl running in the left robot.)
Consider a two-cell Botworld game containing two robots, each in a different cell. The left robot is running an AIXI, and the left square is your home square. There are three timesteps. The right square contains a robot which acts as follows:
1. If there are no other robots in the square, Pass.
2. If an other robot just entered the square, Pass.
3. If an other robot has been in the square for a single turn, Pass.
4. If an other robot has been in the square for two turns, inspect its code.
.. If it is exactly the smallest Turing machine which never takes any action,
.. move Left.
5. In all other cases, Pass.
Imagine, further, that your robot (on the left) holds no items, and that the robot on the right holds a very valuable item. (Therefore, you want the right robot to be in your home square at the end of the game.) The only way to get that large reward is to move right and then rewrite yourself into the smallest Turing machine which never takes any action.
Now, consider the AIXI running on the left robot. It quickly discovers that the Turing machine which receives the highest reward acts as follows:
1. Move right
2. Rewrite self into smallest Turing machine which does nothing ever.
The AIXI then, according to the AIXI specification, does the output of the Turing machine it’s found. But the AIXI’s code is as follows:
1. Look for good Turing machines.
2. When you've found one, do it's output.
Thus, what the AIXI will do is this: it will move right, then it will do nothing for the rest of time. But while the AIXI is simulating the Turing machine that rewrites itself into a stupid machine, the AIXI itself has not eliminated the AIXI code. The AIXI’s code is simulating the Turing machine and doing what it would have done, but the code itself is not the “do nothing ever” code that the second robot was looking for—so the AIXI fails to get the reward.
The AIXI’s problem is that it assumes that if it acts like the best Turing machine it found then it will do as well as that Turing machine. This assumption is true when the AIXI only interacts with the environment over I/O channels, but is not true in the real world (where eg. we can inspect the AIXI’s code).
Thus, what the AIXI will do is this: it will move right, then it will do nothing for the rest of time. But while the AIXI is simulating the Turing machine that rewrites itself into a stupid machine, the AIXI itself has not eliminated the AIXI code.
I don’t think it would do even this. (A computable approximation of) AIXI thinks it only affects the universe through its output signals. Since there is no output signal that would cause AIXI (regarded this time as an element in its own universe model) to be reprogrammed, the solution would be completely inaccessible to it.
Actually, an AI that believes it only communicates with the environment via input/output channels cannot represent the hypothesis that it will stop receiving input bits.
But I am an intelligence that can only communicate with the environment via input/output channels!
Incorrect—your implementation itself also affects the environment via more than your chosen output channels.
Okay, fair enough. But until you pointed that out, I was an intelligence that believed it only communicated with the environment via input/output channels (that was your original phrasing, which I should have copied in the first place), and yet I did (and do) believe that it is possible for me to die.
Thus, what the AIXI will do is this: it will move right, then it will do nothing for the rest of time.
Incorrect. I’ll assume for the sake of argument that you’re right about what AIXI will do at first. But AIXI learns by Solomonoff induction, which is infallible at “noticing that it is confused”—all Turing machines that fail to predict what actually happens get dropped from the hypothesis space. AIXI does nothing just until that fails to cause the right-room robot to move, whereupon any program that predicted that merely outputting “Pass” forever would do the trick gets zeroed out.
The AIXI’s problem is that it assumes that if it acts like the best Turing machine it found then it will do as well as that Turing machine.
If there are programs in the hypothesis space that do not make this assumption (and as far as I know, you and I agree that naturalized induction would be such a program), then these are the only programs that will survive the failure of AIXI’s first plan.
Has Paul Christiano looked at this stuff?
ETA: I don’t usually mind downvotes, but I find these ones (currently −2) are niggling at me. I don’t think I’m being conspicuously stupid, and I do think that discussing AIXI in a relatively concrete scenario could be valuable, so I’m a bit at a loss for an explanation. …Perhaps it’s because I appealed to Paul Christiano’s authority?
Quite frankly, it seems that you have completely misunderstood what AIXI is. AIXI (and its computable variants) is a reinforcement learning agent. You can’t expect it to perform well in a fixed duration one-shot problem.
The thing that you describe as AIXI in your comment doesn’t do any learning and therefore is not AIXI.
I’m not sure what you have in mind, but you seem to describe some sort of expected utility maximizer agent which operates on an explicit model of the world, iterating over Turing machines rather than actions for some (possibly erroneous) reason (AIXI iterates over Turing machines to perform Solomonoff induction. This thing doesn’t perform any induction, hence why bother with Turing machines? Maybe you are thinking of something like UDT, but it is not clear).
But in any case, your model is broken: if the agent simulates a Turing machine which performs the action “Rewrite self into smallest Turing machine which does nothing ever.”, outputting the content of the output tape of the simulated machine on the agent output channel, then the rewrite is not carried out in the simulation inside the agent, but in the real world, therefore the agent gets rewritten and the player reap their reward.
Yes, yes I was implicitly assuming that the AIXI has already been trained up on the game—the technical argument (which allows for training and explores possibilities like “allow the AIXI to choose the agent machine”) is somewhat more nuaunced, and will be explored in depth in an upcoming post. (I was hoping that readers could see the problem from the sketch above, but I suppose if you can’t see AIXI’s problems from Robby’s posts then you’ll probably want to wait for the fully explicit argument.)
If you define waste heat, neural patterns, and so on as “output channels” then sure, we can say you only interact via I/O (although the line between I and O is fuzzy enough and your control over the O is small enough that I’d personally object to the distinction).
Also, even with perfect control of your own cognition, you would be restricted to a small subset of possible output strings. Outputting bits on multiple channels, each of which is dependent on the others, constrains you considerably; although I’m not sure whether the effect is lesser or greater than having output as a side effect of computation.
As I mentioned in a different context, it reminds me of UDT, or of the 2048 game: Every choice controls multiple actions.
You are subject to inputs you do not perceive and you send outputs you are neither aware of nor intended to send. You cannot set your gravitational influence to zero, nor can you arbitrarily declare that you should not output “melting” as an action when dropped in lava. You communicate with reality in ways other than your input-output channels. Your existence as a physical fact predicated on the arrangement of your particles is relevant and not controllable by you. This leads you to safeguard yourself, rather than just asserting your unmeltability.
But I am an intelligence that can only communicate with the environment via input/output channels! And so are you!
How is it that we are able to represent the hypothesis that one can die? I refuse to accept that humans do something that AIXI can’t until I see the actual math. (I don’t affirm the opposite claim, mind you.)
Incorrect—your implementation itself also affects the environment via more than your chosen output channels. (Your brain can be scanned, etc.) If you define waste heat, neural patterns, and so on as “output channels” then sure, we can say you only interact via I/O (although the line between I and O is fuzzy enough and your control over the O is small enough that I’d personally object to the distinction).
However, AIXI is not an agent that communicates with the environment only via I/O in this way: if you insist on using the I/O model then I point out that AIXI neglects crucial I/O channels (such as its source code).
In fact, Botworld is a tool that directly lets us see where AIXI falls short. (To see the ‘actual math’, simply construct the game described below with an AIXItl running in the left robot.)
Consider a two-cell Botworld game containing two robots, each in a different cell. The left robot is running an AIXI, and the left square is your home square. There are three timesteps. The right square contains a robot which acts as follows:
Imagine, further, that your robot (on the left) holds no items, and that the robot on the right holds a very valuable item. (Therefore, you want the right robot to be in your home square at the end of the game.) The only way to get that large reward is to move right and then rewrite yourself into the smallest Turing machine which never takes any action.
Now, consider the AIXI running on the left robot. It quickly discovers that the Turing machine which receives the highest reward acts as follows:
The AIXI then, according to the AIXI specification, does the output of the Turing machine it’s found. But the AIXI’s code is as follows:
Thus, what the AIXI will do is this: it will move right, then it will do nothing for the rest of time. But while the AIXI is simulating the Turing machine that rewrites itself into a stupid machine, the AIXI itself has not eliminated the AIXI code. The AIXI’s code is simulating the Turing machine and doing what it would have done, but the code itself is not the “do nothing ever” code that the second robot was looking for—so the AIXI fails to get the reward.
The AIXI’s problem is that it assumes that if it acts like the best Turing machine it found then it will do as well as that Turing machine. This assumption is true when the AIXI only interacts with the environment over I/O channels, but is not true in the real world (where eg. we can inspect the AIXI’s code).
I don’t think it would do even this. (A computable approximation of) AIXI thinks it only affects the universe through its output signals. Since there is no output signal that would cause AIXI (regarded this time as an element in its own universe model) to be reprogrammed, the solution would be completely inaccessible to it.
Okay, fair enough. But until you pointed that out, I was an intelligence that believed it only communicated with the environment via input/output channels (that was your original phrasing, which I should have copied in the first place), and yet I did (and do) believe that it is possible for me to die.
Incorrect. I’ll assume for the sake of argument that you’re right about what AIXI will do at first. But AIXI learns by Solomonoff induction, which is infallible at “noticing that it is confused”—all Turing machines that fail to predict what actually happens get dropped from the hypothesis space. AIXI does nothing just until that fails to cause the right-room robot to move, whereupon any program that predicted that merely outputting “Pass” forever would do the trick gets zeroed out.
If there are programs in the hypothesis space that do not make this assumption (and as far as I know, you and I agree that naturalized induction would be such a program), then these are the only programs that will survive the failure of AIXI’s first plan.
Has Paul Christiano looked at this stuff?
ETA: I don’t usually mind downvotes, but I find these ones (currently −2) are niggling at me. I don’t think I’m being conspicuously stupid, and I do think that discussing AIXI in a relatively concrete scenario could be valuable, so I’m a bit at a loss for an explanation. …Perhaps it’s because I appealed to Paul Christiano’s authority?
Quite frankly, it seems that you have completely misunderstood what AIXI is.
AIXI (and its computable variants) is a reinforcement learning agent. You can’t expect it to perform well in a fixed duration one-shot problem.
The thing that you describe as AIXI in your comment doesn’t do any learning and therefore is not AIXI. I’m not sure what you have in mind, but you seem to describe some sort of expected utility maximizer agent which operates on an explicit model of the world, iterating over Turing machines rather than actions for some (possibly erroneous) reason (AIXI iterates over Turing machines to perform Solomonoff induction. This thing doesn’t perform any induction, hence why bother with Turing machines? Maybe you are thinking of something like UDT, but it is not clear).
But in any case, your model is broken: if the agent simulates a Turing machine which performs the action “Rewrite self into smallest Turing machine which does nothing ever.”, outputting the content of the output tape of the simulated machine on the agent output channel, then the rewrite is not carried out in the simulation inside the agent, but in the real world, therefore the agent gets rewritten and the player reap their reward.
Yes, yes I was implicitly assuming that the AIXI has already been trained up on the game—the technical argument (which allows for training and explores possibilities like “allow the AIXI to choose the agent machine”) is somewhat more nuaunced, and will be explored in depth in an upcoming post. (I was hoping that readers could see the problem from the sketch above, but I suppose if you can’t see AIXI’s problems from Robby’s posts then you’ll probably want to wait for the fully explicit argument.)
Ok, I’ll wait.
Also, even with perfect control of your own cognition, you would be restricted to a small subset of possible output strings. Outputting bits on multiple channels, each of which is dependent on the others, constrains you considerably; although I’m not sure whether the effect is lesser or greater than having output as a side effect of computation.
As I mentioned in a different context, it reminds me of UDT, or of the 2048 game: Every choice controls multiple actions.
You are subject to inputs you do not perceive and you send outputs you are neither aware of nor intended to send. You cannot set your gravitational influence to zero, nor can you arbitrarily declare that you should not output “melting” as an action when dropped in lava. You communicate with reality in ways other than your input-output channels. Your existence as a physical fact predicated on the arrangement of your particles is relevant and not controllable by you. This leads you to safeguard yourself, rather than just asserting your unmeltability.
Yes, I conceded that point two weeks ago.
Oops, my apologies, then. I don’t actually come here all that often.