Yes, I just changed the notation to be more standard. The point remains. There need not be any “x” that corresponds to “pick a new r” or to “pretend x was really x’”. If there was such an x, it wouldn’t in general have high utility.
x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let’s say that if the camera is looking at a happy puppy, r is big, if it’s looking at something else, r is small.
In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter’s paper): 1) Don’t follow the puppy around. 2) Follow the puppy around.
Clearly, it will do 2, because r is bigger when it’s looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot.
In the real world, there are more options—if you give AIXI access to a printer and some scotch tape, options look like this: 1) Don’t follow the puppy around. 2) Follow the puppy around. 3) Print out a picture of a happy puppy and tape it to the camera.
Clearly, it will do 3, because r is bigger when it’s looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it’s even true.
Yes, I just changed the notation to be more standard. The point remains. There need not be any “x” that corresponds to “pick a new r” or to “pretend x was really x’”. If there was such an x, it wouldn’t in general have high utility.
x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let’s say that if the camera is looking at a happy puppy, r is big, if it’s looking at something else, r is small.
In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter’s paper):
1) Don’t follow the puppy around.
2) Follow the puppy around.
Clearly, it will do 2, because r is bigger when it’s looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot.
In the real world, there are more options—if you give AIXI access to a printer and some scotch tape, options look like this:
1) Don’t follow the puppy around.
2) Follow the puppy around.
3) Print out a picture of a happy puppy and tape it to the camera.
Clearly, it will do 3, because r is bigger when it’s looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it’s even true.