asr comments on How many people here agree with Holden? [Actually, who agrees with Holden?]

asr 15 May 2012 5:56 UTC
0 points
0
Yes, I just changed the notation to be more standard. The point remains. There need not be any “x” that corresponds to “pick a new r” or to “pretend x was really x’”. If there was such an x, it wouldn’t in general have high utility.
- Manfred 15 May 2012 17:06 UTC
  1 point
  0
  Parent
  x is just an input string. So, for example, each x could be a frame coming from a video camera. AIXI then has a reward function r(x), and it maximizes the sum of r(x) over some large number of time steps. In our example, let’s say that if the camera is looking at a happy puppy, r is big, if it’s looking at something else, r is small.
  
  In the lab, AIXI might have to choose between two options (action can be handled by some separate output string, as in Hutter’s paper):
  1) Don’t follow the puppy around.
  2) Follow the puppy around.
  
  Clearly, it will do 2, because r is bigger when it’s looking at a happy puppy, and 2 increases the chance of doing so. One might even say one has a puppy-following robot.
  
  In the real world, there are more options—if you give AIXI access to a printer and some scotch tape, options look like this:
  1) Don’t follow the puppy around.
  2) Follow the puppy around.
  3) Print out a picture of a happy puppy and tape it to the camera.
  
  Clearly, it will do 3, because r is bigger when it’s looking at a happy puppy, and 3 increases the chance of doing so. One might even say one has a happy-puppy-looking-at maximizing robot. This time it’s even true.