AlexMennen comments on Failures of an embodied AIXI

AlexMennen 10 Jun 2014 20:33 UTC
1 point
Just look at the AIXI equation itself:
.

$o_{i}$ (observations) and $r_{i}$ (rewards) are the signals sent from the environment to AIXI, and $a_{i}$ (actions) are AIXI’s outputs. Notice that future $a_{i}$ are predicted by picking the one that would maximize expected reward through timestep m, just like AIXI does, and there is no summation over possible ways that the environment could make AIXI output actions computed some other way, like there is for $o_{i}$ and $r_{i}$ .