oi (observations) and ri (rewards) are the signals sent from the environment to AIXI, and ai (actions) are AIXI’s outputs. Notice that future ai are predicted by picking the one that would maximize expected reward through timestep m, just like AIXI does, and there is no summation over possible ways that the environment could make AIXI output actions computed some other way, like there is for oi and ri.
Just look at the AIXI equation itself:
.oi (observations) and ri (rewards) are the signals sent from the environment to AIXI, and ai (actions) are AIXI’s outputs. Notice that future ai are predicted by picking the one that would maximize expected reward through timestep m, just like AIXI does, and there is no summation over possible ways that the environment could make AIXI output actions computed some other way, like there is for oi and ri.