The inaction rollouts effectively transforms the stepwise inaction baseline into an inaction baseline (starting from the moment the subagent is created; thus the agent has a bit more control than in a true inaction baseline).
This means that restrictions on increased power for the agent (“make sure you never have the power to increase the rewards”) become restrictions on the actual policy followed for the subagent (“make sure you never increase these rewards”).
For the original example, this means that the agent cannot press the red button nor gain the ability to teleport. But while the subagent cannot press the red button, it can gain the ability to teleport.
The inaction rollouts effectively transforms the stepwise inaction baseline into an inaction baseline (starting from the moment the subagent is created; thus the agent has a bit more control than in a true inaction baseline).
Therefore the results on the inaction baseline apply ( https://www.lesswrong.com/s/iRwYCpcAXuFD24tHh/p/M9aoMixFLf8JFLRaP ).
This means that restrictions on increased power for the agent (“make sure you never have the power to increase the rewards”) become restrictions on the actual policy followed for the subagent (“make sure you never increase these rewards”).
Roughly, attainable utility becomes twenty billion questions.
For the original example, this means that the agent cannot press the red button nor gain the ability to teleport. But while the subagent cannot press the red button, it can gain the ability to teleport.