Stuart_Armstrong comments on Stepwise inaction and non-indexical impact measures

Stuart_Armstrong 18 Feb 2020 10:54 UTC
LW: 4 AF: 2
AF
The inaction rollouts effectively transforms the stepwise inaction baseline into an inaction baseline (starting from the moment the subagent is created; thus the agent has a bit more control than in a true inaction baseline).

Therefore the results on the inaction baseline apply ( https://www.lesswrong.com/s/iRwYCpcAXuFD24tHh/p/M9aoMixFLf8JFLRaP ).

This means that restrictions on increased power for the agent (“make sure you never have the power to increase the rewards”) become restrictions on the actual policy followed for the subagent (“make sure you never increase these rewards”).

Roughly, attainable utility becomes twenty billion questions.

For the original example, this means that the agent cannot press the red button nor gain the ability to teleport. But while the subagent cannot press the red button, it can gain the ability to teleport.
What links here?
- Subagents and impact measures, full and fully illustrated by Stuart_Armstrong (24 Feb 2020 13:12 UTC; 31 points)
- Counterfactuals versus the laws of physics by Stuart_Armstrong (18 Feb 2020 13:21 UTC; 16 points)