TurnTrout comments on What’s Up With Confusingly Pervasive Goal Directedness?

TurnTrout 21 Jan 2022 16:45 UTC
13 points
That part does seem wrong to me. It seems wrong because 10^50 is possibly too small. See my post Seeking Power is Convergently Instrumental in a Broad Class of Environments:
If the agent flips the first bit, it’s locked into a single trajectory. None of its actions matter anymore.
But if the agent flips the second bit – this may be suboptimal for a utility function, but the agent still has lots of choices remaining. In fact, it still can induce $(n \times n)^{T - 1}$ observation histories. If $n = 100$ and $T = 50$ , then that’s $(100 \times 100)^{49} = 10^{196}$ observation histories. Probably at least one of these yields greater utility than the shutdown-history utility.
And indeed, we can apply the scaling law for instrumental convergence to conclude that for every u-OH, at least $\frac{10^{196}}{10^{196} + 1}$ of its permuted variants (weakly) prefer flipping the second pixel at $t = 1$ , over flipping the first pixel at $t = 1$ .
$\frac{10^{196}}{10^{196} + 1} .$
Choose any atom in the universe. Uniformly randomly select another atom in the universe. It’s about $10^{117}$ times more likely that these atoms are the same, than that a utility function incentivizes “dying” instead of flipping pixel 2 at $t = 1$ .
(For objectives over the agent’s full observation history, instrumental convergence strength scales exponentially with the complexity of the underlying environment—the environment in question was extremely simple in this case! For different objective classes, the scaling will be linear, but that’s still going to get you far more than 100:1 difficulty, and I don’t think we should privilege such small numbers.)