Why does it no longer hold in a low path-dependence world?
Not sure, but here’s how I understand it:
If we are in a low path-dependence world, the fact that SGD takes a certain path doesn’t say much about what type of model it will eventually converge to.
In a low path-dependence world, if these steps occurred to produce a deceptive model, SGD could still “find a path” to the corrigibly aligned version. The questions of whether it would find these other models depends on things like “how big is the space of models with this property?”, which corresponds to a complexity bias.
Not sure, but here’s how I understand it:
If we are in a low path-dependence world, the fact that SGD takes a certain path doesn’t say much about what type of model it will eventually converge to.
In a low path-dependence world, if these steps occurred to produce a deceptive model, SGD could still “find a path” to the corrigibly aligned version. The questions of whether it would find these other models depends on things like “how big is the space of models with this property?”, which corresponds to a complexity bias.