There are quite a few interesting dynamics in the space of possible values, that become extremely relevant in worlds where ‘perfect inner alignment’ is impossible/incoherent/unstable.
In those worlds, it’s important to develop forms of weak alignment, where successive systems might not be unboundedly corrigible but do still have semi-cooperative interactions (and transitions of power).
There are quite a few interesting dynamics in the space of possible values, that become extremely relevant in worlds where ‘perfect inner alignment’ is impossible/incoherent/unstable.
In those worlds, it’s important to develop forms of weak alignment, where successive systems might not be unboundedly corrigible but do still have semi-cooperative interactions (and transitions of power).