Generallyer comments on Alignment Might Never Be Solved, By Humans or AI

Generallyer Oct 6, 2022, 11:22 PM
6 points
1
There are quite a few interesting dynamics in the space of possible values, that become extremely relevant in worlds where ‘perfect inner alignment’ is impossible/incoherent/unstable.

In those worlds, it’s important to develop forms of weak alignment, where successive systems might not be unboundedly corrigible but do still have semi-cooperative interactions (and transitions of power).