Mostly inspired by the comments in the post on Plans not being optimization.
When alignment researchers think about corrigibility and values do they think of values more in terms of states and corrigibility as process?
[Question] Simple question about corrigibility and values in AI.
Mostly inspired by the comments in the post on Plans not being optimization.
When alignment researchers think about corrigibility and values do they think of values more in terms of states and corrigibility as process?