Mateusz Bagiński comments on Value systematization: how values become coherent (and misaligned)

Mateusz Bagiński 17 Nov 2023 12:31 UTC
3 points
0
The model’s internal representations of aligned behavior need not change very much during this shift; the only difference might be that aligned behavior shifts from being a terminal goal to an instrumental goal.
Relevant post, in case you haven’t seen it: https://www.lesswrong.com/posts/8qCKZj24FJotm3EKd/ultimate-ends-may-be-easily-hidable-behind-convergent