Vika comments on Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika 25 Nov 2022 17:40 UTC
LW: 6 AF: 4
−2
AF
I would consider goal generalization as a component of goal preservation, and I agree this is a significant challenge for this plan. If the model is sufficiently aligned to the goal of being helpful to humans, then I would expect it would want to get feedback about how to generalize the goals correctly when it encounters ontological shifts.