Vika comments on [Linkpost] Some high-level thoughts on the DeepMind alignment team’s strategy

Vika 10 Mar 2023 11:33 UTC
LW: 2 AF: 1
0
AF
We expect that an aligned (blue-cloud) model would have an incentive to preserve its goals, though it would need some help from us to generalize them correctly to avoid becoming a misaligned (red-cloud) model. We talk about this in more detail in Refining the Sharp Left Turn (part 2).