But what stops a blue-cloud model from transitioning into a red-cloud model if the blue-cloud model is an AGI like the one hinted at on your slides (self-aware, goal-directed, highly competent)?
We expect that an aligned (blue-cloud) model would have an incentive to preserve its goals, though it would need some help from us to generalize them correctly to avoid becoming a misaligned (red-cloud) model. We talk about this in more detail in Refining the Sharp Left Turn (part 2).
But what stops a blue-cloud model from transitioning into a red-cloud model if the blue-cloud model is an AGI like the one hinted at on your slides (self-aware, goal-directed, highly competent)?
We expect that an aligned (blue-cloud) model would have an incentive to preserve its goals, though it would need some help from us to generalize them correctly to avoid becoming a misaligned (red-cloud) model. We talk about this in more detail in Refining the Sharp Left Turn (part 2).