Treacherous turns don’t necessarily happen all at once. An AI system can start covertly recruiting resources outside its intended purview in preparation for a more overt power grab.
This can happen during training, without a deliberate “deployment” event. Once the AI has started recruiting resources, it can outperform AI systems that haven’t done that on-distribution with resources left over which it can devote to pursuing its true objective or instrumental goals.
Treacherous turns don’t necessarily happen all at once. An AI system can start covertly recruiting resources outside its intended purview in preparation for a more overt power grab.
This can happen during training, without a deliberate “deployment” event. Once the AI has started recruiting resources, it can outperform AI systems that haven’t done that on-distribution with resources left over which it can devote to pursuing its true objective or instrumental goals.