if it’s possible to build a single AI system that executes a catastrophic takeover (via self-bootstrap or whatever), it’s also probably possible to build a single aligned sovereign, and so in this situation winning once is sufficient
if it is not possible to build a single aligned sovereign, then it’s probably also not possible to build a single system that executes a catastrophic takeover and so the proposition that the model only has to win once is not true in any straightforward way
in this case, we might be able to think of “composite AI systems” that can catastrophically take over or end the acute risk period, and for similar reasons as in the first scenario, winning once with a composite system is sufficient, but such systems are not built from single acts
and you think the second scenario is more likely than the first.
So, if I’m understanding you correctly:
if it’s possible to build a single AI system that executes a catastrophic takeover (via self-bootstrap or whatever), it’s also probably possible to build a single aligned sovereign, and so in this situation winning once is sufficient
if it is not possible to build a single aligned sovereign, then it’s probably also not possible to build a single system that executes a catastrophic takeover and so the proposition that the model only has to win once is not true in any straightforward way
in this case, we might be able to think of “composite AI systems” that can catastrophically take over or end the acute risk period, and for similar reasons as in the first scenario, winning once with a composite system is sufficient, but such systems are not built from single acts
and you think the second scenario is more likely than the first.
Yes, that’s right, though I’d say “probable” not “possible” (most things are “possible”).