I view that analysis as applying to all of the scenarios you outline, not just WFLL 2. (Though it’s arguable whether it applies to the multipolar ones.)
I think this is the intent of the report, not just my interpretation of it, since it is aiming to estimate the probability of x-catastrophe via misalignment.
On reflection, I think you’re right, and his report does apply to a wider range of scenarios, probably all of the ones we discuss excluding the brain-in-a-box scenarios.
However, I think the report’s understanding of power-seeking AI does assume a takeoff that is not extremely fast, such that we end up deliberately deciding to deploy the potentially dangerous AI on a large scale, rather than a system exploding in capability almost immediately.
Given the assumptions of the brain-in-a-box scenario many of the corrective mechanisms the report discusses wouldn’t have time to come into play.
I believe it says in the report that it’s not focussed on very fast takeoff or the sudden emergence of very capable systems.
Similarly, you’re right that multiagent risks don’t quite fit in with the reports discussion (though in this post we discuss multipolar scenarios but don’t really go over multiagent dynamics, like conflict/cooperation between TAIs). Unique multiagent risks (for example risks of conflict between AIs) generally require us to first have an outcome with a lot of misaligned AIs embedded in society, and then further problems will develop after that—this is something we plan to discuss in a follow-up post.
So many of the early steps in scenarios like AAFS will be shared with risks from multiagent systems, but eventually there will be differences.
Rohin is correct. In general, I meant for the report’s analysis to apply to basically all of these situations (e.g., both inner and outer-misaligned, both multi-polar and unipolar, both fast take-off and slow take-off), provided that the misaligned AI systems in question ultimately end up power-seeking, and that this power-seeking leads to existential catastrophe.
It’s true, though, that some of my discussion was specifically meant to address the idea that absent a brain-in-a-box-like scenario, we’re fine. Hence the interest in e.g. deployment decisions, warning shots, and corrective mechanisms.
Nitpick:
I view that analysis as applying to all of the scenarios you outline, not just WFLL 2. (Though it’s arguable whether it applies to the multipolar ones.)
I think this is the intent of the report, not just my interpretation of it, since it is aiming to estimate the probability of x-catastrophe via misalignment.
On reflection, I think you’re right, and his report does apply to a wider range of scenarios, probably all of the ones we discuss excluding the brain-in-a-box scenarios.
However, I think the report’s understanding of power-seeking AI does assume a takeoff that is not extremely fast, such that we end up deliberately deciding to deploy the potentially dangerous AI on a large scale, rather than a system exploding in capability almost immediately.
Given the assumptions of the brain-in-a-box scenario many of the corrective mechanisms the report discusses wouldn’t have time to come into play.
I believe it says in the report that it’s not focussed on very fast takeoff or the sudden emergence of very capable systems.
Similarly, you’re right that multiagent risks don’t quite fit in with the reports discussion (though in this post we discuss multipolar scenarios but don’t really go over multiagent dynamics, like conflict/cooperation between TAIs). Unique multiagent risks (for example risks of conflict between AIs) generally require us to first have an outcome with a lot of misaligned AIs embedded in society, and then further problems will develop after that—this is something we plan to discuss in a follow-up post.
So many of the early steps in scenarios like AAFS will be shared with risks from multiagent systems, but eventually there will be differences.
“I won’t assume any of them” is distinct from “I will assume the negations of them”.
I’m fairly confident the analysis is also meant to apply to situations in which things like (1)-(5) do hold.
(Certainly I personally am willing to apply the analysis to situations in which (1)-(5) hold.)
Rohin is correct. In general, I meant for the report’s analysis to apply to basically all of these situations (e.g., both inner and outer-misaligned, both multi-polar and unipolar, both fast take-off and slow take-off), provided that the misaligned AI systems in question ultimately end up power-seeking, and that this power-seeking leads to existential catastrophe.
It’s true, though, that some of my discussion was specifically meant to address the idea that absent a brain-in-a-box-like scenario, we’re fine. Hence the interest in e.g. deployment decisions, warning shots, and corrective mechanisms.