The term “AI misuse” encompasses two fundamentally different threat models that deserve separate analysis and different mitigation strategies:
Democratization of offense-dominant capabilities
This involves currently weak actors gaining access to capabilities that dramatically amplify their ability to cause harm. That amplification of ability to cause harm is only a huge problem if access to AI didn’t also dramatically amplify the ability of others to defend against harm, which is why I refer to “offense-dominant” capabilities; this is discussed in The Vulnerable World Hypothesis.
This involves AI systems giving already-powerful actors dramatically more power over others
Examples could include:
Government leaders using AI to stage a self-coup then install a permanent totalitarian regime, using AI to maintain a regime with currently impossible levels of surveillance.
AI company CEOs using advanced AI systems to become world dictator.
The key risk here is particular already-powerful people getting potentially unassailable advantages
These threats require different solutions:
Misuse that involves offense-dominant capabilities can be addressed by preventing users of your AIs from doing catastrophically bad things, e.g. by training the models to robustly refuse requests that could lead to these catastrophic outcomes (which might require improvements in adversarial robustness), or by removing dangerous knowledge from the AI training data.
Power concentration risks require different solutions. Technical measures to prevent users from using the AI for particular tasks don’t help against the threat of the lab CEO trying to use the AI for those harmful tasks, or the threat of the US government expropriating the AI system and using it for their own purposes. To resist against these threats, interventions include:
And then there are some technical interventions, but all of these suffer from the problem that our main concern here is, as jbash put it in a comment, “By far the most important risk isn’t that they’ll steal them. It’s that they will be fully authorized to misuse them. No security measure can prevent that.”
Improved computer security. This guards against the risks of third parties stealing the models.
Security against insider threats, to make it harder for the AI to be misused internally. This is a classic insider threat problem; addressing it will require both technical interventions and workflow changes inside AI companies.
Many discussions of “AI misuse” focus primarily on interventions that only help with the first category, while using rhetoric that suggests they’re addressing both. This creates a motte-and-bailey situation where:
The “motte” (easily defensible position) is “we need to prevent terrorists from using AI for mass harm”
The “bailey” (broader claim) is “our work on AI misuse prevention will solve the major misuse risks from AI, therefore we aren’t causing huge risks through our work”
This conflation is dangerous because it may lead us to overinvest in technical solutions that only address the less concerning risk, and underinvest in countermeasures for power concentration risks.
Computer security, to prevent powerful third parties from stealing model weights and using them in bad ways.
By far the most important risk isn’t that they’ll steal them. It’s that they will be fully authorized to misuse them. No security measure can prevent that.
Technical measures to prevent users from using the AI for particular tasks don’t help against the threat of the lab CEO trying to use the AI for those harmful tasks
Actually, it is not that clear to me. I think adversarial robustness is helpful (in conjunction with other things) to prevent CEOs from misusing models.
If at some point in a CEO trying to take over wants to use HHH to help them with the takeover, that model will likely refuse to do egregiously bad things. So the CEO might need to use helpful-only models. But there might be processes in place to access helpful-only models—which might make it harder for the CEO to take over. So while I agree that you need good security and governance to prevent a CEO from using helpful-only models to take over, I think that without good adversarial robustness, it is much harder to build adequate security/governance measures without destroying an AI-assisted-CEO’s productivity.
There is a lot of power concentration risk that just comes from people in power doing normal people-in-power things, such as increasing surveillance on dissidents—for which I agree that adversarial robustness is ~useless. But security against insider threats is quite useless too.
This involves AI systems giving already-powerful actors dramatically more power over others
Examples could include:
Government leaders using AI to stage a self-coup then install a permanent totalitarian regime, using AI to maintain a regime with currently impossible levels of surveillance.
AI company CEOs using advanced AI systems to become world dictator.
The key risk here is particular already-powerful people getting potentially unassailable advantages
Maybe somewhat of a tangent, but I think this might be a much more legible/better reason to ask for international coordination, then the more speculative-seeming (and sometimes, honestly, wildly overconfident IMO) arguments about the x-risks coming from the difficulty of (technically) aligning superintelligence.
I note that the solutions you mention for the second, less-addressed class of misuse only prevent people who aren’t officially in charge of AGI from misusing it; they don’t address government appropriation.
Governments have a monopoly on the use of force, and their self-perceived mandate includes all issues critical to national security. AGI is surely such an issue.
I expect that government will assume control of AGI if they see it coming before it’s smart enough to help its creators evade that control. And that would be very difficult in most foreseeable scenarios.
You can hop borders, but you’re just moving to another government’s jurisdiction.
I don’t have any better solutions to government misuse for a self-coup and permanent dictatorship. Any such solutions are probably political, not technical, and I know nothing about politics.
But it seems like we need to get some politically savvy people onboard before we have powerful AI aligned to its creators intent. Technical alignment is only a partial solution.
Two different meanings of “misuse”
The term “AI misuse” encompasses two fundamentally different threat models that deserve separate analysis and different mitigation strategies:
Democratization of offense-dominant capabilities
This involves currently weak actors gaining access to capabilities that dramatically amplify their ability to cause harm. That amplification of ability to cause harm is only a huge problem if access to AI didn’t also dramatically amplify the ability of others to defend against harm, which is why I refer to “offense-dominant” capabilities; this is discussed in The Vulnerable World Hypothesis.
The canonical example is terrorists using AI to design bioweapons that would be beyond their current technical capacity (c.f. Aum Shinrikyo, which failed to produce bioweapons despite making a serious effort)
Power Concentration Risk
This involves AI systems giving already-powerful actors dramatically more power over others
Examples could include:
Government leaders using AI to stage a self-coup then install a permanent totalitarian regime, using AI to maintain a regime with currently impossible levels of surveillance.
AI company CEOs using advanced AI systems to become world dictator.
The key risk here is particular already-powerful people getting potentially unassailable advantages
These threats require different solutions:
Misuse that involves offense-dominant capabilities can be addressed by preventing users of your AIs from doing catastrophically bad things, e.g. by training the models to robustly refuse requests that could lead to these catastrophic outcomes (which might require improvements in adversarial robustness), or by removing dangerous knowledge from the AI training data.
Power concentration risks require different solutions. Technical measures to prevent users from using the AI for particular tasks don’t help against the threat of the lab CEO trying to use the AI for those harmful tasks, or the threat of the US government expropriating the AI system and using it for their own purposes. To resist against these threats, interventions include:
Transparency interventions: making it so that more people know about the situation, so it’s less likely a tiny conspiracy can grab lots of power. E.g. see 4 Ways to Advance Transparency in Frontier AI Development.
And then there are some technical interventions, but all of these suffer from the problem that our main concern here is, as jbash put it in a comment, “By far the most important risk isn’t that they’ll steal them. It’s that they will be fully authorized to misuse them. No security measure can prevent that.”
Improved computer security. This guards against the risks of third parties stealing the models.
Security against insider threats, to make it harder for the AI to be misused internally. This is a classic insider threat problem; addressing it will require both technical interventions and workflow changes inside AI companies.
Many discussions of “AI misuse” focus primarily on interventions that only help with the first category, while using rhetoric that suggests they’re addressing both. This creates a motte-and-bailey situation where:
The “motte” (easily defensible position) is “we need to prevent terrorists from using AI for mass harm”
The “bailey” (broader claim) is “our work on AI misuse prevention will solve the major misuse risks from AI, therefore we aren’t causing huge risks through our work”
This conflation is dangerous because it may lead us to overinvest in technical solutions that only address the less concerning risk, and underinvest in countermeasures for power concentration risks.
By far the most important risk isn’t that they’ll steal them. It’s that they will be fully authorized to misuse them. No security measure can prevent that.
That’s a great way of saying it. I edited this into my original comment.
Actually, it is not that clear to me. I think adversarial robustness is helpful (in conjunction with other things) to prevent CEOs from misusing models.
If at some point in a CEO trying to take over wants to use HHH to help them with the takeover, that model will likely refuse to do egregiously bad things. So the CEO might need to use helpful-only models. But there might be processes in place to access helpful-only models—which might make it harder for the CEO to take over. So while I agree that you need good security and governance to prevent a CEO from using helpful-only models to take over, I think that without good adversarial robustness, it is much harder to build adequate security/governance measures without destroying an AI-assisted-CEO’s productivity.
There is a lot of power concentration risk that just comes from people in power doing normal people-in-power things, such as increasing surveillance on dissidents—for which I agree that adversarial robustness is ~useless. But security against insider threats is quite useless too.
Maybe somewhat of a tangent, but I think this might be a much more legible/better reason to ask for international coordination, then the more speculative-seeming (and sometimes, honestly, wildly overconfident IMO) arguments about the x-risks coming from the difficulty of (technically) aligning superintelligence.
I think this is a valuable distinction.
I note that the solutions you mention for the second, less-addressed class of misuse only prevent people who aren’t officially in charge of AGI from misusing it; they don’t address government appropriation.
Governments have a monopoly on the use of force, and their self-perceived mandate includes all issues critical to national security. AGI is surely such an issue.
I expect that government will assume control of AGI if they see it coming before it’s smart enough to help its creators evade that control. And that would be very difficult in most foreseeable scenarios.
You can hop borders, but you’re just moving to another government’s jurisdiction.
I don’t have any better solutions to government misuse for a self-coup and permanent dictatorship. Any such solutions are probably political, not technical, and I know nothing about politics.
But it seems like we need to get some politically savvy people onboard before we have powerful AI aligned to its creators intent. Technical alignment is only a partial solution.