Hmm, I agree that ARA is not that compelling on its own (as a threat model). However, it seems to me like ruling out ARA is a relatively naturally way to mostly rule out relatively direct danger. And, once you do have ARA ability, you just need some moderately potent self-improvement ability (including training successor models) for the situation to look reasonably scary. Further, it seems somewhat hard to do capabilities evaluations that rule out this self-improvement if models are ARA capable given that there are so possible routes.
Edit: TBC, I think it’s scary mostly because we might want to slow down and rogue AIs might then out pace AI labs.
So, I think I basically agree with where you’re at overall, but I’d go further than “it’s something that roughly correlates with other threat models, but is easier and more concrete to measure” and say “it’s a reasonable threshold to use to (somewhat) bound danger ” which seems worth noting.
While doing all that, in order to stay relevant, they’ll need to recursively self-improve at the same rate at which leading AI labs are making progress, but with far fewer computational resources
I agree this is probably an issue for the rogue AIs. But, we might want to retain the ability to slow down if misalignment seems to be a huge risk and rogue AIs could make this considerably harder. (The existance of serious rogue AIs is surely also correlated with misalignment being a big risk.)
While it’s hard to coordinate to slow down human AI development even if huge risks are clearly demonstrated, there are ways in which it could be particularly hard to prevent mostly autonomous AIs from self-improving. In particular, other AI projects require human employees which could make them easier to track and shutdown. Further, AI projects are generally limited by not having a vast supply of intellectual labor which would change in a regime where there rogue AIs with reasonable ML abilities.
This is mostly an argument that we should be very careful with AIs which have a reasonable chance of being capable of substantial self-improvement, but ARA feels quite related to me.
However, it seems to me like ruling out ARA is a relatively naturally way to mostly rule out relatively direct danger.
This is what I meant by “ARA as a benchmark”; maybe I should have described it as a proxy instead. Though while I agree that ARA rules out most danger, I think that’s because it’s just quite a low bar. The sort of tasks involved in buying compute etc are ones most humans could do. Meanwhile more plausible threat models involve expert-level or superhuman hacking. So I expect a significant gap between ARA and those threat models.
once you do have ARA ability, you just need some moderately potent self-improvement ability (including training successor models) for the situation to look reasonably scary
You’d need either really good ARA or really good self-improvement ability for an ARA agent to keep up with labs given the huge compute penalty they’ll face, unless there’s a big slowdown. And if we can coordinate on such a big slowdown, I expect we can also coordinate on massively throttling potential ARA agents.
Yep, I was imagining big slow down or uncertainty in how explosive AI R&D will be.
And if we can coordinate on such a big slowdown, I expect we can also coordinate on massively throttling potential ARA agents.
It agree that this cuts the risk, but I don’t think it fully eliminates it. I think it might be considerably harder to fully cut off ARA agents than to do a big slowdown.
One way to put this is that quite powerful AIs in the wild might make slowing down 20% less likely to work (conditional on a serious slow down effort) which seems like a decent cost to me.
Part of my view here is that ARA agents will have unique affordances that no human organization will have had before (like having truely vast, vast amounts of pretty high skill labor).
Of course, AI labs will also have access to vast amounts of high skill labor (from possibly misaligned AIs). I don’t know how the offense defense balance goes here. I think full defense might require doing really crazy things that organizations are unwilling to do. (E.g. unleashing a huge number of controlled AI agents which may commit crimes in order to take free energy.)
The sort of tasks involved in buying compute etc are ones most humans could do.
My guess is that you need to be a decent but not amazing software engineer to ARA. So, I wouldn’t describe this as “tasks most humans can do”.
Part of my view here is that ARA agents will have unique affordances that no human organization will have had before (like having truly vast, vast amounts of pretty high skill labor).
The more labor they have, the more detectable they are, and the easier they are to shut down. Also, are you picturing them gaining money from crimes, then buying compute legitimately? I think the “crimes” part is hard to stop but the “paying for compute” part is relatively easy to stop.
My guess is that you need to be a decent but not amazing software engineer to ARA.
Yeah, you’re probably right. I still stand by the overall point though.
I think in our current situation shutting down all rogue AI operations might be quite tough (though limiting their scale is certainly possible, and I agree with the critique that absent slowdowns or alignment problems or regulatory obstacles etc. it would be surprising if these rogue agents could compete with legitimate actors with many more GPUs).
Assuming the AI agents have money there are maybe three remaining constraints the AI agents need to deal with:
purchasing/renting the GPUs,
(if not rented through a cloud provider) setting them up and maintaining them, and
evading law enforcement or other groups trying to locate them and shut them down
For acquiring the GPUs, there are currently a ton of untracked GPUs spread out around the world, both gaming GPUs like 4090s and datacenter GPUs like H100s. I can buy or rent them with extremely minimal KYC from tons of different places. If we assume ARA agents are able to do all of the following:
Recruit people to physically buy GPUs from retail stores
Take delivery of online ordered GPUs in anonymized ways using intermediaries
Recruit people to set up fake AI startups and buy GPUs through them
then the effort required to prevent them from acquiring any GPUs in a world anything like today seems crazy. And even if the US + allies started confiscating or buying up 4090s en masse, it’s enough that some other countries adopt laxer standards (perhaps to attract AI talent that is frustrated by the new draconian GPU-control regime).
As for setting up acquired GPUs, I think the AI agents will probably be able to find colocation operators that don’t ask too many questions in many parts of the world, but even if the world was to coordinate on extremely stringent controls, the AI agents could set up datacenters of their own in random office buildings or similar—each inference setup wouldn’t need that much power and I think it would be very hard to track it down.
As for not being shut down by law enforcement, I think this might take some skill in cybersecurity and opsec etc. but if it operates as dozens or hundreds of separate “cells”, and each has enough labor to figure out decorrelated security and operations etc. then it seems quite plausible that they wouldn’t be shut down. Historically “insurgency groups” or hacker networks etc. seem to be very difficult to fully eliminate, even when a ton of resources are thrown at the problem.
I don’t think any of the above would require superhuman abilities, though many parts are challenging, which is part of why evals targeting these skills could provide a useful safety case—e.g. if it’s clear that the AI could not pull off the cybersecurity operation required to not be easily shut down then this is a fairly strong argument that this AI agent wouldn’t be able to pose a threat [Edit: from rogue agents operating independently, not including other threat models like sabotaging things from within the lab etc.].
Though again I am not defending any very strong claim here, e.g. I’m not saying:
that rogue AIs will be able to claim 5+% of all GPUs or an amount competitive with a well-resourced legitimate actor (I think the world could shut them out of most of the GPU supply, and that over time the situation would worsen for the AI agents as the production of more/better GPUs is handled with increasing care),
that these skills alone mean it poses a risk of takeover or that it could cause significant damage (I agree that this would likely require significant further capabilities, or many more resources, or already being deployed aggressively in key areas of the military etc.)
that “somewhat dumb AI agents self-replicate their way to a massive disaster” is a key threat model we should be focusing our energy on
I’m just defending the claim that ~human-level rogue AIs in a world similar to the world of today might be difficult to fully shut down, even if the world made a concerted effort to do so.
e.g. if it’s clear that the AI could not pull off the cybersecurity operation required to not be easily shut down then this is a fairly strong argument that this AI agent wouldn’t be able to pose a threat
I agree that this is a fairly strong argument that this AI agent wouldn’t be able be able cause problems while rogue. However, I think there is also a concern that this AI will be able to cause serious problems via the affordances it is granted through the AI lab.
In particular, AIs might be given huge affordances internally with minimal controls by default. And this might pose a substantial risk even if AIs aren’t quite capable enough to pull off the cybersecurity operation. (Though it seems not that bad of a risk.)
Yeah that’s right, I made too broad a claim and only meant to say it was an argument against their ability to pose a threat as rogue independent agents.
Also, are you picturing them gaining money from crimes, then buying compute legitimately? I think the “crimes” part is hard to stop but the “paying for compute” part is relatively easy to stop.
Both legitimately and illegitimately acquiring compute could plausibly be the best route. I’m uncertain.
It doesn’t seem that easy to lock down legitimate compute to me? I think you need to shutdown people buying/renting 8xH100 style boxes. This seems quite difficult potentially.
The model weights probably don’t fit in VRAM in a single 8xH100 device (the model weights might be 5 TB when 4 bit quantized), but you can maybe fit with 8-12 of these. And, you might not need amazing interconnect (e.g. normal 1 GB/s datacenter internet is fine) for somewhat high latency (but decently good throughput) pipeline parallel inference. (You only need to send the residual stream per token.) I might do a BOTEC on this later. Unless you’re aware of a BOTEC on this?
I did some BOTECs on this and think 1 GB/s is sort of borderline, probably works but not obviously.
E.g. I assumed a ~10TB at fp8 MoE model with a sparsity factor of 4 with 32768 hidden size.
With 32kB per token you could send at most 30k tokens/second over a 1GB/s interconnect. Not quite sure what a realistic utilization would be, but maybe we halve that to 15k?
If the model was split across 20 8xH100 boxes, then each box might do ~250 GFLOP/token (2 * 10T parameters / (4*20)), so each box would do at most 3.75 PFLOP/second, which might be about ~20-25% utilization.
This is not bad, but for a model with much more sparsity or GPUs with a different FLOP/s : VRAM ratio or spottier connection etc. the bandwidth constraint might become quite harsh.
(the above is somewhat hastily reconstructed from some old sheets, might have messed something up)
Hmm, I agree that ARA is not that compelling on its own (as a threat model). However, it seems to me like ruling out ARA is a relatively naturally way to mostly rule out relatively direct danger. And, once you do have ARA ability, you just need some moderately potent self-improvement ability (including training successor models) for the situation to look reasonably scary. Further, it seems somewhat hard to do capabilities evaluations that rule out this self-improvement if models are ARA capable given that there are so possible routes.
Edit: TBC, I think it’s scary mostly because we might want to slow down and rogue AIs might then out pace AI labs.
So, I think I basically agree with where you’re at overall, but I’d go further than “it’s something that roughly correlates with other threat models, but is easier and more concrete to measure” and say “it’s a reasonable threshold to use to (somewhat) bound danger ” which seems worth noting.
I agree this is probably an issue for the rogue AIs. But, we might want to retain the ability to slow down if misalignment seems to be a huge risk and rogue AIs could make this considerably harder. (The existance of serious rogue AIs is surely also correlated with misalignment being a big risk.)
While it’s hard to coordinate to slow down human AI development even if huge risks are clearly demonstrated, there are ways in which it could be particularly hard to prevent mostly autonomous AIs from self-improving. In particular, other AI projects require human employees which could make them easier to track and shutdown. Further, AI projects are generally limited by not having a vast supply of intellectual labor which would change in a regime where there rogue AIs with reasonable ML abilities.
This is mostly an argument that we should be very careful with AIs which have a reasonable chance of being capable of substantial self-improvement, but ARA feels quite related to me.
This is what I meant by “ARA as a benchmark”; maybe I should have described it as a proxy instead. Though while I agree that ARA rules out most danger, I think that’s because it’s just quite a low bar. The sort of tasks involved in buying compute etc are ones most humans could do. Meanwhile more plausible threat models involve expert-level or superhuman hacking. So I expect a significant gap between ARA and those threat models.
You’d need either really good ARA or really good self-improvement ability for an ARA agent to keep up with labs given the huge compute penalty they’ll face, unless there’s a big slowdown. And if we can coordinate on such a big slowdown, I expect we can also coordinate on massively throttling potential ARA agents.
Yep, I was imagining big slow down or uncertainty in how explosive AI R&D will be.
It agree that this cuts the risk, but I don’t think it fully eliminates it. I think it might be considerably harder to fully cut off ARA agents than to do a big slowdown.
One way to put this is that quite powerful AIs in the wild might make slowing down 20% less likely to work (conditional on a serious slow down effort) which seems like a decent cost to me.
Part of my view here is that ARA agents will have unique affordances that no human organization will have had before (like having truely vast, vast amounts of pretty high skill labor).
Of course, AI labs will also have access to vast amounts of high skill labor (from possibly misaligned AIs). I don’t know how the offense defense balance goes here. I think full defense might require doing really crazy things that organizations are unwilling to do. (E.g. unleashing a huge number of controlled AI agents which may commit crimes in order to take free energy.)
My guess is that you need to be a decent but not amazing software engineer to ARA. So, I wouldn’t describe this as “tasks most humans can do”.
The more labor they have, the more detectable they are, and the easier they are to shut down. Also, are you picturing them gaining money from crimes, then buying compute legitimately? I think the “crimes” part is hard to stop but the “paying for compute” part is relatively easy to stop.
Yeah, you’re probably right. I still stand by the overall point though.
I think in our current situation shutting down all rogue AI operations might be quite tough (though limiting their scale is certainly possible, and I agree with the critique that absent slowdowns or alignment problems or regulatory obstacles etc. it would be surprising if these rogue agents could compete with legitimate actors with many more GPUs).
Assuming the AI agents have money there are maybe three remaining constraints the AI agents need to deal with:
purchasing/renting the GPUs,
(if not rented through a cloud provider) setting them up and maintaining them, and
evading law enforcement or other groups trying to locate them and shut them down
For acquiring the GPUs, there are currently a ton of untracked GPUs spread out around the world, both gaming GPUs like 4090s and datacenter GPUs like H100s. I can buy or rent them with extremely minimal KYC from tons of different places. If we assume ARA agents are able to do all of the following:
Recruit people to physically buy GPUs from retail stores
Take delivery of online ordered GPUs in anonymized ways using intermediaries
Recruit people to set up fake AI startups and buy GPUs through them
then the effort required to prevent them from acquiring any GPUs in a world anything like today seems crazy. And even if the US + allies started confiscating or buying up 4090s en masse, it’s enough that some other countries adopt laxer standards (perhaps to attract AI talent that is frustrated by the new draconian GPU-control regime).
As for setting up acquired GPUs, I think the AI agents will probably be able to find colocation operators that don’t ask too many questions in many parts of the world, but even if the world was to coordinate on extremely stringent controls, the AI agents could set up datacenters of their own in random office buildings or similar—each inference setup wouldn’t need that much power and I think it would be very hard to track it down.
As for not being shut down by law enforcement, I think this might take some skill in cybersecurity and opsec etc. but if it operates as dozens or hundreds of separate “cells”, and each has enough labor to figure out decorrelated security and operations etc. then it seems quite plausible that they wouldn’t be shut down. Historically “insurgency groups” or hacker networks etc. seem to be very difficult to fully eliminate, even when a ton of resources are thrown at the problem.
I don’t think any of the above would require superhuman abilities, though many parts are challenging, which is part of why evals targeting these skills could provide a useful safety case—e.g. if it’s clear that the AI could not pull off the cybersecurity operation required to not be easily shut down then this is a fairly strong argument that this AI agent wouldn’t be able to pose a threat [Edit: from rogue agents operating independently, not including other threat models like sabotaging things from within the lab etc.].
Though again I am not defending any very strong claim here, e.g. I’m not saying:
that rogue AIs will be able to claim 5+% of all GPUs or an amount competitive with a well-resourced legitimate actor (I think the world could shut them out of most of the GPU supply, and that over time the situation would worsen for the AI agents as the production of more/better GPUs is handled with increasing care),
that these skills alone mean it poses a risk of takeover or that it could cause significant damage (I agree that this would likely require significant further capabilities, or many more resources, or already being deployed aggressively in key areas of the military etc.)
that “somewhat dumb AI agents self-replicate their way to a massive disaster” is a key threat model we should be focusing our energy on
I’m just defending the claim that ~human-level rogue AIs in a world similar to the world of today might be difficult to fully shut down, even if the world made a concerted effort to do so.
I agree that this is a fairly strong argument that this AI agent wouldn’t be able be able cause problems while rogue. However, I think there is also a concern that this AI will be able to cause serious problems via the affordances it is granted through the AI lab.
In particular, AIs might be given huge affordances internally with minimal controls by default. And this might pose a substantial risk even if AIs aren’t quite capable enough to pull off the cybersecurity operation. (Though it seems not that bad of a risk.)
Yeah that’s right, I made too broad a claim and only meant to say it was an argument against their ability to pose a threat as rogue independent agents.
Both legitimately and illegitimately acquiring compute could plausibly be the best route. I’m uncertain.
It doesn’t seem that easy to lock down legitimate compute to me? I think you need to shutdown people buying/renting 8xH100 style boxes. This seems quite difficult potentially.
The model weights probably don’t fit in VRAM in a single 8xH100 device (the model weights might be 5 TB when 4 bit quantized), but you can maybe fit with 8-12 of these. And, you might not need amazing interconnect (e.g. normal 1 GB/s datacenter internet is fine) for somewhat high latency (but decently good throughput) pipeline parallel inference. (You only need to send the residual stream per token.) I might do a BOTEC on this later. Unless you’re aware of a BOTEC on this?
I did some BOTECs on this and think 1 GB/s is sort of borderline, probably works but not obviously.
E.g. I assumed a ~10TB at fp8 MoE model with a sparsity factor of 4 with 32768 hidden size.
With 32kB per token you could send at most 30k tokens/second over a 1GB/s interconnect. Not quite sure what a realistic utilization would be, but maybe we halve that to 15k?
If the model was split across 20 8xH100 boxes, then each box might do ~250 GFLOP/token (2 * 10T parameters / (4*20)), so each box would do at most 3.75 PFLOP/second, which might be about ~20-25% utilization.
This is not bad, but for a model with much more sparsity or GPUs with a different FLOP/s : VRAM ratio or spottier connection etc. the bandwidth constraint might become quite harsh.
(the above is somewhat hastily reconstructed from some old sheets, might have messed something up)
for reference, just last week i rented 3 8xh100 boxes without any KYC