I agree with pretty much all of this and appreciate your clear framing of the issues at hand.
It seems like where our concerns differ is around these two issues:
I believe that the offense-defense balance for AI-enabled biorisk is such that a bad actor with open-weights fine-tuned model could kill billions of people with less than 100k.
I don’t think that the fine-tuning or inference would require more than a single server with 8x GPUs (potentially even just 8x 4090s). So unless the compute regulations are monitoring individual 4090 GPUs, then you aren’t blocking inference or fine-tuning.
Training takes a bunch of servers (often millions or billions of dollars worth of hardware), and thus seems more plausible to monitor.
I haven’t heard any concrete prposals for compute monitoring at the level of 8x 4090 GPUs, have you?
I think compute monitoring for literally 8x 4090 is very likely to be way too hard against a reasonably commited adversary. (The hardware is already way too easy to access and broadly distributed. Also, I think if this causes problems, you might also have issues with people’s mac books which is super costly.)
My overall guess is that for realistically sized models (0.1-100 trillion parameters), we can’t prevent doing a small amount of finetuning and inference against an ML competent adversary.
But, maybe there is some hope for models which are more like 100 trillion parameters? (Minimally, I think 8x 4090 isn’t going to work well for models of this size.)
Yeah, the question of where the threshold of dangerous capabilities is seems very important to whether we can hope to make compute governance a part of restricting bad actors from using fine-tuning & inference to do bad things.
The reason this is important to try to predict is because we really want to not release powerfully dangerous open-source models which bad actors could use for bad things. Once those models have been released, there’s no taking them back. So if there’s a risk that a given model would be dangerous if released openly so that bad actors could have a private copy to fine-tune, then the intervention point is to disallow the release of the model.
I agree with pretty much all of this and appreciate your clear framing of the issues at hand. It seems like where our concerns differ is around these two issues: I believe that the offense-defense balance for AI-enabled biorisk is such that a bad actor with open-weights fine-tuned model could kill billions of people with less than 100k. I don’t think that the fine-tuning or inference would require more than a single server with 8x GPUs (potentially even just 8x 4090s). So unless the compute regulations are monitoring individual 4090 GPUs, then you aren’t blocking inference or fine-tuning. Training takes a bunch of servers (often millions or billions of dollars worth of hardware), and thus seems more plausible to monitor.
I haven’t heard any concrete prposals for compute monitoring at the level of 8x 4090 GPUs, have you?
I think compute monitoring for literally 8x 4090 is very likely to be way too hard against a reasonably commited adversary. (The hardware is already way too easy to access and broadly distributed. Also, I think if this causes problems, you might also have issues with people’s mac books which is super costly.)
My overall guess is that for realistically sized models (0.1-100 trillion parameters), we can’t prevent doing a small amount of finetuning and inference against an ML competent adversary.
But, maybe there is some hope for models which are more like 100 trillion parameters? (Minimally, I think 8x 4090 isn’t going to work well for models of this size.)
Yeah, the question of where the threshold of dangerous capabilities is seems very important to whether we can hope to make compute governance a part of restricting bad actors from using fine-tuning & inference to do bad things.
The reason this is important to try to predict is because we really want to not release powerfully dangerous open-source models which bad actors could use for bad things. Once those models have been released, there’s no taking them back. So if there’s a risk that a given model would be dangerous if released openly so that bad actors could have a private copy to fine-tune, then the intervention point is to disallow the release of the model.