Yeah, the question of where the threshold of dangerous capabilities is seems very important to whether we can hope to make compute governance a part of restricting bad actors from using fine-tuning & inference to do bad things.
The reason this is important to try to predict is because we really want to not release powerfully dangerous open-source models which bad actors could use for bad things. Once those models have been released, there’s no taking them back. So if there’s a risk that a given model would be dangerous if released openly so that bad actors could have a private copy to fine-tune, then the intervention point is to disallow the release of the model.
Yeah, the question of where the threshold of dangerous capabilities is seems very important to whether we can hope to make compute governance a part of restricting bad actors from using fine-tuning & inference to do bad things.
The reason this is important to try to predict is because we really want to not release powerfully dangerous open-source models which bad actors could use for bad things. Once those models have been released, there’s no taking them back. So if there’s a risk that a given model would be dangerous if released openly so that bad actors could have a private copy to fine-tune, then the intervention point is to disallow the release of the model.