There’s a countervailing effect of democratizing safety research, which one might think outweighs because it’s so much more neglected than capabilities, more low-hanging fruit.
I take this argument very seriously. It in fact does seem the case that very much of the safety research I’m excited about happens on open source models. Perhaps I’m more plugged into the AI safety research landscape than the capabilities research landscape? Nonetheless, I think not even considering low-hanging-fruit effects, there’s a big reason to believe open sourcing your model will have disproportionate safety gains:
Capabilities research is about how to train your models to be better, but the overall sub-goal of safety research right now seems to be how to verify properties of your model.
Certainly framed like this, releasing the end-states of training (or possibly even training checkpoints) seems better suited to the safety research strategy than the capabilities research strategy.
I take this argument very seriously. It in fact does seem the case that very much of the safety research I’m excited about happens on open source models. Perhaps I’m more plugged into the AI safety research landscape than the capabilities research landscape? Nonetheless, I think not even considering low-hanging-fruit effects, there’s a big reason to believe open sourcing your model will have disproportionate safety gains:
Capabilities research is about how to train your models to be better, but the overall sub-goal of safety research right now seems to be how to verify properties of your model.
Certainly framed like this, releasing the end-states of training (or possibly even training checkpoints) seems better suited to the safety research strategy than the capabilities research strategy.