I think I’ve switched positions on open source models. Before I felt that we must not release them because they can be easily fine-tuned to remove safety measures and represent a tech donation to adversaries. But now I feel the harm posed by these open source models seems pretty small and that because Alibaba is releasing them at an exceptionally rapid pace, western forbearance will not affect their proliferation.
The stakes with open weights for current models are much lower than for hypothetical long-horizon capable models, where removal of safety tuning becomes a stronger argument. The major effects with current models are wide availability for post-training and interpretability research, and feeding the norm of publishing weights that might persist with future dangerous models.
I think I’ve switched positions on open source models. Before I felt that we must not release them because they can be easily fine-tuned to remove safety measures and represent a tech donation to adversaries. But now I feel the harm posed by these open source models seems pretty small and that because Alibaba is releasing them at an exceptionally rapid pace, western forbearance will not affect their proliferation.
The stakes with open weights for current models are much lower than for hypothetical long-horizon capable models, where removal of safety tuning becomes a stronger argument. The major effects with current models are wide availability for post-training and interpretability research, and feeding the norm of publishing weights that might persist with future dangerous models.